Physical Design Question & Answers | Q&A |Physical Design| VLSI Back-End Adventure

Physical Design Q&A

Q141. After clock tree synthesis (CTS), many timing paths that end at clock gate/ICG enable pins appear. Why didn't these paths get fixed in placement, and how can I handle them?

After clock tree synthesis, clock gates becoming critical because, by default, they have the same latency applied to their clock pin arrival times as do the register clock pins. Once a clock tree is constructed, the clock gates will be in the intermediate part of the clock tree, not at the leaf. Therefore, the clock arrival times are seen to be earlier than that at the clock leaf pins, and timing is impacted.

The following shows a simple example:

Pre-CTS, the register pins and clock pins of clock gates see a clock latency of 0ns which models the same arrival time for both.
Post-CTS, the clock gates are now halfway through the tree and see a latency of 800ps. However, all registers see a 1.5ns arrival time for their clock pins since they are at the leaf level of the tree.
Any paths from a register to a clock gate now see the difference in clock arrival times, and the pre-CTS slack is degraded by 700ps (1.5ns - 800ps). Since the clock gate is supposed to be at the intermediate point to allow the shut-down portions of the clock tree, it is not correct to assume the clock pins of clock gates should be balanced with the registers.

These paths can be addressed in the following ways:

First, examine how far down these ICGs lie in the clock tree for post-CTS. Whether they are near the root of the clock tree or the clock pins of the flops can influence how you handle them.

If the clock gates are roughly halfway down the clock tree, you might get the benefit by splitting (replicating) the clock gate. Splitting the clock gate creates parallel copies of the original driver, resulting more clock gate drivers with fewer loads per driver. If the splitting is done for pre-CTS, then we effectively push the clock gate further down the clock tree, increasing power but improving enable timing. See the split_clock_net command.
If the clock gates are either at beginning or near the bottom of the tree, splitting clock gates will unlikely offer any improvement. In this case, you should add a pre-CTS clock latency value to the ICG clock pin so that you model the pre-CTS latency correctly. Using the above example, you would apply a -700ps latency to the clock pin of the clock gate during place_opt but before clock tree synthesis. Applying a latency allows you to correctly model the slack before the actual clock gate clock arrival time is known.
If the ICG is a single "top-level clock gate", which is fed by a relatively small cone of logic, you might apply a float pin constraint to the flip-flops feeding the enable signal logic to get their clocks earlier (useful skew). Sometimes this technique is the best solution for top-level clock gates because it does not impact power; splitting top-level clock gates can have a very large power impact.

Q142. How CRPR should be treated in SI Analysis? i.e. Will you keep/remove pessimism from crosstalk affected cells during setup analysis using SI or crosstalk analysis? why?

CRPR and Crosstalk Analysis

When you perform crosstalk analysis using PrimeTime SI, a change in delay due to crosstalk along the common segment of a clock path can be pessimistic, but only for a zero-cycle check. A zero-cycle check occurs when the same clock edge drives both the launch and capture events for the path. For other types of paths, a change in delay due to crosstalk is not pessimistic because the change cannot be assumed to be identical for the launch and capture clock edges.
Accordingly, the CRPR algorithm removes crosstalk-induced delays in a common portion of the launch and capture clock paths only if the check is a zero-cycle check. In a zero-cycle check, aggressor switching affects both the launch and capture signals in the same way at the same time.
Here are some cases where the CRPR might apply to crosstalk-induced delays:
1. Standard hold check
2. Hold check on a register with the Q-bar output connected to the D input, as in a divide-by-2 clock circuit
3. Hold check with crosstalk feedback due to parasitic capacitance between the Q-bar output and D input of a register
4. Hold check on a multicycle path set to zero, such as circuit that uses a single clock edge for launch and capture, with designed-in skew between launch and capture
5. Certain setup checks where transparent latches are involved

There is one important difference between the hold and setup analyses related to crosstalk on the common portion of the clock path.
The launch and capture clock edge are normally the same edge for the hold analysis.
The clock edge through the common clock portion cannot have different crosstalk contributions for the launch clock path and the capture clock path.
Therefore, the worst-case hold analysis removes the crosstalk contribution from the common clock path.

For setup Analysis: It will be done on different clock edges & clock edge will come after one clock period. So on common clock path, cross-talk contributions from launch & capture paths are different. So we shouldn’t remove crosstalk contribution from the common-clock path.

Q143. What is Uncertainty ? Why do we have different uncertainties for setup & hold at pre-cts & post-cts?

Uncertainty: It specifies a window within which a clock edge can occur. In physical design uncertainty will be used to model several factors like jitter (the deviation of clock edge from its ideal position), additional margins and skew (at pre-cts)

There will be different uncertainty values specified for setup and hold.
As hold check is performed with respect to same clock edge, any deviation in clock edge (jitter) will affect both launch and capture flop in same way. So for hold uncertainty no need to model jitter, this is the reason, why we always see less value for hold uncertainty compared to setup uncertainty. Before CTS, uncertainty will also models the expected skew after implementation of clock tree (post-cts). So, at post-cts stages we will reduce the uncertainty values as actual skew values are available.

Setup Uncertainty:

Pre-Cts = Jitter + Skew + Extra setup margin
Cts = Jitter + Extra setup margin

Hold Uncertainty:

Pre-Cts = Skew + Extra hold margin
cts = Extra hold margin

Q144. why do we have different de-rating factors for clock cells & data cells? what's the reason for that?

clock cells switching activity is much more than data cells hence it can cause more PVT variations. hence clock cell delay changes due to OCV might contribute to huge number of violations than the data path. That's why we should go for more derating on clock cells than data cells
The OCV effect is typically more pronounced on clock paths as they travel longer distances in a chip
clock cells are having 2nd order effects and hence derates are more. but data cells are having 1st order effects and hence less derates

Q145. what are the pros & cons when u use buffers & Inverters for CTS? which one do you prefer for CTS?

Inverter: Less area & more distance can be driven. But more switching. Good for pulse width and pulse period maintenance.

In other words: Inverter's current driving capability is more than the buffer when you compare same drive strength cells i.e.inverters are faster than buffer.
So it needed less number of inverters than buffers for the same net length. so Insertion delay will be better with Inverters. That is Indirectly reducing OCV (OCV proportional to Insertion Delay) effect on the timing.
As switching is more in inverter based CTS, it might increase OCV? (someone clarify)
maintains 50% duty cycle & inverter has regenerative property
Inverters are having better noise cancellation effect than the buffers

Q146. What’s the purpose of TIE cells & what’s the internal structure of the TIE?

In lower technology nodes, transistor gate oxide is so thin and it is sensitive to voltage fluctuations in the power supply. If transistor gate is connected directly to the PG network, the transistor gate oxide might get damaged due to voltage fluctuations in the power supply. To overcome this issue TIE cells were introduced b/w PG & transistor gates
So TIE cell was introduced to prevent ESD issues
These TIE cells are easily converted from 0 to 1 or vice-versa by simply changing one metal layer
Let's say you need to do an ECO with only one metal mask in order to change a 0 to a 1 on the input of one of your combinatorial gates, but you only have one tie down cell available. If this tie down cell was designed such that you can easily change its function from 0 to 1 by using only one metal layer, then that would be a cost effective change for a localized ECO.

Q147. why are we using clock uncertainties (setup uncertainty & hold uncertainty) after post-cts stage if we use OCV derating factors?

Jitter is not part of OCV and this Jitter issue comes because of PLL Noise. So we should keep both uncertainties & OCV derating factors as separate entities
OCV derating is path based margin. It will consider PVT variations only OCV → process variations i.e.Transistor channel length variations/ gate oxide thickness variations due to mask variations, CMP variations & etching. i.e.if two instances belong to same drive strength library cell located in different place in the layout, then that cell delay might be different due to those variations (cell delay variations due to process variations)
Temperature variations: junction temperature & clock cells switching activity & high density areas might generate higher temperatures. So cell delays will vary
Voltage variations: For some cells, voltage will reduce due to IR drop issues. It might be because of higher density regions.
IR-DROP margin depends on IR drop which you are planning to achieve on your design. if u meet 3% IR drop, then u can have a flexibility to reduce flat margin in OCV derating value
if you are not using ENDCAP cells in the design, then you need to pad/add more margin in derating factors. Because every standard library cell characterized assuming that It sits on middle of chip (if a cell sits in the middle, stress on that cell will be less and hence it performs properly. If it sits at the end, then stress will be more and hence cell may not perform as expected). Like this there are lot factors on which. foundry and company decides to reduce or increase flat margin.

Q148. How do you fix setup timing violation if base gets frozen?

See if there are any detours on the nets in that path. Then remove re-route.
Route on higher metal layer or layer promotion
Fix crosstalk issue on data path
Fix crosstalk issue on clock path
Insert buffer by converting fortune /spare cells.
Logic restructuring. I.e. re-arranging timing critical nets of AND gate away from its ground & timing critical nets of OR gate away from its power. So that non timing critical nets comes first & doesn't acts as load for timing critical nets. Eventually delay will reduce

Q149. What are the ways to fix antenna violations?

Adding antenna diodes near the gate
Switch to higher metal layer near the gate
Insert buffer near input gate if that path is not timing critical
Connect antenna violation net to the input pin of buffer & with output pin float or dummy load

Q150. Can net delay become reduce if you split net by adding buffer?

Assume net is L unit lengths or sections and represent each net section with distributed RC model
Assume that resistance per unit length is Rp & capacitance per unit Length is Cp
Net Total resistance Rt = L*Rp & Total Capacitance = L* Cp and total net delay will become Dt = Rt * Ct = L^2 * Rp * Cp
If you insert buffer, then net length will become L/2 & hence net delay reduced by L^2/4
Other Way to derive net delay is: R directly proportional to L/A= L /Wt and C is directly proportional to A/D= tL/S; multiplying RC will be proportional to L^2
Concepts of Repeaters are same as I have discussed in “Inserting the Buffer” (above point). Just I am trying to explain this in a different way but the overall concept is the same.
Long distance routing means a huge RC loading due to a series of RC delays, as shown in figure. A good alternative is to use repeaters, by splitting the line into several pieces. Why can this solution be better in terms of delay? Because the gate delay is quite small compared to the RC delay
In case of Interconnect driven by a single inverter, the propagation delay become
If two repeaters are inserted, the delay becomes:
So you can see how RC delay is
in case of non-repeater in the circuit.
Consequently, if the gate delay is much smaller than the RC delay, repeaters improve the switching speed performances, at the price of higher power consumption.
As you keep on adding repeaters for improving the transition on the fixed net with length L, then overall total delay will decrease with adding repeater on the net. At one point, gate delay is more than the RC net delay i.e.gate delay dominates over the net delay. If you add repeaters beyond that point, then overall total delay starts increasing. So you should not add buffers beyond that sweat spot. This is how we calculate how much net length a specific buffer can drive