Physical Design Question & Answers | Q&A |Physical Design| VLSI Back-End Adventure

Physical Design Q&A

Q151. Why can’t we use PMOS as footer & NMOS as header?

If we use NMOS as header (Drain D connected to VDD & Source S connected to load CL & SHUTDOWN block), then NMOS will produce output value of VDD-VT. That means we have reduction in supply voltage for the “shut down” block connected to the source of NMOS. This reduction in voltage affects the performance of the cells in the shutdown block
if we use PMOS as footer (source S connected to the SHUTDOWN block & Drain D connected to the ground), then PMOS will produce an output value of VT at the source. That means Shut down block is not purely connected to the Ground.
Output voltage is getting attenuated with this arrangement

152. Tell me about NLDM Vs CCS?

CCS timing model:

The solution to the problem described by the RC-009 warning message. This warning occurs when the drive resistance of the driver model is much less than the network impedance to ground.
It is better at handling the Miller Effect, dynamic IR drop, and multi voltage analysis.
With the advent of smaller nanometer technologies, the CCS timing approach of modeling cell behavior has been developed to address the effects of deep submicron processes.
The driver model uses a time-varying current source The advantage of this driver model is its ability to handle high-impedance nets and other non-monotonic behavior accurately
The CCS timing receiver model uses two different capacitor values rather than a single lumped capacitance. The first capacitance is used as the load up to the input delay threshold. When the input waveform reaches this threshold, the load is dynamically adjusted to the second capacitance value. This model provides a much better approximation of loading effects in the presence of the Miller Effect
CCS timing models provide the additional accuracy for modeling cell output drivers by using a time-varying and voltage-dependent current source. The timing information is provided by specifying detailed models for the receiver pin capacitance and output charging currents under different scenarios
CCS models doesn’t have long tail effect

NLDM:

The NLDM driver model uses a linear voltage ramp in series with a resistor (a Thevenin model). The resistor helps smooth out the voltage ramp so that the resulting driver waveform is similar to the curvature of the actual driver driving the RC network
When the drive resistor is much less than the impedance of the network to ground, the smoothing effect is reduced, potentially reducing the accuracy of RC delay calculation. When this condition occurs, PrimeTime adjusts the drive resistance to improve accuracy and issues an RC-009 warning.
The NLDM receiver model is a capacitor that represents the load capacitance of the receiver input. A different capacitance value can apply to different conditions such as the rising and falling transitions or the minimum and maximum timing analysis. A single capacitance value, however, applies to a given timing check, which does not support accurate modeling of the Miller Effect.
NLDM timing models represent the delay through the timing arcs based upon output load capacitance and input transition time. In reality, the load seen by the cell output is comprised of capacitance as well as interconnect resistance. The interconnect resistance becomes an issue since the NLDM approach assumes that the output loading is purely capacitive
NLDM’s are having long tail effect
conventional STA with NDLM library can’t consider miller effect and long tail effect.
Timing analysis results can be more optimistic than Spice results

Q153. How do you fix DRC’s in particular area on routed database, which is going to get tape-out soon? consider two cases like cell density is higher & lower in that area?

Cell Density is higher:

collect all the nets in that area and find out non-critical timing nets, which are having +ve slack margin of more than 150ps. Then re-route only those non-critical nets incrementally (eco route) by switching off SI driven & timing driven options (delete those nets & delete global route & reroute them with route eco command route_zrt_eco ). So that tool will route them away from that area.
Collect all the buffers/inverters from non-critical timing paths in that area and downsize them. So that you will get some routing tracks or space in that area
Collect all the vias in that area & convert all the multi-cut vias to single cut vias
Do area based DRC cleaning
collect all the nets from the critical timing paths & re-route them incrementally on metal layers above the highest metal layer used in the block
We can apply cell padding or module padding blindly. But it may impact timing as it disturbs all the cells including critical cells
Finally try to trim PG straps by removing some vias without impacting the IR drop limit given by the foundry in that area. But this is less preferable as the cell density is very high in that region.
Add guide buffers on the nets crossing drc area belongs to non-critical timing paths & place those guided buffers away from the drc area

Cell Density is lower:

We can’t apply all the above techniques here as the cell density is very less.
Only thing we can do here is PG trimming. Even if you do PG trim, it will not affect IR drop limit due to less number of cells.

Q154. how do you fix congestion in particular area (core area) during pre-cts stage?

change max density value & re-rum placement step to see whether congestion is under control or not
if density (both cell and pin density) is more, then apply cell padding or module padding or partial density screens to those cells
see if there is any floor plan issue, which is causing module splitting
check if there is any buffer/inverter chain going in that area due to some floor plan issue

Q155. There are 10 macros & they should be placed in 5x2 (10 macros should be placed in 2 columns) array. How much vertical channel you will leave for routing all the macro pins? Assume 10 metal layer design with each macro is having 200 pins and macros were blocked up to metal 4 ( max. layer used at block level is M8)

Total macro pins available are 10x200=2000 and vertical metal layers available are M3, M5 & M7. In 28nm technology node, track pitch for M1-M6 is 0.05um & M7-M8 is 0.1um(2x) and M9-M10 is 0.8um(8x)
Assume min horz space needed is H. Then total space needed is
That means we have to leave 40um horz space b/w those macros starting from bottom to top side.
If you do like that lot of routing tracks will be wasted at the bottom side as 2 macros at the bottom side have only 400 IO pins. So if you want to effectively use that space, don’t keep 40um at the bottom side and keep only the distance equal to the VDD-VSS pitch.
IO pins will add up as you go towards top side and you need 40um space at the top side (Routing tracks needed for 2000 IO pins at the top).
That means, macros should be placed in V-SHAPE manner for effective area & routing track utilization.

Q156. what's the impact on the timing if you insert inverter on the capture clock pin?

Before inserting inverter, they have full clock cycle available for Setup
After inserting inverter, it becomes half-cycle path for setup timing calculation and hence setup timing will be so critical. But we don’t see any hold timing issue as capture clock comes earlier by half clock period (i.e. at -ve edge) and launch clock comes after that (i.e. at +ve clock edge). Hold path will extra half cycle & hence it becomes less critical

Q157. Will you remove CRPR on setup half cycle timing timing paths & hold half cycle timing paths in the presence of cross talk? (i.e. when you insert inverter on capture clock pin)?

No. Cross-talk contributions from common clock path during capture clock path & launch clock path calculations are different as the clock edges are different for launch & capture flops. i.e. launch edge & capture edges are separated by half cycle for both setup & hold calculations. So we shouldn't remove those cross-talk values from the timing analysis for both setup & hold

Q158. how do you improve insertion delay?

by using proper clock drive strength cells i.e. not low drive strength cells
prefer clock inverter cells than clock buffers
use double width for clock nets as it reduces resistance by half (R’=R/2) and ground capacitance increases slightly. So overall, insertion delay will improve due to dominant effect of resistance
Place clock port on any core edge such that, this should be more or less equidistant from all the corners.
Place 1st level clock gating element in the center of the design & build clock tree from there.
slightly relax max transition limit & skew limit to get the insertion delay
multi-point CTS i.e. divide the entire design area into 4 equal parts and then build Hclock tree from main clock port to those 4 points & then add 1 big clock buffer in each area
1. disconnect all the CP pins from main clock port
2. collect all the register CP pins from each area & connect them back to the output of the big clock buffer located in that area then build regular clock tree from that output pin of that big clock buffer.
clock mesh (but routing resources & power consumption will be more)
congestion should be minimal before going to CTS step. Otherwise congestion might detour clock nets and hence it may need huge number of cells to fix DRV. So it will degrade insertion delay
floor plan issue like missing density screens in some of the macro channels due to which some registers might be placed in those areas. As a result, CTS engine will try to balance that register w.r.t all other leaf pins by adding huge number of clock cells.
fish-bone cts
use only single buffer/inverter with proper drive strength for CTS, so that it will be good in MCMM design as OCV effect will be minimized across different corners. This technique will not improve insertion delay. But indirectly helps in reducing the number of violations due to less OCV affect.

(-using only one buffer / inverter is too optimistic. Cts has to drive different amount load in spine n root. you have to give too plenty of the cell to play with. Or else some places it will add high drive cell even if not required if ypu use low drive. Too many cells will be added.)

Q159. How do you fix Antenna violations? What are the possible methods for fixing those violations?

Antenna Ratio = Metal area connected to gate/gate area.
Antenna violation occurs if antenna ratio exceeds value specified on each metal layer.

layer jumping to higher metal layer (Metal area will reduce)
adding antenna diode near the gate (Gate area will increase)
When a metal wire contacted to a transistor gate is plasma-etched, it can charge up to a voltage sufficient to zap the thin gate oxides. This is called plasma-induced gate-oxide damage, or simply the antenna effect. It can increase the gate leakage, change the threshold voltage, and reduce the life expectancy of a transistor. Longer wires accumulate more charge and are more likely to damage the gates.
During the high-temperature plasma etch process, the diodes formed by source and drain diffusions can conduct significant amounts of current. These diodes bleed off charge from wires before gate oxide is damaged.

Q160. Can you fix antenna violation with buffer?

yes, we can fix with buffer i.e. replace antenna diode with buffer (output float & input pin connected to gate). This will increase the gate area. So that antenna violation will come down