Physical Design Q&A

Q141. After clock tree synthesis (CTS), many timing paths that end at clock gate/ICG enable pins appear. Why didn't these paths get fixed in placement, and how can I handle them?

After clock tree synthesis, clock gates becoming critical because, by default, they have the same latency applied to their clock pin arrival times as do the register clock pins. Once a clock tree is constructed, the clock gates will be in the intermediate part of the clock tree, not at the leaf. Therefore, the clock arrival times are seen to be earlier than that at the clock leaf pins, and timing is impacted.

The following shows a simple example:
  • Pre-CTS, the register pins and clock pins of clock gates see a clock latency of 0ns which models the same arrival time for both.
  • Post-CTS, the clock gates are now halfway through the tree and see a latency of 800ps. However, all registers see a 1.5ns arrival time for their clock pins since they are at the leaf level of the tree.
  • Any paths from a register to a clock gate now see the difference in clock arrival times, and the pre-CTS slack is degraded by 700ps (1.5ns - 800ps). Since the clock gate is supposed to be at the intermediate point to allow the shut-down portions of the clock tree, it is not correct to assume the clock pins of clock gates should be balanced with the registers.

These paths can be addressed in the following ways:
  • First, examine how far down these ICGs lie in the clock tree for post-CTS. Whether they are near the root of the clock tree or the clock pins of the flops can influence how you handle them.
    • If the clock gates are roughly halfway down the clock tree, you might get the benefit by splitting (replicating) the clock gate. Splitting the clock gate creates parallel copies of the original driver, resulting more clock gate drivers with fewer loads per driver. If the splitting is done for pre-CTS, then we effectively push the clock gate further down the clock tree, increasing power but improving enable timing. See the split_clock_net command.
    • If the clock gates are either at beginning or near the bottom of the tree, splitting clock gates will unlikely offer any improvement. In this case, you should add a pre-CTS clock latency value to the ICG clock pin so that you model the pre-CTS latency correctly. Using the above example, you would apply a -700ps latency to the clock pin of the clock gate during place_opt but before clock tree synthesis. Applying a latency allows you to correctly model the slack before the actual clock gate clock arrival time is known.
    • If the ICG is a single "top-level clock gate", which is fed by a relatively small cone of logic, you might apply a float pin constraint to the flip-flops feeding the enable signal logic to get their clocks earlier (useful skew). Sometimes this technique is the best solution for top-level clock gates because it does not impact power; splitting top-level clock gates can have a very large power impact.

  • Q142. How CRPR should be treated in SI Analysis? i.e. Will you keep/remove pessimism from crosstalk affected cells during setup analysis using SI or crosstalk analysis? why?

    CRPR and Crosstalk Analysis

    • When you perform crosstalk analysis using PrimeTime SI, a change in delay due to crosstalk along the common segment of a clock path can be pessimistic, but only for a zero-cycle check. A zero-cycle check occurs when the same clock edge drives both the launch and capture events for the path. For other types of paths, a change in delay due to crosstalk is not pessimistic because the change cannot be assumed to be identical for the launch and capture clock edges.
    • Accordingly, the CRPR algorithm removes crosstalk-induced delays in a common portion of the launch and capture clock paths only if the check is a zero-cycle check. In a zero-cycle check, aggressor switching affects both the launch and capture signals in the same way at the same time.
    • Here are some cases where the CRPR might apply to crosstalk-induced delays:
      1. Standard hold check
      2. Hold check on a register with the Q-bar output connected to the D input, as in a divide-by-2 clock circuit
      3. Hold check with crosstalk feedback due to parasitic capacitance between the Q-bar output and D input of a register
      4. Hold check on a multicycle path set to zero, such as circuit that uses a single clock edge for launch and capture, with designed-in skew between launch and capture
      5. Certain setup checks where transparent latches are involved


    • There is one important difference between the hold and setup analyses related to crosstalk on the common portion of the clock path.
    • The launch and capture clock edge are normally the same edge for the hold analysis.
    • The clock edge through the common clock portion cannot have different crosstalk contributions for the launch clock path and the capture clock path.
    • Therefore, the worst-case hold analysis removes the crosstalk contribution from the common clock path.

      For setup Analysis: It will be done on different clock edges & clock edge will come after one clock period. So on common clock path, cross-talk contributions from launch & capture paths are different. So we shouldn’t remove crosstalk contribution from the common-clock path.

    Q143. What is Uncertainty ? Why do we have different uncertainties for setup & hold at pre-cts & post-cts?

    Uncertainty: It specifies a window within which a clock edge can occur. In physical design uncertainty will be used to model several factors like jitter (the deviation of clock edge from its ideal position), additional margins and skew (at pre-cts)

    There will be different uncertainty values specified for setup and hold.
    As hold check is performed with respect to same clock edge, any deviation in clock edge (jitter) will affect both launch and capture flop in same way. So for hold uncertainty no need to model jitter, this is the reason, why we always see less value for hold uncertainty compared to setup uncertainty. Before CTS, uncertainty will also models the expected skew after implementation of clock tree (post-cts). So, at post-cts stages we will reduce the uncertainty values as actual skew values are available.

    Setup Uncertainty:
    • Pre-Cts = Jitter + Skew + Extra setup margin
    • Cts = Jitter + Extra setup margin
    Hold Uncertainty:
    • Pre-Cts = Skew + Extra hold margin
    • cts = Extra hold margin

    Q144. why do we have different de-rating factors for clock cells & data cells? what's the reason for that?

    • clock cells switching activity is much more than data cells hence it can cause more PVT variations. hence clock cell delay changes due to OCV might contribute to huge number of violations than the data path. That's why we should go for more derating on clock cells than data cells
    • The OCV effect is typically more pronounced on clock paths as they travel longer distances in a chip
    • clock cells are having 2nd order effects and hence derates are more. but data cells are having 1st order effects and hence less derates

    Q145. what are the pros & cons when u use buffers & Inverters for CTS? which one do you prefer for CTS?

    Inverter: Less area & more distance can be driven. But more switching. Good for pulse width and pulse period maintenance.

    • In other words: Inverter's current driving capability is more than the buffer when you compare same drive strength cells i.e.inverters are faster than buffer.
    • So it needed less number of inverters than buffers for the same net length. so Insertion delay will be better with Inverters. That is Indirectly reducing OCV (OCV proportional to Insertion Delay) effect on the timing.
    • As switching is more in inverter based CTS, it might increase OCV? (someone clarify)
    • maintains 50% duty cycle & inverter has regenerative property
    • Inverters are having better noise cancellation effect than the buffers

    Q146. What’s the purpose of TIE cells & what’s the internal structure of the TIE?

    • In lower technology nodes, transistor gate oxide is so thin and it is sensitive to voltage fluctuations in the power supply. If transistor gate is connected directly to the PG network, the transistor gate oxide might get damaged due to voltage fluctuations in the power supply. To overcome this issue TIE cells were introduced b/w PG & transistor gates
    • So TIE cell was introduced to prevent ESD issues
    • These TIE cells are easily converted from 0 to 1 or vice-versa by simply changing one metal layer
    • Let's say you need to do an ECO with only one metal mask in order to change a 0 to a 1 on the input of one of your combinatorial gates, but you only have one tie down cell available. If this tie down cell was designed such that you can easily change its function from 0 to 1 by using only one metal layer, then that would be a cost effective change for a localized ECO.

    Q147. why are we using clock uncertainties (setup uncertainty & hold uncertainty) after post-cts stage if we use OCV derating factors?

    • Jitter is not part of OCV and this Jitter issue comes because of PLL Noise. So we should keep both uncertainties & OCV derating factors as separate entities
    • OCV derating is path based margin. It will consider PVT variations only OCV → process variations i.e.Transistor channel length variations/ gate oxide thickness variations due to mask variations, CMP variations & etching. i.e.if two instances belong to same drive strength library cell located in different place in the layout, then that cell delay might be different due to those variations (cell delay variations due to process variations)
    • Temperature variations: junction temperature & clock cells switching activity & high density areas might generate higher temperatures. So cell delays will vary
    • Voltage variations: For some cells, voltage will reduce due to IR drop issues. It might be because of higher density regions.
    • IR-DROP margin depends on IR drop which you are planning to achieve on your design. if u meet 3% IR drop, then u can have a flexibility to reduce flat margin in OCV derating value
    • if you are not using ENDCAP cells in the design, then you need to pad/add more margin in derating factors. Because every standard library cell characterized assuming that It sits on middle of chip (if a cell sits in the middle, stress on that cell will be less and hence it performs properly. If it sits at the end, then stress will be more and hence cell may not perform as expected). Like this there are lot factors on which. foundry and company decides to reduce or increase flat margin.

    Q148. How do you fix setup timing violation if base gets frozen?

    • See if there are any detours on the nets in that path. Then remove re-route.
    • Route on higher metal layer or layer promotion
    • Fix crosstalk issue on data path
    • Fix crosstalk issue on clock path
    • Insert buffer by converting fortune /spare cells.
    • Logic restructuring. I.e. re-arranging timing critical nets of AND gate away from its ground & timing critical nets of OR gate away from its power. So that non timing critical nets comes first & doesn't acts as load for timing critical nets. Eventually delay will reduce

    Q149. What are the ways to fix antenna violations?

    • Adding antenna diodes near the gate
    • Switch to higher metal layer near the gate
    • Insert buffer near input gate if that path is not timing critical
    • Connect antenna violation net to the input pin of buffer & with output pin float or dummy load

    Q150. Can net delay become reduce if you split net by adding buffer?

    • Assume net is L unit lengths or sections and represent each net section with distributed RC model
    • Assume that resistance per unit length is Rp & capacitance per unit Length is Cp
    • Net Total resistance Rt = L*Rp & Total Capacitance = L* Cp and total net delay will become Dt = Rt * Ct = L^2 * Rp * Cp
    • If you insert buffer, then net length will become L/2 & hence net delay reduced by L^2/4
    • Other Way to derive net delay is: R directly proportional to L/A= L /Wt and C is directly proportional to A/D= tL/S; multiplying RC will be proportional to L^2
    • Concepts of Repeaters are same as I have discussed in “Inserting the Buffer” (above point). Just I am trying to explain this in a different way but the overall concept is the same.
    • Long distance routing means a huge RC loading due to a series of RC delays, as shown in figure. A good alternative is to use repeaters, by splitting the line into several pieces. Why can this solution be better in terms of delay? Because the gate delay is quite small compared to the RC delay

    • In case of Interconnect driven by a single inverter, the propagation delay become
        ○ Tdelay= tgate+ nR.nC = tgate + n-2RC

    • If two repeaters are inserted, the delay becomes:
        ○ Tdelay=tgate (delay of inverter) + 2tgate (delay of repeater) +3RC = 3tgate + 3RC

    • So you can see how RC delay is
    • in case of non-repeater in the circuit.
    • Consequently, if the gate delay is much smaller than the RC delay, repeaters improve the switching speed performances, at the price of higher power consumption.
    • As you keep on adding repeaters for improving the transition on the fixed net with length L, then overall total delay will decrease with adding repeater on the net. At one point, gate delay is more than the RC net delay i.e.gate delay dominates over the net delay. If you add repeaters beyond that point, then overall total delay starts increasing. So you should not add buffers beyond that sweat spot. This is how we calculate how much net length a specific buffer can drive
    • What is synthesis?
    • Goals of synthesis
    • Synthesis Flow
    • Synthesis (input & output)
    • HDL file gen. & lib setup
    • Reading files
    • Design envi. Constraints
    • Compile
    • Generate Reports
    • Write files
    Go To page
    • Netlist(.v or .vhd)
    • Constraints
    • Liberty Timing File(.lib or .db)
    • Library Exchange Format(LEF)
    • Technology Related files
    • TLU+ File
    • Milkyway Library
    • Power Specification File
    • Optimization Directives
    • Design Exchange Formats
    • Clock Tree Constraints/ Specification
    • IO Information File
    Go To page
    • import design
    • sanity checks
    • partitioning (flat and hierarchy)
    • objectives of floorplan
    • Inputs of floorplan
    • Floorplan flowchart
    • Floorplan Techniques
    • Terminologies and definitions
    • Steps in FloorPlan
    • Utilization
    • IO Placement
    • Macro Placement
    • Macro Placement Tips
    • Blockages (soft,hard,partial)
    • Halo/keepout margin
    • Issues arises due to bad floor-plan)
    • FloorPlan Qualifications
    • FloorPlan Output
    Go To page
    • levels of power distribution
    • Power Management
    • Powerplanning involves
    • Inputs of powerplan
    • Properties of ideal powerplan
    • Power Information
    • PowerPlan calculations
    • Sub-Block configuration
    • fullchip configuration
    • UPF Content
    • Isolation Cell
    • Level Shifters
    • Retention Registers
    • Power Switches
    • Types of Power dissipation
    • IR Drop
    • Electromigration
    Go To page
    • Pre-Placement
    • Pre-Placement Optimization
    • Placement
    • Placement Objectives
    • Goals of Placement
    • Inputs of Placement
    • Checks Before placement
    • Placement Methods(Timing & Congestion)
    • Placement Steps
    • Placement Optimization
    • Placement Qualifications
    • Placement Outputs
    Go To page
    • Pre-CTS Optimization
    • CTS
    • Diff b/w HFNS & CTS
    • Diff b/w Clock & normal buffer
    • CTS inputs
    • CTS Goals
    • Clock latency
    • Clock problems
    • Main concerns for Clock design
    • Clock Skew
    • Clock Jitter
    • CTS Pre requisites
    • CTS Objects
    • CTS Flow
    • Clock Tree Reference
    • Clock Tree Exceptions
    • CTS Algorithm
    • Analyze the Clock tree
    • Post CTS Optimization
    • CTS Outputs
    Go To page
    • Importance of Routing as Technology Shrinks
    • Routing Objectives
    • Routing
    • Routing Inputs
    • Routing Goals
    • Routing constraints
    • Routing Flow
    • Trial/Global Routing
    • Track Assignment
    • Detail/Nano Routing
    • Grid based Routing
    • Routing Preferences
    • Post Routing Optimization
    • Filler Cell Insertion
    • Metal Fill
    • Spare Cells Tie-up/ Tie-down
    Go To page
    • Diff b/w DTA & STA
    • Static Timing Analysis
    • main steps in STA
    • STA(input & output)
    • Timing Report
    • Clocked storage elements
    • Delays
    • Pins related to clock
    • Timing Arc
    • Timing Unate
    • Clock definitions in STA
    • Timing Paths
    • Timing Path Groups
    • Clock Latency
    • Insertion Delay
    • Clock Uncertainty
    • Clock Skew
    • Clock Jitter
    • Glitch
    • Pulse width
    • Duty Cycle
    • Transition/Slew
    • Asynchronous Path
    • Critical Path
    • Shortest Path
    • Clock Gating Path
    • Launch path
    • Arrival Path
    • Required Time
    • Common Path Pessimism(CPP/CRPR)
    • Slack
    • Setup and Hold time
    • Setup & hold time violations
    • Recovery Time
    • Removal Time
    • Recovery & Removal time violations
    • Single Cycle path
    • Multi Cycle Path
    • Half Cycle Path
    • False Path
    • Clock Domain Crossing(CDC)
    • Clock Domain Synchronization Scheme
    • Bottleneck Analysis
    • Multi-VT Cells(HVT LVT SVT)
    • Time Borrowing/Stealing
    • Types of STA (PBA GBA)
    • Diff b/w PBA & GBA
    • Block based STA & Path based STA
    Go To page

    • Congestion Analysis
    • Routing Congestion Analysis
    • Placement Cong. Analysis
    • Routing Congestion causes
    • Congestion Fixes
    • Global & local cong.
    • Congestion Profiles
    Go To page
    • Power Analysis
    • Leakeage Power
    • Switching Power
    • Short Circuit
    • Leakage/static Power
    • Static power Dissipation
    • Types of Static Leakage
    • Static Power Reduction Techniques
    • Dynamic/Switching Power
    • Dynamic Power calculation depends on
    • Types of Dynamic Power
    • Dynamic Power Reduction Techniques
    Go To page
    • IR Drop Analysis
    • Types of IR Drop & their methodologies
    • IR Drop Reasons
    • IR Drop Robustness Checks
    • IR Drop Impacts
    • IR Drop Remedies
    • Ldi/dt Effects
    Go To page

    • Design Parasitics
    • Latch-Up
    • Electrostatic Discharge(ESD)
    • Electromigration
    • Antenna Effect
    • Crosstalk
    • Soft Errors
    • Sef Heating
    Go To page
    • Cells in PD
    • Standard Cells
    • ICG Cells
    • Well Taps
    • End Caps
    • Filler Cells
    • Decap Cells
    • ESD Clamp
    • Spare Cells
    • Tie Cells
    • Delay Cells
    • Metrology Cells
    Go To page
    • IO Pads
    • Types of IO Pads
    Go To page
    • Delay Calculation
    • Delay Models
    • Interconnect Delay Models
    • Cell Delay Models
    Go To page
    • Engineering Change Order
    • Post Synthesis ECO
    • Post Route ECO
    • Post Silicon ECO
    • Metal Layer ECO Example
    Go To page
    • std cell library types
    • Classification wrt density and Vth
    Go To page

    • The Discontinuity
    • Discontinuity: Classification
    • DFM/DFY
    • Yield Classification
    • Why DFM/DFY?
    • DFM/DFY Solution
    • Wire Spreading
    • metal Fill
    • CAA
    • CMP Aware-Design
    • Redundant Via
    • RET
    • Litho Process Check(LPC)
    • Layout Dependent Effects
    • Resolution Enhancement Techniques
    • Types of RET
    • Optical Proximity Correction(OPC)
    • Scattering Bars
    • Multiple Patterning
    • Phase-shift Masking
    • Off-Axis Illumination
    Go To page
    • Corners
    • Need for corner analysis
    • PVT Variations
    • Corner Analysis
    • PVT/RC Corners
    • Temperature Inversion
    • Cross Corner Analysis
    • Modes of Analysis
    • MC/MM Analysis
    • OCV
    • Derating
    • OCV Timing Checks
    • OCV Enhancements
    • AOCV
    • SSTA
    • CRPR/CPPR
    Go To page
    Copyright © 2021