After clock tree synthesis, clock gates becoming critical because, by default, they have the same latency applied to their clock pin arrival times as do the register clock pins. Once a clock tree is constructed, the clock gates will be in the intermediate part of the clock tree, not at the leaf. Therefore, the clock arrival times are seen to be earlier than that at the clock leaf pins, and timing is impacted.
The following shows a simple example:
- Pre-CTS, the register pins and clock pins of clock gates see a clock latency of 0ns which models the same arrival time for both.
- Post-CTS, the clock gates are now halfway through the tree and see a latency of 800ps. However, all registers see a 1.5ns arrival time for their clock pins since they are at the leaf level of the tree.
- Any paths from a register to a clock gate now see the difference in clock arrival times, and the pre-CTS slack is degraded by 700ps (1.5ns - 800ps). Since the clock gate is supposed to be at the intermediate point to allow the shut-down portions of the clock tree, it is not correct to assume the clock pins of clock gates should be balanced with the registers.
These paths can be addressed in the following ways:
First, examine how far down these ICGs lie in the clock tree for post-CTS. Whether they are near the root of the clock tree or the clock pins of the flops can influence how you handle them.
- If the clock gates are roughly halfway down the clock tree, you might get the benefit by splitting (replicating) the clock gate. Splitting the clock gate creates parallel copies of the original driver, resulting more clock gate drivers with fewer loads per driver. If the splitting is done for pre-CTS, then we effectively push the clock gate further down the clock tree, increasing power but improving enable timing. See the split_clock_net command.
- If the clock gates are either at beginning or near the bottom of the tree, splitting clock gates will unlikely offer any improvement. In this case, you should add a pre-CTS clock latency value to the ICG clock pin so that you model the pre-CTS latency correctly. Using the above example, you would apply a -700ps latency to the clock pin of the clock gate during place_opt but before clock tree synthesis. Applying a latency allows you to correctly model the slack before the actual clock gate clock arrival time is known.
- If the ICG is a single "top-level clock gate", which is fed by a relatively small cone of logic, you might apply a float pin constraint to the flip-flops feeding the enable signal logic to get their clocks earlier (useful skew). Sometimes this technique is the best solution for top-level clock gates because it does not impact power; splitting top-level clock gates can have a very large power impact.
Uncertainty: It specifies a window within which a clock edge can occur. In physical design uncertainty will be used to model several factors like jitter (the deviation of clock edge from its ideal position), additional margins and skew (at pre-cts)
There will be different uncertainty values specified for setup and hold.
As hold check is performed with respect to same clock edge, any deviation in clock edge (jitter) will affect both launch and capture flop in same way. So for hold uncertainty no need to model jitter, this is the reason, why we always see less value for hold uncertainty compared to setup uncertainty.
Before CTS, uncertainty will also models the expected skew after implementation of clock tree (post-cts). So, at post-cts stages we will reduce the uncertainty values as actual skew values are available.
Setup Uncertainty:
- Pre-Cts = Jitter + Skew + Extra setup margin
- Cts = Jitter + Extra setup margin
Hold Uncertainty:
- Pre-Cts = Skew + Extra hold margin
- cts = Extra hold margin