Due to DPT rules, there is at least 1 week of DRC Clean up activity after PnR/Timing Converges. I think below suggestions are useful in reducing the number of DPT-DRCโs upfront.
The most common type of DPT DRC is called ๐จ๐๐ ๐๐ฒ๐๐ฅ๐ ๐ฏ๐ข๐จ๐ฅ๐๐ญ๐ข๐จ๐ง and that can be fixed by:
The most common type of DPT DRC is called ๐จ๐๐ ๐๐ฒ๐๐ฅ๐ ๐ฏ๐ข๐จ๐ฅ๐๐ญ๐ข๐จ๐ง and that can be fixed by:
- 1. Increasing the spacing between the two polygons pair. (๐ญ๐จ ๐ฆ๐จ๐ฏ๐ ๐ญ๐ก๐๐ฆ ๐ฌ๐๐ฉ๐๐ซ๐๐ญ๐ ๐๐จ๐ฅ๐จ๐ซ๐๐ ๐ฆ๐๐ฌ๐ค๐ฌ).
- 2. Making the cycle even by removing one polygon. (๐๐ซ๐๐๐ค ๐ญ๐ก๐ ๐ฅ๐จ๐จ๐ฉ).
- 3. Dividing one polygon into 2 pieces that involved an odd cycle to assign them in a different color than change it into an even cycle of four. However, we will have to make sure that the two-piece of the polygon must be overlapped to allow for ๐ฅ๐ข๐ญ๐ก๐จ๐ ๐ซ๐๐ฉ๐ก๐ข๐ rounding & ๐ฆ๐ข๐ฌ๐๐ฅ๐ข๐ ๐ง๐ฆ๐๐ง๐ญ and still ends up with a continuous polygon. (which is also called โ๐๐ญ๐ข๐ญ๐๐กโ).
Whatโs the first thing you look at after CTS?
skew, latency, timing, routing congestion.
skew, latency, timing, routing congestion.
- - I look at number of buffers/inverters tool has added for each clock.
- - If this number is not as per my calculated estimate then I know something is not ok and CTS is messed up.
- - Based on my experiences, I can estimate how to arrive ๐๐ญ ๐ง๐ฎ๐ฆ๐๐๐ซ ๐จ๐ ๐๐ฅ๐จ๐๐ค buffers ๐๐จ๐ซ ๐ ๐๐ฅ๐จ๐๐ค ๐ญ๐ซ๐๐. Specially for clocks with large sync fanout.
- - A thumb rule to estimate clock buffer is (๐ฌ๐ฒ๐ง๐_๐ฉ๐ข๐ง๐ฌ_of_clk รท cts_๐ฆ๐๐ฑ_๐๐๐ง๐จ๐ฎ๐ญ constraints).
- - If we see a major difference in this ratio Vs actual buff or inv added then it should be investigated.
In my above post, I explained few techniques by which we can #analyze the higher latency issue.
here, ๐ ๐๐ฆ ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐ ๐ฌ๐จ๐ฆ๐ ๐๐ฉ๐ฉ๐ซ๐จ๐๐๐ก๐๐ฌ ๐ฐ๐ก๐ข๐๐ก ๐ฐ๐ข๐ฅ๐ฅ ๐ก๐๐ฅ๐ฉ ๐ญ๐จ #๐ซ๐๐๐ฎ๐๐ ๐ข๐ญ:
here, ๐ ๐๐ฆ ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐ ๐ฌ๐จ๐ฆ๐ ๐๐ฉ๐ฉ๐ซ๐จ๐๐๐ก๐๐ฌ ๐ฐ๐ก๐ข๐๐ก ๐ฐ๐ข๐ฅ๐ฅ ๐ก๐๐ฅ๐ฉ ๐ญ๐จ #๐ซ๐๐๐ฎ๐๐ ๐ข๐ญ:
-
1. We can create ๐ฌ๐๐ฉ๐๐ซ๐๐ญ๐ ๐ฌ๐ค๐๐ฐ ๐ ๐ซ๐จ๐ฎ๐ฉ for the main clock & the clock due to which our main clock is getting pushed.
2. We can add the ๐๐๐ฅ๐๐ง๐๐ ๐ฉ๐จ๐ข๐ง๐ญ๐ฌ to pull the clock.
3. Sometimes we must ๐๐ฑ๐ฉ๐ฅ๐ข๐๐ข๐ญ๐ฅ๐ฒ ๐ฌ๐ญ๐จ๐ฉ ๐ ๐๐ฅ๐จ๐๐ค (for eg: on input pin of mux) if two asynchronous clocks are coming on to the two input pins of that mux. Proper case analysis values on select pins of mux also guide tool to propagate the intended clock through its output pin.
4. We can ๐ฏ๐๐ซ๐ข๐๐ฒ ๐ญ๐ก๐ ๐ฉ๐ฅ๐๐๐๐ฆ๐๐ง๐ญ of the clock related logic. Placing the clock cells according to the logical connections always helps in building optimized clock latency.
A timing path which is converging in placement does violates setup time in post-CTS. I am listing down issues I can think of or I faced:
skew, latency, timing, routing congestion.
skew, latency, timing, routing congestion.
- - CTS Skew: During placement, clock tree is ideal and based on 80-100ps clock skew assumption, uncertainty is defined. However post CTS, we see the actual skew on those paths which could be more than our assumption we did at placement (150ps-200ps). This ultimately leads to a setup violation in that path.
- - Crosstalk: At placement, we use the global router to check the overall routing congestion. Intuitively, we assume that a placement with less congestion should have less noise. However it doesnโt show the cross-talk noise map. After clock route, cross-talk delay and noise plays a significant role in timing reduction.
- - HVT Inter corner delay: Moreover, we use HVT cells for hold fixing, which inherently have high inter-corner delay. This difference (comparatively lower for LVT/SVT cells) also leads to increase in data path delay, leading to setup violations.
๐๐ง ๐ฆ๐จ๐ฌ๐ญ ๐จ๐ ๐ญ๐ก๐ ๐๐ข๐ซ๐๐ฎ๐ข๐ญ๐ฌ ๐ญ๐ก๐๐ญ ๐๐ซ๐ ๐๐ฎ๐ซ๐ซ๐๐ง๐ญ๐ฅ๐ฒ ๐ฎ๐ฌ๐๐, ๐ฉ๐จ๐ฐ๐๐ซ ๐ข๐ฌ ๐ ๐ฆ๐๐ฃ๐จ๐ซ ๐๐จ๐ง๐๐๐ซ๐ง.
๐๐๐ซ๐ ๐๐ซ๐ ๐ญ๐ก๐ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐ฐ๐๐ฒ๐ฌ ๐ฐ๐ก๐ข๐๐ก ๐ ๐ฎ๐ฌ๐๐ ๐ญ๐จ ๐ซ๐๐๐ฎ๐๐ ๐ฉ๐จ๐ฐ๐๐ซ ๐๐ฎ๐ซ๐ข๐ง๐ ๐ญ๐ก๐ ๐ข๐ฆ๐ฉ๐ฅ๐๐ฆ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐๐๐ฌ๐ข๐ ๐ง:
๐๐ฅ๐จ๐๐ค ๐ ๐๐ญ๐ข๐ง๐ - To save the dynamic switching power, we use multiple clock gates in our clock paths. Clock gates can be introduced in the design both at RTL & implementation (PNR) level. The concept of clock gating aims to stop the clock of those sequential elements whose data are not toggling.
๐๐ฒ๐ง๐๐ฆ๐ข๐ ๐ฏ๐จ๐ฅ๐ญ๐๐ ๐ & ๐๐ซ๐๐ช๐ฎ๐๐ง๐๐ฒ ๐ฌ๐๐๐ฅ๐ข๐ง๐ - DVFS is a technique where the clock frequency of a design is decreased to allow a corresponding reduction in supply voltage in the design. Since the dynamic power consumption of a design is directly proportional to the square of the voltage, we achieve significant power reduction with this technique.
๐๐จ๐ฐ๐๐ซ ๐ซ๐๐๐จ๐ฏ๐๐ซ๐ฒ ๐ฉ๐จ๐ฌ๐ญ-๐ข๐ฆ๐ฉ๐ฅ๐๐ฆ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง- Once the timing of the design is closed, we run power recovery algorithms on our design. These algorithms look for the timing paths having positive setup slack & convert VT/ downsizes the cells in those paths. This helps us to reduce some leakage/ dynamic power of design post-implementation.
๐๐๐ซ๐ ๐๐ซ๐ ๐ญ๐ก๐ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐ฐ๐๐ฒ๐ฌ ๐ฐ๐ก๐ข๐๐ก ๐ ๐ฎ๐ฌ๐๐ ๐ญ๐จ ๐ซ๐๐๐ฎ๐๐ ๐ฉ๐จ๐ฐ๐๐ซ ๐๐ฎ๐ซ๐ข๐ง๐ ๐ญ๐ก๐ ๐ข๐ฆ๐ฉ๐ฅ๐๐ฆ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐๐๐ฌ๐ข๐ ๐ง:
๐๐ฅ๐จ๐๐ค ๐ ๐๐ญ๐ข๐ง๐ - To save the dynamic switching power, we use multiple clock gates in our clock paths. Clock gates can be introduced in the design both at RTL & implementation (PNR) level. The concept of clock gating aims to stop the clock of those sequential elements whose data are not toggling.
๐๐ฒ๐ง๐๐ฆ๐ข๐ ๐ฏ๐จ๐ฅ๐ญ๐๐ ๐ & ๐๐ซ๐๐ช๐ฎ๐๐ง๐๐ฒ ๐ฌ๐๐๐ฅ๐ข๐ง๐ - DVFS is a technique where the clock frequency of a design is decreased to allow a corresponding reduction in supply voltage in the design. Since the dynamic power consumption of a design is directly proportional to the square of the voltage, we achieve significant power reduction with this technique.
๐๐จ๐ฐ๐๐ซ ๐ซ๐๐๐จ๐ฏ๐๐ซ๐ฒ ๐ฉ๐จ๐ฌ๐ญ-๐ข๐ฆ๐ฉ๐ฅ๐๐ฆ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง- Once the timing of the design is closed, we run power recovery algorithms on our design. These algorithms look for the timing paths having positive setup slack & convert VT/ downsizes the cells in those paths. This helps us to reduce some leakage/ dynamic power of design post-implementation.