Ankita Agrawal

Sr. Physical Design Engineer


Due to DPT rules, there is at least 1 week of DRC Clean up activity after PnR/Timing Converges. I think below suggestions are useful in reducing the number of DPT-DRCโ€™s upfront.

The most common type of DPT DRC is called ๐จ๐๐ ๐œ๐ฒ๐œ๐ฅ๐ž ๐ฏ๐ข๐จ๐ฅ๐š๐ญ๐ข๐จ๐ง and that can be fixed by:
  • 1. Increasing the spacing between the two polygons pair. (๐ญ๐จ ๐ฆ๐จ๐ฏ๐ž ๐ญ๐ก๐ž๐ฆ ๐ฌ๐ž๐ฉ๐š๐ซ๐š๐ญ๐ž ๐œ๐จ๐ฅ๐จ๐ซ๐ž๐ ๐ฆ๐š๐ฌ๐ค๐ฌ).
  • 2. Making the cycle even by removing one polygon. (๐›๐ซ๐ž๐š๐ค ๐ญ๐ก๐ž ๐ฅ๐จ๐จ๐ฉ).
  • 3. Dividing one polygon into 2 pieces that involved an odd cycle to assign them in a different color than change it into an even cycle of four. However, we will have to make sure that the two-piece of the polygon must be overlapped to allow for ๐ฅ๐ข๐ญ๐ก๐จ๐ ๐ซ๐š๐ฉ๐ก๐ข๐œ rounding & ๐ฆ๐ข๐ฌ๐š๐ฅ๐ข๐ ๐ง๐ฆ๐ž๐ง๐ญ and still ends up with a continuous polygon. (which is also called โ€œ๐’๐ญ๐ข๐ญ๐œ๐กโ€).
Whatโ€™s the first thing you look at after CTS?

skew, latency, timing, routing congestion.
  • - I look at number of buffers/inverters tool has added for each clock.
  • - If this number is not as per my calculated estimate then I know something is not ok and CTS is messed up.
  • - Based on my experiences, I can estimate how to arrive ๐š๐ญ ๐ง๐ฎ๐ฆ๐›๐ž๐ซ ๐จ๐Ÿ ๐œ๐ฅ๐จ๐œ๐ค buffers ๐Ÿ๐จ๐ซ ๐š ๐œ๐ฅ๐จ๐œ๐ค ๐ญ๐ซ๐ž๐ž. Specially for clocks with large sync fanout.
  • - A thumb rule to estimate clock buffer is (๐ฌ๐ฒ๐ง๐œ_๐ฉ๐ข๐ง๐ฌ_of_clk รท cts_๐ฆ๐š๐ฑ_๐Ÿ๐š๐ง๐จ๐ฎ๐ญ constraints).
  • - If we see a major difference in this ratio Vs actual buff or inv added then it should be investigated.

In my above post, I explained few techniques by which we can #analyze the higher latency issue.

here, ๐ˆ ๐š๐ฆ ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐  ๐ฌ๐จ๐ฆ๐ž ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก๐ž๐ฌ ๐ฐ๐ก๐ข๐œ๐ก ๐ฐ๐ข๐ฅ๐ฅ ๐ก๐ž๐ฅ๐ฉ ๐ญ๐จ #๐ซ๐ž๐๐ฎ๐œ๐ž ๐ข๐ญ:

    1. We can create ๐ฌ๐ž๐ฉ๐š๐ซ๐š๐ญ๐ž ๐ฌ๐ค๐ž๐ฐ ๐ ๐ซ๐จ๐ฎ๐ฉ for the main clock & the clock due to which our main clock is getting pushed.
    2. We can add the ๐›๐š๐ฅ๐š๐ง๐œ๐ž ๐ฉ๐จ๐ข๐ง๐ญ๐ฌ to pull the clock.
    3. Sometimes we must ๐ž๐ฑ๐ฉ๐ฅ๐ข๐œ๐ข๐ญ๐ฅ๐ฒ ๐ฌ๐ญ๐จ๐ฉ ๐š ๐œ๐ฅ๐จ๐œ๐ค (for eg: on input pin of mux) if two asynchronous clocks are coming on to the two input pins of that mux. Proper case analysis values on select pins of mux also guide tool to propagate the intended clock through its output pin.
    4. We can ๐ฏ๐ž๐ซ๐ข๐Ÿ๐ฒ ๐ญ๐ก๐ž ๐ฉ๐ฅ๐š๐œ๐ž๐ฆ๐ž๐ง๐ญ of the clock related logic. Placing the clock cells according to the logical connections always helps in building optimized clock latency.
A timing path which is converging in placement does violates setup time in post-CTS. I am listing down issues I can think of or I faced:

skew, latency, timing, routing congestion.
  • - CTS Skew: During placement, clock tree is ideal and based on 80-100ps clock skew assumption, uncertainty is defined. However post CTS, we see the actual skew on those paths which could be more than our assumption we did at placement (150ps-200ps). This ultimately leads to a setup violation in that path.
  • - Crosstalk: At placement, we use the global router to check the overall routing congestion. Intuitively, we assume that a placement with less congestion should have less noise. However it doesnโ€™t show the cross-talk noise map. After clock route, cross-talk delay and noise plays a significant role in timing reduction.
  • - HVT Inter corner delay: Moreover, we use HVT cells for hold fixing, which inherently have high inter-corner delay. This difference (comparatively lower for LVT/SVT cells) also leads to increase in data path delay, leading to setup violations.
๐ˆ๐ง ๐ฆ๐จ๐ฌ๐ญ ๐จ๐Ÿ ๐ญ๐ก๐ž ๐œ๐ข๐ซ๐œ๐ฎ๐ข๐ญ๐ฌ ๐ญ๐ก๐š๐ญ ๐š๐ซ๐ž ๐œ๐ฎ๐ซ๐ซ๐ž๐ง๐ญ๐ฅ๐ฒ ๐ฎ๐ฌ๐ž๐, ๐ฉ๐จ๐ฐ๐ž๐ซ ๐ข๐ฌ ๐š ๐ฆ๐š๐ฃ๐จ๐ซ ๐œ๐จ๐ง๐œ๐ž๐ซ๐ง.
๐‡๐ž๐ซ๐ž ๐š๐ซ๐ž ๐ญ๐ก๐ž ๐๐ข๐Ÿ๐Ÿ๐ž๐ซ๐ž๐ง๐ญ ๐ฐ๐š๐ฒ๐ฌ ๐ฐ๐ก๐ข๐œ๐ก ๐ˆ ๐ฎ๐ฌ๐ž๐ ๐ญ๐จ ๐ซ๐ž๐๐ฎ๐œ๐ž ๐ฉ๐จ๐ฐ๐ž๐ซ ๐๐ฎ๐ซ๐ข๐ง๐  ๐ญ๐ก๐ž ๐ข๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง ๐จ๐Ÿ ๐๐ž๐ฌ๐ข๐ ๐ง:

๐‚๐ฅ๐จ๐œ๐ค ๐ ๐š๐ญ๐ข๐ง๐ - To save the dynamic switching power, we use multiple clock gates in our clock paths. Clock gates can be introduced in the design both at RTL & implementation (PNR) level. The concept of clock gating aims to stop the clock of those sequential elements whose data are not toggling.

๐ƒ๐ฒ๐ง๐š๐ฆ๐ข๐œ ๐ฏ๐จ๐ฅ๐ญ๐š๐ ๐ž & ๐Ÿ๐ซ๐ž๐ช๐ฎ๐ž๐ง๐œ๐ฒ ๐ฌ๐œ๐š๐ฅ๐ข๐ง๐ - DVFS is a technique where the clock frequency of a design is decreased to allow a corresponding reduction in supply voltage in the design. Since the dynamic power consumption of a design is directly proportional to the square of the voltage, we achieve significant power reduction with this technique.

๐๐จ๐ฐ๐ž๐ซ ๐ซ๐ž๐œ๐จ๐ฏ๐ž๐ซ๐ฒ ๐ฉ๐จ๐ฌ๐ญ-๐ข๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง- Once the timing of the design is closed, we run power recovery algorithms on our design. These algorithms look for the timing paths having positive setup slack & convert VT/ downsizes the cells in those paths. This helps us to reduce some leakage/ dynamic power of design post-implementation.
Copyright ยฉ 2021