Probability-Driven Multibit Flip-Flop Integration With Clock Gating
Abstract:
Data-driven clock gated (DDCG) and multibit flip-flops (MBFFs) are two low-power design techniques that are usually treated separately. Combining these techniques into a single grouping algorithm and design flow enables further power savings. We study MBFF multiplicity and its synergy with FF data-to-clock toggling probabilities. A probabilistic model is implemented to maximize the expected energy savings by grouping FFs in increasing order of their data-to-clock toggling probabilities. We present a front-end design flow, guided by physical layout considerations for a 65-nm 32-bit MIPS and a 28-nm industrial network processor. It is shown to achieve the power savings of 23% and 17%, respectively, compared with designs with ordinary FFs. About half of the savings was due to integrating the DDCG into the MBFFs. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.
Existing System:
The data of digital systems are usually stored in flip-flops (FFs), each of which has its own internal clock driver. In an attempt to reduce the clock power, several FFs can be grouped into a module called a multibit FF (MBFF) that houses the clock drivers of all the underlying FFs. We denote the grouping of kFFs into an MBFF by a k-MBFF. Kapoor et al. reported a 15% reduction of the total dynamic power in a 90-nm processor design. Electronic design automation tools, such as Cadence Liberate, support MBFF characterization. The benefits of MBFFs do not come for free. By sharing common drivers, the clock slew rate is degraded, thus causing a larger short-circuit current and a longer clock-to-Q propagation delay tpCQ.
To remedy this, the MBFF internal drivers can be strengthened at the cost of some extra power. It is therefore recommended to apply the MBFF at the RTL design level to avoid the timing closure hurdles caused by the introduction of the MBFF at the backend design stage. Due to the fact that the average data-to-clock toggling ratio of FFs is very small, which usually ranges from 0.01 to 0.1, the clock power savings always outweigh the short-circuit power penalty of the data toggling. An MBFF grouping should be driven by logical, structural, and FF activity considerations. While FFs grouping at the layout level have been studied thoroughly, the front-end implications of MBFF group size and how it affects clock gating (CG) has attracted little attention. This brief responds to two questions. The first is what the optimal bit multiplicity k of data-driven clock-gated (DDCG) MBFFs should be. The second is how to maximize the power savings based on data-to-clock toggling ratio (also termed activity and data toggling probability).
Disadvantages:
- Power consumption is high
Proposed System:
Clearly, the best grouping of FFs that minimizes the energy consumption can be achieved for FFs whose toggling is highly correlated. Using toggling correlations for MBFF grouping has the drawback of requiring early knowledge of the value change dump vectors of a typical workload. Such data may not exist in the early design stage. More commonly available information is the average toggling bulk probability of each FF in the design, which can be estimated from earlier designs or the functional knowledge of modules. FFs’ toggling probabilities are usually different from each other. An important question is therefore how they affect their grouping. We show below that data-to-clock toggling probabilities matter and should be considered for energy minimization.
Figure 1: DDCG integrated into ak-MBFF
Capturing everything in a design flow:
In the following paragraphs, we combine the activity pand the MBFF multiplicityk in a design flow aimed at minimizing the expected wasted energy. Fig. 2(a)–(c) illustrates that the power savings of the 2-MBFF, 4-MBFF, and 8-MBFF, respectively, are used. Knowing the activitypof an FF, the decision as to which MBFF size kit best fits follows the interim lines, lines (d). To obtain the per-bit power consumption, lines (d) in Fig. 2(a)–(c), representing an MBFF realistic operation, were divided by their respective multiplicity. The result is shown in Fig. 3.
Figure 2: Power consumption ofk1-bit FFs compared to k-MBFF: 2-MBFF (a), 4-MBFF (b) and 8-MBFF (c). Line (a) is the power consumed byk1-bit FFs driven independently of each other. Line (b) is the ideal case of simultaneous (identical) toggling. Line (c) is the worst case of exclusive (disjoint) toggling. Line (d) is an example of realistic toggling.
To maximize the power savings, Fig. 3 divides the range of FF activity into regions. The black line follows the power consumed by a 1-bit ungated FF. The triangular areas bounded by the black line and each of the green, blue, and red per-bit lines show the amount of power savings per activity obtained by grouping an FF in the 2-MBFF, 4-MBFF, and 8-MBFF, respectively. It shows that for a very low activity, it pays to group FFs into an 8-MBFF. As activity increases, there will be some point where the 4-MBFF overtakes and pays off more than the 8-MBFF. At some higher activity, the 2-MBFF overtakes and pays off more than the 4-MBFF, up to an activity where the power savings stops. The remaining FFs can be grouped into ungated MBFFs, simply to reduce the number of internal.
Figure 3: Division of the activity into ranges of maximal savings.
A few practical comments are in order. The grouping should not cross clock domains. The clock enable signals introduced by the RTL synthesis and manually by designers are untouched. Groupings should also consider logical relations and practical layout concerns. One example is the pipeline registers of a microprocessor, which are natural candidates for MBFF implementation (see Section V). It is expected that the place and route tool will locate bits belonging to the same register close to each other, whereas FF clusters of registers belonging to distinct pipeline stages will be placed away from each other. FFs belonging to different pipeline registers should therefore not be mixed in an MBFF. Similar arguments hold for other system buses and registers such as those storing data, addresses, counters, statuses, and the like. Another example is the FFs of finite-state machines, whose MBFF grouping should not cross control logic borders.
Finally, the aforementioned post-placement MBFF clustering must consider the timing constraints, which are built into their algorithms. By contrast, the MBFF grouping algorithm does not require explicit timing constraints since it works at the RTL design level. In order to bridge the gap between the RTL grouping and the grouping driven by backend timing-closure considerations, we suggested appropriate DDCG design flow. The main idea involves providing “natural” physical layout directives for FF grouping by employing a prior placement. clock drivers.
Advantages:
- Power consumption is low
Software implementation:
- Tanner tool