Temporarily Fine-Grained Sleep Technique for Near- and Subthreshold Parallel Architectures
Abstract:
This paper presents a design approach for improving energy-efficiency and throughput of parallel architectures in near- and subthreshold voltage circuits. The focus is to suppress leakage energy dissipation of the idle portions of circuits during active modes, which can allow us to wholly transform the throughput improvement from parallel architectures into energy savings via deep voltage scaling. We begin by investigating the efficacy of parallel and pipeline architectures in the near- and subthreshold circuits. The investigation reveals that active energy dissipation largely undermines the ability of deep voltage scaling to transform excessive throughput into energy savings. Techniques, such as power-gating switches (PGSs), can mitigate active-leakage power dissipation; however, the overhead for entering and exiting sleep modes can offset the energy savings provided by sleep mode, particularly if sleep time is fine grained for suppressing active leakage. Therefore, in this paper, we propose a PGS design technique, inspired by the so-called zigzag super cutoff CMOS, in order to optimize the overheads of mode transitions of PGS in near- and subthreshold circuits. The proposed technique enables to have circuits in sleep mode for as short as a single clock cycle with a negligible amount of energy and delay overheads. We apply our proposed design to parallel multiplier-based test circuits operating at near- and subthreshold voltages. Simulations show a significant improvement in energy efficiency over baselines at the same throughput. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.
Existing System:
One of the most effective approaches to reduce power dissipation is to design digital circuits for the operation at supply voltages (VDD) scaled from nominal to near or below the level of transistor threshold voltage (Vth). This approach is often referred to as near- and subthreshold voltage circuits, and it can provide approximately one or two orders of magnitude savings in energy dissipation. Furthermore, we can employ parallel and pipeline architectures, and by scaling VDD, it is possible to trade off throughput improvements from those architectural techniques for higher energy-efficiency. Several classic studies show that such combinations can improve throughput, energy-efficiency, or both. The existing works on parallel and pipelined architectures, however, have emphasis on nominal VDD designs, having little or no attention on a crucial issue that has greater significance in near- and subthreshold circuits: active-leakage dissipation. As VDD is scaled from nominal to near- and subthreshold levels, increasingly slowed-down circuits accumulate more leakage power per clockcycle. Eventually, leakage energy dissipation starts to offset the quadratic savings of dynamic energy dissipation. Active-leakage energy dissipation, consequently, becomes critical to runtime computing energy efficiency. The VDD level at which the total energy consumption starts to increase is defined as energy-optimal voltage (VOPT). The energy consumption at VOPT is denoted by EOPT.
In order to improve computing energy-efficiency beyond the conventional limit, i.e., VOPT, it is of great importance to reduce leakage energy dissipation during active modes. One of the potential solutions for this is to place idle parts of circuits into a low-leakage (sleep) mode. For example, although not targeting ultralow-voltage (ULV) circuits, Hu et al. and Tschanzet al. have proposed power-gating switch(PGS) for each function block of an execution stage of a pipelined microprocessor. By opportunistically having the blocks that perform no useful work in a sleep mode, we can reduce active leakage waste. While low-leakage sleep mode is a valid approach, it can cause non-negligible energy and delay overheads to frequently enter and exit modes (i.e., mode transitions) for suppressing active-leakage dissipation. The use of PGS, as an example, can consume a significant amount of dynamic energy to charge and discharge parasitic capacitances during mode transitions. Furthermore, the delay required for transitioning from a sleep to an active mode can degrade throughput and complicate sleep control.
Disadvantages:
- Energy consumption is high
Proposed System:
Increasing hardware-level parallelism and pipelining are notable architectural strategies for enhancing computing throughput. These approaches can also provide significant energy-efficiency gains, since the improved throughput can be traded off for energy savings via voltage scaling.
we will investigate the efficacy of those architectures in near- and subthreshold circuits. In nominal voltage circuits, the classic study has shown that parallelism and pipelining can allow the use of lower supply voltages for the same throughput, improving energy-efficiency. The results show that two-way parallel and two-stage pipeline architectures can reduce VDD from 5 to 2.9 V and achieve 2.5×and 2.8×, respectively, energy savings over a baseline design. As we show shortly, however, those architectures become significantly less effective on improving energy efficiency in near- and subthreshold circuits.
Figure 1: Fig. 1. Three test architectures based on a 16-bit multiplier. (a) Baseline, (b) two-stage pipelined, and (c) two-way parallel designs. Dashed lines: boundaries of the equivalent sequencing stage across three designs.
Efficacy of Two-Way Parallel Architecture:
In order to investigate parallelism and pipelining in near and subthreshold circuits, we use three test architectures based on 16-bit array multipliers in a 65-nm general-purpose CMOS. The baseline version, shown in Fig. 1(a), consists of 32 input D flip-flops and an array multiplier, which operates at the maximum clock frequency (FCLK,BASE)at eachVDD.Fig.1(b) shows the two-stage pipeline architecture, which consists of 32 input and pipeline flip-flops. The two-stage pipeline can halve the critical path delay, which allows us to use the same clock frequency (FCLK,PIPE =FCLK,BASE) at VDD. This low VDD can improve energy-efficiency. Finally, Fig. 1(c) shows the two-way parallel architecture. This design includes a 32-bit 2-to-1 multiplexer to recombine the outputs of the two multipliers (Multipliers 1 and 2). In the parallel architecture, while a new input comes at FCLK, BASE, computation is interleaved by clocking the input flop-flops at FCLK, PARA, which is the half of FCLK,BASE. Although clock frequency is reduced, throughput is still maintained. This slack, provided by parallelism, enables us to reduceVDDto increase power and energy savings. The energy dissipation of output flip-flops is not included.
Mode Transition Overhead of PGS:
Fig. 2(a) shows the conventional nMOS-based PGS design, and Fig. 2(b) shows the transient behaviors of the virtual ground potential (VVG), and power dissipation of the main circuits when entering, exercising, and exiting sleep modes. Here, when the SLEEP BAR (SLPB) signal transitions to logic level LOW (i.e., entering sleep mode), the potential of VG starts to rise to a level close to VDD. The elapsed time associated with this transition is defined as time to sleep (T2SLP). After that, the circuits reach deep sleep, when they consume small leakage power, referred to as PSLEEP. In order to exit a sleep mode, SLPB signal is set HIGH, and each node of the main circuits, including VG, returns to its stable state. The transition time from sleep to active mode is defined as wake-up time (T2WKU). The total sleep time, TSLP, is defined as the sum ofT2SLP, TSLEEP, and T2WKU. The energy dissipated during a mode transition is defined as ETRAN.
Figure 2: (a) Main circuits (two inverters) with an nMOS PGS, showing the critical discharging path during a wake-up process. (b) Timing and energy overheads during a mode transition.
Advantages:
- Energy consumption is low
Software implementation:
- Tanner tool