Sense Amplifier Half-Buffer (SAHB): A Low-PowerHigh-Performance AsynchronousLogic QDI Cell Template

Abstract:

We propose a novel asynchronous logic (async) quasi-delay-insensitive (QDI) sense-amplifier half-buffer (SAHB) cell design approach, with emphases on high operational robustness, high speed, and low power dissipation. There are five key features of our proposed SAHB. First, the SAHB cell embodies the async QDI 4-phase (4φ) signaling protocol to accommodate process–voltage–temperature variations. Second, the sense amplifier (SA) block in SAHB cells embodies a cross-coupled latch with a positive feedback mechanism to speed up the output evaluation. Third, the evaluation block in the SAHB comprises both nMOS pull-up and pull-down networks with minimum transistor sizing to reduce the parasitic capacitance. Fourth, both the evaluation block and SA block are tightly coupled to reduce redundant internal switching nodes. Fifth, the SAHB cell is designed in CMOS static logic and hence appropriate for full range dynamic voltage scaling operation for VDD ranging from nominal voltage (1 V) to subthreshold voltage (∼0.3 V). When six library cells embodying our proposed SAHB are compared with those embodying the conventional async QDI pre-charged half buffer (PCHB) approach, the proposed SAHB cells collectively feature simultaneous ∼64% lower power, ∼21% faster, and ∼6% smaller IC area; the PCHB cell is inappropriate for subthreshold operation. A prototype 64-bit Kogge–Stone pipeline adder based on the SAHB approach (at 65 nm CMOS) is designed. For a 1-GHz throughput and at nominal VDD, the design based on the SAHB approach simultaneously features ∼56% lower energy and∼24% lower transistor count advantages than its PCHB counterpart. When benchmarked against the ubiquitous synchronous logic counterpart, our SAHB dissipates∼39% lower energy at the 1-GHz throughput The proposed architecture of this paper is analysis the logic size, area and power consumption using tanner tool.

Existing System:

Fig. 1 broadly classifies digital logic for the realization ofoperationally robust digital circuits. In the highest classification, there are the sync and async digital logic design philosophies. As the sync digital logic design philosophy requirestiming assumptions associated with the clock (e.g., clockskews and setup/hold times), realizing operationally robustcircuits under large PVT variations is challenging, where largetiming margins are required to accommodate the worst caseconditions. In contrast, the async digital logic design philosophy, particularly the quasi-delay-insensitive (QDI) approach,is an alternative approach to mitigate the timing assumptions.

Figure 1: General classification of digital logic circuits

There are nevertheless other challenges and will be discussedin the following two paragraphs.In Fig. 1, the classifications within the async digitallogic design philosophy are depicted. In the perspectiveof the timing approach classification, there are three asynctypes: 1) delay-insensitive (DI); 2) bundled-data (BD);and 3) QDI/timed-pipeline (TP)/single-track (ST). For the firstin this classification, the DI circuits, they are largely impractical because they make no assumption on the gate/wire delays,leading to circuit realizations comprising only buffer cells andC-Muller cells. For the second approach, BD circuits, theyare similar to sync circuits, requiring delay assumptions for circuit realization. As their operations rely on bounded gate/wiredelays similar to sync circuits, their design is somewhatchallenging to guarantee operational robustness in unknownoperating conditions. For the third approach, QDI, TP, and STcircuits, they are grouped together for their similar completion detection mechanisms. QDI circuits operate errorfree for arbitrary wire delays and assume isochronic forks,i.e., the same wire delays are assumed for different branches.This assumption can be satisfied easily in the placementand routing stage. On the other hand, although TP circuitsand ST circuits have completion detection mechanisms, theyrequire delay assumptions for their circuit realizations. Thesedelay assumptions consequently reduce the reliability of theircircuits for unknown operating conditions. In short, as theQDI async approach detects thecompletion of data accordingto actual workloads and/or operating conditions, it offers themost practical approach to accommodate unknownPVT variations.

Disadvantages:

  • Low speed
  • High energy

Proposed System:

We further describe a 64-bit Kogge–Stone (KS) pipeline adder embodying the proposed SAHB approach for a power management application. Our SAHB pipeline adder is experimentally verified to be operationally robust within a wide supply voltage range (0.3 to 1.4 V) and wide temperature range (−40 °C to 100 °C). When benchmarked against its competing async PCHB and sync equivalents (at 1-GHz throughput), our SAHB pipeline adder is more energy efficient;

Figure 2: SAHB cell template.

Sense Amplifier Half-Buffer:

Fig. 2 depicts the generic interface signals for the proposed dual-rail SAHB cell template. The data inputs are Datain and nDatain and the data outputs areQ.T/Q.FandnQ.T/nQ.F.The left-channel handshake outputs are Lack and nLack,and the right-channel handshake inputs are Rack andnRack. nDatain , nQ.T, nQ.F, nLack,andnRack are logical complementary signals to the primary input/output signals of Datain, Q.T, Q.F, Lack,andRack, respectively. For the sake of brevity, we will only use the primary input/output signals to delineate the operations of an SAHB cell. The SAHB cell strictly abides by the async 4-phase (4φ) handshake protocol—having two alternate operation sequences, evaluation and reset. Initially, Lack and Rack are reset to 0 and both Datain and Q.T/Q.F are empty, i.e., both of the rails in each signal are 0. During the evaluation sequence, when Datain is valid (i.e., one of the rails in each signal is 1) and Rack is 0, Q.T/Q.Fis evaluated and latched andLack is asserted to 1 to indicate the validity of the output. During the reset sequence, when Datain is empty andRack is 1, Q.T/Q.Fwill then be empty andLack is deasserted to 0. Subsequently, the SAHB cell is ready for the next operation.

Figure 3: Circuit schematic of a buffer cell embodying SAHB. (a) Evaluation block powered byVDD_L. (b) SA block powered by VDD

For illustration, Fig. 3(a) and (b) depicts the respective circuit schematic of an evaluation block and an SA block of a buffer cell embodying SAHB; the various sub-blocks are shown within the dotted blocks. The evaluation and SA blocks are powered, respectively, byVDD_Land byVDD, which can be the same or different voltages (see Section II-B). The nMOS transistor in green with RSTis optional for cell initialization. In Fig. 3(a), the evaluation block comprises an nMOS pull-up network and an nMOS pull-down network to, respectively, evaluate and reset the dual-rail output Q.T/Q.F. Of particular interest, the nMOS pull-up network features low parasitic capacitance (lower than the usual pMOS pull-up network whose transistor sizing is often 2×larger than that of the nMOS).

Figure 4: Dual-rail SAHB library cells. (a) Two-input AND/NAND. (b) Two-input XOR/XNOR. (c) Three-input AO/AOI

Fig. 4(a)–(c) depicts the circuit schematic of three basic SAHB library cells: 1) two-input AND/NAND; 2) two-input XOR/XNOR; and 3) three-input AOI/AOI cells. The logic functions of the pull-up network for AND/NAND, XOR/XNOR, and AO/AOI cells are, respectively, expressed in (2), (3), and (4). Similar to the buffer cell, the structure of the evaluation block and SA block of these cells are constructed based on their logic functions and input signals. These library cells will be used for benchmarking and for realizing the 64-bit SAHB pipeline adder.

Circuit Configuration and Supply Voltage Setup:

In the evaluation block, there are two ways to configure the connection of the transistors for a multiple-input SAHB cell. Fig. 5(a) and (b) depicts two different circuit configurations for Q.F of the two-input AND/NANDSAHB cell. Of these circuit configurations, the configuration in Fig. 5(a) is adopted in the cell library for its lesser transistor count, where Q.F will be partially charged up to VDD_L when either A.F or B.F is 1. The voltage level of voltage supplies VDD_L and VDD is critical to prevent an early output transition before all the inputs (A.F and B.F) are valid.

Figure 5: Circuit configurations in a two-input SAHB AND/NANDcell.(a) Transistors are shared and (b) transistors are not shared. The drawings depict the scenario when only inputAis valid.

Advantages:

  • High speed
  • Low power consumption

Software implementation:

  • Tanner EDA