Dual-Quality 4:2 Compressors for Utilizing inDynamic Accuracy Configurable Multipliers
Abstract:
In this paper, we propose four 4:2 compressors, which have the flexibility of switching between the exact and approximate operating modes. In the approximate mode, these dual-quality compressors provide higher speeds and lower power consumptions at the cost of lower accuracy. Each of these compressors has its own level of accuracy in the approximate mode as well as different delays and power dissipations in the approximate and exact modes. Using these compressors in the structures of parallel multipliers provides configurable multipliers whose accuracies (as well as their powers and speeds) may change dynamically during the runtime. The efficiencies of these compressors in a 32-bit Dadda multiplier are evaluated in a 45-nm standard CMOS technology by comparing their parameters with those of the state-of-the-art approximate multipliers. The results of comparison indicate, on average, 46% and 68% lower delay and power consumption in the approximate mode. Also, the effectiveness of these compressors is assessed in some image processing applications. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
Existing System:
While there are many works in designing approximatemultipliers, the research efforts on accuracy configurableapproximate multipliers are limited. In this section, we reviewsome of these works. In [10], a static segment method (SSM)is presented, which performs the multiplication operation onan m-bit segment starting from the leading 1 bit of theinput operands where mis equal to or greater than n/2.Hence, anm×mmultiplier consumes much less energythan an n×nmultiplier. Also, a dynamic range unbiasedmultiplier (DRUM) multiplier, which selects anm-bit segment,starting from the leading 1 bit of the input operands, and setsthe least significant bit of the truncated values to “1,” has been proposed. In this structure, the truncated values aremultiplied and shifted to the left to generate the final output.Although, by exploiting smaller values form, the structureprovides higher accuracy designs than those,its approach requires utilizing extra complex circuitry.
A bioinspired approximate multiplier, called broken arraymultiplier, has been proposed. In this structure, somecarry save adder cells, in both vertical and horizontal directionsduring the summation of the partial products, have beenomitted to save the power and area and reduce the delay. Two approximate 4:2 compressors have been proposedand utilized in Dadda multiplier.The proposed compressorsonly operated in the approximate mode. By modifyingthe Karnaugh map of a 2×2 multiplier (omitting one term inthe Karnaugh map), an approximate 2×2 multiplier with asimpler structure has been proposed. This block may be usedfor constructing larger multipliers. Also, in this paper, an errordetection and correction (EDC) circuit has been proposed.An inaccurate multiplier design strategy based on redesigningthe multiplier into two multiplication andnon-multiplicationparts was introduced. The multiplication part wasconstructed based on the conventional multipliers while thenon-multiplication part was implemented in an approximatestructure with a specified value of error. It should be notedthat both of the approaches presented sufferfrom high relative errors.
Figure 1: Block diagram of 4:2 compressor.
Figure 2: Structure of the conventional 4:2 compressor.
A high accuracy approximate 4×4 Wallace tree multiplier was proposed. This multiplier employed a 4:2 approximate counter leading to delay and power reductions of thepartial product stage of the 4×4 Wallace tree. In this paper, theproposed small multiplier was used to form larger multipliers.Due to the array structure of this approximate multiplier, itsdelay was large. In addition, an EDC unit was suggestedto be used at the output of the approximate 4 ×4 Wallacetree. The unit generated the exact output in the case of theexact operating mode. By proposing an approximateadder with a small carry propagation delay, the partial productreduction stage was sped up. In this paper, an OR-gate-basederror reduction unit was also proposed. A roundingbased approximate multiplier (ROBA) has been proposed thatround the input operands into the nearest exponent of two. Thisway the multiplication operation became simpler. It should benoticed that the error recovery unit increases the power consumption and delay of themultiplier. This implies that accuracy configurable multiplierswould have large delay and power overheads.
Exact 4:2 Compressor:
To reduce the delay of the partial product summation stageof parallel multipliers, 4:2 and 5:2 compressors are widelyemployed. Thefocus of this paper is on approximate 4:2 compressors. First,some background on the exact 4:2 compressor is presented.This type of compressor, shown schematically in Fig. 1, hasfour inputs (x1–x4) along with an input carry (Cin),andtwooutputs (sumandcarry) along with an outputCout.
The internal structure of an exact 4:2 compressor is composed of two serially connected full adders, as shown in Fig. 2.In this structure, the weights of all the inputs and the sumoutput are the same whereas the weights of thecarryandCoutoutputs are one binary bit position higher. The outputssum,carry,andCout are obtained from
sum=x1⊕x2⊕x3⊕x4⊕Cin (1)
carry=(x1⊕x2⊕x3⊕x4)Cin+(x1⊕x2⊕x3⊕x4)’x4(2)
Cout=(x1⊕x2)x3+(x1⊕x2)’x1. (3)
Disadvantages:
- Delay is high
- Power consumption is high
Proposed System:
We present four dual-quality reconfigurableapproximate 4:2 compressors, which provide the ability ofswitching between the exact and approximate operating modesduring the runtime. The compressors may be utilized in thearchitectures of dynamic quality configurable parallel multipliers. The basic structures of the proposed compressorsconsist of two parts of approximate and supplementary. Inthe approximate mode, only the approximate part is activewhereas in the exact operating mode, the supplementary partalong with some components of the approximate part isinvoked.
Proposed Dual-Quality 4:2 Compressors:
The proposed DQ4:2Cs operate in two accuracy modes ofapproximate and exact. The general block diagram of thecompressors is shown in Fig. 3. The diagram consists oftwo main parts of approximate and supplementary. Duringthe approximate mode, only the approximate part is exploitedwhile the supplementary part is power gated. During theexact operating mode, the supplementary and some parts ofthe approximate parts are utilized. In the proposed structure,to reduce the power consumption and area, most of thecomponents of the approximate part are also used during theexact operating mode. We use the power gating technique toturn OFFthe unused components of the approximate part. Alsonote that, as is evident from Fig. 3, in the exact operatingmode, tristate buffers are utilized to disconnect the outputs ofthe approximate part from the primary outputs.
Figure 3: Block diagram of the proposed approximate 4:2 compressors. The hachured box in the approximate part indicates the components, which are not shared between this and supplementary parts
1)Structure 1 (DQ4:2C1):For the approximate part of thefirst proposed DQ4:2C structure, as shown in Fig. 4(a), theapproximate output carry (i.e.,carry’) is directly connectedto the input x4(carry’=x4), and also, in a similar approach,the approximate output sum (i.e.,sum’) is directly connected toinputx1(sum’=x1). In the approximate part of this structure,the output Cout is ignored. While the approximate part of thisstructure is considerably fast and low power, its error rate islarge (62.5%).
Figure 4: (a) Approximate part and (b) overall structure of DQ4:2C1
The supplementary part of this structure is an exact4:2 compressor. The overall structure of the proposed structureis shown in Fig. 4(b). In the exact operating mode, thedelay of this structure is about the same as that of the exact4:2 compressor.
2) Structure 2 (DQ4:2C2): In the first structure, whileignoring Coutsimplified the internal structure of the reductionstage of the multiplication, its error was large. In the secondstructure, compared with the DQ4:2C1, the output Cout isgenerated by connecting it directly to the input x3 in theapproximate part. Fig. 5 shows the internal structure of theapproximate part and the overall structure of DQ4:2C2. Whilethe error rate of this structure is the same as that of DQ4:2C1,namely, 62.5%, its relative error is lower.
Figure 5: (a) Approximate part and (b) overall structure of DQ4:2C2.
3) Structure3(DQ4:2C3): The previous structures, in theapproximate operating mode, had maximum power and delayreductions compared with those of the exact compressor.In some applications, however, a higher accuracy may beneeded. In the third structure, the accuracy of the approximateoperating mode is improved by increasing the complexityof the approximate part whose internal structure is shownin Fig. 6(a). In this structure, the accuracy of output sum’ is increased. Similar to DQ4:2C1, the approximate part of thisstructure does not support outputCout. The error rate of thisstructure, however, is reduced to 50%.
Figure 6: (a) Approximate part of DQ4:2C3 and (b) overall structure of DQ4:2C3
4) Structure4(DQ4:2C4):In this structure, we improve theaccuracy of the outputcarry’ compared with that of DQ4:2C3at the cost of larger delay and power consumption where theerror rate is reduced to 31.25%. The internal structure of theapproximate part and the overall structure of DQ4:2C4 areshown in Fig. 7. The supplementary part is indicated by reddashed line rectangular while the gates of the approximate part,poweredOFFduring the exact operating mode, are indicatedby the blue dotted line.
Figure 7: (a) Approximate part of DQ4:2C4 and (b) overall structure of DQ4:2C4
Advantages:
- Delay is less
- Power consumption is less
Software implementation:
- Modelsim
- Xilinx ISE