RoBA Multiplier: A Rounding-Based ApproximateMultiplier for High-Speed yet Energy-EfficientDigital Signal Processing

Abstract:

In this paper, we propose an approximate multiplier that is high speed yet energy efficient. The approach is to round the operands to the nearest exponent of two. This way the computational intensive part of the multiplication is omitted improving speed and energy consumption at the price of a small error. The proposed approach is applicable to both signed and unsigned multiplications. We propose three hardware implementations of the approximate multiplier that includes one for the unsigned and two for the signed operations. The efficiency of the proposed multiplier is evaluated by comparing its performance with those of some approximate and accurate multipliers using different design parameters. In addition, the efficacy of the proposed approximate multiplier is studied in two image processing applications, i.e., image sharpening and smoothing. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Existing System:

Some of the previous works in the field ofapproximate multipliers are briefly reviewed. In an approximate multiplier and an approximate adder based on a technique named broken-array multiplier (BAM) were proposed.By applying the BAM approximation method tothe conventional modified Booth multiplier, an approximatesigned Booth multiplier was presented. The approximatemultiplier provided power consumption savings form 28% to58.6% and area reductions from19.7% to 41.8% for differentword lengths in comparison with a regular Booth multiplier.Kulkarnietal.suggested an approximate multiplier consisting of a number of 2×2 inaccurate building blocks thatsaved the power by 31.8%–45.4% over an accurate multiplier.An approximate signed 32-bit multiplier for speculationpurposes in pipelined processors was designed. It was20% faster than a full-adder-based tree multiplier while havinga probability of error of around 14%.An error-tolerantmultiplier, which computed the approximate result by dividingthe multiplication into one accurate and one approximate part,was introduced, in which the accuracies for different bit widthswere reported. In the case of a 12-bit multiplier, a power savingof more than 50% was reported. In two approximate 4:2compressors for utilizing in a regular Dadda multiplier weredesigned and analyzed.

The use of approximate multipliers in image processingapplications, which leads to reductions in power consumption,delay, and transistor count compared with those of an exactmultiplier design, has been discussed in the literature. In ,an accuracy-configurable multiplier architecture (ACMA) wassuggested for error-resilient systems. To increase its throughput, the ACMA made use of a technique called carry-inprediction that worked based on a precomputation logic.When compared with the exact one, the proposed approximatemultiplication resulted in nearly 50% reduction in the latencyby reducing the critical path. Also, Bhardwaj et al. presented an approximate Wallace tree multiplier (AWTM).Again, it invoked the carry-in prediction to reduce the criticalpath. In this work, AWTM was used in a real-time benchmarkimage application showing about 40% and 30% reductions inthe power and area, respectively, without any image qualityloss compared with the case of using an accurate Wallace treemultiplier (WTM) structure.

Disadvantages:

  • Complex design

Proposed System:

Multiplication Algorithm of RoBA Multiplier:

The main idea behind the proposed approximate multiplieris to make use of the ease of operation when the numbersare two to the powern(2n). To elaborate on the operationof the approximate multiplier, first, let us denote the roundednumbers of the input ofAandBbyAr andBr, respectively.The multiplication ofAbyBmay be rewritten as

A×B=(Ar −A)×(Br −B)+Ar ×B+Br ×A−Ar ×Br.                                                                  (1)

The key observation is that the multiplications ofAr ×Br,Ar×B,andBr×Amay be implemented just by the shift operation. The hardware implementation of (Ar −A)×(Br −B),however, is rather complex. The weight of this term inthe final result, which depends on differences of the exactnumbers from their rounded ones, is typically small. Hence,we propose to omit this part from (1), helping simplify themultiplication operation. Hence, to perform the multiplicationprocess, the following expression is used:

A×B∼=Ar ×B+Br ×A−Ar ×Br.                                                                                              (2)

Thus, one can perform the multiplication operation usingthree shift and two addition/subtraction operations. In thisapproach, the nearest values for Aand Bin the form of 2nshould be determined. When the value of A(or B) is equalto the 3 ×2p−2(where p is an arbitrary positive integerlarger than one), it has two nearest values in the form of2nwith equal absolute differences that are 2pand 2p−1. Whileboth values lead to the same effect on the accuracy of theproposed multiplier, selecting the larger one (except for thecase of p=2) leads to a smaller hardware implementationfor determining the nearest rounded value, and hence, it isconsidered in this paper. It originates from the fact that thenumbers in the form of 3×2p−2are considered as do notcare in both rounding up and down simplifying the process,and smaller logic expressions may be achieved if they are usedin the rounding up.

Figure 1: Block diagram for the hardware implementation of the proposed multiplier

Hardware Implementation of RoBA Multiplier:

Based on (2), we provide the block diagram for the hardwareimplementation of the proposed multiplier in Fig. 1 wherethe inputs are represented in two’s complement format. First,the signs of the inputs are determined, and for each negativevalue, the absolute value is generated. Next, the rounding blockextracts the nearest value for each absolute value in the formof 2n. It should be noted that the bit width of the output ofthis block is n(the most significant bit of the absolute valueof ann-bit number in the two’s complement format is zero).To find the nearest value of inputA, we use the following equation to determine each output bit of the rounding block:

Table 1: ALLPOSSIBLECASES FORAr ×Br ANDAr ×B+Br ×AVALUES

In the proposed equation, Ar[i] is one in two cases. In thefirst case,A[i] is one and all the bits on its left side are zerowhileA[i −1] is zero. In the second case, when A[i]andall its left-side bits are zero, A[i −1] andA[i −2] are bothone. Having determined the rounding values, using three barrelshifter blocks, the products Ar×Br, Ar×B,andBr×Aarecalculated. Hence, the amount of shifting is determined basedon logAr2 −1(orlogBr2 −1)in the case of A(or B) operand.Here, the input bit width of the shifter blocks isn, while theiroutputs are 2n.

A single 2n-bit Kogge-Stone adder is used to calculate thesummation of Ar ×BandBr ×A. The output of this adderand the result of Ar ×Br are the inputs of the subtractorblock whose output is the absolute value of the output of theproposed multiplier. BecauseArandBr are in the form of 2n,the inputs of the subtractor may take one of the three inputpatterns shown in Table I. The corresponding output patternsare also shown in Table I.

The forms of the inputs and output inspired us to conceivea simple circuit based on the following expression:

Out=(PXORZ)AND({(P<<1)XOR(PXORZ)}or {(PANDZ)<<1})                                                 (4)

WherePis Ar ×B+Br ×AandZis Ar ×Br. The corresponding circuit for implementing this expression is smallerand faster than the conventional subtraction circuit.

Advantages:

  • Design complexity is less

Software implementation:

  • Modelsim
  • Xilinx ISE