VLSI Design of 64bit × 64bit High Performance Multiplier with Redundant Binary Encoding

**Abstract:**

For multiplier dominated applications such as digital signal processing, wireless communications, and computer applications, high speed multiplier designs has always been a primary requisite. In this paper a high performance 16×16 bit redundant binary (RB) multiplier have been designed by using recently proposed redundant binary encoding approach to eliminate the error correcting word and a delay efficient parallel prefix Ling adder for final redundant binary to normal binary (RB-NB) conversion. Since redundant binary (RB) representation allows carry-free addition and adaptability, it has been used in 16×16 bit high-performance RB multiplier design for summation of partial product terms. The design of multiplier also reduces redundant partial product accumulation stage when eliminating the error correcting word which improves the complexity and the critical path delay. The performance of RB multiplier design compared with conventional RB modified booth encoding multiplier (CRBMBE). The comparison is based on synthesis result obtained by synthesizing both multiplier architectures targeting a Xilinx FPGA in terms of area and delay analysis. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

**Enhancement of the project:**

In the proposed design use the ripple carry adder for replace the redundant binary adder and convertor design

**Existing System:**

For multiplier dominated applications such as wireless communications, computer applications, and digital signal processing, conventional high speed multiplier are no longer capable of providing the high speed computational needs at low power requirements of the portable devices. Thus to accelerate the overall performance of the systems, designing high performance multipliers has always been one of the main objectives for system designers. To achieve the multiplication process more efficiently, numerous high performance algorithms and architectures have been proposed. In general, multiplication of two operands can be carried out in three steps. In the first step partial product terms are generated. In the second step pair wise summation of partial product rows is perform until only two partial product rows are remain. In the third step the two remaining rows are added up by fast carry-propagate adder to generate the final product.

In the first step, Radix-4 Modified Booth Encoding (MBE) is used to great extent to reduce the total number of partial product rows to half. Partial product reduction schemes, suchas Wallace trees or compressor trees are embrace for fast summation of partial product rows in the second step. In the third step, the two rows from partial product summing tree are added by means of fast carry-look ahead or carry select adders to obtain the final multiplication product. To obtain an efficient multiplier, numerous approaches have been proposed in the past years to develop the above three steps. Modified Booth encoding scheme reduces the total partial product terms to half and thus asset the multiplier design by reducing the size and enhancing the speed of the summing tree. The advance predominant multiplier design employs compressors in partial product summing tree for high-speed parallel computation. This method is useful in eliminating the intermediate carry propagation, but still there is a possibility of carry propagation from the least significant bit (LSB) position to the most significant bit (MSB) position in the final addition stage, which needs a speedy, complicated, and high power adder. The redundant binary system which provides carry propagation free addition can be used as an alternative to add partial product rows. However, for n bit multiplier, MBE and Redundant Binary (RB) encoding scheme requires an extra partial product bits to be added to generate the correct results and thus instead of generating partial product rows, MBE and RB encoding scheme generates partial product terms.

**Carry free RB addition**:

The RB number representation is frequently employed in high speed multipliers to achieve the reduction in partial product rows at the rate of 2:1. The redundant binary representation permits the existence of redundancy. Owing to this redundancy, carry-propagation free addition of two RB numbers can be perform in a constant time independent of the word length of operands. In most of the high speed multiplier designs, RB summing tree employs Redundant Binary Adder (RBA) as a unit cell to speed up the addition process. The carry-free addition rule for RBA is well explained, also one of the fast modified RB adder structures for quick addition of RB partial product terms is proposed by Makino et al.. The experimental results of proposed RBA array shows improved results over 4:2 compressors. RBA proposed in has been used to design the summing tree to carry out fast parallel addition of all the partial product rows in a constant time until only two RBPP rows are remain.

**RB to NB convertor**:

RB summing tree adds all the partial product rows until only two RB partial product rows (p+,p-) are produce. To establish a proper communication with peripheral devices, this two RB rows needs to be converted to 2’s complement representation.

The RB to NB conversion requires adding two NB numbers. The addition of two NB numbers demands continuous carry propagation from the least significant bit position to the most significant bit position. The fast CLA are deployed in traditional RB-NB converters to enhance the conversion process. Several techniques to achieve RB-NB conversion has been proposed, which has the advantage over CLA’s. The parallel-prefix formulation of Ling adders has been proposed by Dimitrakopoulos et al., which is one of the fastest available structures to perform addition. The experimental results of paper reveal that compared to previous methods, this device achieve the delay reductions of approximately 10 percent to 14 percent. Due to the half fan out requirements and one-less logic level implementation, this adder exceeds the traditional Ladner-Fischer structure. To compensate the delay overheads introduced by Modified partial product generator the high speed adder structure proposed by Dimitrakopoulos et al. have been used in last conversion step of our high performance RB multiplier designs. The design of the multiplier is shows in the below figure.

**Disadvantages**:

- high area coverage

**Proposed System:**

Modified RB partial product generator:

The degree of computation to be performed by RB summing tree is highly depends upon Radix-4 Booth encoder and RB partial product generator. The total number of partial product terms that can saved by this stage determines the cost, performance, and power consumption of the summing tree and thus ultimately affects the overall multiplier design. Extra hardware is required to accumulate the error correction vector (extra partial product row). Also it can extent the number of accumulation stages in the RB summing tree for multipliers with exactly the power of two word length (e.g. 8×8,16×16,32×32 etc.). Some techniques have been proposed to generate more regular partial product rows. One such efficient technique to eliminate the error correction vector is recently proposed by Cui et al. In broad-brush, n-bit traditional RBMBE multiplier requires n/4 Redundant Binary Partial Product rows (RBPPR). An error-correction vector is also required as an additional partial product row by both RBE and the MBE to collectively represent the error correction bits from each partial product row. Therefore, the total number of redundant binary partial product accumulation stages (NRBPPAS) needed by 2^{N} bit multiplier is given by

One accumulation stage can be preserve, if the error correction vector is eliminated. Thus this elimination can eventually improve the performance and complexity of the multiplier design. One of the most effective techniques to eliminate the error correction vector is proposed by Cui et al.. In his proposed work, the error correction bits from each row are moved to its neighboring row. Furthermore, both the two most significant bits (MSBs) of the first partial product term and the two least significant bits (LSBs) of the last partial product term are modified to eliminate the extra error correction bits generated from the last partial product row. By using this technique one accumulation stage can be preserved in power of two word length multipliers. This architecture achieves reduction in area and power consumption with a modest reduction in speed (approximately 5%). The result in assures that the use of this RBPPG is useful in designing an area and PDP efficient RB multipliers.

**Ripple carry adder****(RCA)****:**

The Ripple carry adder is one of the types of adders which is built by using a one-bit full adder in series. The summation of two binary bits at any phase of the ripple carry is dependent on the 1-bit full adder. The carryout of first block is directly given to the carry-in of the next corresponding block. Ripple carry adder can be constructed by adding the n number of full adders or ripple carry adders of variable sizes may be added in series in order to sufficient space for binary vector chains of greater sizes. The n bit parallel adder needs n computational full adder blocks. An example of 8-bit ripple-carry adder is as shown in figure 3.8. It is framed of 8-bit full adders. Here A and B are the two inputs along with the input Cin. The sum(S) and carry out(Cout) are the outputs. Each bit summation generates a carry-out and a sum. The carry-out is then passed on to the carry in of the next corresponding higher order bit.

Figure 1: Regular 8-bit RCA architecture

Although RCA has very simple and easy architecture but it is not suitable in designing for large number of bit widths.

The major drawback of RCA is it has a higher delay since it has to wait for the carry-in to propagate from the previous adder block. Even though if the adder has a value at its output, but still it has to wait for the proliferation for the carry bit before the output influences to obtain a correct value and is shown in the below given figure 3.9. If one full adder takes T-fa seconds to go through its full operation, the final result will reach its steady-state value only after 8-Tfa seconds. Its area is n A-fa.

A very small and tiny perk up in area consumption can be attained only when if its first carry in (c0) is known in advance which will be always be zero. (If it is so, the first full adder block can be replaced by a half adder block). Usually, by assuming all the existing gates will be having the same and specific delay and area of NAND-2 represented by Tgate and Agate then this circuit will be having 3n Tgate delay and 5n-Agate. Where n is the number of existing full adders. One should be mindful that in static CMOs, this presumption is not trustworthy. Gate delay will b always bank on fanin delay + fanout delay + intrinsic delay.

**Advantages:**

- Area coverage is low

**Software implementation:**

- Modelsim
- Xilinx ISE