

# International Journal of VLSI System Design and Communication Systems

ISSN 2322-0929 Vol.03, Issue.08, October-2015, Pages:1213-1216

# Realization of Multi-Operand Adders Based 64-Bit Modified Wallace MAC PONNANA PRABHAVATHI<sup>1</sup>, P. ASHOK<sup>2</sup>

<sup>1</sup>PG Scholar, Avanthi Institute of Engineering and Technology, Visakhapatnam, AP, India, E-mail: prabha433.p@gmail.com. <sup>2</sup>Asst Prof, Avanthi Institute of Engineering and Technology, Visakhapatnam, AP, India, E-mail: ashok.pentaece@gmail.com.

**Abstract:** MAC unit is an inevitable component in many digital signal processing (DSP) applications involving multiplications and/or accumulations. MAC unit is used for high performance digital signal processing systems. The DSP applications include filtering, convolution, and inner products. Multiplication-and-accumulate operations are typical for digital filters. Therefore, the functionality of the MAC unit enables high-speed filtering and other processing typical for DSP applications. Since the MAC unit operates completely independent of the CPU, it can process data separately and thereby reduce CPU load. The application like optical communication systems which is based on DSP, require extremely fast processing of huge amount of digital data. A MAC unit consists of a multiplier and an accumulator containing the sum of the previous successive products. The MAC inputs are obtained from the memory location and given to the multiplier block. The design consists of 64 bit modified Wallace multiplier, 128 bit carry save adder and a register.

Keywords: MAC, Carry Save Adder, Modified Wallace.

# I. INTRODUCTION

Multi-input addition is an important operation for many DSP and video processing applications. On FPGAs, multiinput addition has traditionally been implemented using trees of carry-propagate adders. This approach has been used because the traditional lookup table (LUT) structure of FPGAs is not amenable to compressor trees, which are used to implement multi-input addition and parallel multiplication in ASIC technology. In prior work, we developed a greedy heuristic method to map compressor trees onto the general logic of an FPGA. Although redundant addition is widely used to design parallel multi operand adders for ASIC implementations, the use of redundant adders on Field Programmable Gate Arrays (FPGAs) has generally been avoided. Multiplier unit is an inevitable component in many digital signal processing (DSP) applications involving multiplications. Modified Wallace multiplier unit is used for high performance digital signal processing systems. The DSP applications include filtering, convolution, and inner products. Most of digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transforms (DWT). Because they are basically accomplished by repetitive application of multiplication and addition, the speed of the Multiplication and addition arithmetic determines the execution speed and performance of the entire calculation. Multiplication-andaccumulate operations are typical for digital filters. Therefore, the functionality of the Multiplier unit enables high-speed filtering and other processing typical for DSP applications.

# **II. COMPRESSOR**

A multiplier is one of the key hardware blocks in most digital and high performance systems such as FIR filters, digital signal processor, microprocessors etc. With advances in technology, many researchers have tried and strive to design multipliers which offer either of the following- high speed, low power consumption, less area combination of them in multipliers, thus making them compatible for various high speed, low power, and compact VLSI implementations. However, area and speed are two conflicting constraints. Therefore, improving speed always results in larger area. The most efficient multiplier structure will vary depending on the throughput requirement of the application. The first step of the design process is the selection of the optimum circuit structure as shown in Fig.1.



Fig.1. 3:2 Compressor block diagram.

The combined factors of low power, low transistor count and minimum delay makes the 5:2 and 4:2 compressors, the appropriate choice. In these compressors, the outputs generated at each stage are efficiently used by replacing the XOR blocks with multiplexer blocks .The select bits to the multiplexers are available much ahead of the inputs so that the critical path delay is minimized. The various adder structures in the conventional architecture are replaced by compressors.

#### **III. MODIFIED WALLACE IMPLEMENTATION**

A modified Wallace multiplier is an efficient hardware implementation of digital circuit multiplying two integers. Generally in conventional Wallace multipliers many full adders and half adders are used in their reduction phase. Half adders do not reduce the number of partial product bits. Therefore, minimizing the number of half adders used in a multiplier reduction will reduce the complexity. Hence, a modification to the Wallace reduction is done in which the delay is the same as for the conventional Wallace reduction. The modified reduction method greatly reduces the number of half adders with a very slight increase in the number of full adders. Reduced complexity Wallace multiplier reduction consists of three stages. First stage the N x N product matrix is formed and before the passing on to the second phase the product matrix is rearranged to take the shape of inverted pyramid. During the second phase the rearranged product matrix is grouped into non-overlapping group of three as shown in the fig.2, single bit and two bits in the group will be passed on to the next stage and three bits are given to a full adder. The number of rows in the in each stage of the reduction phase is calculated by the formula





If the value calculated from the above equation for number of rows in each stage in the second phase and the number of row that are formed in each stage of the second phase does not match, only then the half adder will be used. The final product of the second stage will be in the height of two bits and passed onto the third stage. During the third stage the output of the second stage is given to the carry propagation adder to generate the final output. Thus 64 bit modified Wallace multiplier is constructed and the total number of stages in the second phase is 10. As per the equation the number of row in each of the 10 stages was calculated and the use of half adders was restricted only to the 10<sup>th</sup>stage. The total number of half adders used in the second phase is 8 and the total number of full adders that was used during the second phase is slightly increased that in the conventional Wallace multiplier. Since the 64 bit modified Wallace multiplier is difficult to represent, a typical l0-bit by 10-bitreduction shown in fig.2 for understanding. The modified Wallace tree shows better performance when carry save adder is used in final stage instead of ripple carry adder as shown in Fig.3. The carry save adder which is used is considered to be the critical part in the multiplier because it is responsible for the largest amount of computation.



Fig.3. Flow chart for Modified Wallace Multiplier.

#### **IV. MAC OPERATION**

The Multiplier-Accumulator (MAC) operation is the key operation not only in DSP applications but also in multimedia information processing and various other applications. As mentioned above, MAC unit consist of multiplier, adder and register/accumulator. In this paper, we used 64 bit modified Wallace multiplier. The MAC inputs are obtained from the memory location and given to the multiplier block. This will be useful in 64 bit digital signal processor. The input which is being fed from the memory location is 64 bit. When the input

International Journal of VLSI System Design and Communication Systems Volume.03, IssueNo.08, October-2015, Pages: 1213-1216

#### Realization of Multi-operand Adders based 64-Bit Modified Wallace MAC

is given to the multiplier it starts computing value for the given 64 bit input and hence the output will be128 bits. The multiplier output is given as the input to carry save adder which performs addition. The function of the MAC unit is given by the following equation:

$$\mathbf{F} = \sum \mathbf{P}_i \mathbf{Q}_i \tag{1}$$

The output of carry save adder is 129 bit i.e. one bit is for the carry (128bits+ 1 bit). Then, the output is given to the accumulator register. The accumulator register used in this design is Parallel In Parallel Out (PIPO). Since the bits are huge and also carry save adder produces all the output values in parallel, PIPO register is used where the input bits are taken in parallel and output is taken in parallel. The output of the accumulator register is taken out or fed back as one of the input to the carry save adder. The fig.4 shows the basic architecture of MAC unit.



Fig.4. Basic Architecture of MAC unit.

**V. RESULTS** Results of this paper is as shown in bellow Figs. 5 to 7.



Fig.5.Schematic View.



Fig.6. RTL Schematic View.



Fig.7.Waveform.

# VI. CONCLUSION

The Design of high performance 64 bit Multiplier-and-Accumulator (MAC) was implemented in this paper. The total MAC unit operates at a frequency of 217 MHz's with a total power dissipation of 177.732 mW. Since the delay of 64 bit MAC is less, this design can be used in the system which requires high performance in processors involving large number of bits of the operation. The functionality of the MAC is verified using XILINX ISE 12.3i and synthesized using XILINX synthesizer.

# **VII. REFERENCES**

[1]. Javier Hormigo, Julio Villalba, Member, IEEE, and Emilio L. Zapata "Multioperand Redundant Adders on FPGAs "IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 10, OCTOBER 2013.

[2]. Young-Ho Seo and Dong-Wook Kim, "New VLSIArchitecture of Parallel Multiplier-Accumulator Based on Radix-2Modified Booth Algorithm," IEEE Transactions on very largescale integration (vlsi) systems, vol. 18, no. 2,february 20 10

[3]. Ron S. Waters and Earl E. Swartzlander, Jr., "A ReducedComplexity Wall ace Multiplier Reduction, " IEEE TransactionsOn Computers, vol. 59, no. 8, Aug 20 10.

[4]. C. S. Wallace, "A suggestion for a fast multiplier," IEEETrans. ElectronComput., vol. EC-13, no. I, pp. 14-17, Feb. 1964.

[5]. Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni, "Designand VLST Implementation of Pipelined Multiply AccumulateUnit," IEEE International Conference on Emerging Trends inEngineering and Technology, ICETET-09.

[6]. B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, "ASICImplementation of Modified Faster Carry Save Adder ", EuropeanJournal of Scientific Research, Vol. 42, Issue 1, 2010.

[7]. R.UMA, Vidya Vijayan, M. Mohanapriya and Sharon Paul,"Area, Delay and Power .Comparison of Adder Topologies",International Journal of VLSI design & Communication Systems(VLSICSj Vo1.3, No.1, February 2012.

[8]. V. G. Oklobdzija, "High-Speed VLSI Arithmetic Units:Adders and Multipliers", in "Design of High-PerformanceMicroprocessor Circuits", Book edited by A.Chandrakasan,IEEEPress,2000.

[9]. Dadda, "Some Schemes for Parallel Multipliers," AltaFrequenza, vol. 34, pp. 349-356, 1965.

[10]. C.S. Wall ace "A Suggestion for a fast multipliers," IEEETrans. Electronic Computers, vol. 13, no.l,pp 14-17, Feb. 1967.

[ 11]. L.Dadda, "On Parallel Digital Multiplier", Alta Frequenza, vol. 45, pp. 574-580, 1976.