Comparison of Vedic Multiplier with Conventional Array and Wallace Tree Multiplier

ANU THOMAS¹, ASHLY JACOB², SERIN SHIBU³, SWATHI SUDHAKARAN⁴
¹B.Tech Scholar, Saintgits College of Engineering, Kottayam, KL, India, E-mail: anumt05@gmail.com.
²B.Tech Scholar, Saintgits College of Engineering, Kottayam, KL, India, E-mail: ashlyjacob@yahoo.com.
³B.Tech Scholar, Saintgits College of Engineering, Kottayam, KL, India, E-mail: serinsara29@gmail.com.
⁴B.Tech Scholar, Saintgits College of Engineering, Kottayam, KL, India, E-mail: swathipanachoor@gmail.com.

Abstract: In the design of most high performance systems such as FIR filters, image processing, microprocessors multipliers are the integral components and multiplication is the important fundamental function in arithmetic operations. A CPU (central processing unit) devotes a considerable amount of processing time for performing arithmetic operations. High speed processors depends greatly on the multipliers. Multiplication requires substantially more hardware resources and processing time than addition and subtraction. Since multiplication dominates the execution time of most DSP algorithms, there is a need of high speed multiplier. Currently the speed of the multipliers is limited by the speed of the adders used for partial product addition. In this paper, we proposed a high speed 8-bit multiplier using a Vedic Mathematics (Urdhva Tiryagbyham sutra) [1]for generating the partial products. And comparison of this 8-bit Vedic multiplier using RCA with conventional array multiplier, Wallace tree multiplier and 8-bit Vedic multiplier using CLA. Simulation and synthesise of the design is carry out using Xilinx ISE 14.7.

Keywords: Vedic Multiplier, Urdhva Tiryagbhayam, Array Multiplier; Wallace Tree Multiplier.

I. INTRODUCTION

The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications. Higher throughput arithmetic operations are important to achieve the desired performance in many real-time signal and image processing applications. One of the key arithmetic operations in such applications is multiplication and the development of fast multiplier circuit has been a subject of interest over decades. Multiplication process is used in many applications like instrumentaiton and measurement, communications, audio and video processing, animations, special effect, Graphics, image enhancement, Navigation, radar, GPS, and control applications like robotics, machine vision etc. Reducing the time delay and power consumption are very essential requirements for these applications. This work presents different multiplier architectures such as basic array multiplier, Wallace tree multiplier and Vedic multiplier. Multiplier based on Vedic Mathematics is one of the fast and low power multiplier. The speed of multiplication operation is of great importance in DSP as well as in general processor. In the past multiplication was implemented generally with a sequence of addition, subtraction and shift operations. There have been many algorithms proposals in literature to perform multiplication, each offering different advantages and having trade-off in terms of speed, circuit complexity, area and power consumption. Digital multipliers are the core components of all the digital signal processors (DSPs) and the speed of the DSP is largely determined by the speed of its multipliers. The most common multiplication algorithms followed in the digital hardware is array multiplication algorithm.

The computation time taken by the array multiplier is comparatively less because the partial products are calculated independently in parallel. The delay associated with the array multiplier is the time taken by the signals to propagate through the gates that form the multiplication array. Another multiplication algorithm used is Wallace tree multiplication algorithm, having Parallel multipliers which uses carry save addition algorithm which include generation of partial products, Accumulation of partial products and Final addition phase. In this paper we compare vedic multiplier with array and Wallace multiplier. Vedic mathematics is the ancient Indian system of mathematics, and is mainly based on sixteen principles or word-formulae which are termed as Sutras. Among these 16 sutras two sutras are for multiplication urdhva tiryagbhayam and Nikhilam Sutra. It has been found that Urdhva tiryagbhayam Sutra is most efficient Sutra (Algorithm), giving minimum delay for multiplication of all types of numbers, either small or large. Integrating Vedic mathematics for the multiplier design will enhance the speed of multiplication operation. The multiplier architecture is based on Urdhva Tiryagbhayam (vertical and cross-wise algorithm) sutra[2]. By using Urdhva Tiryagbhayam Sutra in binary multiplication, the numbers of steps required to calculate the final product will be reduced and hence there is a reduction in computational time and increase in speed of the multiplier.
II. EXISTING SYSTEMS

The common multiplication method is add and shift algorithm. In parallel multipliers number of partial products to be added is the main parameter that determines the performance of the multiplier. To reduce the number of partial products to be added, Modified Booth algorithm[3] is one of the most popular algorithms. To achieve speed improvements Wallace Tree algorithm can be used, here the number of sequential adding stages can be reduced. However with increasing parallelism, the amount of shifts between the partial products and intermediate sums to be added will increase which may result in reduced speed, increase in silicon area due to irregularity of structure and also increased power consumption due to increase in interconnect resulting from complex routing. On the other hand, serial-parallel multipliers compromise speed to achieve better performance for area and power consumption. The selection of a parallel or serial multiplier actually depends on the nature of application. In this lecture we introduce the multiplication algorithms and architecture and compare them in terms of speed, power and combination of these metrics.

A. Array Multiplier

Array multiplier[4] is well known due to its regular structure. Multiplier circuit is based on add and shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial product are shifted according to their bit orders and then added. The addition can be performed with normal carry propagate adder. N-1 adders are required where N is the multiplier length. In this paper conventional array multiplier is synthesized using 16T full adder cell. Consider the multiplication of two unsigned n-bit numbers, where A=An-1, An-2,....A0 is the multiplicand and B=Bn-1, Bn-2,....B0 is the multiplier. The product of these two bits can be written as S7S6S5S4S3S2S1S0, where S0 is the LSB and S7 is the MSB. Fig.1 shows multiplication of a 4x4 array multiplier using carry save adders.

\[
\begin{array}{c|c|c|c|c}
A & B & A0 & A1 & A2 & A3 \\
B0 & B1 & B2 & B3 & & \\
\hline
A0B0 & A1B0 & A2B0 & A3B0 & & \\
A0B1 & A1B1 & A2B1 & A3B1 & & \\
A0B3 & A1B3 & A2B3 & A3B3 & & \\
\hline
S7 & S6 & S5 & S4 & S3 & S2 & S1 & S0
\end{array}
\]

Fig.1. 4x4 Array Multiplication.

The conventional array multiplier uses carry save addition to add the products. In the carry save addition method, the first row will be either half adders or full adders. If the first row of the partial products is implemented with full adders, Cn will be considered “0”. Then the carries of each full adder can be diagonally forwarded to the next row of the adder. The resulting multiplier is said to be carry save array multiplier as the carry bits are not immediately added but rather saved for the next stage of addition. Hence the name carry save multiplier. In the design if the full adders have two input data the third input is considered as zero. The final adder which is used to add carries and sums of the multiplier is removed. Then the carries of the multiplier at the final stage is carefully added to the inputs of the multiplier. The carry of the fourth column of the multiplier is given to the input of the fifth column instead of zero. Then the carry of the fifth column is forwarded to the input of the sixth column and so on.

B. Wallace Tree Multiplier

The currently existing system is a normal Wallace tree multiplier[5]. The Wallace tree multiplier is considered as faster than a simple array multiplier and is an efficient implementation of a digital circuit which multiplies two integers. A Wallace tree multiplier is a parallel multiplier which uses the carry save addition algorithm to reduce the propagation delay. There are many researchers have been worked on the design of increasingly more efficient multipliers. They aim at achieving higher speed and lower power consumption even while occupying reduced silicon area.

![Image](image.png)

Fig.2. Logic for Wallace tree multiplier.

A fast process for multiplication of two numbers was developed by Wallace. Using this method, a three step process is used to multiply two numbers, first the bit products are formed, the bit product matrix is reduced to a two row matrix where sum of the row equals the sum of bit products, and the two resulting rows are summed with a fast adder to produce a final product. In the Wallace tree method, three bit signals are passed to a one bit full adder which is called a three input Wallace tree circuit, and the output signal(sum signal) is supplied to the next stage full adder of the same bit, and the carry output signal is supplied to the next stage of the full adder located at a one bit higher position. Fig. 2 shows the logic used for Wallace tree multiplication.
C. Limitations of Existing Systems

An array multiplier is one of the most basic parallel multiplier circuits. It has a regular structure and its pipelined architecture is easy to design. But the major limitation of an array multiplier is its size. As operand size increase, array grow in size at a rate equal to the square of the operand size. This increase in area result in an increase in power consumption. Also the large size will increase. The propagation delay which is \(O(\log n^2)\). The Wallace tree multiplier is an adder tree comprising of carry save adders. It is used to obtain quick reduction of partial products. From various denominations, the Wallace tree algorithm is one of the most efficient algorithm to be employed in digital multipliers. But because of its non regularity, the layout of Wallace tree multipliers suffers from a large area wastage. Also the delay offered by the various computational units such as half adders, full adders and the final stage ripple carry adder will increase proportionally with the size of the multiplier. So it is necessary to substitute the above mentioned units with alternate computational units which provide a considerable reduction in propagation delay which is otherwise equal to \(O(\log n)\), where \(n\) is the number of bits in the multiplier.

III. PROPOSED METHOD

The proposed method is based on techniques derived from an ancient system of mathematics called Vedic Mathematics. Vedic Mathematics is the name given to an ancient system of calculation which was rediscovered from the Vedas between 1911 and 1918 by Sri Bharati Krishna Tirthaji Maharaj (1884-1960). The methods of Vedic Mathematics give a quicker way to solve supposedly any mathematical problem. Vedic Mathematics is said to manifest the coherent and unified structure of arithmetic, and its methods are complementary, direct and easy. Vedic Mathematics also has the advantage of being flexible since solutions can be obtained by a number of methods and is not limited to a single method. According to Tirthaji, Vedic Mathematics is based on sixteen Sutras, or aphorisms, as algorithms and their upa-sutras or corollaries derived from these Sutras. The sixteen sutras and their corollaries are as follows:

- Ekadhikina Purvena -By one more than the previous one (Cor: Anurupyena)
- Nikhilam Navatashcaramam Dashatah -All from 9 and the last from 10 (Cor: Sisyte Sesamasjnah)
- Urdhva-Tiryagbyham-Vertically and crosswise (Cor: Adyamadynantyamantyena)
- Paraavartya Yojayet-Transpose and adjust (Cor: Kevalaih Saptakam Gunyat)
- Shunyam Saamyasamuccaye-When the sum is the same, that sum is zero. (Cor: Vestanam)
- (Anurupye) Shunyamanyat-If one is in ratio, the other is zero (Cor: Yavadunam Tavadunam)
- Sankalana-vyavakalanabhayam-By addition and by subtraction (Cor:Yavadunam Tavadunikritya Varga Yojayet)
- Puranapuranabyham-By the completion or non-completion (Cor: Antyayordashake)
- Chalana-Kalanabyham-Differences and Similarities (Cor: Antyayoreva)
- Yaavadunam-Whatever the extent of its deficiency (Cor: Samuccayagunitah)
- Vyashitsamanstih-Part and Whole (Cor: Lopanasthapanabhyam)
- Shesanyankena Charamenae-The remainders by the last digit (Cor: Vilokanam)
- Sopaantyadvayamantyam-The ultimate and twice the penultimate (Cor: Gunitasamuccayah Samuccayagunitah)
- Ekanyunanam Purvena-By one less than the previous one (Cor: Dhvajanka)
- Gunitasamuchyah-The product of the sum is equal to the sum of the product (Cor: Dwandwa Yoga)
- Gunakasamuchyah-The factors of the sum is equal to the sum of the factors

The proposed method is based on the Urdhva-Tiryagbyham (Vertically and crosswise)[6] Sutra. This method can be illustrated using the following example as shown in Fig.3:

![Fig.3. Illustration of Vertical and Crosswise algorithm.](image)

A. Generalized Algorithm For NxN Bit Multiplier

![Fig.4. NxN Multiplication](image)

In general, if both the inputs are N-bit binary numbers, both are first split[7] into two halves. In fig. 4, \(A_L\) and \(B_L\) are the...
halves containing the lower bits and \(A_M\) and \(B_M\) are the halves containing the higher bits. Vertical and Crosswise algorithm can then be directly used on these sub-groups as shown in figure. \(A_L \times B_L\) is the first vertical result, \((A_M \times B_L + A_L \times B_M)\) is the crosswise result, and finally \(A_M \times B_M\) is the last vertical result. Carries are forwarded to the respective next stages.

**B. Vedic Multiplier Architecture**

The 8x8 vedic multiplier consists of 4x4 vedic multipliers, 2x2 vedic multipliers and suitable adders. The 2x2 vedic multiplier implementation is the same as a normal 2x2 multiplier. It uses AND gates and half adders as shown in Fig.5.

![Fig. 5. 2x2 bit multiplication.](image)

The 4x4 multiplier is implemented as illustrated as shown in Fig.6.

![Fig. 6. 4x4 bit multiplication.](image)

In this case, \(A_1A_0\) forms \(A_L\), \(A_3A_2\) forms \(A_M\), and similarly \(B_1B_0\) forms \(B_L\), \(B_3B_2\) forms \(B_M\) of the two 4-bit inputs. Firstly, the vertical result of \(A_L\) and \(B_L\) is computed by passing it through a 2x2 vedic multiplier. Then, the crosswise result of \((A_M \times B_L + A_L \times B_M)\) is computed using 2x2 multipliers and a 4-bit adder[8]. Finally the vertical result of \(A_M\) and \(B_M\) are computed using a 2x2 multiplier. The 8x8 multiplier is implemented as illustrated as shown in Fig.7.

![Fig.7. 8x8 bit Vedic Multiplier.](image)

The implementation follows similar vertical and crosswise steps, but instead uses 4x4 vedic multipliers and 8-bit adders. Ripple carry adders were replaced with carry look-ahead adders to give lower path delays.

**IV. RESULTS**

Table 1 shows comparison of array, Wallace tree multiplier with Vedic multiplier in terms of delay and number of LUTs.

<table>
<thead>
<tr>
<th>Multiplier Type</th>
<th>No of LUTs</th>
<th>Processing delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Array Multiplier</td>
<td>126</td>
<td>44</td>
</tr>
<tr>
<td>Wallace tree multiplier</td>
<td>133</td>
<td>39</td>
</tr>
<tr>
<td>Vedic multiplier with RCA</td>
<td>166</td>
<td>34</td>
</tr>
<tr>
<td>Vedic multiplier with CLA</td>
<td>167</td>
<td>30</td>
</tr>
</tbody>
</table>

Fig. 8 shows the Simulation result of 8x8 Vedic Multiplier using CLA.

![Fig. 8. Simulation Result.](image)
V. CONCLUSION

This paper presents a highly efficient method of multiplication “Urdhva Tiryakbhyam Sutra” based on Vedic mathematics. It gives us method for modular multiplier design and clearly indicates the computational advantages offered by Vedic methods. The computational path delay for the proposed new 8*8 Vedic multiplier is found to be less as compared to other multipliers like 8*8 array multiplier and 8*8 Wallace tree multiplier. Hence our motivation to reduce delay is finely fulfilled. Therefore, we observed that the new Vedic multiplier is much more efficient than Array and Wallace tree multiplier in terms of execution time (speed). Effective memory implementation and deployment of memory compression algorithms can yield even better results. Vedic mathematics is long been known but has not been implemented in the DSP and ADSP processors employing large number of multiplications in calculating the various transforms like FFTs and control applications such as P, PI, PID Controller implementing in FPGA etc. An awareness of Vedic mathematics can be effectively increased if it is included in engineering education. In future, all the major universities may set up appropriate research centers to promote research works in Vedic mathematics.

VI. REFERENCES


