Multiplier optimization in VHDL

When FPGAs do not implement a multiplier HW macro

The modern FPGAs implement multiplication using a dedicated hardware resource. Such dedicated hardware resource generally implements 18×18 multiply and accumulate function. Many FPGAs use two 9×9 multiplier IP to implement a single 18×18 multiplier macro. It depends on the technology you are using.

Depending on the technology and FPGA you are using it is also possible to have no multiplier at all as dedicated IP. In this case, if you need to perform a multiplication it could be a problem. Multiplication is a very demanding operator in term of area and timing resources, so you need to pay attention to the operand number of bits in order to minimize the area and timing impact on your FPGA.

There are some particular cases in which is possible to optimize the multiplier number of bits.

In this post I want to address an example you can use as guideline for multiplier optimization.

Sine and Cosine quantization

A typical example where we can optimize the number of bits of a multiplier is when one of the two operands is the quantization of sine and cosine.

DISCLAIMER: this is a particular case it cannot be applied in all multiplication

The assumption is that we need to perform a multiplication like that:

M1 = op1 * sin(a)                      EQ1
M2 = op2 * cos(a)

Op1 and op2 quantized with N bit, sin(a), cos(a) quantized using K bits.

There is a very particular case where the angle “a” takes ONLY values on a restricted range, for instance, +/- 10°

In this case, cos(a) needs all the K bits but sin(a) need few bits. Let give a practical example with K=8

Sin(-/+10°) = -/+ 0.1736 
Cos(-/+10°) = 0.9848

Using 8 bit we have the quantized values:

round(127*Sin(+10°)) = +22 
round(128*Sin(-10°)) = -22
round(128*Cos(-/+10°)) = 126

as clear, even using 8 bits, for sine quantization the number of bits we really need are only 6 instead of 8, since using 6 bits we can range [-32..+31].

For cosine we have a problem. We need to use all the 8 bits!

Optimizing the cosine quantization

We realized that sine quantization ranges in [-22 .. + 22] so we need 6 bits but cosine quantization needs all the 8 bits.

As you know any number can be written as this equation:

C = (C-1) + 1                  EQ2

We will not win the Nobel prize for this equation, but it is useful for cosine quantization. Using Eq2 we can rewrite the cosine value as follow:

Cos(-/+10°) = (cos(-/+10) -1) + 1 =        EQ3
(0.9848 – 1) + 1 = -0.0152 + 1           

If we quantize the EQ2 we have:

round(128*(Cos(-/+10°)-1)
+ 128 = -2 +128

As clear for cosine quantization we need a number of bits much less than 8

VHDL implementation of multiplier optimization

The multiplications in EQ1 can be implemented in VHDL as follow.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity mult is
port (
	i_clk           : in  std_logic;
	i_rstb          : in  std_logic;
	i_sin           : in  std_logic_vector( 5 downto 0); -- +/-10° 8 bit quantization
	i_cos           : in  std_logic_vector( 3 downto 0);
	i_op1           : in  std_logic_vector( 7 downto 0);
	i_op2           : in  std_logic_vector( 7 downto 0);
	o_m1            : out std_logic_vector(15 downto 0);  -- 8+8 bit output
	o_m2            : out std_logic_vector(15 downto 0));
end mult;

architecture rtl of mult is
-- used to implement left shift for op2
constant C_ZERO_FILL   : std_logic_vector(6 downto 0):=(others=>'0');
signal r_op2           : signed(i_op2'length+C_ZERO_FILL'length-1 downto 0);
signal r_sin           : signed(i_op1'length-1 downto 0);
signal r_cos           : signed(i_op2'length-1 downto 0);
signal r_m1            : signed(i_op1'length*2-1 downto 0);
signal r_m2            : signed(i_op2'length*2-1 downto 0);


begin

p_input : process (i_rstb,i_clk)
begin
	if(i_rstb='0') then
		r_op2           <= (others=>'0');
		r_sin           <= (others=>'0');
		r_cos           <= (others=>'0');
		r_m1            <= (others=>'0');
		r_m2            <= (others=>'0');
		o_m1            <= (others=>'0');
		o_m2            <= (others=>'0');
	elsif(rising_edge(i_clk)) then
		-- multiply by 2^7
		r_op2           <= signed(i_op2&C_ZERO_FILL);
	-- sign extension  
		r_sin           <= resize(signed(i_sin),i_op1'length);
		r_cos           <= resize(signed(i_cos),i_op2'length);
		r_m1            <= r_sin * signed(i_op1);
		r_m2            <= r_cos * signed(i_op2);
		o_m1            <= std_logic_vector(r_m1);
		o_m2            <= std_logic_vector(r_m2 + r_op2);
	end if;
end process p_input;

end rtl;

In the VHDL code for the multiplier, the value of the cosine multiplied by 128 is simply left shifted by 7 bits. As you know, a multiplication by power of two can be implemented as a left shift by N where N is the value of the exponent.

Simulation of VHDL implementation of multiplier optimization

In the simulation presented in is reported the value of the multiplication of

op1*sin(10°) 
op2*cos(10°)

where the second multiplication is implemented using the optimization presented in the previous section.

Optimised VHDL multiplier simulation results

In this case the value passed to the VHDL code for the cos(10°) is not the quantization of

Round(128*Cos(10°))
= round(128*0.9848) = 126

But the value

round(128*(Cos(-/+10°)-1)
= -2 

the error signals are relative to the comparison of the output results of the VHDL code above vs the classical multiplication

op2*cos(10°)
= op2*126

as clear the output relative to the multiplication performed with the optimized VHDL matches with the value performed with the classical multiplication.

Conclusion

In this post, we addressed a possible optimization that can be adopted in a multiplier when one of the multiplier operand value is close to the maximum value. A typical example is the cosine value when the angle is on the range of +/- 10°

In this case, we don’t need to use all the number of bits to represent the cosine value, but we can take advantage of from the equation:

Cos(a) = (cos(a)-1)+1

Leave a Reply

Your email address will not be published. Required fields are marked *