Performance Analysis of BitWidth Reduced FloatingPoint Arithmetic Units in FPGAs: A Case Study of Neural NetworkBased Face Detector
 Yongsoon Lee^{1},
 Younhee Choi^{1},
 SeokBum Ko^{1}Email author and
 Moon Ho Lee^{2}
DOI: 10.1155/2009/258921
© Yongsoon Lee et al. 2009
Received: 4 July 2008
Accepted: 31 March 2009
Published: 2 July 2009
Abstract
This paper implements a field programmable gate array (FPGA) based face detector using a neural network (NN) and the bitwidth reduced floatingpoint arithmetic unit (FPU). The analytical error model, using the maximum relative representation error (MRRE) and the average relative representation error (ARRE), is developed to obtain the maximum and average output errors for the bitwidth reduced FPUs. After the development of the analytical error model, the bitwidth reduced FPUs and an NN are designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results, along with the experimental (VHDL) results, are compared. The analytical results and the experimental results show conformity of shape. We demonstrate that incremented reductions in the number of bits used can produce significant cost reductions including area, speed, and power.
1. Introduction
Neural networks have been studied and applied in various fields requiring learning, classification, fault tolerance, and associate memory since the 1950s. The neural networks are frequently used to model complicated problems which are difficult to make equations by analytical methods. Applications include pattern recognition and function approximation [1]. The most popular neural network is the multilayer perceptron (MLP) trained using the error back propagation (BP) algorithm [2]. Because of the slow training in MLPBP, however, it is necessary to speed up the training time. The very attractive solution is to implement it on field programmable gate arrays (FPGAs).
For implementing MLPBP, each processing element must perform multiplication and addition. Another important calculation is an activation function, which is used to calculate the output of the neural network. One of the most important considerations for implementing a neural network on FPGAs is the arithmetic representation format. It is known that floatingpoint (FP) formats are more area efficient than fixedpoint ones to implement artificial neural networks with the combination of addition and multiplication on FPGAs [3].
The main advantage of the FP format is its wide range. The feature of the wide range is good for neural network systems because the system requires the big range when the learning weight is calculated or changed [4]. Another advantage of the FP format is the ease of use. A personal computer uses the floatingpoint format for its arithmetic calculation. If the target application uses the FP format, the effort of converting to other arithmetic format is not necessary.
FP hardware offers a wide dynamic range and high computation precision, but it occupies large fractions of total chip area and energy consumption. Therefore, its usage is very limited. Many embedded microprocessors do not even include a floatingpoint unit (FPU) due to its unacceptable hardware cost.
A bitwidth reduced FPU solves this complexity problem [5, 6]. An FP bitwidth reduction can provide a significant saving of hardware resources such as area and power. It is useful to understand the loss in accuracy and the reduction in costs as the number of bits in an implementation of floatingpoint representation is reduced. Incremented reductions in the number of bits used can produce useful cost reductions. In order to determine the required number of bits in the bitwidth reduced FPU, analysis of the error caused by a reducedprecision is essential. Precision reduced error analysis for neural network implementations was introduced in [7]. A formula that estimates the standard deviation of the output differences of fixedpoint and floatingpoint networks was developed in [8]. Previous error analyses are useful to estimate possible errors. However, it is necessary to know the maximum and average possible errors caused by a reducedprecision FPU for a practical implementation.
Therefore, in this paper, the error model is developed using the maximum relative representation error (MRRE) and average relative representation error (ARRE) which are representative indices to examine the FPU accuracy.
After the error model for the reduced precision FPU is developed, the bitwidth reduced FPUs and the neural network for face detection are designed using MATLAB and Very high speed integrated circuit Hardware Description Language (VHDL). Finally the analytical (MATLAB) results are compared with the experimental (VHDL) results.
Detecting a face in an image means to find its position in the image plane and its size. There has been extensive research in the field, ranging mostly in the software domain [9, 10]. There have been a few researches for hardware face detector implementations on FPGAs [11, 12], but most of the proposed solutions are not very compact and the implementations are not purely on hardware. In our previous work, the FPGAbased standalone face detector to support a face recognition system was suggested and showed that an embedded system could be made [13].
Our central contribution here is to examine how neural networkbased face detector can employ the minimal number of bits in an FPU to reduce hardware resources, yet maintain a face detector's overall accuracy.
This paper is outlined as follows. In Section 2, the FPGA implementation of the neural network face detector using the bitwidth reduced FPUs is described. Section 3 explains how representation errors theoretically affect a detection rate in order to determine the required number of bits for the bitwidth reduced FPUs. In Section 4, the experimental results are presented, and then they are compared to the analytical results to verify if both results match closely. Section 5 draws conclusions.
2. A Neural NetworkBased Face Detector Using a BitWidth Reduced FPU in an FPGA
2.1. General Review on MLP
A neural network model can be categorized into two types: single layer perceptron and multilayer perceptron (MLP). A single layer perceptron has only two layers: the input layer and the output layer. Each layer contains a certain number of neurons. The MLP is a neural network model that contains multiple layers, typically three or more layers including one or more hidden layers. The MLP is a representative method of supervized learning.
Each neuron in one layer receives its input from the neurons in the previous layer and broadcasts its output to the neurons in the next layer. Every processing node in one particular layer is usually connected to every node in the previous layer and the next layer. The connections carry weights, and the weights are adjusted during training. The operation of the network consists of two stages: forward pass and backward pass or backpropagation. In the forward pass, an input pattern vector is presented to the network and the output of the input layer nodes is precisely the components of the input pattern. For successive layers, the input to each node is then the sum of the products of the incoming vector components with their respective weights.
The input to a node is given by simply
where is the weight connecting node to node and out_{ i } is the output from node .
The output of a node j is simply
which is then sent to all nodes in the following layer. This continues through all the layers of the network until the output layer is reached and the output vector is computed. The input layer nodes do not perform any of the above calculations. They simply take the corresponding value from the input pattern vector. The function f denotes the activation function of each node, and it will be discussed in the following section.
After the face data enters the input node, it is calculated by the multiplicationandaccumulation (MAC) with weights. Face or nonface data is determined by comparing output results with the thresholds. For example, if the output is larger than the threshold, it is considered as a face data. Here, on the FPGA, this decision is easily made by checking a sign bit after subtracting the output results and the threshold.
2.2. Estimation of Activation Function
An activation function is used to calculate the output of the neural network. The learning procedure of the neural network requires the differentiation of the activation function to renew the weights value. Therefore, the activation function has to be differentiable. A sigmoid function, having an " '' shape, is used for the activation function, and a logistic or a hyperbolic tangent function is commonly used as the sigmoid function. The hyperbolic tangent function and its antisymmetric feature were better than the logistic function for learning ability in our experiment. Therefore, hyperbolic tangent sigmoid transfer function was used, as shown in (3). The firstorder derivative of the hyperbolic tangent sigmoid transfer function can be easily obtained as (4). MATLAB provides the commands, "tansig" and "dtansig":
where x = in (2).
The activation function can be estimated by using different methods. The Taylor and polynomial methods are effective, and guarantee the highest speed and accuracy among these methods.
The polynomial method is used to estimate the activation function in this paper as seen in (5) and (6) because it is simpler than the Taylor approximation.
A firstdegree polynomial estimation of an activation function is
A firstorder derivative is
2.3. FPU Implementation
2.4. Implementation of the Neural NetworkBased FPGA Face Detector Using MATLAB and VHDL
The Olivetti face database [17] is chosen for this study. The Olivetti face database consists of monocolor face and nonface image so it is easy to use. Some other databases which have large size, color, mixed with other pictures are difficult for this error analysis purpose due to the necessity of more preprocessing like cropping, data classification, and color model change.
3. Error Analysis of the Neural Network Caused by ReducedPrecision FPU
3.1. MRRE and ARRE
MRRE and ARRE of Five Different FBUs.
Bitwidth  Unit  

 Range  MRRE(ulp)  ARRE  
FPU32  2, 8, 23 



FPU24  2, 6, 17 

 
FPU20  2, 6, 13 

 
FPU16  2, 6, 9 



FPU12  2, 6, 5 


where ulp is a unit in the last position and is the exponent base.
An average relative representation error (ARRE) can be considered for practical use:
3.2. Output Error Estimation of the Neural Network
The error of the 1st layer is the difference between the output by a finite precision arithmetic ( ) and the ideal output ( ), and it can be described as
is the nonlinear function error by Taylor estimation; is very small and negligible. Therefore, becomes 0.
Other calculation errors occur when the differential of activation is calculated (i.e., ), and the final face determination is calculated as follows: f(x) = (x) + (0.5).
The multiplication error, is not considered in this paper. The multiplication unit assigns twice the size of the bits to save the result data. For example, multiplication of 16 bits × 16 bits needs 32 bits. This large size register reduces the error, thus the error is negligible.
However, the summation error, is not negligible and added to the error term, . The multiplication error ( ) and the addition error ( ) are bounded by the MRRE (assuming rounding mode = truncation) as given by (11) and (12):
where negative sign (–) describes the direction.
For example, the of " ": :
For example, the of " ": .
Note that the maximum error caused by truncation of rounding scheme is bounded as
The error caused by roundtothenearest scheme is bounded as
The truncation of rounding scheme creates a negative error and a roundtonearest scheme creates a positive error. The total error can be reduced by almost 50% by roundtonearest scheme [18].
From (9), the terms and are weights data and input data, respectively, including the reducedprecision error. They are described by and . Therefore, the multiplication of weights and input data are denoted by .
Equations (16) and (18) are obtained by applying the firstorder Taylor's series approximation as given by [7, 8]
From (9), the error of the first layer, is given by
The error of the second layer can also be found as
By replacing the and the with and , (18) becomes
The error (22) can be generalized for the l th layer, l, in a similar way:
3.3. Output Error Estimation by MRRE and ARRE
The error equation can be rewritten using the MRRE in the error term to find the maximum output error caused by reducedprecision. The average error can be estimated in the practical application by replacing the MRRE with ARRE
From (16), the output error of the first layer is described as
The and terms can be defined by and . Thus from (24), the error is bounded so that
Finally, the output error of the second layer is also described from (22) as shown in (28), where the error of weights can also be written as
3.4. Relationship between MRRE and Output Error
In order to observe the relationship between the MRRE and output error, (28) is written as (30) again.
where
Some properties are derived from (26) and (30) for the output error. The differential of summations affects the output error proportionally like
One more finding is that the output error is also proportional to the MRRE.
From the (30),
where (assuming "rounding mode = truncation"). Therefore, (33) can be described as
Finally, it is concluded that nbits reduction in the FPU creates 2^{ n } times the error. If one bit is reduced, for example, the output error is doubled (e.g., ). After putting the MRRE between FPU32 and other reduced precision FPU bits into error terms in (26) and (28) using MATLAB and real face data, finally, the total accumulated error of the neural network is obtained as shown in Table 11.
4. Result and Discussion
4.1. FPGA Synthesis Results
The FPGAbased face detector using the neural network and the reducedprecision, FPU, is implemented in this paper. The logic circuits of the neural networkbased FPGA face detector are synthesized using the FPGA design tool, Xilinx ISE on a Spartan3 XC3S4000 [19]. To verify the error model, first of all, the neural network on a PC is designed using MATLAB. Next, the weights and testbench data are saved as a file to verify the VHDL code.
After simulation, area and operating speed are obtained by synthesizing the logic circuits. The FPU uses the same calculation method, floatingpoint arithmetic, as the PC so it is easy to verify and easy to change the neural network's structure.
4.1.1. Timing
Timing results of the neural networkbased FPGA face detector by different FPUs.
Bitwidth  Max. Clock (MHz)  1/f(ns)  Time /1 frame (ms)  Frame rate 

 8.5  117  50  20 
FPU32  48  21.7  8.7  114.4 
FPU24  58 (+21%)  17.4  7.4  135.9 
FPU20  77 ( 60%)  13  5.5  182.1 
FPU16  80 ( 67%)  12.5  5.3  189.8 
FPU12  85 ( 77%)  11.7  5  201.8 
The remaining question is to examine if bitwidth reduced FPU can still maintain a face detector's overall accuracy. For this purpose, detection rate error for bitwidth reduced FPU will be discussed in Section 4.2.2.
4.1.2. Area
Area results of the neural networkbased FPGA face detector by different FPUs.
Bitwidth  No. of Slices  No. of FFs  No. of LUTs 

FPU32  1077  771  1952 
FPU24  878 (–18.5%)  637  1577 
FPU20  750 (–30.4%)  569  1356 
FPU16  650 (–39.7%)  501  1167 
FPU12  556 (–48.4%)  433  998 
As the bitwidth decreases, the number of slices is decreased from 18.5% (FPU24) to 39.7% (FPU16) compared to FPU32.
Area results of 32/24/20/16/12Bit FP adders.
FP Adder Bitwidth  Memory (Kbits)  NN Area (Slices)  FP Adder area (Slices) 

32  3760  1077  486 
24  2820 ( 25%)  878  403 ( 17%) 
20  2350 ( 37%)  750  300 ( 38%) 
16  1880 ( 50%)  650  250 ( 49%) 
12  1410 ( 63%)  556  173 ( 64%) 
The number of slices of the floatingpoint adder varies from 31% (FP12: 173/556) to 45% (FP32: 486/1077) of the total size of the neural network as shown in Table 4.
4.1.3. Power
Power consumption of the neural networkbased FPGA face detector by the different FPUs (unit: Mw).
Bitwidth  CLBs  RAM (Width)  Multiplier (Block)  I/O 


FPU32  2  17 ( 36)  9 ( 5)  67  306 
FPU24  2  17 ( 36)  7 ( 4)  49  286 (–6.5%) 
FPU20  2  17 ( 36)  4 ( 2)  45  279 (–8.8%) 
FPU16  2  8 ( 18)  4 ( 2)  36  261 (–14.7%) 
FPU12  1  8 ( 18)  4 ( 2)  29  253 (–17.3%) 
As the bitwidth decreases, the power consumption decreases. For example, bit reduction from the FPU32 to the FPU16 reduces the total power by 14.7% (FPU32: 306 mW, FPU16: 261 mW) through RAM, multiplier, and I/O as shown in Table 5.
The change of the logic cell does not considerably affect the power as much as hardwired IP such as memory and multiplier spend the power. See the number of configurable logic blocks (CLBs) in Table 5.
4.1.4. Architectures of FP Adder
The neural network system and the FPU hardware performance are greatly affected by the FP addition [21]. The bitwidth reduced FP addition is modified for this study from the commercial IP, LEON processor. LEON FPU uses standard adder architecture [16]. The system performance and the clock speed can be further improved by leadingoneprediction (LOP) algorithm and 2path (closeandfar path) algorithm, respectively [18].
Comparison of different FP adder architectures (5 pipeline stages).
Adder type  Slices  FFs  LUTs  Max. freq. (MHz) 

LEON IP  486  269  905  71.5 
LOP  570 (+17%)  294  1052  102 (+42.7%) 
2path  1026(+111%)  128  1988  200 (+180%) 
4.1.5. Specification
Specifications of neural networkbased FPGA face detector
Feature  Specification 

FPU Bitwidth  32, 24, 20, 16, 12 
Frequency  48/58/77/80/85 MHz 
Slices (Xilinx Spartan)  1077/878/750/650/556 (FPU32 / FPU16) 
Arithmetic unit  IEEE 754 single precision with bitwidth reduced FPU 
Networks  2 Layers (400/300/1 node) 
Input Data Size  20×20 (400 pixel image) 
Operating Time  8.7/7.4/5.5/5.3/5 ms/frame 
Frame Rate  114/136/182/190/201 seconds 
4.2. Detection Rate Error
Two factors affect the detection rate error. One is the polynomial estimation error as shown in Figure 2 which is occurred when the activation function is estimated through the polynomial equation. Another possible error caused by the bitwidth reduced FPU.
4.2.1. Detection Rate Error by Polynomial Estimation
To reduce the error caused by polynomial estimation, the polynomial equation, (35) can be more elaborately modified as shown in (36). The problem of (36) is not differentiable at and also the error (30) will be identically 0 (i.e., ) for , which will make error analysis difficult:
Difference between (3) and (5) in face detection rate (MATALAB).
Threshold  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  10 

Tansig (3)  34.09  34.55  37.27  45.91  53.64  61.36  73.09  77.73  75  72.73 
Poly (5)  35  39.09  45.91  53.64  62.73  70  72.27  77.27  78.18  77.27 
Abs diff  0.91  4.54  8.64  7.73  9.09  8.64  0.82  0.46  3.18  4.54 
Avg. error  4.9 
4.2.2. Detection Rate Error by ReducedPrecision FPU
Detection rate of PC software face detector.
Threshold  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

Face  60  60  60  53  50  43  29  21  17  10 
Rate  100  100  100  88.33  83.33  71.67  48.33  35  28.33  16.67 
Nface  17  26  41  65  88  111  130  149  155  160 
Rate  10.625  16.25  25.625  40.625  55  69.375  81.25  93.125  96.875  100 
Total  35  39.09  45.91  53.64  62.73  70  72.27  77.27  78.18  77.27 
Detection rate of reducedprecision FPUs (VDHL).
Threshold  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  Avg. detection rate error 

FPU64 (PC)  35  39.09  45.91  53.64  62.73  70  72.27  77.27  78.18  77.27  
FPU32 NN  35  39.09  45.91  53.64  62.73  70  72.27  76.82  78.18  77.27  0 
FPU24 NN  35  39.09  45.91  53.64  62.73  70  72.27  76.82  78.18  77.27  0 
FPU20 NN  35  39.09  46.36  53.64  63.18  70  73.64  76.82  77.73  76.82  0.36 
FPU18 NN  35  41.36  47.73  56.82  65.46  69.55  74.55  77.73  77.27  74.09  1.73 
FPU16 NN  35.91  44.55  53.18  66.36  70.46  76.36  78.18  74.55  72.73  72.73  5.91 
 0.91  5.45  7.27  12.73  7.73  6.36  5.91  2.73  5.46  4.55  5.91 
Results of output error on a neural networkbased FPGA face detector.
Bitwidth  Calculation  Experiment  

MRRE  ARRE  max  
FPU32  4E05  2.89E05  1.93E05 
FPU24  0.0026  0.0018  0.0012 
FPU20  0.0410  0.0296  0.0192 
FPU18  0.1641  0.1184  0.0766 
FPU16  0.6560  0.4733  0.2816 
FPU14  2.62  1.891  0.9872 
FPU12  10.4  7.5256  1.0741 
Table 10 shows the detection rate error (i.e., ∣ detection rate of FPU64 (PC software)—detection rate of reducedprecision FPUs∣) caused by reducedprecision FPUs. The detection rate is changed from FPU64(PC) to FPU16 by only 5.91% (i.e., ).
Analytical results are found to be in agreement with simulation results as shown in Figure 10. The analytical MRRE results and the maximum experimental results show conformity of shape. The analytical ARRE results and the minimum experimental results also show conformity of shape.
As the bits in the FPU are reduced within the ranges from 32 bits to 14 bits, the output error is incremented by times. For example, 2bit reduction from FPU16 to FPU14 makes 4 times ( ) the error.
Due to the small number of fraction bits (e.g., 5 bits in FPU12), no meaningful results are obtained under 14 bits. Therefore, at least 14 bits should be employed to achieve an acceptable face detection rate. See Figures 9 and 10.
5. Conclusion
In this paper, the analytical error model was developed using the maximum relative representation error (MRRE) and average relative representation error (ARRE) to obtain the maximum and average output errors for the bitwidth reduced FPUs.
After the development of the analytical error model, the bitwidth reduced FPUs, and the neural network were designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results with the experimental (VHDL) results were compared.
The analytical results and the experimental results showed conformity of shape. According to both results, as the n bits in FPU are reduced within the ranges from 32 bits to 14 bits, the output error is incremented by times.
An operating speed was significantly improved from an FPGAbased face detector implementation using a reduced precision FPU. For example, it took only 5.3 milliseconds in the FPU16 to process one frame which is 9 times faster than 50 milliseconds (40 milliseconds for loading time milliseconds for calculation time) of the PC (Pentium 4, 1.4 GHz). It was found that bit reduction from FPU 32 bits to FPU16 bits reduced the size of memory and arithmetic units by 50% and the total power consumption by 14.7%, while still maintaining 94.1% face detection accuracy. The developed error analysis for bitwidth reduced FPUs will be helpful to determine the specification for an embedded neural network hardware system.
Declarations
Acknowledgments
The authors would like to acknowledge the Natural Science and Engineering Research Council of Canada (NSERC) / the University of Saskatchewan's Publications Fund, the Korea Research Foundation, and a Korean Federation of Science and Technology Societies grant funded by the South Korean government (MOEHRD, Basic Research Promotion Fund) for supporting this research and to thank the reviewers for their valuable suggestions.
Authors’ Affiliations
References
 Skrbek M: Fast neural network implementation. Neural Network World 1999,9(5):375391.Google Scholar
 Rumelhart DE, McClelland JL: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1. MIT Press, Cambridge, Mass, USA; 1986.Google Scholar
 Li X, Moussa M, Areibi S: Arithmetic formats for implementing artificial neural networks on FPGAs. Canadian Journal of Electrical and Computer Engineering 2006,31(1):3040.View ArticleGoogle Scholar
 Brown HK, Cross DD, Whittaker AG: Neural network number systems. Proceedings of International Joint Conference on Neural Networks (IJCNN '90), June 1990, San Diego, Calif, USA 3: 903908.Google Scholar
 Kontro J, Kalliojarvi K, Neuvo Y: Use of short floatingpoint formats in audio applications. IEEE Transactions on Consumer Electronics 1992,38(3):200207. 10.1109/30.156684View ArticleGoogle Scholar
 Tong J, Nagle D, Rutenbar R: Reducing power by optimizing the necessary precision/range of floatingpoint arithmetic. IEEE Transactions on VLSI Systems 2000,8(3):273286.View ArticleGoogle Scholar
 Holt JL, Hwang JN: Finite precision error analysis of neural network hardware implementations. IEEE Transactions on Computers 1993,42(3):281290. 10.1109/12.210171View ArticleGoogle Scholar
 Sen S, Robertson W, Phillips WJ: The effects of reduced precision bit lengths on feed forward neural networks for speech recognition. Proceedings of IEEE International Conference on Neural Networks, June 1996, Washington, DC, USA 4: 19861991.View ArticleGoogle Scholar
 Feraud R, Bernier OJ, Viallet JE, Collobert M: A fast and accurate face detector based on neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001,23(1):4253. 10.1109/34.899945View ArticleGoogle Scholar
 Rowley HA, Baluja S, Kanade T: Neural networkbased face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(1):2338. 10.1109/34.655647View ArticleGoogle Scholar
 Theocharides T, Link G, Vijaykrishnan N, Irwin MJ, Wolf W: Embedded hardware face detection. Proceedings of the 17th IEEE International Conference on VLSI Design, January 2004, Mumbai, India 133138.View ArticleGoogle Scholar
 Sadri M, Shams N, Rahmaty M, et al.: An FPGA based fast face detector. Global Signal Processing Expo and Conference (GSPX '04), September 2004, Santa Clara, Calif, USA Google Scholar
 Lee Y, Ko SB: FPGA implementation of a face detector using neural networks. Canadian Conference on Electrical and Computer Engineering (CCECE '07), May 2006, Ottawa, Canada 19141917.Google Scholar
 Chester D: Why two hidden layers are better than one. Proceedings of International Joint Conference on Neural Networks (IJCNN '90), January 1990, Washington, DC, USA 1: 265268.Google Scholar
 IEEE Std 7541985 : IEEE standard for binary floatingpoint arithmetic. Standards Committee of the IEEE Computer Society, New York, NY, USA, August 1985
 LEON Processor http://www.gaisler.com
 Olivetti & Oracle Research Laboratory The Olivetti & Oracle Research Laboratory Face Database of Faces, http://www.camorl.co.uk/facedatabase.html The Olivetti & Oracle Research Laboratory Face Database of Faces,
 Koren I: Computer Arithmetic Algorithms. 2nd edition. A K Peters, Natick, Mass, USA; 2001.Google Scholar
 XILINX : Spartan3 FPGA Family Complete Data Sheet. Product Specification, April 2008
 XILINX Spartan3 Web Power Tool Version 8.1.01 http://www.xilinx.com/cgibin/power_tool/power_Spartan3
 Govindu G, Zhuo L, Choi S, Prasanna V: Analysis of highperformance floatingpoint arithmetic on FPGAs. Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS '04), April 2004, Santa Fe, NM, USA 149156.Google Scholar
 Malik A: Design tradeoff analysis of floatingpoint adder in FPGAs, M.S. thesis. Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, Canada; 2005.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.