Compare View
Commits (2)
Diff
Showing 6 changed files Inline Diff
biographies/ah.jpg
43.4 KB
biographies/gmg.jpg
148 KB
biographies/jb.jpg
31.8 KB
biographies/jmf.jpg
46.6 KB
biographies/pyb.jpg
161 KB
ifcs2018_journal.tex
| \documentclass[a4paper,journal]{IEEEtran/IEEEtran} | 1 | |||
| \usepackage{graphicx,color,hyperref} | 2 | |||
| \usepackage{amsfonts} | 3 | |||
| \usepackage{amsthm} | 4 | |||
| \usepackage{amssymb} | 5 | |||
| \usepackage{amsmath} | 6 | |||
| \usepackage{algorithm2e} | 7 | |||
| \usepackage{url,balance} | 8 | |||
| \usepackage[normalem]{ulem} | 9 | |||
| \usepackage{tikz} | 10 | |||
| \usetikzlibrary{positioning,fit} | 11 | |||
| \usepackage{multirow} | 12 | |||
| \usepackage{scalefnt} | 13 | 1 | \documentclass[a4paper,journal]{IEEEtran/IEEEtran} | |
| \usepackage{caption} | 14 | 2 | \usepackage{graphicx,color,hyperref} | |
| \usepackage{subcaption} | 15 | 3 | \usepackage{amsfonts} | |
| 16 | 4 | \usepackage{amsthm} | ||
| 17 | 5 | \usepackage{amssymb} | ||
| \hyphenation{op-tical net-works semi-conduc-tor} | 18 | 6 | \usepackage{amsmath} | |
| \textheight=26cm | 19 | 7 | \usepackage{algorithm2e} | |
| \setlength{\footskip}{30pt} | 20 | 8 | \usepackage{url,balance} | |
| \pagenumbering{gobble} | 21 | 9 | \usepackage[normalem]{ulem} | |
| \begin{document} | 22 | 10 | \usepackage{tikz} | |
| \title{Filter optimization for real time digital processing of radiofrequency signals: application | 23 | 11 | \usetikzlibrary{positioning,fit} | |
| to oscillator metrology} | 24 | 12 | \usepackage{multirow} | |
| 25 | 13 | \usepackage{scalefnt} | ||
| \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, | 26 | 14 | \usepackage{caption} | |
| G. Goavec-M\'erou\IEEEauthorrefmark{1}, | 27 | 15 | \usepackage{subcaption} | |
| P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\ | 28 | 16 | ||
| \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\ | 29 | 17 | ||
| \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ | 30 | 18 | \hyphenation{op-tical net-works semi-conduc-tor} | |
| Email: \{pyb2,jmfriedt\}@femto-st.fr} | 31 | 19 | \textheight=26cm | |
| } | 32 | 20 | \setlength{\footskip}{30pt} | |
| \maketitle | 33 | 21 | \pagenumbering{gobble} | |
| \thispagestyle{plain} | 34 | 22 | \begin{document} | |
| \pagestyle{plain} | 35 | 23 | \title{Filter optimization for real time digital processing of radiofrequency signals: application | |
| \newtheorem{definition}{Definition} | 36 | 24 | to oscillator metrology} | |
| 37 | 25 | |||
| \begin{abstract} | 38 | 26 | \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, | |
| Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to | 39 | 27 | G. Goavec-M\'erou\IEEEauthorrefmark{1}, | |
| radiofrequency signal processing. Applied to oscillator characterization in the context | 40 | 28 | P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\ | |
| of ultrastable clocks, stringent filtering requirements are defined by spurious signal or | 41 | 29 | \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\ | |
| noise rejection needs. Since real time radiofrequency processing must be performed in a | 42 | 30 | \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ | |
| Field Programmable Array to meet timing constraints, we investigate optimization strategies | 43 | 31 | Email: \{pyb2,jmfriedt\}@femto-st.fr} | |
| to design filters meeting rejection characteristics while limiting the hardware resources | 44 | 32 | } | |
| required and keeping timing constraints within the targeted measurement bandwidths. The | 45 | 33 | \maketitle | |
| presented technique is applicable to scheduling any sequence of processing blocks characterized | 46 | 34 | \thispagestyle{plain} | |
| by a throughput, resource occupation and performance tabulated as a function of configuration | 47 | 35 | \pagestyle{plain} | |
| characateristics, as is the case for filters with their coefficients and resolution yielding | 48 | 36 | \newtheorem{definition}{Definition} | |
| rejection and number of multipliers. | 49 | 37 | ||
| \end{abstract} | 50 | 38 | \begin{abstract} | |
| 51 | 39 | Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to | ||
| \begin{IEEEkeywords} | 52 | 40 | radiofrequency signal processing. Applied to oscillator characterization in the context | |
| Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter | 53 | 41 | of ultrastable clocks, stringent filtering requirements are defined by spurious signal or | |
| \end{IEEEkeywords} | 54 | 42 | noise rejection needs. Since real time radiofrequency processing must be performed in a | |
| 55 | 43 | Field Programmable Array to meet timing constraints, we investigate optimization strategies | ||
| \section{Digital signal processing of ultrastable clock signals} | 56 | 44 | to design filters meeting rejection characteristics while limiting the hardware resources | |
| 57 | 45 | required and keeping timing constraints within the targeted measurement bandwidths. The | ||
| Analog oscillator phase noise characteristics are classically performed by downconverting | 58 | 46 | presented technique is applicable to scheduling any sequence of processing blocks characterized | |
| the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, | 59 | 47 | by a throughput, resource occupation and performance tabulated as a function of configuration | |
| followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In | 60 | 48 | characateristics, as is the case for filters with their coefficients and resolution yielding | |
| a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by | 61 | 49 | rejection and number of multipliers. | |
| multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. | 62 | 50 | \end{abstract} | |
| 63 | 51 | |||
| \begin{figure}[h!tb] | 64 | 52 | \begin{IEEEkeywords} | |
| \begin{center} | 65 | 53 | Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter | |
| \includegraphics[width=.8\linewidth]{images/schema} | 66 | 54 | \end{IEEEkeywords} | |
| \end{center} | 67 | 55 | ||
| \caption{Fully digital oscillator phase noise characterization: the Device Under Test | 68 | 56 | \section{Digital signal processing of ultrastable clock signals} | |
| (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and | 69 | 57 | ||
| downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals | 70 | 58 | Analog oscillator phase noise characteristics are classically performed by downconverting | |
| and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite | 71 | 59 | the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, | |
| Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays | 72 | 60 | followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In | |
| the spectral characteristics of the phase fluctuations.} | 73 | 61 | a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by | |
| \label{schema} | 74 | 62 | multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. | |
| \end{figure} | 75 | 63 | ||
| 76 | 64 | \begin{figure}[h!tb] | ||
| As with the analog mixer, | 77 | 65 | \begin{center} | |
| the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as | 78 | 66 | \includegraphics[width=.8\linewidth]{images/schema} | |
| well as the generation of the frequency sum signal in addition to the frequency difference. | 79 | 67 | \end{center} | |
| These unwanted spectral characteristics must be rejected before decimating the data stream | 80 | 68 | \caption{Fully digital oscillator phase noise characterization: the Device Under Test | |
| for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the | 81 | 69 | (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and | |
| downconverter | 82 | 70 | downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals | |
| and the decimation processing blocks are core characteristics of an oscillator characterization | 83 | 71 | and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite | |
| system, and must reject out-of-band signals below the targeted phase noise -- typically in the | 84 | 72 | Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays | |
| sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will | 85 | 73 | the spectral characteristics of the phase fluctuations.} | |
| use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency | 86 | 74 | \label{schema} | |
| datastream: optimizing the performance of the filter while reducing the needed resources is | 87 | 75 | \end{figure} | |
| hence tackled in a systematic approach using optimization techniques. Most significantly, we | 88 | 76 | ||
| tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with | 89 | 77 | As with the analog mixer, | |
| tunable number of coefficients and tunable number of bits representing the coefficients and the | 90 | 78 | the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as | |
| data being processed. | 91 | 79 | well as the generation of the frequency sum signal in addition to the frequency difference. | |
| 92 | 80 | These unwanted spectral characteristics must be rejected before decimating the data stream | ||
| \section{Finite impulse response filter} | 93 | 81 | for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the | |
| 94 | 82 | downconverter | ||
| We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined | 95 | 83 | and the decimation processing blocks are core characteristics of an oscillator characterization | |
| by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the | 96 | 84 | system, and must reject out-of-band signals below the targeted phase noise -- typically in the | |
| outputs $y_k$ | 97 | 85 | sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will | |
| \begin{align} | 98 | 86 | use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency | |
| y_n=\sum_{k=0}^N b_k x_{n-k} | 99 | 87 | datastream: optimizing the performance of the filter while reducing the needed resources is | |
| \label{eq:fir_equation} | 100 | 88 | hence tackled in a systematic approach using optimization techniques. Most significantly, we | |
| \end{align} | 101 | 89 | tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with | |
| 102 | 90 | tunable number of coefficients and tunable number of bits representing the coefficients and the | ||
| As opposed to an implementation on a general purpose processor in which word size is defined by the | 103 | 91 | data being processed. | |
| processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since | 104 | 92 | ||
| not only the coefficient values and number of taps must be defined, but also the number of bits | 105 | 93 | \section{Finite impulse response filter} | |
| defining the coefficients and the sample size. For this reason, and because we consider pipeline | 106 | 94 | ||
| processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency | 107 | 95 | We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined | |
| signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but | 108 | 96 | by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the | |
| the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language | 109 | 97 | outputs $y_k$ | |
| (VHDL) level. | 110 | 98 | \begin{align} | |
| Since latency is not an issue in a openloop phase noise characterization instrument, | 111 | 99 | y_n=\sum_{k=0}^N b_k x_{n-k} | |
| the large | 112 | 100 | \label{eq:fir_equation} | |
| numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, | 113 | 101 | \end{align} | |
| is not considered as an issue as would be in a closed loop system. | 114 | 102 | ||
| 115 | 103 | As opposed to an implementation on a general purpose processor in which word size is defined by the | ||
| The coefficients are classically expressed as floating point values. However, this binary | 116 | 104 | processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since | |
| number representation is not efficient for fast arithmetic computation by an FPGA. Instead, | 117 | 105 | not only the coefficient values and number of taps must be defined, but also the number of bits | |
| we select to quantify these floating point values into integer values. This quantization | 118 | 106 | defining the coefficients and the sample size. For this reason, and because we consider pipeline | |
| will result in some precision loss. | 119 | 107 | processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency | |
| 120 | 108 | signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but | ||
| \begin{figure}[h!tb] | 121 | 109 | the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language | |
| \includegraphics[width=\linewidth]{images/zero_values} | 122 | 110 | (VHDL) level. | |
| \caption{Impact of the quantization resolution of the coefficients: the quantization is | 123 | 111 | Since latency is not an issue in a openloop phase noise characterization instrument, | |
| set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting | 124 | 112 | the large | |
| the 30~first and 30~last coefficients out of the initial 128~band-pass | 125 | 113 | numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, | |
| filter coefficients to 0 (red dots).} | 126 | 114 | is not considered as an issue as would be in a closed loop system. | |
| \label{float_vs_int} | 127 | 115 | ||
| \end{figure} | 128 | 116 | The coefficients are classically expressed as floating point values. However, this binary | |
| 129 | 117 | number representation is not efficient for fast arithmetic computation by an FPGA. Instead, | ||
| The tradeoff between quantization resolution and number of coefficients when considering | 130 | 118 | we select to quantify these floating point values into integer values. This quantization | |
| integer operations is not trivial. As an illustration of the issue related to the | 131 | 119 | will result in some precision loss. | |
| relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits | 132 | 120 | ||
| a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon | 133 | 121 | \begin{figure}[h!tb] | |
| quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the | 134 | 122 | \includegraphics[width=\linewidth]{images/zero_values} | |
| taps become null, making the large number of coefficients irrelevant: processing | 135 | 123 | \caption{Impact of the quantization resolution of the coefficients: the quantization is | |
| resources | 136 | 124 | set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting | |
| are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources | 137 | 125 | the 30~first and 30~last coefficients out of the initial 128~band-pass | |
| to reach a given rejection level, or maximizing out of band rejection for a given computational | 138 | 126 | filter coefficients to 0 (red dots).} | |
| resource, will drive the investigation on cascading filters designed with varying tap resolution | 139 | 127 | \label{float_vs_int} | |
| and tap length, as will be shown in the next section. Indeed, our development strategy closely | 140 | 128 | \end{figure} | |
| follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} | 141 | 129 | ||
| in which basic blocks are defined and characterized before being assembled \cite{hide} | 142 | 130 | The tradeoff between quantization resolution and number of coefficients when considering | |
| in a complete processing chain. In our case, assembling the filter blocks is a simpler block | 143 | 131 | integer operations is not trivial. As an illustration of the issue related to the | |
| combination process since we assume a single value to be processed and a single value to be | 144 | 132 | relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits | |
| generated at each clock cycle. The FIR filters will not be considered to decimate in the | 145 | 133 | a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon | |
| current implementation: the decimation is assumed to be located after the FIR cascade at the | 146 | 134 | quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the | |
| moment. | 147 | 135 | taps become null, making the large number of coefficients irrelevant: processing | |
| 148 | 136 | resources | ||
| \section{Methodology description} | 149 | 137 | are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources | |
| 150 | 138 | to reach a given rejection level, or maximizing out of band rejection for a given computational | ||
| Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP) | 151 | 139 | resource, will drive the investigation on cascading filters designed with varying tap resolution | |
| chain obtained by assembling basic processing blocks, with hardware and manufacturer independence. | 152 | 140 | and tap length, as will be shown in the next section. Indeed, our development strategy closely | |
| Achieving such a target requires defining an abstract model to represent some basic properties | 153 | 141 | follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} | |
| of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and | 154 | 142 | in which basic blocks are defined and characterized before being assembled \cite{hide} | |
| resource occupation. These abstract properties, not necessarily related to the detailed hardware | 155 | 143 | in a complete processing chain. In our case, assembling the filter blocks is a simpler block | |
| implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum | 156 | 144 | combination process since we assume a single value to be processed and a single value to be | |
| target, whether in terms of maximizing performance for a given arbitrary resource occupation, or | 157 | 145 | generated at each clock cycle. The FIR filters will not be considered to decimate in the | |
| minimizing resource occupation for a given performance. In our approach, the solution of the | 158 | 146 | current implementation: the decimation is assumed to be located after the FIR cascade at the | |
| solver is then synthesized using the dedicated tool provided by each platform manufacturer | 159 | 147 | moment. | |
| to assess the validity of our abstract resource occupation indicator, and the result of running | 160 | 148 | ||
| the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize | 161 | 149 | \section{Methodology description} | |
| that all solutions found by the solver are synthesized and executed on hardware at the end | 162 | 150 | ||
| of the analysis. | 163 | 151 | Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP) | |
| 164 | 152 | chain obtained by assembling basic processing blocks, with hardware and manufacturer independence. | ||
| In this demonstration, we focus on only two operations: filtering and shifting the number of | 165 | 153 | Achieving such a target requires defining an abstract model to represent some basic properties | |
| bits needed to represent the data along the processing chain. | 166 | 154 | of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and | |
| We have chosen these basic operations because shifting and the filtering have already been studied | 167 | 155 | resource occupation. These abstract properties, not necessarily related to the detailed hardware | |
| in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for | 168 | 156 | implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum | |
| assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend | 169 | 157 | target, whether in terms of maximizing performance for a given arbitrary resource occupation, or | |
| requiring pipelined processing at full bandwidth for the earliest steps, including for | 170 | 158 | minimizing resource occupation for a given performance. In our approach, the solution of the | |
| time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}. | 171 | 159 | solver is then synthesized using the dedicated tool provided by each platform manufacturer | |
| 172 | 160 | to assess the validity of our abstract resource occupation indicator, and the result of running | ||
| Addressing only two operations allows for demonstrating the methodology but should not be | 173 | 161 | the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize | |
| considered as a limitation of the framework which can be extended to assembling any number | 174 | 162 | that all solutions found by the solver are synthesized and executed on hardware at the end | |
| of skeleton blocks as long as performance and resource occupation can be determined. | 175 | 163 | of the analysis. | |
| Hence, | 176 | 164 | ||
| in this paper we will apply our methodology on simple DSP chains: a white noise input signal | 177 | 165 | In this demonstration, we focus on only two operations: filtering and shifting the number of | |
| is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s) | 178 | 166 | bits needed to represent the data along the processing chain. | |
| 14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been | 179 | 167 | We have chosen these basic operations because shifting and the filtering have already been studied | |
| digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance -- | 180 | 168 | in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for | |
| practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction | 181 | 169 | assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend | |
| by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing, | 182 | 170 | requiring pipelined processing at full bandwidth for the earliest steps, including for | |
| allowing to assess either filter rejection for a given resource usage, or validating the rejection | 183 | 171 | time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}. | |
| when implementing a solution minimizing resource occupation. | 184 | 172 | ||
| 185 | 173 | Addressing only two operations allows for demonstrating the methodology but should not be | ||
| The first step of our approach is to model the DSP chain. Since we aim at only optimizing | 186 | 174 | considered as a limitation of the framework which can be extended to assembling any number | |
| the filtering part of the signal processing chain, we have not included the PRN generator or the | 187 | 175 | of skeleton blocks as long as performance and resource occupation can be determined. | |
| ADC in the model: the input data size and rate are considered fixed and defined by the hardware. | 188 | 176 | Hence, | |
| The filtering can be done in two ways, either by considering a single monolithic FIR filter | 189 | 177 | in this paper we will apply our methodology on simple DSP chains: a white noise input signal | |
| requiring many coefficients to reach the targeted noise rejection ratio, or by | 190 | 178 | is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s) | |
| cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter. | 191 | 179 | 14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been | |
| 192 | 180 | digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance -- | ||
| After each filter we leave the possibility of shifting the filtered data to consume | 193 | 181 | practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction | |
| less resources. Hence in the case of cascaded filter, we define a stage as a filter | 194 | 182 | by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing, | |
| and a shifter (the shift could be omitted if we do not need to divide the filtered data). | 195 | 183 | allowing to assess either filter rejection for a given resource usage, or validating the rejection | |
| 196 | 184 | when implementing a solution minimizing resource occupation. | ||
| \subsection{Model of a FIR filter} | 197 | 185 | ||
| 198 | 186 | The first step of our approach is to model the DSP chain. Since we aim at only optimizing | ||
| A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$) | 199 | 187 | the filtering part of the signal processing chain, we have not included the PRN generator or the | |
| the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$ | 200 | 188 | ADC in the model: the input data size and rate are considered fixed and defined by the hardware. | |
| bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as | 201 | 189 | The filtering can be done in two ways, either by considering a single monolithic FIR filter | |
| the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage} | 202 | 190 | requiring many coefficients to reach the targeted noise rejection ratio, or by | |
| shows a filtering stage. | 203 | 191 | cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter. | |
| 204 | 192 | |||
| \begin{figure} | 205 | 193 | After each filter we leave the possibility of shifting the filtered data to consume | |
| \centering | 206 | 194 | less resources. Hence in the case of cascaded filter, we define a stage as a filter | |
| \begin{tikzpicture}[node distance=2cm] | 207 | 195 | and a shifter (the shift could be omitted if we do not need to divide the filtered data). | |
| \node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ; | 208 | 196 | ||
| \node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ; | 209 | 197 | \subsection{Model of a FIR filter} | |
| \node (Start) [left of=FIR] { } ; | 210 | 198 | ||
| \node (End) [right of=Shift] { } ; | 211 | 199 | A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$) | |
| 212 | 200 | the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$ | ||
| \node[draw,fit=(FIR) (Shift)] (Filter) { } ; | 213 | 201 | bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as | |
| 214 | 202 | the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage} | ||
| \draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ; | 215 | 203 | shows a filtering stage. | |
| \draw[->] (FIR) -- (Shift) ; | 216 | 204 | ||
| \draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ; | 217 | 205 | \begin{figure} | |
| \end{tikzpicture} | 218 | 206 | \centering | |
| \caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)} | 219 | 207 | \begin{tikzpicture}[node distance=2cm] | |
| \label{fig:fir_stage} | 220 | 208 | \node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ; | |
| \end{figure} | 221 | 209 | \node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ; | |
| 222 | 210 | \node (Start) [left of=FIR] { } ; | ||
| FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB. | 223 | 211 | \node (End) [right of=Shift] { } ; | |
| This rejection has been computed using GNU Octave software FIR coefficient design functions | 224 | 212 | ||
| (\texttt{firls} and \texttt{fir1}). | 225 | 213 | \node[draw,fit=(FIR) (Shift)] (Filter) { } ; | |
| For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients. | 226 | 214 | ||
| Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively, | 227 | 215 | \draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ; | |
| the coefficients are normalized by their absolute maximum before being scaled to integer coefficients. | 228 | 216 | \draw[->] (FIR) -- (Shift) ; | |
| At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits. | 229 | 217 | \draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ; | |
| 230 | 218 | \end{tikzpicture} | ||
| With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter | 231 | 219 | \caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)} | |
| transfer function. | 232 | 220 | \label{fig:fir_stage} | |
| Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag}, | 233 | 221 | \end{figure} | |
| the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the | 234 | 222 | ||
| bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration, | 235 | 223 | FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB. | |
| we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\% | 236 | 224 | This rejection has been computed using GNU Octave software FIR coefficient design functions | |
| of the Nyquist frequency to the end of the band, as would be typically selected to prevent | 237 | 225 | (\texttt{firls} and \texttt{fir1}). | |
| aliasing before decimating the dataflow by 2. The method is however generalized to any filter | 238 | 226 | For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients. | |
| shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid} | 239 | 227 | Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively, | |
| as described below is indeed unique for each filter shape. | 240 | 228 | the coefficients are normalized by their absolute maximum before being scaled to integer coefficients. | |
| 241 | 229 | At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits. | ||
| \begin{figure} | 242 | 230 | ||
| \begin{center} | 243 | 231 | With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter | |
| \scalebox{0.8}{ | 244 | 232 | transfer function. | |
| \centering | 245 | 233 | Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag}, | |
| \begin{tikzpicture}[scale=0.3] | 246 | 234 | the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the | |
| \draw[<->] (0,15) -- (0,0) -- (21,0) ; | 247 | 235 | bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration, | |
| \draw[thick] (0,12) -- (8,12) -- (20,0) ; | 248 | 236 | we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\% | |
| 249 | 237 | of the Nyquist frequency to the end of the band, as would be typically selected to prevent | ||
| \draw (0,14) node [left] { $P$ } ; | 250 | 238 | aliasing before decimating the dataflow by 2. The method is however generalized to any filter | |
| \draw (20,0) node [below] { $f$ } ; | 251 | 239 | shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid} | |
| 252 | 240 | as described below is indeed unique for each filter shape. | ||
| \draw[>=latex,<->] (0,14) -- (8,14) ; | 253 | 241 | ||
| \draw (4,14) node [above] { passband } node [below] { $40\%$ } ; | 254 | 242 | \begin{figure} | |
| 255 | 243 | \begin{center} | ||
| \draw[>=latex,<->] (8,14) -- (12,14) ; | 256 | 244 | \scalebox{0.8}{ | |
| \draw (10,14) node [above] { transition } node [below] { $20\%$ } ; | 257 | 245 | \centering | |
| 258 | 246 | \begin{tikzpicture}[scale=0.3] | ||
| \draw[>=latex,<->] (12,14) -- (20,14) ; | 259 | 247 | \draw[<->] (0,15) -- (0,0) -- (21,0) ; | |
| \draw (16,14) node [above] { stopband } node [below] { $40\%$ } ; | 260 | 248 | \draw[thick] (0,12) -- (8,12) -- (20,0) ; | |
| 261 | 249 | |||
| \draw[>=latex,<->] (16,12) -- (16,8) ; | 262 | 250 | \draw (0,14) node [left] { $P$ } ; | |
| \draw (16,10) node [right] { rejection } ; | 263 | 251 | \draw (20,0) node [below] { $f$ } ; | |
| 264 | 252 | |||
| \draw[dashed] (8,-1) -- (8,14) ; | 265 | 253 | \draw[>=latex,<->] (0,14) -- (8,14) ; | |
| \draw[dashed] (12,-1) -- (12,14) ; | 266 | 254 | \draw (4,14) node [above] { passband } node [below] { $40\%$ } ; | |
| 267 | 255 | |||
| \draw[dashed] (8,12) -- (16,12) ; | 268 | 256 | \draw[>=latex,<->] (8,14) -- (12,14) ; | |
| \draw[dashed] (12,8) -- (16,8) ; | 269 | 257 | \draw (10,14) node [above] { transition } node [below] { $20\%$ } ; | |
| 270 | 258 | |||
| \end{tikzpicture} | 271 | 259 | \draw[>=latex,<->] (12,14) -- (20,14) ; | |
| } | 272 | 260 | \draw (16,14) node [above] { stopband } node [below] { $40\%$ } ; | |
| \end{center} | 273 | 261 | ||
| \caption{Shape of the filter transmitted power $P$ as a function of frequency $f$: | 274 | 262 | \draw[>=latex,<->] (16,12) -- (16,8) ; | |
| the passband is considered to occupy the initial 40\% of the Nyquist frequency range, | 275 | 263 | \draw (16,10) node [right] { rejection } ; | |
| the stopband the last 40\%, allowing 20\% transition width.} | 276 | 264 | ||
| \label{fig:fir_mag} | 277 | 265 | \draw[dashed] (8,-1) -- (8,14) ; | |
| \end{figure} | 278 | 266 | \draw[dashed] (12,-1) -- (12,14) ; | |
| 279 | 267 | |||
| In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics. | 280 | 268 | \draw[dashed] (8,12) -- (16,12) ; | |
| Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches | 281 | 269 | \draw[dashed] (12,8) -- (16,8) ; | |
| overestimate the rejection capability of the filter. | 282 | 270 | ||
| An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values | 283 | 271 | \end{tikzpicture} | |
| within the passband is subtracted to avoid filters with excessive ripples, normalized to the | 284 | 272 | } | |
| bin width to remain consistent with the passband criterion (dBc/Hz units in all cases). | 285 | 273 | \end{center} | |
| In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples | 286 | 274 | \caption{Shape of the filter transmitted power $P$ as a function of frequency $f$: | |
| led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band. | 287 | 275 | the passband is considered to occupy the initial 40\% of the Nyquist frequency range, | |
| Hence, the final criterion considers the minimal rejection in the stopband to which the | 288 | 276 | the stopband the last 40\%, allowing 20\% transition width.} | |
| the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with | 289 | 277 | \label{fig:fir_mag} | |
| a 1~dB threshold on the latter quantity over which the filter is discarded. | 290 | 278 | \end{figure} | |
| With this | 291 | 279 | ||
| criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}. | 292 | 280 | In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics. | |
| The best filter has a correct rejection estimation and the worst filter | 293 | |||
| is discarded based on the excessive passband ripple criterion. | 294 | 281 | Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches | |
| 295 | 282 | overestimate the rejection capability of the filter. | ||
| \begin{figure} | 296 | 283 | An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values | |
| \centering | 297 | |||
| \includegraphics[width=\linewidth]{images/custom_criterion} | 298 | |||
| \caption{Selected filter qualification criterion computed as the maximum rejection in the stopband | 299 | |||
| minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded: | 300 | |||
| comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).} | 301 | 284 | within the passband is subtracted to avoid filters with excessive ripples, normalized to the | |
| \label{fig:custom_criterion} | 302 | 285 | bin width to remain consistent with the passband criterion (dBc/Hz units in all cases). | |
| \end{figure} | 303 | 286 | In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples | |
| 304 | 287 | led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band. | ||
| Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps | 305 | 288 | Hence, the final criterion considers the minimal rejection in the stopband to which the | |
| and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the | 306 | 289 | the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with | |
| rejection as a function of the number of coefficients and the number of bits representing these coefficients. | 307 | 290 | a 1~dB threshold on the latter quantity over which the filter is discarded. | |
| The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet. | 308 | |||
| Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection. | 309 | |||
| Conversely when setting the a given number of bits, increasing the number of coefficients will not improve | 310 | |||
| the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length | 311 | |||
| and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected | 312 | |||
| by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding | 313 | 291 | With this | |
| these filters and so is the solution search space. | 314 | 292 | criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}. | |
| 315 | 293 | The best filter has a correct rejection estimation and the worst filter | ||
| \begin{figure} | 316 | 294 | is discarded based on the excessive passband ripple criterion. | |
| \centering | 317 | |||
| \includegraphics[width=\linewidth]{images/rejection_pyramid} | 318 | |||
| \caption{Filter rejection as a function of number of coefficients and number of bits | 319 | |||
| : this lookup table will be used to identify which filter parameters -- number of bits | 320 | |||
| representing coefficients and number of coefficients -- best match the targeted transfer function. Filters | 321 | |||
| with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive | 322 | |||
| ripples in the passband.} | 323 | |||
| \label{fig:rejection_pyramid} | 324 | 295 | ||
| \end{figure} | 325 | 296 | \begin{figure} | |
| 326 | 297 | \centering | ||
| Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps), | 327 | 298 | \includegraphics[width=\linewidth]{images/custom_criterion} | |
| we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria. | 328 | 299 | \caption{Selected filter qualification criterion computed as the maximum rejection in the stopband | |
| If the FIR filter coefficients are the same between the stages, we have: | 329 | 300 | minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded: | |
| $$F_{total} = F_1 + F_2$$ | 330 | 301 | comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).} | |
| But selecting two different sets of coefficient will yield a more complex situation in which | 331 | 302 | \label{fig:custom_criterion} | |
| the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves | 332 | 303 | \end{figure} | |
| are two different filters with maximums and notches not located at the same frequency offsets. | 333 | 304 | ||
| Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved | 334 | 305 | Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps | |
| with respect to a basic sum of the rejection criteria shown as a the dotted yellow line. | 335 | 306 | and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the | |
| Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection | 336 | 307 | rejection as a function of the number of coefficients and the number of bits representing these coefficients. | |
| criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade, | 337 | 308 | The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet. | |
| this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability | 338 | 309 | Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection. | |
| of the filter cascade to meet design criteria. | 339 | 310 | Conversely when setting the a given number of bits, increasing the number of coefficients will not improve | |
| 340 | 311 | the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length | ||
| \begin{figure} | 341 | 312 | and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected | |
| \centering | 342 | 313 | by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding | |
| \includegraphics[width=\linewidth]{images/cascaded_criterion} | 343 | 314 | these filters and so is the solution search space. | |
| \caption{Transfer function of individual filters and after cascading the two filters, | 344 | 315 | ||
| demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal | 345 | 316 | \begin{figure} | |
| lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop | 346 | 317 | \centering | |
| maximum of each individual filter. | 347 | 318 | \includegraphics[width=\linewidth]{images/rejection_pyramid} | |
| } | 348 | 319 | \caption{Filter rejection as a function of number of coefficients and number of bits | |
| \label{fig:sum_rejection} | 349 | 320 | : this lookup table will be used to identify which filter parameters -- number of bits | |
| \end{figure} | 350 | 321 | representing coefficients and number of coefficients -- best match the targeted transfer function. Filters | |
| 351 | 322 | with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive | ||
| Finally in our case, we consider that the input signal are fully known. The | 352 | 323 | ripples in the passband.} | |
| resolution of the input data stream are fixed and still the same for all experiments | 353 | 324 | \label{fig:rejection_pyramid} | |
| in this paper. | 354 | 325 | \end{figure} | |
| 355 | 326 | |||
| Based on this analysis, we address the estimate of resource consumption (called | 356 | 327 | Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps), | |
| silicon area -- in the case of FPGAs this means processing cells) as a function of | 357 | 328 | we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria. | |
| filter characteristics. As a reminder, we do not aim at matching actual hardware | 358 | 329 | If the FIR filter coefficients are the same between the stages, we have: | |
| configuration but consider an arbitrary silicon area occupied by each processing function, | 359 | 330 | $$F_{total} = F_1 + F_2$$ | |
| and will assess after synthesis the adequation of this arbitrary unit with actual | 360 | 331 | But selecting two different sets of coefficient will yield a more complex situation in which | |
| hardware resources provided by FPGA manufacturers. The sum of individual processing | 361 | 332 | the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves | |
| unit areas is constrained by a total silicon area representative of FPGA global resources. | 362 | 333 | are two different filters with maximums and notches not located at the same frequency offsets. | |
| Formally, variable $a_i$ is the area taken by filter~$i$ | 363 | 334 | Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved | |
| (in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB). | 364 | 335 | with respect to a basic sum of the rejection criteria shown as a the dotted yellow line. | |
| Constant $\mathcal{A}$ is the total available area. We model our problem as follows: | 365 | |||
| 366 | 336 | Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection | ||
| \begin{align} | 367 | 337 | criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade, | |
| \text{Maximize } & \sum_{i=1}^n r_i \notag \\ | 368 | |||
| \sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\ | 369 | 338 | this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability | |
| a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\ | 370 | 339 | of the filter cascade to meet design criteria. | |
| r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\ | 371 | 340 | ||
| \pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\ | 372 | 341 | \begin{figure} | |
| \pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\ | 373 | 342 | \centering | |
| \pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\ | 374 | 343 | \includegraphics[width=\linewidth]{images/cascaded_criterion} | |
| \pi_1^- &= \Pi^I \label{eq:init} | 375 | 344 | \caption{Transfer function of individual filters and after cascading the two filters, | |
| \end{align} | 376 | 345 | demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal | |
| 377 | 346 | lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop | ||
| Equation~\ref{eq:area} states that the total area taken by the filters must be | 378 | 347 | maximum of each individual filter. | |
| less than the available area. Equation~\ref{eq:areadef} gives the definition of | 379 | 348 | } | |
| the area used by a filter, considered as the area of the FIR since the Shifter is | 380 | 349 | \label{fig:sum_rejection} | |
| assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size | 381 | 350 | \end{figure} | |
| $\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the | 382 | 351 | ||
| input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the | 383 | 352 | Finally in our case, we consider that the input signal are fully known. The | |
| definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined | 384 | 353 | resolution of the input data stream are fixed and still the same for all experiments | |
| previously. The Shifter does not introduce negative rejection as we will explain later, | 385 | 354 | in this paper. | |
| so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the | 386 | 355 | ||
| relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add | 387 | 356 | Based on this analysis, we address the estimate of resource consumption (called | |
| $\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes | 388 | |||
| $\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of | 389 | 357 | silicon area -- in the case of FPGAs this means processing cells) as a function of | |
| a filter is the same as the input number of bits of the next filter. | 390 | 358 | filter characteristics. As a reminder, we do not aim at matching actual hardware | |
| Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative | 391 | 359 | configuration but consider an arbitrary silicon area occupied by each processing function, | |
| rejection. Indeed, the results of the FIR can be right shifted without compromising | 392 | 360 | and will assess after synthesis the adequation of this arbitrary unit with actual | |
| the quality of the rejection until a threshold. Each bit of the output data | 393 | 361 | hardware resources provided by FPGA manufacturers. The sum of individual processing | |
| increases the maximum rejection level by 6~dB. We add one to take the sign bit | 394 | 362 | unit areas is constrained by a total silicon area representative of FPGA global resources. | |
| into account. If equation~\ref{eq:maxshift} was not present, the Shifter could | 395 | 363 | Formally, variable $a_i$ is the area taken by filter~$i$ | |
| shift too much and introduce some noise in the output data. Each supplementary | 396 | 364 | (in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB). | |
| shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is: | 397 | 365 | Constant $\mathcal{A}$ is the total available area. We model our problem as follows: | |
| $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$. | 398 | 366 | ||
| Finally, equation~\ref{eq:init} gives the number of bits of the global input. | 399 | 367 | \begin{align} | |
| 400 | 368 | \text{Maximize } & \sum_{i=1}^n r_i \notag \\ | ||
| This model is non-linear since we multiply some variable with another variable | 401 | 369 | \sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\ | |
| and it is even non-quadratic, as the cost function $F$ does not have a known | 402 | 370 | a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\ | |
| linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations. | 403 | 371 | r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\ | |
| This variable $p$ is defined by the user, and represents the number of different | 404 | 372 | \pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\ | |
| set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1} | 405 | 373 | \pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\ | |
| functions from GNU Octave) based on the targeted filter characteristics and implementation | 406 | 374 | \pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\ | |
| assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and | 407 | 375 | \pi_1^- &= \Pi^I \label{eq:init} | |
| $\pi_{ij}^C$ become constants and | 408 | 376 | \end{align} | |
| we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table) | 409 | 377 | ||
| for each configurations thanks to the rejection criterion. We also define the binary | 410 | 378 | Equation~\ref{eq:area} states that the total area taken by the filters must be | |
| variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ | 411 | 379 | less than the available area. Equation~\ref{eq:areadef} gives the definition of | |
| and 0 otherwise. The new equations are as follows: | 412 | 380 | the area used by a filter, considered as the area of the FIR since the Shifter is | |
| 413 | 381 | assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size | ||
| \begin{align} | 414 | 382 | $\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the | |
| a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ | 415 | 383 | input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the | |
| r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\ | 416 | 384 | definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined | |
| \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\ | 417 | 385 | previously. The Shifter does not introduce negative rejection as we will explain later, | |
| \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config} | 418 | 386 | so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the | |
| \end{align} | 419 | 387 | relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add | |
| 420 | 388 | $\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes | ||
| Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace | 421 | 389 | $\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of | |
| respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. | 422 | 390 | a filter is the same as the input number of bits of the next filter. | |
| Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. | 423 | 391 | Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative | |
| 424 | 392 | rejection. Indeed, the results of the FIR can be right shifted without compromising | ||
| The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} | 425 | 393 | the quality of the rejection until a threshold. Each bit of the output data | |
| we multiply | 426 | 394 | increases the maximum rejection level by 6~dB. We add one to take the sign bit | |
| $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can | 427 | 395 | into account. If equation~\ref{eq:maxshift} was not present, the Shifter could | |
| linearize this multiplication. The following formula shows how to linearize | 428 | 396 | shift too much and introduce some noise in the output data. Each supplementary | |
| this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$): | 429 | 397 | shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is: | |
| \begin{equation*} | 430 | 398 | $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$. | |
| m = x \times y \implies | 431 | 399 | Finally, equation~\ref{eq:init} gives the number of bits of the global input. | |
| \left \{ | 432 | 400 | ||
| \begin{split} | 433 | 401 | This model is non-linear since we multiply some variable with another variable | |
| m & \geq 0 \\ | 434 | 402 | and it is even non-quadratic, as the cost function $F$ does not have a known | |
| m & \leq y \times X^{max} \\ | 435 | 403 | linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations. | |
| m & \leq x \\ | 436 | |||
| m & \geq x - (1 - y) \times X^{max} \\ | 437 | |||
| \end{split} | 438 | |||
| \right . | 439 | |||
| \end{equation*} | 440 | |||
| So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is | 441 | |||
| assumed on hardware characteristics, | 442 | |||
| the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize | 443 | |||
| for us the quadratic problem so the model is left as is. This model | 444 | |||
| has $O(np)$ variables and $O(n)$ constraints. | 445 | 404 | This variable $p$ is defined by the user, and represents the number of different | |
| 446 | 405 | set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1} | ||
| Two problems will be addressed using the workflow described in the next section: on the one | 447 | 406 | functions from GNU Octave) based on the targeted filter characteristics and implementation | |
| hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary | 448 | 407 | assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and | |
| silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area | 449 | 408 | $\pi_{ij}^C$ become constants and | |
| for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the | 450 | 409 | we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table) | |
| objective function is replaced with: | 451 | 410 | for each configurations thanks to the rejection criterion. We also define the binary | |
| \begin{align} | 452 | 411 | variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ | |
| \text{Minimize } & \sum_{i=1}^n a_i \notag | 453 | 412 | and 0 otherwise. The new equations are as follows: | |
| \end{align} | 454 | 413 | ||
| We adapt our constraints of quadratic program to replace equation \ref{eq:area} | 455 | 414 | \begin{align} | |
| with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal | 456 | 415 | a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ | |
| rejection required. | 457 | 416 | r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\ | |
| 458 | 417 | \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\ | ||
| \begin{align} | 459 | 418 | \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config} | |
| \sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min} | 460 | 419 | \end{align} | |
| \end{align} | 461 | 420 | ||
| 462 | 421 | Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace | ||
| \section{Design workflow} | 463 | 422 | respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. | |
| \label{sec:workflow} | 464 | 423 | Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. | |
| 465 | 424 | |||
| In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area} | 466 | |||
| and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved | 467 | |||
| in the computation of the results. | 468 | |||
| 469 | ||||
| \begin{figure} | 470 | |||
| \centering | 471 | |||
| \begin{tikzpicture}[node distance=0.75cm and 2cm] | 472 | |||
| \node[draw,minimum size=1cm] (Solver) { Filter Solver } ; | 473 | |||
| \node (Start) [left= 3cm of Solver] { } ; | 474 | |||
| \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ; | 475 | |||
| \node (Input) [above= of TCL] { } ; | 476 | 425 | The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} | |
| \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ; | 477 | 426 | we multiply | |
| \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ; | 478 | 427 | $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can | |
| \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ; | 479 | 428 | linearize this multiplication. The following formula shows how to linearize | |
| \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ; | 480 | 429 | this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$): | |
| \node (Results) [left= of Postproc] { } ; | 481 | 430 | \begin{equation*} | |
| 482 | 431 | m = x \times y \implies | ||
| \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ; | 483 | 432 | \left \{ | |
| \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ; | 484 | 433 | \begin{split} | |
| \draw[->] (Solver) edge node [below] { (1a) } (TCL) ; | 485 | 434 | m & \geq 0 \\ | |
| \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ; | 486 | 435 | m & \leq y \times X^{max} \\ | |
| \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ; | 487 | 436 | m & \leq x \\ | |
| \draw[->,dashed] (Bitstream) -- (Deploy) ; | 488 | 437 | m & \geq x - (1 - y) \times X^{max} \\ | |
| \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ; | 489 | 438 | \end{split} | |
| \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ; | 490 | 439 | \right . | |
| \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; | 491 | 440 | \end{equation*} | |
| \draw[->] (Postproc) -- (Results) ; | 492 | 441 | So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is | |
| \end{tikzpicture} | 493 | 442 | assumed on hardware characteristics, | |
| \caption{Design workflow from the input parameters to the results allowing for | 494 | 443 | the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize | |
| a fully automated optimal solution search.} | 495 | 444 | for us the quadratic problem so the model is left as is. This model | |
| \label{fig:workflow} | 496 | 445 | has $O(np)$ variables and $O(n)$ constraints. | |
| \end{figure} | 497 | 446 | ||
| 498 | ||||
| The filter solver is a C++ program that takes as input the maximum area | 499 | |||
| $\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$, | 500 | |||
| the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates | 501 | |||
| the quadratic programs and uses the Gurobi solver to estimate the optimal results. | 502 | |||
| Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow}) | 503 | |||
| and a deploy script ((1b) on figure~\ref{fig:workflow}). | 504 | |||
| 505 | ||||
| The TCL script describes the whole digital processing chain from the beginning | 506 | |||
| (the raw signal data) to the end (the filtered data) in a language compatible | 507 | |||
| with proprietary synthesis software, namely Vivado for Xilinx and Quartus for | 508 | |||
| Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN) | 509 | |||
| generator inside the FPGA and $\Pi^I$ is fixed at 16~bits. | 510 | |||
| Then the script builds each stage of the chain with a generic FIR task that | 511 | |||
| comes from a skeleton library. The generic FIR is highly configurable | 512 | |||
| with the number of coefficients and the size of the coefficients. The coefficients | 513 | |||
| themselves are not stored in the script. | 514 | |||
| As the signal is processed in real-time, the output signal is stored as | 515 | |||
| consecutive bursts of data for post-processing, mainly assessing the consistency of the | 516 | |||
| implemented FIR cascade transfer function with the design criteria and the expected | 517 | |||
| transfer function. | 518 | |||
| 519 | ||||
| The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}). | 520 | |||
| We use the 2018.2 version of Xilinx Vivado and we execute the synthesized | 521 | |||
| bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series | 522 | |||
| FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to | 523 | |||
| provide a broadband noise source. | 524 | |||
| The board runs the Linux kernel and surrounding environment produced from the | 525 | |||
| Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring | 526 | |||
| the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and | 527 | |||
| fetching the results is automated. | 528 | 447 | Two problems will be addressed using the workflow described in the next section: on the one | |
| 529 | 448 | hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary | ||
| The deploy script uploads the bitstream to the board ((3) on | 530 | 449 | silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area | |
| figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers, | 531 | 450 | for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the | |
| configures the coefficients of the FIR filters. It then waits for the results | 532 | 451 | objective function is replaced with: | |
| and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}). | 533 | 452 | \begin{align} | |
| 534 | 453 | \text{Minimize } & \sum_{i=1}^n a_i \notag | ||
| Finally, an Octave post-processing script computes the final results thanks to | 535 | 454 | \end{align} | |
| the output data ((5) on figure~\ref{fig:workflow}). | 536 | 455 | We adapt our constraints of quadratic program to replace equation \ref{eq:area} | |
| The results are normalized so that the Power Spectrum Density (PSD) starts at zero | 537 | 456 | with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal | |
| and the different configurations can be compared. | 538 | 457 | rejection required. | |
| 539 | 458 | |||
| \section{Maximizing the rejection at fixed silicon area} | 540 | 459 | \begin{align} | |
| \label{sec:fixed_area} | 541 | 460 | \sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min} | |
| This section presents the output of the filter solver {\em i.e.} the computed | 542 | 461 | \end{align} | |
| configurations for each stage, the computed rejection and the computed silicon area. | 543 | 462 | ||
| Such results allow for understanding the choices made by the solver to compute its solutions. | 544 | 463 | \section{Design workflow} | |
| 545 | 464 | \label{sec:workflow} | ||
| The experimental setup is composed of three cases. The raw input is generated | 546 | 465 | ||
| by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$. | 547 | 466 | In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area} | |
| Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500 | 548 | 467 | and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved | |
| arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500. | 549 | 468 | in the computation of the results. | |
| The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$ | 550 | 469 | ||
| ranging from 2 to 22. In each case, the quadratic program has been able to give a | 551 | 470 | \begin{figure} | |
| result up to five stages ($n = 5$) in the cascaded filter. | 552 | 471 | \centering | |
| 553 | 472 | \begin{tikzpicture}[node distance=0.75cm and 2cm] | ||
| Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500. | 554 | 473 | \node[draw,minimum size=1cm] (Solver) { Filter Solver } ; | |
| Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000. | 555 | 474 | \node (Start) [left= 3cm of Solver] { } ; | |
| Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500. | 556 | 475 | \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ; | |
| 557 | 476 | \node (Input) [above= of TCL] { } ; | ||
| \renewcommand{\arraystretch}{1.4} | 558 | 477 | \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ; | |
| 559 | 478 | \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ; | ||
| \begin{table} | 560 | 479 | \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ; | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500} | 561 | 480 | \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ; | |
| \label{tbl:gurobi_max_500} | 562 | 481 | \node (Results) [left= of Postproc] { } ; | |
| \centering | 563 | 482 | ||
| {\scalefont{0.77} | 564 | 483 | \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ; | |
| \begin{tabular}{|c|ccccc|c|c|} | 565 | 484 | \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ; | |
| \hline | 566 | 485 | \draw[->] (Solver) edge node [below] { (1a) } (TCL) ; | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 567 | 486 | \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ; | |
| \hline | 568 | 487 | \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ; | |
| 1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\ | 569 | 488 | \draw[->,dashed] (Bitstream) -- (Deploy) ; | |
| 2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\ | 570 | 489 | \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ; | |
| 3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | 571 | 490 | \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ; | |
| 4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | 572 | 491 | \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; | |
| 5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | 573 | 492 | \draw[->] (Postproc) -- (Results) ; | |
| \hline | 574 | 493 | \end{tikzpicture} | |
| \end{tabular} | 575 | 494 | \caption{Design workflow from the input parameters to the results allowing for | |
| } | 576 | 495 | a fully automated optimal solution search.} | |
| \end{table} | 577 | 496 | \label{fig:workflow} | |
| 578 | 497 | \end{figure} | ||
| \begin{table} | 579 | 498 | ||
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000} | 580 | 499 | The filter solver is a C++ program that takes as input the maximum area | |
| \label{tbl:gurobi_max_1000} | 581 | 500 | $\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$, | |
| \centering | 582 | 501 | the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates | |
| {\scalefont{0.77} | 583 | 502 | the quadratic programs and uses the Gurobi solver to estimate the optimal results. | |
| \begin{tabular}{|c|ccccc|c|c|} | 584 | 503 | Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow}) | |
| \hline | 585 | 504 | and a deploy script ((1b) on figure~\ref{fig:workflow}). | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 586 | 505 | ||
| \hline | 587 | 506 | The TCL script describes the whole digital processing chain from the beginning | |
| 1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\ | 588 | 507 | (the raw signal data) to the end (the filtered data) in a language compatible | |
| 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ | 589 | 508 | with proprietary synthesis software, namely Vivado for Xilinx and Quartus for | |
| 3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\ | 590 | 509 | Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN) | |
| 4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ | 591 | 510 | generator inside the FPGA and $\Pi^I$ is fixed at 16~bits. | |
| 5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ | 592 | 511 | Then the script builds each stage of the chain with a generic FIR task that | |
| \hline | 593 | 512 | comes from a skeleton library. The generic FIR is highly configurable | |
| \end{tabular} | 594 | 513 | with the number of coefficients and the size of the coefficients. The coefficients | |
| } | 595 | 514 | themselves are not stored in the script. | |
| \end{table} | 596 | 515 | As the signal is processed in real-time, the output signal is stored as | |
| 597 | 516 | consecutive bursts of data for post-processing, mainly assessing the consistency of the | ||
| \begin{table} | 598 | 517 | implemented FIR cascade transfer function with the design criteria and the expected | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500} | 599 | 518 | transfer function. | |
| \label{tbl:gurobi_max_1500} | 600 | 519 | ||
| \centering | 601 | 520 | The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}). | |
| {\scalefont{0.77} | 602 | 521 | We use the 2018.2 version of Xilinx Vivado and we execute the synthesized | |
| \begin{tabular}{|c|ccccc|c|c|} | 603 | 522 | bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series | |
| \hline | 604 | 523 | FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 605 | 524 | provide a broadband noise source. | |
| \hline | 606 | 525 | The board runs the Linux kernel and surrounding environment produced from the | |
| 1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\ | 607 | 526 | Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring | |
| 2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\ | 608 | 527 | the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and | |
| 3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\ | 609 | 528 | fetching the results is automated. | |
| 4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ | 610 | 529 | ||
| 5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ | 611 | 530 | The deploy script uploads the bitstream to the board ((3) on | |
| \hline | 612 | 531 | figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers, | |
| \end{tabular} | 613 | 532 | configures the coefficients of the FIR filters. It then waits for the results | |
| } | 614 | 533 | and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}). | |
| \end{table} | 615 | 534 | ||
| 616 | 535 | Finally, an Octave post-processing script computes the final results thanks to | ||
| \renewcommand{\arraystretch}{1} | 617 | 536 | the output data ((5) on figure~\ref{fig:workflow}). | |
| 618 | 537 | The results are normalized so that the Power Spectrum Density (PSD) starts at zero | ||
| By analyzing these tables, we can first state that we reach an optimal solution | 619 | 538 | and the different configurations can be compared. | |
| for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover | 620 | 539 | ||
| the cascaded filters always exhibit better performance than the monolithic solution. | 621 | 540 | \section{Maximizing the rejection at fixed silicon area} | |
| It was an expected result as it has | 622 | 541 | \label{sec:fixed_area} | |
| been previously observed that many small filters are better than | 623 | 542 | This section presents the output of the filter solver {\em i.e.} the computed | |
| a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions | 624 | 543 | configurations for each stage, the computed rejection and the computed silicon area. | |
| being hardly used in practice due to the lack of tools for identifying individual filter | 625 | 544 | Such results allow for understanding the choices made by the solver to compute its solutions. | |
| coefficients in the cascaded approach. | 626 | 545 | ||
| 627 | 546 | The experimental setup is composed of three cases. The raw input is generated | ||
| Second, the larger the silicon area, the better the rejection. This was also an | 628 | 547 | by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$. | |
| expected result as more area means a filter of better quality with more coefficients | 629 | 548 | Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500 | |
| or more bits per coefficient. | 630 | 549 | arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500. | |
| 631 | 550 | The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$ | ||
| Then, we also observe that the first stage can have a larger shift than the other | 632 | 551 | ranging from 2 to 22. In each case, the quadratic program has been able to give a | |
| stages. This is explained by the fact that the solver tries to use just enough | 633 | 552 | result up to five stages ($n = 5$) in the cascaded filter. | |
| bits for the computed rejection after each stage. In the first stage, a | 634 | 553 | ||
| balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift} | 635 | 554 | Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500. | |
| gives the relation between both values. | 636 | 555 | Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000. | |
| 637 | 556 | Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500. | ||
| Finally, we note that the solver consumes all the given silicon area. | 638 | 557 | ||
| 639 | 558 | \renewcommand{\arraystretch}{1.4} | ||
| The following graphs present the rejection for real data on the FPGA. In all the following | 640 | 559 | ||
| figures, the solid line represents the actual rejection of the filtered | 641 | 560 | \begin{table} | |
| data on the FPGA as measured experimentally and the dashed line are the noise levels | 642 | 561 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500} | |
| given by the quadratic solver. The configurations are those computed in the previous section. | 643 | 562 | \label{tbl:gurobi_max_500} | |
| 644 | 563 | \centering | ||
| Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500. | 645 | 564 | {\scalefont{0.77} | |
| Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000. | 646 | |||
| Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500. | 647 | 565 | \begin{tabular}{|c|ccccc|c|c|} | |
| 648 | 566 | \hline | ||
| \begin{figure} | 649 | 567 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| \centering | 650 | 568 | \hline | |
| \begin{subfigure}{\linewidth} | 651 | 569 | 1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\ | |
| \includegraphics[width=\linewidth]{images/max_500} | 652 | 570 | 2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\ | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 653 | 571 | 3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | |
| the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).} | 654 | 572 | 4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | |
| \label{fig:max_500_result} | 655 | 573 | 5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ | |
| \end{subfigure} | 656 | 574 | \hline | |
| 657 | 575 | \end{tabular} | ||
| \begin{subfigure}{\linewidth} | 658 | 576 | } | |
| \includegraphics[width=\linewidth]{images/max_1000} | 659 | 577 | \end{table} | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 660 | 578 | ||
| the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).} | 661 | 579 | \begin{table} | |
| \label{fig:max_1000_result} | 662 | 580 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000} | |
| \end{subfigure} | 663 | 581 | \label{tbl:gurobi_max_1000} | |
| 664 | 582 | \centering | ||
| \begin{subfigure}{\linewidth} | 665 | 583 | {\scalefont{0.77} | |
| \includegraphics[width=\linewidth]{images/max_1500} | 666 | 584 | \begin{tabular}{|c|ccccc|c|c|} | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 667 | 585 | \hline | |
| the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).} | 668 | 586 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| \label{fig:max_1500_result} | 669 | 587 | \hline | |
| \end{subfigure} | 670 | 588 | 1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\ | |
| \caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing | 671 | 589 | 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ | |
| rejection for a given resource allocation. | 672 | 590 | 3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\ | |
| The filter shape constraint (bandpass and bandstop) is shown as thick | 673 | 591 | 4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ | |
| horizontal lines on each chart.} | 674 | 592 | 5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ | |
| \end{figure} | 675 | 593 | \hline | |
| 676 | 594 | \end{tabular} | ||
| In all cases, we observe that the actual rejection is close to the rejection computed by the solver. | 677 | 595 | } | |
| 678 | 596 | \end{table} | ||
| We compare the actual silicon resources given by Vivado to the | 679 | 597 | ||
| resources in arbitrary units. | 680 | 598 | \begin{table} | |
| The goal is to check that our arbitrary units of silicon area models well enough | 681 | 599 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500} | |
| the real resources on the FPGA. Especially we want to verify that, for a given | 682 | 600 | \label{tbl:gurobi_max_1500} | |
| number of arbitrary units, the actual silicon resources do not depend on the | 683 | 601 | \centering | |
| number of stages $n$. Most significantly, our approach aims | 684 | 602 | {\scalefont{0.77} | |
| at remaining far enough from the practical logic gate implementation used by | 685 | 603 | \begin{tabular}{|c|ccccc|c|c|} | |
| various vendors to remain platform independent and be portable from one | 686 | 604 | \hline | |
| architecture to another. | 687 | 605 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| 688 | 606 | \hline | ||
| Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and | 689 | 607 | 1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\ | |
| MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000 | 690 | 608 | 2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\ | |
| and 1500 arbitrary units. We have taken care to extract solely the resources used by | 691 | 609 | 3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\ | |
| the FIR filters and remove additional processing blocks including FIFO and Programmable | 692 | 610 | 4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ | |
| Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication. | 693 | 611 | 5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ | |
| 694 | 612 | \hline | ||
| \begin{table}[h!tb] | 695 | 613 | \end{tabular} | |
| \caption{Resource occupation following synthesis of the solutions found for | 696 | 614 | } | |
| the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} | 697 | 615 | \end{table} | |
| \label{tbl:resources_usage} | 698 | 616 | ||
| \centering | 699 | 617 | \renewcommand{\arraystretch}{1} | |
| \begin{tabular}{|c|c|ccc|c|} | 700 | 618 | ||
| \hline | 701 | 619 | By analyzing these tables, we can first state that we reach an optimal solution | |
| $n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline | 702 | |||
| & LUT & 249 & 453 & 627 & \emph{17600} \\ | 703 | |||
| 1 & BRAM & 1 & 1 & 1 & \emph{120} \\ | 704 | 620 | for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover | |
| & DSP & 21 & 37 & 47 & \emph{80} \\ \hline | 705 | 621 | the cascaded filters always exhibit better performance than the monolithic solution. | |
| & LUT & 2253 & 474 & 691 & \emph{17600} \\ | 706 | 622 | It was an expected result as it has | |
| 2 & BRAM & 2 & 2 & 2 & \emph{120} \\ | 707 | 623 | been previously observed that many small filters are better than | |
| & DSP & 0 & 50 & 70 & \emph{80} \\ \hline | 708 | 624 | a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions | |
| & LUT & 1329 & 2006 & 3158 & \emph{17600} \\ | 709 | 625 | being hardly used in practice due to the lack of tools for identifying individual filter | |
| 3 & BRAM & 3 & 3 & 3 & \emph{120} \\ | 710 | 626 | coefficients in the cascaded approach. | |
| & DSP & 15 & 30 & 42 & \emph{80} \\ \hline | 711 | 627 | ||
| & LUT & 1329 & 1600 & 2260 & \emph{17600} \\ | 712 | 628 | Second, the larger the silicon area, the better the rejection. This was also an | |
| 4 & BRAM & 3 & 4 & 4 & \emph{120} \\ | 713 | 629 | expected result as more area means a filter of better quality with more coefficients | |
| & DPS & 15 & 38 & 49 & \emph{80} \\ \hline | 714 | 630 | or more bits per coefficient. | |
| & LUT & 1329 & 1600 & 2260 & \emph{17600} \\ | 715 | 631 | ||
| 5 & BRAM & 3 & 4 & 4 & \emph{120} \\ | 716 | 632 | Then, we also observe that the first stage can have a larger shift than the other | |
| & DPS & 15 & 38 & 49 & \emph{80} \\ \hline | 717 | 633 | stages. This is explained by the fact that the solver tries to use just enough | |
| \end{tabular} | 718 | 634 | bits for the computed rejection after each stage. In the first stage, a | |
| \end{table} | 719 | 635 | balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift} | |
| 720 | 636 | gives the relation between both values. | ||
| In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that, | 721 | 637 | ||
| when the filter coefficients are small enough, or when the input size is small | 722 | 638 | Finally, we note that the solver consumes all the given silicon area. | |
| enough, Vivado optimizes resource consumption by selecting multiplexers to | 723 | 639 | ||
| implement the multiplications instead of a DSP. In this case, it is quite difficult | 724 | 640 | The following graphs present the rejection for real data on the FPGA. In all the following | |
| to compare the whole silicon budget. | 725 | 641 | figures, the solid line represents the actual rejection of the filtered | |
| 726 | 642 | data on the FPGA as measured experimentally and the dashed line are the noise levels | ||
| However, a rough estimation can be made with a simple equivalence: looking at | 727 | 643 | given by the quadratic solver. The configurations are those computed in the previous section. | |
| the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, | 728 | 644 | ||
| we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon | 729 | 645 | Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500. | |
| area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs, | 730 | 646 | Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000. | |
| 1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond | 731 | 647 | Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500. | |
| to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary | 732 | 648 | ||
| unit map well to actual hardware resources. The relatively small differences can probably be explained | 733 | |||
| by the optimizations done by Vivado based on the detailed map of available processing resources. | 734 | |||
| 735 | ||||
| We now present the computation time needed to solve the quadratic problem. | 736 | |||
| For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606 | 737 | |||
| clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve | 738 | |||
| the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic | 739 | |||
| problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units. | 740 | |||
| 741 | ||||
| \begin{table}[h!tb] | 742 | |||
| \caption{Time needed to solve the quadratic program with Gurobi} | 743 | |||
| \label{tbl:area_time} | 744 | |||
| \centering | 745 | |||
| \begin{tabular}{|c|c|c|c|}\hline | 746 | |||
| $n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline | 747 | |||
| 1 & 0.01~s & 0.02~s & 0.03~s \\ | 748 | |||
| 2 & 0.1~s & 1~s & 2~s \\ | 749 | |||
| 3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\ | 750 | |||
| 4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\ | 751 | |||
| 5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline | 752 | |||
| \end{tabular} | 753 | |||
| \end{table} | 754 | |||
| 755 | 649 | \begin{figure} | ||
| As expected, the computation time seems to rise exponentially with the number of stages. | 756 | 650 | \centering | |
| When the area is limited, the design exploration space is more limited and the solver is able to | 757 | 651 | \begin{subfigure}{\linewidth} | |
| find an optimal solution faster. | 758 | 652 | \includegraphics[width=\linewidth]{images/max_500} | |
| We also notice that the solution with $n$ greater than the optimal value | 759 | 653 | \caption{Filter transfer functions for varying number of cascaded filters solving | |
| takes more time to be found than the optimal one. This can be explained since the search space is | 760 | 654 | the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).} | |
| larger and we need more time to ensure that the previous solution (from the | 761 | 655 | \label{fig:max_500_result} | |
| smaller value of $n$) still remains the optimal solution. | 762 | 656 | \end{subfigure} | |
| 763 | 657 | |||
| \subsection{Minimizing resource occupation at fixed rejection} | 764 | 658 | \begin{subfigure}{\linewidth} | |
| \label{sec:fixed_rej} | 765 | 659 | \includegraphics[width=\linewidth]{images/max_1000} | |
| 766 | 660 | \caption{Filter transfer functions for varying number of cascaded filters solving | ||
| This section presents the results of the complementary quadratic program aimed at | 767 | 661 | the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).} | |
| minimizing the area occupation for a targeted rejection level. | 768 | 662 | \label{fig:max_1000_result} | |
| 769 | 663 | \end{subfigure} | ||
| The experimental setup is composed of four cases. The raw input is the same | 770 | 664 | ||
| as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$. | 771 | 665 | \begin{subfigure}{\linewidth} | |
| Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB. | 772 | 666 | \includegraphics[width=\linewidth]{images/max_1500} | |
| Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100. | 773 | 667 | \caption{Filter transfer functions for varying number of cascaded filters solving | |
| The number of configurations $p$ is the same as previous section. | 774 | 668 | the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).} | |
| 775 | 669 | \label{fig:max_1500_result} | ||
| Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40. | 776 | 670 | \end{subfigure} | |
| Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60. | 777 | 671 | \caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing | |
| Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80. | 778 | 672 | rejection for a given resource allocation. | |
| Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100. | 779 | 673 | The filter shape constraint (bandpass and bandstop) is shown as thick | |
| 780 | 674 | horizontal lines on each chart.} | ||
| \renewcommand{\arraystretch}{1.4} | 781 | 675 | \end{figure} | |
| 782 | 676 | |||
| \begin{table}[h!tb] | 783 | 677 | In all cases, we observe that the actual rejection is close to the rejection computed by the solver. | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40} | 784 | 678 | ||
| \label{tbl:gurobi_min_40} | 785 | 679 | We compare the actual silicon resources given by Vivado to the | |
| \centering | 786 | 680 | resources in arbitrary units. | |
| {\scalefont{0.77} | 787 | 681 | The goal is to check that our arbitrary units of silicon area models well enough | |
| \begin{tabular}{|c|ccccc|c|c|} | 788 | 682 | the real resources on the FPGA. Especially we want to verify that, for a given | |
| \hline | 789 | 683 | number of arbitrary units, the actual silicon resources do not depend on the | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 790 | 684 | number of stages $n$. Most significantly, our approach aims | |
| \hline | 791 | 685 | at remaining far enough from the practical logic gate implementation used by | |
| 1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\ | 792 | 686 | various vendors to remain platform independent and be portable from one | |
| 2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | 793 | 687 | architecture to another. | |
| 3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | 794 | 688 | ||
| 4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | 795 | 689 | Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and | |
| 5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | 796 | 690 | MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000 | |
| \hline | 797 | 691 | and 1500 arbitrary units. We have taken care to extract solely the resources used by | |
| \end{tabular} | 798 | 692 | the FIR filters and remove additional processing blocks including FIFO and Programmable | |
| } | 799 | 693 | Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication. | |
| \end{table} | 800 | 694 | ||
| 801 | 695 | \begin{table}[h!tb] | ||
| \begin{table}[h!tb] | 802 | 696 | \caption{Resource occupation following synthesis of the solutions found for | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60} | 803 | 697 | the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} | |
| \label{tbl:gurobi_min_60} | 804 | 698 | \label{tbl:resources_usage} | |
| \centering | 805 | |||
| {\scalefont{0.77} | 806 | 699 | \centering | |
| \begin{tabular}{|c|ccccc|c|c|} | 807 | 700 | \begin{tabular}{|c|c|ccc|c|} | |
| \hline | 808 | 701 | \hline | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 809 | 702 | $n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline | |
| \hline | 810 | 703 | & LUT & 249 & 453 & 627 & \emph{17600} \\ | |
| 1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\ | 811 | 704 | 1 & BRAM & 1 & 1 & 1 & \emph{120} \\ | |
| 2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\ | 812 | 705 | & DSP & 21 & 37 & 47 & \emph{80} \\ \hline | |
| 3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | 813 | 706 | & LUT & 2253 & 474 & 691 & \emph{17600} \\ | |
| 4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | 814 | 707 | 2 & BRAM & 2 & 2 & 2 & \emph{120} \\ | |
| 5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | 815 | 708 | & DSP & 0 & 50 & 70 & \emph{80} \\ \hline | |
| \hline | 816 | 709 | & LUT & 1329 & 2006 & 3158 & \emph{17600} \\ | |
| \end{tabular} | 817 | 710 | 3 & BRAM & 3 & 3 & 3 & \emph{120} \\ | |
| } | 818 | 711 | & DSP & 15 & 30 & 42 & \emph{80} \\ \hline | |
| \end{table} | 819 | 712 | & LUT & 1329 & 1600 & 2260 & \emph{17600} \\ | |
| 820 | 713 | 4 & BRAM & 3 & 4 & 4 & \emph{120} \\ | ||
| \begin{table}[h!tb] | 821 | 714 | & DPS & 15 & 38 & 49 & \emph{80} \\ \hline | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80} | 822 | 715 | & LUT & 1329 & 1600 & 2260 & \emph{17600} \\ | |
| \label{tbl:gurobi_min_80} | 823 | 716 | 5 & BRAM & 3 & 4 & 4 & \emph{120} \\ | |
| \centering | 824 | 717 | & DPS & 15 & 38 & 49 & \emph{80} \\ \hline | |
| {\scalefont{0.77} | 825 | 718 | \end{tabular} | |
| \begin{tabular}{|c|ccccc|c|c|} | 826 | 719 | \end{table} | |
| \hline | 827 | 720 | ||
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 828 | 721 | In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that, | |
| \hline | 829 | 722 | when the filter coefficients are small enough, or when the input size is small | |
| 1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\ | 830 | 723 | enough, Vivado optimizes resource consumption by selecting multiplexers to | |
| 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ | 831 | 724 | implement the multiplications instead of a DSP. In this case, it is quite difficult | |
| 3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | 832 | 725 | to compare the whole silicon budget. | |
| 4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | 833 | 726 | ||
| 5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | 834 | 727 | However, a rough estimation can be made with a simple equivalence: looking at | |
| \hline | 835 | 728 | the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, | |
| \end{tabular} | 836 | 729 | we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon | |
| } | 837 | 730 | area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs, | |
| \end{table} | 838 | 731 | 1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond | |
| 839 | 732 | to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary | ||
| \begin{table}[h!tb] | 840 | 733 | unit map well to actual hardware resources. The relatively small differences can probably be explained | |
| \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100} | 841 | 734 | by the optimizations done by Vivado based on the detailed map of available processing resources. | |
| \label{tbl:gurobi_min_100} | 842 | 735 | ||
| \centering | 843 | 736 | We now present the computation time needed to solve the quadratic problem. | |
| {\scalefont{0.77} | 844 | 737 | For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606 | |
| \begin{tabular}{|c|ccccc|c|c|} | 845 | 738 | clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve | |
| \hline | 846 | 739 | the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic | |
| $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | 847 | 740 | problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units. | |
| \hline | 848 | 741 | ||
| 1 & - & - & - & - & - & - & - \\ | 849 | 742 | \begin{table}[h!tb] | |
| 2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\ | 850 | 743 | \caption{Time needed to solve the quadratic program with Gurobi} | |
| 3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\ | 851 | 744 | \label{tbl:area_time} | |
| 4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ | 852 | 745 | \centering | |
| 5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ | 853 | |||
| \hline | 854 | 746 | \begin{tabular}{|c|c|c|c|}\hline | |
| \end{tabular} | 855 | 747 | $n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline | |
| } | 856 | 748 | 1 & 0.01~s & 0.02~s & 0.03~s \\ | |
| \end{table} | 857 | 749 | 2 & 0.1~s & 1~s & 2~s \\ | |
| \renewcommand{\arraystretch}{1} | 858 | 750 | 3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\ | |
| 859 | 751 | 4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\ | ||
| From these tables, we can first state that almost all configurations reach the targeted rejection | 860 | 752 | 5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline | |
| level or even better thanks to our underestimate of the cascade rejection as the sum of the | 861 | 753 | \end{tabular} | |
| individual filter rejection. The only exception is for the monolithic case ($n = 1$) in | 862 | 754 | \end{table} | |
| MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection. | 863 | 755 | ||
| Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters | 864 | 756 | As expected, the computation time seems to rise exponentially with the number of stages. | |
| (675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection | 865 | 757 | When the area is limited, the design exploration space is more limited and the solver is able to | |
| respectively). More generally, the more filters are cascaded, the lower the occupied area. | 866 | 758 | find an optimal solution faster. | |
| 867 | 759 | We also notice that the solution with $n$ greater than the optimal value | ||
| Like in previous section, the solver chooses always a little filter as first | 868 | 760 | takes more time to be found than the optimal one. This can be explained since the search space is | |
| filter stage and the second one is often the biggest filter. This choice can be explained | 869 | 761 | larger and we need more time to ensure that the previous solution (from the | |
| as in the previous section, with the solver using just enough bits not to degrade the input | 870 | 762 | smaller value of $n$) still remains the optimal solution. | |
| signal and in the second filter selecting a better filter to improve rejection without | 871 | 763 | ||
| having too many bits in the output data. | 872 | 764 | \subsection{Minimizing resource occupation at fixed rejection} | |
| 765 | \label{sec:fixed_rej} | |||
| 873 | 766 | |||
| For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$, | 874 | 767 | This section presents the results of the complementary quadratic program aimed at | |
| for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions | 875 | 768 | minimizing the area occupation for a targeted rejection level. | |
| when $n$ is greater than this optimal $n$ remain identical to the optimal one. | 876 | 769 | ||
| 877 | 770 | The experimental setup is composed of four cases. The raw input is the same | ||
| The following graphs present the rejection for real data on the FPGA. In all the following | 878 | 771 | as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$. | |
| figures, the solid line represents the actual rejection of the filtered | 879 | 772 | Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB. | |
| data on the FPGA as measured experimentally and the dashed line is the noise level | 880 | 773 | Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100. | |
| given by the quadratic solver. | 881 | 774 | The number of configurations $p$ is the same as previous section. | |
| 882 | 775 | |||
| Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40. | 883 | 776 | Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40. | |
| Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60. | 884 | 777 | Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60. | |
| Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80. | 885 | 778 | Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80. | |
| Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100. | 886 | 779 | Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100. | |
| 887 | 780 | |||
| \begin{figure} | 888 | 781 | \renewcommand{\arraystretch}{1.4} | |
| \centering | 889 | 782 | ||
| \begin{subfigure}{\linewidth} | 890 | 783 | \begin{table}[h!tb] | |
| \includegraphics[width=.91\linewidth]{images/min_40} | 891 | 784 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40} | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 892 | 785 | \label{tbl:gurobi_min_40} | |
| the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.} | 893 | 786 | \centering | |
| \label{fig:min_40} | 894 | 787 | {\scalefont{0.77} | |
| \end{subfigure} | 895 | 788 | \begin{tabular}{|c|ccccc|c|c|} | |
| 896 | 789 | \hline | ||
| \begin{subfigure}{\linewidth} | 897 | 790 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| \includegraphics[width=.91\linewidth]{images/min_60} | 898 | 791 | \hline | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 899 | 792 | 1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\ | |
| the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.} | 900 | 793 | 2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | |
| \label{fig:min_60} | 901 | 794 | 3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | |
| \end{subfigure} | 902 | 795 | 4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | |
| 903 | 796 | 5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ | ||
| \begin{subfigure}{\linewidth} | 904 | 797 | \hline | |
| \includegraphics[width=.91\linewidth]{images/min_80} | 905 | 798 | \end{tabular} | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 906 | 799 | } | |
| the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.} | 907 | 800 | \end{table} | |
| \label{fig:min_80} | 908 | 801 | ||
| \end{subfigure} | 909 | 802 | \begin{table}[h!tb] | |
| 910 | 803 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60} | ||
| \begin{subfigure}{\linewidth} | 911 | 804 | \label{tbl:gurobi_min_60} | |
| \includegraphics[width=.91\linewidth]{images/min_100} | 912 | 805 | \centering | |
| \caption{Filter transfer functions for varying number of cascaded filters solving | 913 | 806 | {\scalefont{0.77} | |
| the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.} | 914 | 807 | \begin{tabular}{|c|ccccc|c|c|} | |
| \label{fig:min_100} | 915 | 808 | \hline | |
| \end{subfigure} | 916 | 809 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| \caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a | 917 | 810 | \hline | |
| given rejection while minimizing resource allocation. The filter shape constraint (bandpass and | 918 | 811 | 1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\ | |
| bandstop) is shown as thick | 919 | 812 | 2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\ | |
| horizontal lines on each chart.} | 920 | 813 | 3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | |
| \end{figure} | 921 | 814 | 4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | |
| 922 | 815 | 5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ | ||
| We observe that all rejections given by the quadratic solver are close to the experimentally | 923 | 816 | \hline | |
| measured rejection. All curves prove that the constraint to reach the target rejection is | 924 | 817 | \end{tabular} | |
| respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters. | 925 | 818 | } | |
| 926 | 819 | \end{table} | ||
| Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60; | 927 | 820 | ||
| MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We | 928 | 821 | \begin{table}[h!tb] | |
| have taken care to extract solely the resources used by | 929 | 822 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80} | |
| the FIR filters and remove additional processing blocks including FIFO and PL to | 930 | 823 | \label{tbl:gurobi_min_80} | |
| PS communication. | 931 | 824 | \centering | |
| 932 | 825 | {\scalefont{0.77} | ||
| \renewcommand{\arraystretch}{1.2} | 933 | 826 | \begin{tabular}{|c|ccccc|c|c|} | |
| \begin{table} | 934 | 827 | \hline | |
| \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} | 935 | 828 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| \label{tbl:resources_usage_comp} | 936 | 829 | \hline | |
| \centering | 937 | 830 | 1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\ | |
| {\scalefont{0.90} | 938 | 831 | 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ | |
| \begin{tabular}{|c|c|cccc|c|} | 939 | 832 | 3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | |
| \hline | 940 | 833 | 4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | |
| $n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline | 941 | 834 | 5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ | |
| & LUT & 343 & 334 & 772 & - & \emph{17600} \\ | 942 | 835 | \hline | |
| 1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\ | 943 | 836 | \end{tabular} | |
| & DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline | 944 | 837 | } | |
| & LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\ | 945 | 838 | \end{table} | |
| 2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\ | 946 | 839 | ||
| & DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline | 947 | 840 | \begin{table}[h!tb] | |
| & LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\ | 948 | 841 | \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100} | |
| 3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\ | 949 | 842 | \label{tbl:gurobi_min_100} | |
| & DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline | 950 | 843 | \centering | |
| & LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\ | 951 | 844 | {\scalefont{0.77} | |
| 4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\ | 952 | 845 | \begin{tabular}{|c|ccccc|c|c|} | |
| & DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline | 953 | 846 | \hline | |
| & LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\ | 954 | 847 | $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ | |
| 5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\ | 955 | 848 | \hline | |
| & DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline | 956 | 849 | 1 & - & - & - & - & - & - & - \\ | |
| \end{tabular} | 957 | 850 | 2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\ | |
| } | 958 | 851 | 3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\ | |
| \end{table} | 959 | 852 | 4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ | |
| \renewcommand{\arraystretch}{1} | 960 | 853 | 5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ | |
| 961 | 854 | \hline | ||
| If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT) | 962 | 855 | \end{tabular} | |
| the real resource consumption decreases as a function of the number of stages in the cascaded | 963 | 856 | } | |
| filter according | 964 | 857 | \end{table} | |
| to the solution given by the quadratic solver. Indeed, we have always a decreasing | 965 | 858 | \renewcommand{\arraystretch}{1} | |
| consumption even if the difference between the monolithic and the two cascaded | 966 | 859 | ||
| filters is less than expected. | 967 | 860 | From these tables, we can first state that almost all configurations reach the targeted rejection | |
| 968 | 861 | level or even better thanks to our underestimate of the cascade rejection as the sum of the | ||
| Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve | 969 | 862 | individual filter rejection. The only exception is for the monolithic case ($n = 1$) in | |
| the quadratic program. | 970 | 863 | MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection. | |
| 971 | 864 | Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters | ||
| \renewcommand{\arraystretch}{1.2} | 972 | 865 | (675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection | |
| \begin{table}[h!tb] | 973 | 866 | respectively). More generally, the more filters are cascaded, the lower the occupied area. | |
| \caption{Time to solve the quadratic program with Gurobi} | 974 | 867 | ||
| \label{tbl:area_time_comp} | 975 | 868 | Like in previous section, the solver chooses always a little filter as first | |
| \centering | 976 | 869 | filter stage and the second one is often the biggest filter. This choice can be explained | |
| {\scalefont{0.90} | 977 | 870 | as in the previous section, with the solver using just enough bits not to degrade the input | |
| \begin{tabular}{|c|c|c|c|c|}\hline | 978 | 871 | signal and in the second filter selecting a better filter to improve rejection without | |
| $n$ & Time (MIN/40) & Time (MIN/60) & Time (MIN/80) & Time (MIN/100) \\\hline\hline | 979 | 872 | having too many bits in the output data. | |
| 1 & 0.04~s & 0.01~s & 0.01~s & - \\ | 980 | 873 | ||
| 2 & 2.7~s & 2.4~s & 2.4~s & 0.8~s \\ | 981 | 874 | For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$, | |
| 3 & 4.6~s & 7~s & 7~s & 18~s \\ | 982 | 875 | for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions | |
| 4 & 3~s & 22~s & 70~s & 220~s ($\approx$ 3~min) \\ | 983 | 876 | when $n$ is greater than this optimal $n$ remain identical to the optimal one. | |
| 5 & 5~s & 122~s & 200~s & 384~s ($\approx$ 5~min) \\\hline | 984 | |||
| \end{tabular} | 985 | |||
| } | 986 | |||
| \end{table} | 987 | 877 | ||
| \renewcommand{\arraystretch}{1} | 988 | 878 | The following graphs present the rejection for real data on the FPGA. In all the following | |
| 989 | 879 | figures, the solid line represents the actual rejection of the filtered | ||
| The time needed to solve this configuration is significantly shorter than the time | 990 | 880 | data on the FPGA as measured experimentally and the dashed line is the noise level | |
| needed in the previous section. Indeed the worst time in this case is only 5~minutes, | 991 | 881 | given by the quadratic solver. | |
| compared to 13~hours in the previous section: this problem is more easily solved than the | 992 | 882 | ||
| previous one. | 993 | 883 | Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40. | |
| 994 | 884 | Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60. | ||
| To conclude, we compare our monolithic filters with the FIR Compiler provided by | 995 | 885 | Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80. | |
| Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the | 996 | 886 | Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100. | |
| same coefficient set and we compare the resource consumption, having checked that | 997 | 887 | ||
| the transfer functions are indeed the same with both implementations. | 998 | |||
| Table~\ref{tbl:xilinx_resources} exhibits the results. | 999 | |||
| The FIR Compiler never uses BRAM while our filter implementation uses one block. This difference | 1000 | |||
| is explained be our wish to have a dynamically reconfigurable FIR filter whose | 1001 | |||
| coefficients can be updated from the processing system without having to update the FPGA design. | 1002 | |||
| With the FIR compiler, the coefficients are defined during the FPGA design so that | 1003 | |||
| changing coefficients required generating a new design. The difference with the LUT consumption | 1004 | |||
| is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest | 1005 | |||
| resource, is the same between the Xilinx FIR Compiler end | 1006 | |||
| our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation. | 1007 | |||
| 1008 | ||||
| \renewcommand{\arraystretch}{1.2} | 1009 | |||
| \begin{table} | 1010 | |||
| \centering | 1011 | |||
| \caption{Resource consumption compared between the FIR Compiler from Xilinx and our FIR block} | 1012 | |||
| \label{tbl:xilinx_resources} | 1013 | |||
| \begin{tabular}{|c|c|c|c|c|c|c|} | 1014 | |||
| \hline | 1015 | |||
| \multirow{2}{*}{} & \multicolumn{3}{c|}{Xilinx} & \multicolumn{3}{c|}{Our FIR block} \\ \cline{2-7} | 1016 | |||
| & LUT & BRAM & DSP & LUT & BRAM & DSP \\ \hline | 1017 | |||
| MAX/500 & 177 & 0 & 21 & 249 & 1 & 21 \\ \hline | 1018 | |||
| MAX/1000 & 306 & 0 & 37 & 453 & 1 & 37 \\ \hline | 1019 | |||
| MAX/1500 & 418 & 0 & 47 & 627 & 1 & 47 \\ \hline | 1020 |