Commit e8887034382ad7d93d7a8f2367bebf4fe4a263b3

Authored by jfriedt
Exists in master

Merge branch 'master' of https://lxsd.femto-st.fr/gitlab/jfriedt/ifcs2018-article

Showing 12 changed files Inline Diff

ifcs2018_journal.tex
\documentclass[a4paper,conference]{IEEEtran/IEEEtran} 1 1 \documentclass[a4paper,conference]{IEEEtran/IEEEtran}
\usepackage{graphicx,color,hyperref} 2 2 \usepackage{graphicx,color,hyperref}
\usepackage{amsfonts} 3 3 \usepackage{amsfonts}
\usepackage{amsthm} 4 4 \usepackage{amsthm}
\usepackage{amssymb} 5 5 \usepackage{amssymb}
\usepackage{amsmath} 6 6 \usepackage{amsmath}
\usepackage{algorithm2e} 7 7 \usepackage{algorithm2e}
\usepackage{url,balance} 8 8 \usepackage{url,balance}
\usepackage[normalem]{ulem} 9 9 \usepackage[normalem]{ulem}
\usepackage{tikz} 10 10 \usepackage{tikz}
\usetikzlibrary{positioning,fit} 11 11 \usetikzlibrary{positioning,fit}
\usepackage{multirow} 12 12 \usepackage{multirow}
\usepackage{scalefnt} 13 13 \usepackage{scalefnt}
14 14
% correct bad hyphenation here 15 15 % correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor} 16 16 \hyphenation{op-tical net-works semi-conduc-tor}
\textheight=26cm 17 17 \textheight=26cm
\setlength{\footskip}{30pt} 18 18 \setlength{\footskip}{30pt}
\pagenumbering{gobble} 19 19 \pagenumbering{gobble}
\begin{document} 20 20 \begin{document}
\title{Filter optimization for real time digital processing of radiofrequency signals: application 21 21 \title{Filter optimization for real time digital processing of radiofrequency signals: application
to oscillator metrology} 22 22 to oscillator metrology}
23 23
\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, 24 24 \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},
G. Goavec-M\'erou\IEEEauthorrefmark{1}, 25 25 G. Goavec-M\'erou\IEEEauthorrefmark{1},
P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}} 26 26 P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}
\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France } 27 27 \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }
\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ 28 28 \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\
Email: \{pyb2,jmfriedt\}@femto-st.fr} 29 29 Email: \{pyb2,jmfriedt\}@femto-st.fr}
} 30 30 }
\maketitle 31 31 \maketitle
\thispagestyle{plain} 32 32 \thispagestyle{plain}
\pagestyle{plain} 33 33 \pagestyle{plain}
\newtheorem{definition}{Definition} 34 34 \newtheorem{definition}{Definition}
35 35
\begin{abstract} 36 36 \begin{abstract}
Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to 37 37 Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to
radiofrequency signal processing. Applied to oscillator characterization in the context 38 38 radiofrequency signal processing. Applied to oscillator characterization in the context
of ultrastable clocks, stringent filtering requirements are defined by spurious signal or 39 39 of ultrastable clocks, stringent filtering requirements are defined by spurious signal or
noise rejection needs. Since real time radiofrequency processing must be performed in a 40 40 noise rejection needs. Since real time radiofrequency processing must be performed in a
Field Programmable Array to meet timing constraints, we investigate optimization strategies 41 41 Field Programmable Array to meet timing constraints, we investigate optimization strategies
to design filters meeting rejection characteristics while limiting the hardware resources 42 42 to design filters meeting rejection characteristics while limiting the hardware resources
required and keeping timing constraints within the targeted measurement bandwidths. 43 43 required and keeping timing constraints within the targeted measurement bandwidths.
\end{abstract} 44 44 \end{abstract}
45 45
\begin{IEEEkeywords} 46 46 \begin{IEEEkeywords}
Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter 47 47 Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter
\end{IEEEkeywords} 48 48 \end{IEEEkeywords}
49 49
\section{Digital signal processing of ultrastable clock signals} 50 50 \section{Digital signal processing of ultrastable clock signals}
51 51
Analog oscillator phase noise characteristics are classically performed by downconverting 52 52 Analog oscillator phase noise characteristics are classically performed by downconverting
the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, 53 53 the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,
followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In 54 54 followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In
a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by 55 55 a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by
multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. 56 56 multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.
57 57
\begin{figure}[h!tb] 58 58 \begin{figure}[h!tb]
\begin{center} 59 59 \begin{center}
\includegraphics[width=.8\linewidth]{images/schema} 60 60 \includegraphics[width=.8\linewidth]{images/schema}
\end{center} 61 61 \end{center}
\caption{Fully digital oscillator phase noise characterization: the Device Under Test 62 62 \caption{Fully digital oscillator phase noise characterization: the Device Under Test
(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and 63 63 (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and
downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals 64 64 downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals
and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite 65 65 and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite
Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays 66 66 Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays
the spectral characteristics of the phase fluctuations.} 67 67 the spectral characteristics of the phase fluctuations.}
\label{schema} 68 68 \label{schema}
\end{figure} 69 69 \end{figure}
70 70
As with the analog mixer, 71 71 As with the analog mixer,
the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as 72 72 the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as
well as the generation of the frequency sum signal in addition to the frequency difference. 73 73 well as the generation of the frequency sum signal in addition to the frequency difference.
These unwanted spectral characteristics must be rejected before decimating the data stream 74 74 These unwanted spectral characteristics must be rejected before decimating the data stream
for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the 75 75 for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the
downconverter 76 76 downconverter
and the decimation processing blocks are core characteristics of an oscillator characterization 77 77 and the decimation processing blocks are core characteristics of an oscillator characterization
system, and must reject out-of-band signals below the targeted phase noise -- typically in the 78 78 system, and must reject out-of-band signals below the targeted phase noise -- typically in the
sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will 79 79 sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will
use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency 80 80 use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency
datastream: optimizing the performance of the filter while reducing the needed resources is 81 81 datastream: optimizing the performance of the filter while reducing the needed resources is
hence tackled in a systematic approach using optimization techniques. Most significantly, we 82 82 hence tackled in a systematic approach using optimization techniques. Most significantly, we
tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with 83 83 tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with
tunable number of coefficients and tunable number of bits representing the coefficients and the 84 84 tunable number of coefficients and tunable number of bits representing the coefficients and the
data being processed. 85 85 data being processed.
86 86
\section{Finite impulse response filter} 87 87 \section{Finite impulse response filter}
88 88
We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined 89 89 We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined
by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the 90 90 by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the
outputs $y_k$ 91 91 outputs $y_k$
\begin{align} 92 92 \begin{align}
y_n=\sum_{k=0}^N b_k x_{n-k} 93 93 y_n=\sum_{k=0}^N b_k x_{n-k}
\label{eq:fir_equation} 94 94 \label{eq:fir_equation}
\end{align} 95 95 \end{align}
96 96
As opposed to an implementation on a general purpose processor in which word size is defined by the 97 97 As opposed to an implementation on a general purpose processor in which word size is defined by the
processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since 98 98 processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since
not only the coefficient values and number of taps must be defined, but also the number of bits 99 99 not only the coefficient values and number of taps must be defined, but also the number of bits
defining the coefficients and the sample size. For this reason, and because we consider pipeline 100 100 defining the coefficients and the sample size. For this reason, and because we consider pipeline
processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency 101 101 processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency
signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but 102 102 signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but
the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level. 103 103 the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level.
Since latency is not an issue in a openloop phase noise characterization instrument, the large 104 104 Since latency is not an issue in a openloop phase noise characterization instrument, the large
numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, 105 105 numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,
is not considered as an issue as would be in a closed loop system. 106 106 is not considered as an issue as would be in a closed loop system.
107 107
The coefficients are classically expressed as floating point values. However, this binary 108 108 The coefficients are classically expressed as floating point values. However, this binary
number representation is not efficient for fast arithmetic computation by an FPGA. Instead, 109 109 number representation is not efficient for fast arithmetic computation by an FPGA. Instead,
we select to quantify these floating point values into integer values. This quantization 110 110 we select to quantify these floating point values into integer values. This quantization
will result in some precision loss. 111 111 will result in some precision loss.
112 112
\begin{figure}[h!tb] 113 113 \begin{figure}[h!tb]
\includegraphics[width=\linewidth]{images/demo_filtre} 114 114 \includegraphics[width=\linewidth]{images/zero_values}
\caption{Impact of the quantization resolution of the coefficients: the quantization is 115 115 \caption{Impact of the quantization resolution of the coefficients: the quantization is
set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting 116 116 set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting
the 30~first and 30~last coefficients out of the initial 128~band-pass 117 117 the 30~first and 30~last coefficients out of the initial 128~band-pass
filter coefficients to 0 (red dots).} 118 118 filter coefficients to 0 (red dots).}
\label{float_vs_int} 119 119 \label{float_vs_int}
\end{figure} 120 120 \end{figure}
121 121
The tradeoff between quantization resolution and number of coefficients when considering 122 122 The tradeoff between quantization resolution and number of coefficients when considering
integer operations is not trivial. As an illustration of the issue related to the 123 123 integer operations is not trivial. As an illustration of the issue related to the
relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits 124 124 relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits
a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon 125 125 a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon
quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the 126 126 quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the
taps become null, making the large number of coefficients irrelevant and allowing to save 127 127 taps become null, making the large number of coefficients irrelevant and allowing to save
processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources 128 128 processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources
to reach a given rejection level, or maximizing out of band rejection for a given computational 129 129 to reach a given rejection level, or maximizing out of band rejection for a given computational
resource, will drive the investigation on cascading filters designed with varying tap resolution 130 130 resource, will drive the investigation on cascading filters designed with varying tap resolution
and tap length, as will be shown in the next section. Indeed, our development strategy closely 131 131 and tap length, as will be shown in the next section. Indeed, our development strategy closely
follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} 132 132 follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}
in which basic blocks are defined and characterized before being assembled \cite{hide} 133 133 in which basic blocks are defined and characterized before being assembled \cite{hide}
in a complete processing chain. In our case, assembling the filter blocks is a simpler block 134 134 in a complete processing chain. In our case, assembling the filter blocks is a simpler block
combination process since we assume a single value to be processed and a single value to be 135 135 combination process since we assume a single value to be processed and a single value to be
generated at each clock cycle. The FIR filters will not be considered to decimate in the 136 136 generated at each clock cycle. The FIR filters will not be considered to decimate in the
current implementation: the decimation is assumed to be located after the FIR cascade at the 137 137 current implementation: the decimation is assumed to be located after the FIR cascade at the
moment. 138 138 moment.
139 139
\section{Methodology description} 140 140 \section{Methodology description}
We want create a new methodology to develop any Digital Signal Processing (DSP) chain 141 141 We want create a new methodology to develop any Digital Signal Processing (DSP) chain
and for any hardware platform (Altera, Xilinx...). To do this we have defined an 142 142 and for any hardware platform (Altera, Xilinx...). To do this we have defined an
abstract model to represent some basic operations of DSP. 143 143 abstract model to represent some basic operations of DSP.
144 144
For the moment, we are focused on only two operations: the filtering and the shifting of data. 145 145 For the moment, we are focused on only two operations: the filtering and the shifting of data.
We have chosen this basic operation because the shifting and the filtering have already be studied in 146 146 We have chosen this basic operation because the shifting and the filtering have already be studied in
lot of works \cite{lim_1996, lim_1988, young_1992, smith_1998} hence it will be easier 147 147 lot of works \cite{lim_1996, lim_1988, young_1992, smith_1998} hence it will be easier
to check and validate our results. 148 148 to check and validate our results.
149 149
However having only two operations is insufficient to work with complex DSP but 150 150 However having only two operations is insufficient to work with complex DSP but
in this paper we only want demonstrate the relevance and the efficiency of our approach. 151 151 in this paper we only want demonstrate the relevance and the efficiency of our approach.
In future work it will be possible to add more operations and we are able to 152 152 In future work it will be possible to add more operations and we are able to
model any DSP chain. 153 153 model any DSP chain.
154 154
We will apply our methodology on very simple DSP chain. We generate a digital signal 155 155 We will apply our methodology on very simple DSP chain. We generate a digital signal
thanks at generator of Pseudo-Random Number (PRN) or thanks at an Analog to Digital 156 156 thanks at generator of Pseudo-Random Number (PRN) or thanks at an Analog to Digital
Converter (ADC). Once we have a digital signal, we filter it to decrease the noise level. 157 157 Converter (ADC). Once we have a digital signal, we filter it to decrease the noise level.
Finally we stored some burst of filtered samples before post-processing it. 158 158 Finally we stored some burst of filtered samples before post-processing it.
% TODO: faire un schéma 159 159 % TODO: faire un schéma
In this particular case, we want optimize the filtering step to have the best noise 160 160 In this particular case, we want optimize the filtering step to have the best noise
rejection for constrain number of resource or to have the minimal resources 161 161 rejection for constrain number of resource or to have the minimal resources
consumption for a given rejection objective. 162 162 consumption for a given rejection objective.
163 163
The first step of our approach is to model the DSP chain and since we just optimize 164 164 The first step of our approach is to model the DSP chain and since we just optimize
the filtering, we have not modeling the PRN generator or the ADC. The filtering can be 165 165 the filtering, we have not modeling the PRN generator or the ADC. The filtering can be
done by two ways. The first one we use only one FIR filter with lot of coefficients 166 166 done by two ways. The first one we use only one FIR filter with lot of coefficients
to rejection the noise, we called this approach a monolithic approach. And the second one 167 167 to rejection the noise, we called this approach a monolithic approach. And the second one
we select different FIR filters with less coefficients the monolithic filter and we cascaded 168 168 we select different FIR filters with less coefficients the monolithic filter and we cascaded
it to filtering the signal. 169 169 it to filtering the signal.
170 170
After each filter we leave the possibility of shifting the filtered data to consume 171 171 After each filter we leave the possibility of shifting the filtered data to consume
less resources. Hence in the case of cascaded filter, we define a stage as a filter 172 172 less resources. Hence in the case of cascaded filter, we define a stage as a filter
and a shifter (the shift could be omitted if we do not need to divide the filtered data). 173 173 and a shifter (the shift could be omitted if we do not need to divide the filtered data).
174 174
\subsection{Model of a FIR filter} 175 175 \subsection{Model of a FIR filter}
A cascade of filter are composed of $n$ stage. In stage $i$ ($1 \leq i \leq n$) 176 176 A cascade of filter are composed of $n$ stage. In stage $i$ ($1 \leq i \leq n$)
the FIR has $C_i$ coefficients and each coefficients are integer values with $\pi^C_i$ 177 177 the FIR has $C_i$ coefficients and each coefficients are integer values with $\pi^C_i$
bits and the filtered data are shifted of $\pi^S_i$ bits. We define also $\pi^-_i$ as 178 178 bits and the filtered data are shifted of $\pi^S_i$ bits. We define also $\pi^-_i$ as
the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage} 179 179 the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}
shows a filtering stage. 180 180 shows a filtering stage.
181 181
\begin{figure} 182 182 \begin{figure}
\centering 183 183 \centering
\begin{tikzpicture}[node distance=2cm] 184 184 \begin{tikzpicture}[node distance=2cm]
\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ; 185 185 \node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;
\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ; 186 186 \node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;
\node (Start) [left of=FIR] { } ; 187 187 \node (Start) [left of=FIR] { } ;
\node (End) [right of=Shift] { } ; 188 188 \node (End) [right of=Shift] { } ;
189 189
\node[draw,fit=(FIR) (Shift)] (Filter) { } ; 190 190 \node[draw,fit=(FIR) (Shift)] (Filter) { } ;
191 191
\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ; 192 192 \draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;
\draw[->] (FIR) -- (Shift) ; 193 193 \draw[->] (FIR) -- (Shift) ;
\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ; 194 194 \draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;
\end{tikzpicture} 195 195 \end{tikzpicture}
\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)} 196 196 \caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}
\label{fig:fir_stage} 197 197 \label{fig:fir_stage}
\end{figure} 198 198 \end{figure}
199 199
FIR $i$ can reject $F(C_i, \pi_i^C)$ dB. $F$ is determined numerically. 200 200 FIR $i$ can reject $F(C_i, \pi_i^C)$ dB. $F$ is determined numerically.
To measure this rejection, we use GNU Octave software to design FIR filter coefficients thanks to two 201 201 To measure this rejection, we use GNU Octave software to design FIR filter coefficients thanks to two
algorithms (\texttt{firls} and \texttt{fir1}). 202 202 algorithms (\texttt{firls} and \texttt{fir1}).
For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients. 203 203 For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.
Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively, 204 204 Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,
the coefficients are normalized by their absolute maximum before being scaled to integer coefficients. 205 205 the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.
At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the other are coded on very fewer bits. 206 206 At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the other are coded on very fewer bits.
207 207
With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter. 208 208 With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter.
Comparing the performance between FIRs requires however a unique criterion. As shown in figure~\ref{fig:fir_mag}, 209 209 Comparing the performance between FIRs requires however a unique criterion. As shown in figure~\ref{fig:fir_mag},
the FIR magnitude exhibits two parts. 210 210 the FIR magnitude exhibits two parts.
211 211
\begin{figure} 212 212 \begin{figure}
\centering 213 213 \centering
\begin{tikzpicture}[scale=0.3] 214 214 \begin{tikzpicture}[scale=0.3]
\draw[<->] (0,15) -- (0,0) -- (21,0) ; 215 215 \draw[<->] (0,15) -- (0,0) -- (21,0) ;
\draw[thick] (0,12) -- (8,12) -- (20,0) ; 216 216 \draw[thick] (0,12) -- (8,12) -- (20,0) ;
217 217
\draw (0,14) node [left] { $P$ } ; 218 218 \draw (0,14) node [left] { $P$ } ;
\draw (20,0) node [below] { $f$ } ; 219 219 \draw (20,0) node [below] { $f$ } ;
220 220
\draw[>=latex,<->] (0,14) -- (8,14) ; 221 221 \draw[>=latex,<->] (0,14) -- (8,14) ;
\draw (4,14) node [above] { passband } node [below] { $40\%$ } ; 222 222 \draw (4,14) node [above] { passband } node [below] { $40\%$ } ;
223 223
\draw[>=latex,<->] (8,14) -- (12,14) ; 224 224 \draw[>=latex,<->] (8,14) -- (12,14) ;
\draw (10,14) node [above] { transition } node [below] { $20\%$ } ; 225 225 \draw (10,14) node [above] { transition } node [below] { $20\%$ } ;
226 226
\draw[>=latex,<->] (12,14) -- (20,14) ; 227 227 \draw[>=latex,<->] (12,14) -- (20,14) ;
\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ; 228 228 \draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;
229 229
\draw[>=latex,<->] (16,12) -- (16,8) ; 230 230 \draw[>=latex,<->] (16,12) -- (16,8) ;
\draw (16,10) node [right] { rejection } ; 231 231 \draw (16,10) node [right] { rejection } ;
232 232
\draw[dashed] (8,-1) -- (8,14) ; 233 233 \draw[dashed] (8,-1) -- (8,14) ;
\draw[dashed] (12,-1) -- (12,14) ; 234 234 \draw[dashed] (12,-1) -- (12,14) ;
235 235
\draw[dashed] (8,12) -- (16,12) ; 236 236 \draw[dashed] (8,12) -- (16,12) ;
\draw[dashed] (12,8) -- (16,8) ; 237 237 \draw[dashed] (12,8) -- (16,8) ;
238 238
\end{tikzpicture} 239 239 \end{tikzpicture}
240 240
% \includegraphics[width=.5\linewidth]{images/fir_magnitude} 241 241 % \includegraphics[width=.5\linewidth]{images/fir_magnitude}
\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$: 242 242 \caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:
the passband is considered to occupy the initial 40\% of the Nyquist frequency range, 243 243 the passband is considered to occupy the initial 40\% of the Nyquist frequency range,
the stopband the last 40\%, allowing 20\% transition width.} 244 244 the stopband the last 40\%, allowing 20\% transition width.}
\label{fig:fir_mag} 245 245 \label{fig:fir_mag}
\end{figure} 246 246 \end{figure}
247 247
In the transition band, the behavior of the filter is left free, we only care about the passband and the stopband. 248 248 In the transition band, the behavior of the filter is left free, we only care about the passband and the stopband.
Our first criterion considers the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion does not work because we do not consider the shape of the passband. 249 249 Our first criterion considers the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion does not work because we do not consider the shape of the passband.
A second criterion considers the maximum rejection within the stopband minus the mean of the absolute value of passband rejection. With this criterion, the results are significantly improved as shown in figure~\ref{fig:custom_criterion}. 250 250 A second criterion considers the maximum rejection within the stopband minus the mean of the absolute value of passband rejection. With this criterion, the results are significantly improved as shown in figure~\ref{fig:custom_criterion}.
251 251
\begin{figure} 252 252 \begin{figure}
\centering 253 253 \centering
\includegraphics[width=\linewidth]{images/mean_criterion} 254 254 \includegraphics[width=\linewidth]{images/colored_mean_criterion}
\caption{Mean criterion comparison between monolithic filter and cascade filters} 255 255 \caption{Mean criterion comparison between monolithic filter and cascade filters}
\label{fig:mean_criterion} 256 256 \label{fig:mean_criterion}
\end{figure} 257 257 \end{figure}
258 258
\begin{figure} 259 259 \begin{figure}
\centering 260 260 \centering
\includegraphics[width=\linewidth]{images/custom_criterion} 261 261 \includegraphics[width=\linewidth]{images/colored_custom_criterion}
\caption{Custom criterion comparison between monolithic filter and cascade filters} 262 262 \caption{Custom criterion comparison between monolithic filter and cascade filters}
\label{fig:custom_criterion} 263 263 \label{fig:custom_criterion}
\end{figure} 264 264 \end{figure}
265 265
Although we have a efficient criterion to estimate the rejection of one set of coefficient 266 266 Although we have a efficient criterion to estimate the rejection of one set of coefficient
we have a problem when we sum two or more criterion. If the FIR filter coefficients are the same 267 267 we have a problem when we sum two or more criterion. If the FIR filter coefficients are the same
between the stage, we have: 268 268 between the stage, we have:
$$F_{total} = F_1 + F_2$$ 269 269 $$F_{total} = F_1 + F_2$$
But when we choose two different set of coefficient, the previous equality are not 270 270 But when we choose two different set of coefficient, the previous equality are not
true. The figure~\ref{fig:sum_rejection} illustrates the problem. The red and blue curves 271 271 true. The figure~\ref{fig:sum_rejection} illustrates the problem. The red and blue curves
are two different filter coefficient and we can see that their maximum on the stopband 272 272 are two different filter coefficient and we can see that their maximum on the stopband
are not at the same frequency. So when we sum the rejection criteria (the dotted yellow line) 273 273 are not at the same frequency. So when we sum the rejection criteria (the dotted yellow line)
we do not meet the dashed yellow line. Define the rejection of cascaded filters 274 274 we do not meet the dashed yellow line. Define the rejection of cascaded filters
is more difficult than just take the summation between all the rejection criteria of each filter. 275 275 is more difficult than just take the summation between all the rejection criteria of each filter.
However this summation gives us an upper bound for rejection although in fact we obtain 276 276 However this summation gives us an upper bound for rejection although in fact we obtain
better rejection than expected. 277 277 better rejection than expected.
278 278
\begin{figure} 279 279 \begin{figure}
\centering 280 280 \centering
\includegraphics[width=\linewidth]{images/sum_rejection} 281 281 \includegraphics[width=\linewidth]{images/cascaded_criterion}
\caption{Rejection of two cascaded filters} 282 282 \caption{Rejection of two cascaded filters}
\label{fig:sum_rejection} 283 283 \label{fig:sum_rejection}
\end{figure} 284 284 \end{figure}
285 285
286 The first problem we address is to maximize the rejection under bounded silicon area
287 and feasibility constraints. Variable $a_i$ is the area taken by filter~$i$
288 (in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).
289 Constant $\mathcal{A}$ is the total available area. We model our problem as follows:
290
Finally we can describe our abstract model with following expressions : 286 291 Finally we can describe our abstract model with following expressions :
\begin{align} 287 292 \begin{align}
\text{Maximize } & \sum_{i=1}^n r_i \notag \\ 288 293 \text{Maximize } & \sum_{i=1}^n r_i \notag \\
\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\ 289 294 \sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\
a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\ 290 295 a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\
r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\ 291 296 r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\
\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\ 292 297 \pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\
\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\ 293 298 \pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\
\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\ 294 299 \pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\
\pi_1^- &= \Pi^I \label{eq:init} 295 300 \pi_1^- &= \Pi^I \label{eq:init}
\end{align} 296 301 \end{align}
297 302
{\color{red} Je sais que l'idée est de ne pas parler du programme linéaire mais 298
ça me semble quand même indispensable. Au pire, j'essaierai de revoir ça si on 299
est vraiment en manque de place.} 300
301
Equation~\ref{eq:area} states that the total area taken by the filters must be 302 303 Equation~\ref{eq:area} states that the total area taken by the filters must be
less than the available area. Equation~\ref{eq:areadef} gives the definition of 303 304 less than the available area. Equation~\ref{eq:areadef} gives the definition of
the area for a filter. More precisely, it is the area of the FIR as the Shifter 304 305 the area for a filter. More precisely, it is the area of the FIR as the Shifter
does not need any circuitry. We consider that the FIR needs $C_i$ registers of size 305 306 does not need any circuitry. We consider that the FIR needs $C_i$ registers of size
$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the 306 307 $\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the
input data and the coefficients. Equation~\ref{eq:rejectiondef} gives the 307 308 input data and the coefficients. Equation~\ref{eq:rejectiondef} gives the
definition of the rejection of the filter thanks to function~$F$ that we defined 308 309 definition of the rejection of the filter thanks to function~$F$ that we defined
previously. The Shifter does not introduce negative rejection as we explain later, 309 310 previously. The Shifter does not introduce negative rejection as we explain later,
so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the 310 311 so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the
relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add 311 312 relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add
$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes 312 313 $\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes
$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of 313 314 $\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of
a filter is the same as the input number of bits of the next filter. 314 315 a filter is the same as the input number of bits of the next filter.
Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative 315 316 Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative
rejection. Indeed, the results of the FIR can be right shifted without compromising 316 317 rejection. Indeed, the results of the FIR can be right shifted without compromising
the quality of the rejection until a threshold. Each bit of the output data 317 318 the quality of the rejection until a threshold. Each bit of the output data
increases the maximum rejection level of 6~dB. We add one to take the sign bit 318 319 increases the maximum rejection level of 6~dB. We add one to take the sign bit
into account. If equation~\ref{eq:maxshift} was not present, the Shifter could 319 320 into account. If equation~\ref{eq:maxshift} was not present, the Shifter could
shift too much and introduce some noise in the output data. Each supplementary 320 321 shift too much and introduce some noise in the output data. Each supplementary
shift bit would cause 6~dB of noise. A totally equivalent equation is: 321 322 shift bit would cause 6~dB of noise. A totally equivalent equation is:
$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right) $. 322 323 $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right) $.
Finally, equation~\ref{eq:init} gives the global input's number of bits. 323 324 Finally, equation~\ref{eq:init} gives the global input's number of bits.
324 325
This model is non-linear and even non-quadratic, as $F$ does not have a known 325 326 This model is non-linear and even non-quadratic, as $F$ does not have a known
linear or quadratic expression. We introduce $p$ FIR configurations 326 327 linear or quadratic expression. We introduce $p$ FIR configurations
$(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants. We define binary 327 328 $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants. We define binary
variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ 328 329 variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
and 0 otherwise. The new equations are as follows: 329 330 and 0 otherwise. The new equations are as follows:
330 331
\begin{align} 331 332 \begin{align}
a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ 332 333 a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\ 333 334 r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\ 334 335 \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config} 335 336 \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
\end{align} 336 337 \end{align}
337 338
Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace 338 339 Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. 339 340 respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. 340 341 Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
341 342
The next section shows the results for this quadratic program but the section~\ref{sec:fixed_rej} 342 343 This modified model is quadratic, and it can be linearised if necessary. The Gurobi
344 (\url{www.gurobi.com}) optimization software is used to solve this quadratic
345 model, and since Gurobi is able to linearize, the model is left as is. This model
346 has $O(np)$ variables and $O(n)$ constraints.
347
348 The section~\ref{sec:fixed_area} shows the results for the first version of quadratic program but the section~\ref{sec:fixed_rej}
presents the results for the complementary problem. In this case we want 343 349 presents the results for the complementary problem. In this case we want
minimize the occupied area for a targeted rejection level. Hence we have replace 344 350 minimize the occupied area for a targeted rejection level. Hence we have replace
the objective function with: 345 351 the objective function with:
\begin{align} 346 352 \begin{align}
\text{Minimize } & \sum_{i=1}^n a_i \notag 347 353 \text{Minimize } & \sum_{i=1}^n a_i \notag
\end{align} 348 354 \end{align}
We adapt our constraints of quadratic program to replace the equation \ref{eq:area} 349 355 We adapt our constraints of quadratic program to replace the equation \ref{eq:area}
by the equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal 350 356 by the equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal
rejection required. 351 357 rejection required.
352 358
\begin{align} 353 359 \begin{align}
\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min} 354 360 \sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}
\end{align} 355 361 \end{align}
356 362
\section{Design workflow} 357 363 \section{Design workflow}
\label{sec:workflow} 358 364 \label{sec:workflow}
359 365
In this section, we describe the workflow to compute all the results presented in section~\ref{sec:fixed_area}. 360 366 In this section, we describe the workflow to compute all the results presented in section~\ref{sec:fixed_area}.
Figure~\ref{fig:workflow} shows the global workflow and the different steps involved in the computations of the results. 361 367 Figure~\ref{fig:workflow} shows the global workflow and the different steps involved in the computations of the results.
362 368
\begin{figure} 363 369 \begin{figure}
\centering 364 370 \centering
\begin{tikzpicture}[node distance=0.75cm and 2cm] 365 371 \begin{tikzpicture}[node distance=0.75cm and 2cm]
\node[draw,minimum size=1cm] (Solver) { Filter Solver } ; 366 372 \node[draw,minimum size=1cm] (Solver) { Filter Solver } ;
\node (Start) [left= 3cm of Solver] { } ; 367 373 \node (Start) [left= 3cm of Solver] { } ;
\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ; 368 374 \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;
\node (Input) [above= of TCL] { } ; 369 375 \node (Input) [above= of TCL] { } ;
\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ; 370 376 \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;
\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ; 371 377 \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;
\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ; 372 378 \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;
\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ; 373 379 \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;
\node (Results) [left= of Postproc] { } ; 374 380 \node (Results) [left= of Postproc] { } ;
375 381
\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ; 376 382 \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;
\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ; 377 383 \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;
\draw[->] (Solver) edge node [below] { (1a) } (TCL) ; 378 384 \draw[->] (Solver) edge node [below] { (1a) } (TCL) ;
\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ; 379 385 \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;
\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ; 380 386 \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;
\draw[->,dashed] (Bitstream) -- (Deploy) ; 381 387 \draw[->,dashed] (Bitstream) -- (Deploy) ;
\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ; 382 388 \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;
\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ; 383 389 \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;
\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; 384 390 \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
\draw[->] (Postproc) -- (Results) ; 385 391 \draw[->] (Postproc) -- (Results) ;
\end{tikzpicture} 386 392 \end{tikzpicture}
\caption{Design workflow from the input parameters to the results} 387 393 \caption{Design workflow from the input parameters to the results}
\label{fig:workflow} 388 394 \label{fig:workflow}
\end{figure} 389 395 \end{figure}
390 396
The filter solver is a C++ program that takes as input the maximum area 391 397 The filter solver is a C++ program that takes as input the maximum area
$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$, 392 398 $\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,
the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates 393 399 the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates
the quadratic programs and uses the Gurobi solver to get the optimal results. 394 400 the quadratic programs and uses the Gurobi solver to get the optimal results.
Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow}) 395 401 Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})
and a deploy script ((1b) on figure~\ref{fig:workflow}). 396 402 and a deploy script ((1b) on figure~\ref{fig:workflow}).
397 403
The TCL script describes the whole digital processing chain from the beginning 398 404 The TCL script describes the whole digital processing chain from the beginning
(the raw signal data) to the end (the filtered data). 399 405 (the raw signal data) to the end (the filtered data).
The raw input data generated from a Pseudo Random Number (PRN) 400 406 The raw input data generated from a Pseudo Random Number (PRN)
generator inside the FPGA and $\Pi^I$ is fixed at 16~bits. 401 407 generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.
Then the script builds each stage of the chain with a generic FIR task that 402 408 Then the script builds each stage of the chain with a generic FIR task that
comes from a skeleton library. The generic FIR is highly configurable 403 409 comes from a skeleton library. The generic FIR is highly configurable
with the number of coefficients and the size of the coefficients. The coefficients 404 410 with the number of coefficients and the size of the coefficients. The coefficients
themselves are not stored in the script. 405 411 themselves are not stored in the script.
Whereas the signal is processed in real-time, the output signal is stored as 406 412 Whereas the signal is processed in real-time, the output signal is stored as
consecutive bursts of data. 407 413 consecutive bursts of data.
408 414
The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}). 409 415 The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).
We use the 2018.2 version of Xilinx Vivado and we execute the synthesized 410 416 We use the 2018.2 version of Xilinx Vivado and we execute the synthesized
bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series 411 417 bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series
FPGA (xc7z010clg400-1) and two 125~MS/s ADC. 412 418 FPGA (xc7z010clg400-1) and two 125~MS/s ADC.
The board works with a Buildroot Linux image. We have developed some tools and 413 419 The board works with a Buildroot Linux image. We have developed some tools and
drivers to flash and communicate with the FPGA. They are used to automatize all 414 420 drivers to flash and communicate with the FPGA. They are used to automatize all
the workflow inside the board: load the filter coefficients and retrieve the 415 421 the workflow inside the board: load the filter coefficients and retrieve the
computed data. 416 422 computed data.
417 423
The deploy script uploads the bitstream to the board ((3) on 418 424 The deploy script uploads the bitstream to the board ((3) on
figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers, 419 425 figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,
configures the coefficients of the FIR filters. It then waits for the results 420 426 configures the coefficients of the FIR filters. It then waits for the results
and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}). 421 427 and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).
422 428
Finally, an Octave post-processing script computes the final results thanks to 423 429 Finally, an Octave post-processing script computes the final results thanks to
the output data ((5) on figure~\ref{fig:workflow}). 424 430 the output data ((5) on figure~\ref{fig:workflow}).
The results are normalized so that the Power Spectrum Density (PSD) starts at zero 425 431 The results are normalized so that the Power Spectrum Density (PSD) starts at zero
and the different configurations can be compared. 426 432 and the different configurations can be compared.
427 433
The workflow used to compute the results in section~\ref{sec:fixed_rej}, we 428 434 The workflow used to compute the results in section~\ref{sec:fixed_rej}, we
have just adapted the quadratic program but the rest of the workflow is unchanged. 429 435 have just adapted the quadratic program but the rest of the workflow is unchanged.
430 436
\section{Experiments with fixed area space} 431 437 \section{Experiments with fixed area space}
\label{sec:fixed_area} 432 438 \label{sec:fixed_area}
This section presents the output of the filter solver {\em i.e.} the computed 433 439 This section presents the output of the filter solver {\em i.e.} the computed
configurations for each stage, the computed rejection and the computed silicon area. 434 440 configurations for each stage, the computed rejection and the computed silicon area.
This is interesting to understand the choices made by the solver to compute its solutions. 435 441 This is interesting to understand the choices made by the solver to compute its solutions.
436 442
The experimental setup is composed of three cases. The raw input is generated 437 443 The experimental setup is composed of three cases. The raw input is generated
by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$. 438 444 by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.
Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500 439 445 Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500
arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500. 440 446 arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.
The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$ 441 447 The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$
ranging from 2 to 22. In each case, the quadratic program has been able to give a 442 448 ranging from 2 to 22. In each case, the quadratic program has been able to give a
result up to five stages ($n = 5$) in the cascaded filter. 443 449 result up to five stages ($n = 5$) in the cascaded filter.
444 450
Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500. 445 451 Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.
Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000. 446 452 Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.
Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500. 447 453 Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.
448 454
\renewcommand{\arraystretch}{1.4} 449 455 \renewcommand{\arraystretch}{1.4}
450 456
\begin{table} 451 457 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500} 452 458 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}
\label{tbl:gurobi_max_500} 453 459 \label{tbl:gurobi_max_500}
\centering 454 460 \centering
{\scalefont{0.77} 455 461 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 456 462 \begin{tabular}{|c|ccccc|c|c|}
\hline 457 463 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 458 464 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 459 465 \hline
1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\ 460 466 1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\
2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\ 461 467 2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\
3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\ 462 468 3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\
4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\ 463 469 4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\
5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\ 464 470 5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\
\hline 465 471 \hline
\end{tabular} 466 472 \end{tabular}
} 467 473 }
\end{table} 468 474 \end{table}
469 475
\begin{table} 470 476 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000} 471 477 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}
\label{tbl:gurobi_max_1000} 472 478 \label{tbl:gurobi_max_1000}
\centering 473 479 \centering
{\scalefont{0.77} 474 480 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 475 481 \begin{tabular}{|c|ccccc|c|c|}
\hline 476 482 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 477 483 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 478 484 \hline
1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\ 479 485 1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\
2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\ 480 486 2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\
3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\ 481 487 3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\
4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\ 482 488 4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\
5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\ 483 489 5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\
\hline 484 490 \hline
\end{tabular} 485 491 \end{tabular}
} 486 492 }
\end{table} 487 493 \end{table}
488 494
\begin{table} 489 495 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500} 490 496 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}
\label{tbl:gurobi_max_1500} 491 497 \label{tbl:gurobi_max_1500}
\centering 492 498 \centering
{\scalefont{0.77} 493 499 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 494 500 \begin{tabular}{|c|ccccc|c|c|}
\hline 495 501 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 496 502 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 497 503 \hline
1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\ 498 504 1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\
2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\ 499 505 2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\
3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\ 500 506 3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\
4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\ 501 507 4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\
5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\ 502 508 5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\
\hline 503 509 \hline
\end{tabular} 504 510 \end{tabular}
} 505 511 }
\end{table} 506 512 \end{table}
507 513
\renewcommand{\arraystretch}{1} 508 514 \renewcommand{\arraystretch}{1}
509 515
From these tables, we can first state that the more stages are used to define 510 516 From these tables, we can first state that the more stages are used to define
the cascaded FIR filters, the better the rejection. It was an expected result as it has 511 517 the cascaded FIR filters, the better the rejection. It was an expected result as it has
been previously observed that many small filters are better than 512 518 been previously observed that many small filters are better than
a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusion 513 519 a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusion
being hardly used in practice due to the lack of tools for identifying individual filter 514 520 being hardly used in practice due to the lack of tools for identifying individual filter
coefficients in the cascaded approach. 515 521 coefficients in the cascaded approach.
516 522
Second, the larger the silicon area, the better the rejection. This was also an 517 523 Second, the larger the silicon area, the better the rejection. This was also an
expected result as more area means a filter of better quality (more coefficients 518 524 expected result as more area means a filter of better quality (more coefficients
or more bits per coefficient). 519 525 or more bits per coefficient).
520 526
Then, we also observe that the first stage can have a larger shift than the other 521 527 Then, we also observe that the first stage can have a larger shift than the other
stages. This is explained by the fact that the solver tries to use just enough 522 528 stages. This is explained by the fact that the solver tries to use just enough
bits for the computed rejection after each stage. In the first stage, a 523 529 bits for the computed rejection after each stage. In the first stage, a
balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift} 524 530 balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}
gives the relation between both values. 525 531 gives the relation between both values.
526 532
Finally, we note that the solver consumes all the given silicon area. 527 533 Finally, we note that the solver consumes all the given silicon area.
528 534
The following graphs present the rejection for real data on the FPGA. In all following 529 535 The following graphs present the rejection for real data on the FPGA. In all following
figures, the solid line represents the actual rejection of the filtered 530 536 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line are the noise level 531 537 data on the FPGA as measured experimentally and the dashed line are the noise level
given by the quadratic solver. The configurations are those computed in the previous section. 532 538 given by the quadratic solver. The configurations are those computed in the previous section.
533 539
Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500. 534 540 Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.
Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000. 535 541 Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.
Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500. 536 542 Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.
537 543
\begin{figure} 538 544 \begin{figure}
\centering 539 545 \centering
\includegraphics[width=\linewidth]{images/max_500} 540 546 \includegraphics[width=\linewidth]{images/max_500}
\caption{Signal spectrum for MAX/500} 541 547 \caption{Signal spectrum for MAX/500}
\label{fig:max_500_result} 542 548 \label{fig:max_500_result}
\end{figure} 543 549 \end{figure}
544 550
\begin{figure} 545 551 \begin{figure}
\centering 546 552 \centering
\includegraphics[width=\linewidth]{images/max_1000} 547 553 \includegraphics[width=\linewidth]{images/max_1000}
\caption{Signal spectrum for MAX/1000} 548 554 \caption{Signal spectrum for MAX/1000}
\label{fig:max_1000_result} 549 555 \label{fig:max_1000_result}
\end{figure} 550 556 \end{figure}
551 557
\begin{figure} 552 558 \begin{figure}
\centering 553 559 \centering
\includegraphics[width=\linewidth]{images/max_1500} 554 560 \includegraphics[width=\linewidth]{images/max_1500}
\caption{Signal spectrum for MAX/1500} 555 561 \caption{Signal spectrum for MAX/1500}
\label{fig:max_1500_result} 556 562 \label{fig:max_1500_result}
\end{figure} 557 563 \end{figure}
558 564
In all cases, we observe that the actual rejection is close to the rejection computed by the solver. 559 565 In all cases, we observe that the actual rejection is close to the rejection computed by the solver.
560 566
We compare the actual silicon resources given by Vivado to the 561 567 We compare the actual silicon resources given by Vivado to the
resources in arbitrary units. 562 568 resources in arbitrary units.
The goal is to check that our arbitrary units of silicon area models well enough 563 569 The goal is to check that our arbitrary units of silicon area models well enough
the real resources on the FPGA. Especially we want to verify that, for a given 564 570 the real resources on the FPGA. Especially we want to verify that, for a given
number of arbitrary units, the actual silicon resources do not depend on the 565 571 number of arbitrary units, the actual silicon resources do not depend on the
number of stages $n$. Most significantly, our approach aims 566 572 number of stages $n$. Most significantly, our approach aims
at remaining far enough from the practical logic gate implementation used by 567 573 at remaining far enough from the practical logic gate implementation used by
various vendors to remain platform independent and be portable from one 568 574 various vendors to remain platform independent and be portable from one
architecture to another. 569 575 architecture to another.
570 576
Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and 571 577 Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and
MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000 572 578 MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000
and 1500 arbitrary units. We have taken care to extract solely the resources used by 573 579 and 1500 arbitrary units. We have taken care to extract solely the resources used by
the FIR filters and remove additional processing blocks including FIFO and PL to 574 580 the FIR filters and remove additional processing blocks including FIFO and PL to
PS communication. 575 581 PS communication.
576 582
\begin{table} 577 583 \begin{table}
\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} 578 584 \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
\label{tbl:resources_usage} 579 585 \label{tbl:resources_usage}
\centering 580 586 \centering
\begin{tabular}{|c|c|ccc|c|} 581 587 \begin{tabular}{|c|c|ccc|c|}
\hline 582 588 \hline
$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline 583 589 $n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline
& LUT & 249 & 453 & 627 & \emph{17600} \\ 584 590 & LUT & 249 & 453 & 627 & \emph{17600} \\
1 & BRAM & 1 & 1 & 1 & \emph{120} \\ 585 591 1 & BRAM & 1 & 1 & 1 & \emph{120} \\
& DSP & 21 & 37 & 47 & \emph{80} \\ \hline 586 592 & DSP & 21 & 37 & 47 & \emph{80} \\ \hline
& LUT & 2374 & 5494 & 691 & \emph{17600} \\ 587 593 & LUT & 2374 & 5494 & 691 & \emph{17600} \\
2 & BRAM & 2 & 2 & 2 & \emph{120} \\ 588 594 2 & BRAM & 2 & 2 & 2 & \emph{120} \\
& DSP & 0 & 0 & 70 & \emph{80} \\ \hline 589 595 & DSP & 0 & 0 & 70 & \emph{80} \\ \hline
& LUT & 2443 & 3304 & 3521 & \emph{17600} \\ 590 596 & LUT & 2443 & 3304 & 3521 & \emph{17600} \\
3 & BRAM & 3 & 3 & 3 & \emph{120} \\ 591 597 3 & BRAM & 3 & 3 & 3 & \emph{120} \\
& DSP & 0 & 19 & 35 & \emph{80} \\ \hline 592 598 & DSP & 0 & 19 & 35 & \emph{80} \\ \hline
& LUT & 2634 & 3753 & 2557 & \emph{17600} \\ 593 599 & LUT & 2634 & 3753 & 2557 & \emph{17600} \\
4 & BRAM & 4 & 4 & 4 & \emph{120} \\ 594 600 4 & BRAM & 4 & 4 & 4 & \emph{120} \\
& DPS & 0 & 19 & 46 & \emph{80} \\ \hline 595 601 & DPS & 0 & 19 & 46 & \emph{80} \\ \hline
& LUT & 2423 & 3047 & 2847 & \emph{17600} \\ 596 602 & LUT & 2423 & 3047 & 2847 & \emph{17600} \\
5 & BRAM & 5 & 5 & 5 & \emph{120} \\ 597 603 5 & BRAM & 5 & 5 & 5 & \emph{120} \\
& DPS & 0 & 22 & 46 & \emph{80} \\ \hline 598 604 & DPS & 0 & 22 & 46 & \emph{80} \\ \hline
\end{tabular} 599 605 \end{tabular}
\end{table} 600 606 \end{table}
601 607
In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that, 602 608 In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,
when the filters coefficients are small enough, or when the input size is small 603 609 when the filters coefficients are small enough, or when the input size is small
enough, Vivado optimized resource consumption by selecting multiplexers to 604 610 enough, Vivado optimized resource consumption by selecting multiplexers to
implement the multiplications instead of a DSP. In this case, it is quite difficult 605 611 implement the multiplications instead of a DSP. In this case, it is quite difficult
to compare the whole silicon budget. 606 612 to compare the whole silicon budget.
607 613
However, a rough estimation can be made with a simple equivalence. Looking at 608 614 However, a rough estimation can be made with a simple equivalence. Looking at
the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, 609 615 the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon 610 616 we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
area use. With this equivalence, our 500 arbitraty units corresponds to 2500 LUTs, 611 617 area use. With this equivalence, our 500 arbitraty units corresponds to 2500 LUTs,
1000 arbitrary units corresponds to 5000 LUTs and 1500 arbitrary units corresponds 612 618 1000 arbitrary units corresponds to 5000 LUTs and 1500 arbitrary units corresponds
to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary 613 619 to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
unit are quite good. The relatively small differences can probably be explained 614 620 unit are quite good. The relatively small differences can probably be explained
by the optimizations done by Vivado based on the detailed map of available processing resources. 615 621 by the optimizations done by Vivado based on the detailed map of available processing resources.
616 622
We present the computation time to solve the quadratic problem. 617 623 We present the computation time to solve the quadratic problem.
For each case, the filter solver software are executed with a Intel(R) Xeon(R) CPU E5606 618 624 For each case, the filter solver software are executed with a Intel(R) Xeon(R) CPU E5606
cadenced at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve 619 625 cadenced at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve
the quadratic problem. 620 626 the quadratic problem.
621 627
Table~\ref{tbl:area_time} shows the time needed to solve the quadratic 622 628 Table~\ref{tbl:area_time} shows the time needed to solve the quadratic
problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units. 623 629 problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.
624 630
\begin{table} 625 631 \begin{table}
\caption{Time to solve the quadratic program with Gurobi} 626 632 \caption{Time to solve the quadratic program with Gurobi}
\label{tbl:area_time} 627 633 \label{tbl:area_time}
\centering 628 634 \centering
\begin{tabular}{|c|c|c|c|}\hline 629 635 \begin{tabular}{|c|c|c|c|}\hline
$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline 630 636 $n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline
1 & 0.1~s & 0.1~s & 0.3~s \\ 631 637 1 & 0.1~s & 0.1~s & 0.3~s \\
2 & 1.1~s & 2.2~s & 12~s \\ 632 638 2 & 1.1~s & 2.2~s & 12~s \\
3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\ 633 639 3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\
4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\ 634 640 4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\
5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline 635 641 5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline
\end{tabular} 636 642 \end{tabular}
\end{table} 637 643 \end{table}
638 644
As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ? 639 645 As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?
When the area is limited, the design exploration space is more limited and the solver is able to 640 646 When the area is limited, the design exploration space is more limited and the solver is able to
find an optimal solution faster. On the contrary, in the case of MAX/1500 with 641 647 find an optimal solution faster. On the contrary, in the case of MAX/1500 with
5~stages, we were not able to obtain a result after 40~hours of computation so we decided to stop. 642 648 5~stages, we were not able to obtain a result after 40~hours of computation so we decided to stop.
643 649
\section{Experiments with fixed rejection target} 644 650 \section{Experiments with fixed rejection target}
\label{sec:fixed_rej} 645 651 \label{sec:fixed_rej}
This section presents the results of complementary quadratic program which we 646 652 This section presents the results of complementary quadratic program which we
minimize the area occupation for a targeted noise level. 647 653 minimize the area occupation for a targeted noise level.
648 654
The experimental setup is also composed of three cases. The raw input is the same 649 655 The experimental setup is also composed of three cases. The raw input is the same
as previous section, a PRN generator, which fixes the input data size $\Pi^I$. 650 656 as previous section, a PRN generator, which fixes the input data size $\Pi^I$.
Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60 or 80~dB. 651 657 Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60 or 80~dB.
Hence, the three cases have been named: MIN/40, MIN/60, MIN/80. 652 658 Hence, the three cases have been named: MIN/40, MIN/60, MIN/80.
The number of configurations $p$ is the same as previous section. 653 659 The number of configurations $p$ is the same as previous section.
654 660
Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40. 655 661 Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.
Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60. 656 662 Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.
Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80. 657 663 Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.
658 664
\renewcommand{\arraystretch}{1.4} 659 665 \renewcommand{\arraystretch}{1.4}
660 666
\begin{table} 661 667 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40} 662 668 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}
\label{tbl:gurobi_min_40} 663 669 \label{tbl:gurobi_min_40}
\centering 664 670 \centering
{\scalefont{0.77} 665 671 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 666 672 \begin{tabular}{|c|ccccc|c|c|}
\hline 667 673 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 668 674 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 669 675 \hline
1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\ 670 676 1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\
2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\ 671 677 2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\
3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\ 672 678 3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\
4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\ 673 679 4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\
\hline 674 680 \hline
\end{tabular} 675 681 \end{tabular}
} 676 682 }
\end{table} 677 683 \end{table}
678 684
\begin{table} 679 685 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60} 680 686 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}
\label{tbl:gurobi_min_60} 681 687 \label{tbl:gurobi_min_60}
\centering 682 688 \centering
{\scalefont{0.77} 683 689 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 684 690 \begin{tabular}{|c|ccccc|c|c|}
\hline 685 691 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 686 692 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 687 693 \hline
1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\ 688 694 1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\
2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\ 689 695 2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\
3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\ 690 696 3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\
4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\ 691 697 4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\
5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\ 692 698 5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\
\hline 693 699 \hline
\end{tabular} 694 700 \end{tabular}
} 695 701 }
\end{table} 696 702 \end{table}
697 703
\begin{table} 698 704 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80} 699 705 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}
\label{tbl:gurobi_min_80} 700 706 \label{tbl:gurobi_min_80}
\centering 701 707 \centering
{\scalefont{0.77} 702 708 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 703 709 \begin{tabular}{|c|ccccc|c|c|}
\hline 704 710 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 705 711 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 706 712 \hline
1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\ 707 713 1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\
2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\ 708 714 2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\
3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\ 709 715 3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\
4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\ 710 716 4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\
5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\ 711 717 5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\
\hline 712 718 \hline
\end{tabular} 713 719 \end{tabular}
} 714 720 }
\end{table} 715 721 \end{table}
\renewcommand{\arraystretch}{1} 716 722 \renewcommand{\arraystretch}{1}
717 723
From these tables, we can first state that all configuration reach the target rejection 718 724 From these tables, we can first state that all configuration reach the target rejection
level and more we have stages lesser is the area occupied in arbitrary unit. 719 725 level and more we have stages lesser is the area occupied in arbitrary unit.
Futhermore, the area of the monolithic filter is twice bigger than the two cascaded. 720 726 Futhermore, the area of the monolithic filter is twice bigger than the two cascaded.
More generally, more there is filters lower is the occupied area. 721 727 More generally, more there is filters lower is the occupied area.
722 728
Like in previous section, the solver choose always a little filter as first 723 729 Like in previous section, the solver choose always a little filter as first
filter stage and the second one is often the biggest filter. this choice can be explain 724 730 filter stage and the second one is often the biggest filter. this choice can be explain
as the previous section. The solver uses just enough bits to not degrade the input 725 731 as the previous section. The solver uses just enough bits to not degrade the input
signal and in second filter it can choose a better filter to improve rejection without 726 732 signal and in second filter it can choose a better filter to improve rejection without
have too bits in the output data. 727 733 have too bits in the output data.
728 734
For the specific case in MIN/40 for $n = 5$ the solver has determined that the optimal 729 735 For the specific case in MIN/40 for $n = 5$ the solver has determined that the optimal
number of filter is 4 so it not chose any configuration in last filter. Hence this 730 736 number of filter is 4 so it not chose any configuration in last filter. Hence this
solution is equivalent to the result for $n = 4$. 731 737 solution is equivalent to the result for $n = 4$.
732 738
The following graphs present the rejection for real data on the FPGA. In all following 733 739 The following graphs present the rejection for real data on the FPGA. In all following
figures, the solid line represents the actual rejection of the filtered 734 740 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line are the noise level 735 741 data on the FPGA as measured experimentally and the dashed line are the noise level
given by the quadratic solver. 736 742 given by the quadratic solver.
737 743
Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40. 738 744 Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.
Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60. 739 745 Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.
Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80. 740 746 Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.
741 747
\begin{figure} 742 748 \begin{figure}
\centering 743 749 \centering
\includegraphics[width=\linewidth]{images/min_40} 744 750 \includegraphics[width=\linewidth]{images/min_40}
\caption{Signal spectrum for MIN/40} 745 751 \caption{Signal spectrum for MIN/40}
\label{fig:min_40} 746 752 \label{fig:min_40}
\end{figure} 747 753 \end{figure}
748 754
\begin{figure} 749 755 \begin{figure}
\centering 750 756 \centering
\includegraphics[width=\linewidth]{images/min_60} 751 757 \includegraphics[width=\linewidth]{images/min_60}
\caption{Signal spectrum for MIN/60} 752 758 \caption{Signal spectrum for MIN/60}
images/cascaded_criterion.pdf
No preview for this file type
images/colored_custom_criterion.pdf
No preview for this file type
images/colored_mean_criterion.pdf
No preview for this file type
images/criterion_cascaded.pdf
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
images/zero_values.pdf
No preview for this file type