Commit a5c9e7b9422523032c79c324e28452311fd46215

Authored by Arthur HUGEAT
1 parent c9c460c6b3
Exists in master

Rajout de la pyramide de rejection.

Showing 2 changed files with 20 additions and 7 deletions Inline Diff

ifcs2018_journal.tex
% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee 1 1 % fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee
% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de 2 2 % demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de
% rejection par bit et perte si moins de bits que rejection/6 3 3 % rejection par bit et perte si moins de bits que rejection/6
% developper programme lineaire en incluant le decalage de bits 4 4 % developper programme lineaire en incluant le decalage de bits
% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on 5 5 % insister que avant on etait synthetisable mais pas implementable, alors que maintenant on
% implemente et on demontre que ca tourne 6 6 % implemente et on demontre que ca tourne
% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ? 7 7 % gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?
% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer 8 8 % Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer
% (zedboard ou redpit) 9 9 % (zedboard ou redpit)
10 10
% ajouter pyramide "juste" 11 11 % ajouter pyramide "juste"
% label schema : verifier que "argumenter de la cascade de FIR" est fait 12 12 % label schema : verifier que "argumenter de la cascade de FIR" est fait
13 13
\documentclass[a4paper,conference]{IEEEtran/IEEEtran} 14 14 \documentclass[a4paper,conference]{IEEEtran/IEEEtran}
\usepackage{graphicx,color,hyperref} 15 15 \usepackage{graphicx,color,hyperref}
\usepackage{amsfonts} 16 16 \usepackage{amsfonts}
\usepackage{amsthm} 17 17 \usepackage{amsthm}
\usepackage{amssymb} 18 18 \usepackage{amssymb}
\usepackage{amsmath} 19 19 \usepackage{amsmath}
\usepackage{algorithm2e} 20 20 \usepackage{algorithm2e}
\usepackage{url,balance} 21 21 \usepackage{url,balance}
\usepackage[normalem]{ulem} 22 22 \usepackage[normalem]{ulem}
\usepackage{tikz} 23 23 \usepackage{tikz}
\usetikzlibrary{positioning,fit} 24 24 \usetikzlibrary{positioning,fit}
\usepackage{multirow} 25 25 \usepackage{multirow}
\usepackage{scalefnt} 26 26 \usepackage{scalefnt}
27 27
% correct bad hyphenation here 28 28 % correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor} 29 29 \hyphenation{op-tical net-works semi-conduc-tor}
\textheight=26cm 30 30 \textheight=26cm
\setlength{\footskip}{30pt} 31 31 \setlength{\footskip}{30pt}
\pagenumbering{gobble} 32 32 \pagenumbering{gobble}
\begin{document} 33 33 \begin{document}
\title{Filter optimization for real time digital processing of radiofrequency signals: application 34 34 \title{Filter optimization for real time digital processing of radiofrequency signals: application
to oscillator metrology} 35 35 to oscillator metrology}
36 36
\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, 37 37 \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},
G. Goavec-M\'erou\IEEEauthorrefmark{1}, 38 38 G. Goavec-M\'erou\IEEEauthorrefmark{1},
P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}} 39 39 P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}
\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France } 40 40 \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }
\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ 41 41 \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\
Email: \{pyb2,jmfriedt\}@femto-st.fr} 42 42 Email: \{pyb2,jmfriedt\}@femto-st.fr}
} 43 43 }
\maketitle 44 44 \maketitle
\thispagestyle{plain} 45 45 \thispagestyle{plain}
\pagestyle{plain} 46 46 \pagestyle{plain}
\newtheorem{definition}{Definition} 47 47 \newtheorem{definition}{Definition}
48 48
\begin{abstract} 49 49 \begin{abstract}
Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to 50 50 Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to
radiofrequency signal processing. Applied to oscillator characterization in the context 51 51 radiofrequency signal processing. Applied to oscillator characterization in the context
of ultrastable clocks, stringent filtering requirements are defined by spurious signal or 52 52 of ultrastable clocks, stringent filtering requirements are defined by spurious signal or
noise rejection needs. Since real time radiofrequency processing must be performed in a 53 53 noise rejection needs. Since real time radiofrequency processing must be performed in a
Field Programmable Array to meet timing constraints, we investigate optimization strategies 54 54 Field Programmable Array to meet timing constraints, we investigate optimization strategies
to design filters meeting rejection characteristics while limiting the hardware resources 55 55 to design filters meeting rejection characteristics while limiting the hardware resources
required and keeping timing constraints within the targeted measurement bandwidths. 56 56 required and keeping timing constraints within the targeted measurement bandwidths.
\end{abstract} 57 57 \end{abstract}
58 58
\begin{IEEEkeywords} 59 59 \begin{IEEEkeywords}
Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter 60 60 Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter
\end{IEEEkeywords} 61 61 \end{IEEEkeywords}
62 62
\section{Digital signal processing of ultrastable clock signals} 63 63 \section{Digital signal processing of ultrastable clock signals}
64 64
Analog oscillator phase noise characteristics are classically performed by downconverting 65 65 Analog oscillator phase noise characteristics are classically performed by downconverting
the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, 66 66 the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,
followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In 67 67 followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In
a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by 68 68 a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by
multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. 69 69 multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.
70 70
\begin{figure}[h!tb] 71 71 \begin{figure}[h!tb]
\begin{center} 72 72 \begin{center}
\includegraphics[width=.8\linewidth]{images/schema} 73 73 \includegraphics[width=.8\linewidth]{images/schema}
\end{center} 74 74 \end{center}
\caption{Fully digital oscillator phase noise characterization: the Device Under Test 75 75 \caption{Fully digital oscillator phase noise characterization: the Device Under Test
(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and 76 76 (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and
downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals 77 77 downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals
and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite 78 78 and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite
Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays 79 79 Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays
the spectral characteristics of the phase fluctuations.} 80 80 the spectral characteristics of the phase fluctuations.}
\label{schema} 81 81 \label{schema}
\end{figure} 82 82 \end{figure}
83 83
As with the analog mixer, 84 84 As with the analog mixer,
the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as 85 85 the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as
well as the generation of the frequency sum signal in addition to the frequency difference. 86 86 well as the generation of the frequency sum signal in addition to the frequency difference.
These unwanted spectral characteristics must be rejected before decimating the data stream 87 87 These unwanted spectral characteristics must be rejected before decimating the data stream
for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the 88 88 for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the
downconverter 89 89 downconverter
and the decimation processing blocks are core characteristics of an oscillator characterization 90 90 and the decimation processing blocks are core characteristics of an oscillator characterization
system, and must reject out-of-band signals below the targeted phase noise -- typically in the 91 91 system, and must reject out-of-band signals below the targeted phase noise -- typically in the
sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will 92 92 sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will
use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency 93 93 use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency
datastream: optimizing the performance of the filter while reducing the needed resources is 94 94 datastream: optimizing the performance of the filter while reducing the needed resources is
hence tackled in a systematic approach using optimization techniques. Most significantly, we 95 95 hence tackled in a systematic approach using optimization techniques. Most significantly, we
tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with 96 96 tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with
tunable number of coefficients and tunable number of bits representing the coefficients and the 97 97 tunable number of coefficients and tunable number of bits representing the coefficients and the
data being processed. 98 98 data being processed.
99 99
\section{Finite impulse response filter} 100 100 \section{Finite impulse response filter}
101 101
We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined 102 102 We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined
by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the 103 103 by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the
outputs $y_k$ 104 104 outputs $y_k$
\begin{align} 105 105 \begin{align}
y_n=\sum_{k=0}^N b_k x_{n-k} 106 106 y_n=\sum_{k=0}^N b_k x_{n-k}
\label{eq:fir_equation} 107 107 \label{eq:fir_equation}
\end{align} 108 108 \end{align}
109 109
As opposed to an implementation on a general purpose processor in which word size is defined by the 110 110 As opposed to an implementation on a general purpose processor in which word size is defined by the
processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since 111 111 processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since
not only the coefficient values and number of taps must be defined, but also the number of bits 112 112 not only the coefficient values and number of taps must be defined, but also the number of bits
defining the coefficients and the sample size. For this reason, and because we consider pipeline 113 113 defining the coefficients and the sample size. For this reason, and because we consider pipeline
processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency 114 114 processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency
signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but 115 115 signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but
the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level. 116 116 the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level.
Since latency is not an issue in a openloop phase noise characterization instrument, the large 117 117 Since latency is not an issue in a openloop phase noise characterization instrument, the large
numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, 118 118 numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,
is not considered as an issue as would be in a closed loop system. 119 119 is not considered as an issue as would be in a closed loop system.
120 120
The coefficients are classically expressed as floating point values. However, this binary 121 121 The coefficients are classically expressed as floating point values. However, this binary
number representation is not efficient for fast arithmetic computation by an FPGA. Instead, 122 122 number representation is not efficient for fast arithmetic computation by an FPGA. Instead,
we select to quantify these floating point values into integer values. This quantization 123 123 we select to quantify these floating point values into integer values. This quantization
will result in some precision loss. 124 124 will result in some precision loss.
125 125
\begin{figure}[h!tb] 126 126 \begin{figure}[h!tb]
\includegraphics[width=\linewidth]{images/zero_values} 127 127 \includegraphics[width=\linewidth]{images/zero_values}
\caption{Impact of the quantization resolution of the coefficients: the quantization is 128 128 \caption{Impact of the quantization resolution of the coefficients: the quantization is
set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting 129 129 set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting
the 30~first and 30~last coefficients out of the initial 128~band-pass 130 130 the 30~first and 30~last coefficients out of the initial 128~band-pass
filter coefficients to 0 (red dots).} 131 131 filter coefficients to 0 (red dots).}
\label{float_vs_int} 132 132 \label{float_vs_int}
\end{figure} 133 133 \end{figure}
134 134
The tradeoff between quantization resolution and number of coefficients when considering 135 135 The tradeoff between quantization resolution and number of coefficients when considering
integer operations is not trivial. As an illustration of the issue related to the 136 136 integer operations is not trivial. As an illustration of the issue related to the
relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits 137 137 relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits
a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon 138 138 a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon
quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the 139 139 quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the
taps become null, making the large number of coefficients irrelevant and allowing to save 140 140 taps become null, making the large number of coefficients irrelevant and allowing to save
processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources 141 141 processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources
to reach a given rejection level, or maximizing out of band rejection for a given computational 142 142 to reach a given rejection level, or maximizing out of band rejection for a given computational
resource, will drive the investigation on cascading filters designed with varying tap resolution 143 143 resource, will drive the investigation on cascading filters designed with varying tap resolution
and tap length, as will be shown in the next section. Indeed, our development strategy closely 144 144 and tap length, as will be shown in the next section. Indeed, our development strategy closely
follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} 145 145 follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}
in which basic blocks are defined and characterized before being assembled \cite{hide} 146 146 in which basic blocks are defined and characterized before being assembled \cite{hide}
in a complete processing chain. In our case, assembling the filter blocks is a simpler block 147 147 in a complete processing chain. In our case, assembling the filter blocks is a simpler block
combination process since we assume a single value to be processed and a single value to be 148 148 combination process since we assume a single value to be processed and a single value to be
generated at each clock cycle. The FIR filters will not be considered to decimate in the 149 149 generated at each clock cycle. The FIR filters will not be considered to decimate in the
current implementation: the decimation is assumed to be located after the FIR cascade at the 150 150 current implementation: the decimation is assumed to be located after the FIR cascade at the
moment. 151 151 moment.
152 152
\section{Methodology description} 153 153 \section{Methodology description}
We want create a new methodology to develop any Digital Signal Processing (DSP) chain 154 154 We want create a new methodology to develop any Digital Signal Processing (DSP) chain
and for any hardware platform (Altera, Xilinx...). To do this we have defined an 155 155 and for any hardware platform (Altera, Xilinx...). To do this we have defined an
abstract model to represent some basic operations of DSP. 156 156 abstract model to represent some basic operations of DSP.
157 157
For the moment, we are focused on only two operations: the filtering and the shifting of data. 158 158 For the moment, we are focused on only two operations: the filtering and the shifting of data.
We have chosen this basic operation because the shifting and the filtering have already be studied in 159 159 We have chosen this basic operation because the shifting and the filtering have already be studied in
lot of works \cite{lim_1996, lim_1988, young_1992, smith_1998} hence it will be easier 160 160 lot of works \cite{lim_1996, lim_1988, young_1992, smith_1998} hence it will be easier
to check and validate our results. 161 161 to check and validate our results.
162 162
However having only two operations is insufficient to work with complex DSP but 163 163 However having only two operations is insufficient to work with complex DSP but
in this paper we only want demonstrate the relevance and the efficiency of our approach. 164 164 in this paper we only want demonstrate the relevance and the efficiency of our approach.
In future work it will be possible to add more operations and we are able to 165 165 In future work it will be possible to add more operations and we are able to
model any DSP chain. 166 166 model any DSP chain.
167 167
We will apply our methodology on very simple DSP chain. We generate a digital signal 168 168 We will apply our methodology on very simple DSP chain. We generate a digital signal
thanks at generator of Pseudo-Random Number (PRN) or thanks at an Analog to Digital 169 169 thanks at generator of Pseudo-Random Number (PRN) or thanks at an Analog to Digital
Converter (ADC). Once we have a digital signal, we filter it to decrease the noise level. 170 170 Converter (ADC). Once we have a digital signal, we filter it to decrease the noise level.
Finally we stored some burst of filtered samples before post-processing it. 171 171 Finally we stored some burst of filtered samples before post-processing it.
% TODO: faire un schéma 172
In this particular case, we want optimize the filtering step to have the best noise 173 172 In this particular case, we want optimize the filtering step to have the best noise
rejection for constrain number of resource or to have the minimal resources 174 173 rejection for constrain number of resource or to have the minimal resources
consumption for a given rejection objective. 175 174 consumption for a given rejection objective.
176 175
The first step of our approach is to model the DSP chain and since we just optimize 177 176 The first step of our approach is to model the DSP chain and since we just optimize
the filtering, we have not modeling the PRN generator or the ADC. The filtering can be 178 177 the filtering, we have not modeling the PRN generator or the ADC. The filtering can be
done by two ways. The first one we use only one FIR filter with lot of coefficients 179 178 done by two ways. The first one we use only one FIR filter with lot of coefficients
to rejection the noise, we called this approach a monolithic approach. And the second one 180 179 to rejection the noise, we called this approach a monolithic approach. And the second one
we select different FIR filters with less coefficients the monolithic filter and we cascaded 181 180 we select different FIR filters with less coefficients the monolithic filter and we cascaded
it to filtering the signal. 182 181 it to filtering the signal.
183 182
After each filter we leave the possibility of shifting the filtered data to consume 184 183 After each filter we leave the possibility of shifting the filtered data to consume
less resources. Hence in the case of cascaded filter, we define a stage as a filter 185 184 less resources. Hence in the case of cascaded filter, we define a stage as a filter
and a shifter (the shift could be omitted if we do not need to divide the filtered data). 186 185 and a shifter (the shift could be omitted if we do not need to divide the filtered data).
187 186
\subsection{Model of a FIR filter} 188 187 \subsection{Model of a FIR filter}
A cascade of filter are composed of $n$ stage. In stage $i$ ($1 \leq i \leq n$) 189 188 A cascade of filter are composed of $n$ stage. In stage $i$ ($1 \leq i \leq n$)
the FIR has $C_i$ coefficients and each coefficients are integer values with $\pi^C_i$ 190 189 the FIR has $C_i$ coefficients and each coefficients are integer values with $\pi^C_i$
bits and the filtered data are shifted of $\pi^S_i$ bits. We define also $\pi^-_i$ as 191 190 bits and the filtered data are shifted of $\pi^S_i$ bits. We define also $\pi^-_i$ as
the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage} 192 191 the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}
shows a filtering stage. 193 192 shows a filtering stage.
194 193
\begin{figure} 195 194 \begin{figure}
\centering 196 195 \centering
\begin{tikzpicture}[node distance=2cm] 197 196 \begin{tikzpicture}[node distance=2cm]
\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ; 198 197 \node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;
\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ; 199 198 \node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;
\node (Start) [left of=FIR] { } ; 200 199 \node (Start) [left of=FIR] { } ;
\node (End) [right of=Shift] { } ; 201 200 \node (End) [right of=Shift] { } ;
202 201
\node[draw,fit=(FIR) (Shift)] (Filter) { } ; 203 202 \node[draw,fit=(FIR) (Shift)] (Filter) { } ;
204 203
\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ; 205 204 \draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;
\draw[->] (FIR) -- (Shift) ; 206 205 \draw[->] (FIR) -- (Shift) ;
\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ; 207 206 \draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;
\end{tikzpicture} 208 207 \end{tikzpicture}
\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)} 209 208 \caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}
\label{fig:fir_stage} 210 209 \label{fig:fir_stage}
\end{figure} 211 210 \end{figure}
212 211
FIR $i$ can reject $F(C_i, \pi_i^C)$ dB. $F$ is determined numerically. 213 212 FIR $i$ can reject $F(C_i, \pi_i^C)$ dB. $F$ is determined numerically.
To measure this rejection, we use GNU Octave software to design FIR filter coefficients thanks to two 214 213 To measure this rejection, we use GNU Octave software to design FIR filter coefficients thanks to two
algorithms (\texttt{firls} and \texttt{fir1}). 215 214 algorithms (\texttt{firls} and \texttt{fir1}).
For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients. 216 215 For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.
Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively, 217 216 Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,
the coefficients are normalized by their absolute maximum before being scaled to integer coefficients. 218 217 the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.
At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the other are coded on very fewer bits. 219 218 At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the other are coded on very fewer bits.
220 219
With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter. 221 220 With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter.
Comparing the performance between FIRs requires however a unique criterion. As shown in figure~\ref{fig:fir_mag}, 222 221 Comparing the performance between FIRs requires however a unique criterion. As shown in figure~\ref{fig:fir_mag},
the FIR magnitude exhibits two parts. 223 222 the FIR magnitude exhibits two parts.
224 223
\begin{figure} 225 224 \begin{figure}
\centering 226 225 \centering
\begin{tikzpicture}[scale=0.3] 227 226 \begin{tikzpicture}[scale=0.3]
\draw[<->] (0,15) -- (0,0) -- (21,0) ; 228 227 \draw[<->] (0,15) -- (0,0) -- (21,0) ;
\draw[thick] (0,12) -- (8,12) -- (20,0) ; 229 228 \draw[thick] (0,12) -- (8,12) -- (20,0) ;
230 229
\draw (0,14) node [left] { $P$ } ; 231 230 \draw (0,14) node [left] { $P$ } ;
\draw (20,0) node [below] { $f$ } ; 232 231 \draw (20,0) node [below] { $f$ } ;
233 232
\draw[>=latex,<->] (0,14) -- (8,14) ; 234 233 \draw[>=latex,<->] (0,14) -- (8,14) ;
\draw (4,14) node [above] { passband } node [below] { $40\%$ } ; 235 234 \draw (4,14) node [above] { passband } node [below] { $40\%$ } ;
236 235
\draw[>=latex,<->] (8,14) -- (12,14) ; 237 236 \draw[>=latex,<->] (8,14) -- (12,14) ;
\draw (10,14) node [above] { transition } node [below] { $20\%$ } ; 238 237 \draw (10,14) node [above] { transition } node [below] { $20\%$ } ;
239 238
\draw[>=latex,<->] (12,14) -- (20,14) ; 240 239 \draw[>=latex,<->] (12,14) -- (20,14) ;
\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ; 241 240 \draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;
242 241
\draw[>=latex,<->] (16,12) -- (16,8) ; 243 242 \draw[>=latex,<->] (16,12) -- (16,8) ;
\draw (16,10) node [right] { rejection } ; 244 243 \draw (16,10) node [right] { rejection } ;
245 244
\draw[dashed] (8,-1) -- (8,14) ; 246 245 \draw[dashed] (8,-1) -- (8,14) ;
\draw[dashed] (12,-1) -- (12,14) ; 247 246 \draw[dashed] (12,-1) -- (12,14) ;
248 247
\draw[dashed] (8,12) -- (16,12) ; 249 248 \draw[dashed] (8,12) -- (16,12) ;
\draw[dashed] (12,8) -- (16,8) ; 250 249 \draw[dashed] (12,8) -- (16,8) ;
251 250
\end{tikzpicture} 252 251 \end{tikzpicture}
253
% \includegraphics[width=.5\linewidth]{images/fir_magnitude} 254
\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$: 255 252 \caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:
the passband is considered to occupy the initial 40\% of the Nyquist frequency range, 256 253 the passband is considered to occupy the initial 40\% of the Nyquist frequency range,
the stopband the last 40\%, allowing 20\% transition width.} 257 254 the stopband the last 40\%, allowing 20\% transition width.}
\label{fig:fir_mag} 258 255 \label{fig:fir_mag}
\end{figure} 259 256 \end{figure}
260 257
In the transition band, the behavior of the filter is left free, we only care about the passband and the stopband. 261 258 In the transition band, the behavior of the filter is left free, we only care about the passband and the stopband.
Our first criterion considers the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion does not work because we do not consider the shape of the passband. 262 259 Our first criterion considers the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion does not work because we do not consider the shape of the passband.
A second criterion considers the maximum rejection within the stopband minus the mean of the absolute value of passband rejection. With this criterion, the results are significantly improved as shown in figure~\ref{fig:custom_criterion}. 263 260 A second criterion considers the maximum rejection within the stopband minus the mean of the absolute value of passband rejection. With this criterion, the results are significantly improved as shown in figure~\ref{fig:custom_criterion}.
264 261
\begin{figure} 265 262 \begin{figure}
\centering 266 263 \centering
\includegraphics[width=\linewidth]{images/colored_mean_criterion} 267 264 \includegraphics[width=\linewidth]{images/colored_mean_criterion}
\caption{Mean criterion comparison between monolithic filter and cascade filters} 268 265 \caption{Mean criterion comparison between monolithic filter and cascade filters}
\label{fig:mean_criterion} 269 266 \label{fig:mean_criterion}
\end{figure} 270 267 \end{figure}
271 268
\begin{figure} 272 269 \begin{figure}
\centering 273 270 \centering
\includegraphics[width=\linewidth]{images/colored_custom_criterion} 274 271 \includegraphics[width=\linewidth]{images/colored_custom_criterion}
\caption{Custom criterion comparison between monolithic filter and cascade filters} 275 272 \caption{Custom criterion comparison between monolithic filter and cascade filters}
\label{fig:custom_criterion} 276 273 \label{fig:custom_criterion}
274 \end{figure}
275
276 Thanks to this criterion we are able to automatically generate lot of fir coefficients
277 and estimate their rejection. The figure~\ref{fig:rejection_pyramid} exhibits the
278 rejection in function of the number of coefficients and their number of bits.
279 We can observe it looks like a pyramid so the edge represents the best
280 coefficient set. Indeed if we choose a number of coefficients, increasing the number
281 of bits over the edge will not improve the rejection. Conversely when we choose
282 a number of bits, too much increase the number of coefficients will not improve
283 the rejection. Hence the best coefficient set are on the edge of pyramid.
284
285 \begin{figure}
286 \centering
287 \includegraphics[width=\linewidth]{images/rejection_pyramid}
288 \caption{Rejection as a function of number of coefficients and number of bits}
289 \label{fig:rejection_pyramid}
\end{figure} 277 290 \end{figure}
278 291
Although we have a efficient criterion to estimate the rejection of one set of coefficient 279 292 Although we have a efficient criterion to estimate the rejection of one set of coefficient
we have a problem when we sum two or more criterion. If the FIR filter coefficients are the same 280 293 we have a problem when we sum two or more criterion. If the FIR filter coefficients are the same
between the stage, we have: 281 294 between the stage, we have:
$$F_{total} = F_1 + F_2$$ 282 295 $$F_{total} = F_1 + F_2$$
But when we choose two different set of coefficient, the previous equality are not 283 296 But when we choose two different set of coefficient, the previous equality are not
true. The figure~\ref{fig:sum_rejection} illustrates the problem. The red and blue curves 284 297 true. The figure~\ref{fig:sum_rejection} illustrates the problem. The red and blue curves
are two different filter coefficient and we can see that their maximum on the stopband 285 298 are two different filter coefficient and we can see that their maximum on the stopband
are not at the same frequency. So when we sum the rejection criteria (the dotted yellow line) 286 299 are not at the same frequency. So when we sum the rejection criteria (the dotted yellow line)
we do not meet the dashed yellow line. Define the rejection of cascaded filters 287 300 we do not meet the dashed yellow line. Define the rejection of cascaded filters
is more difficult than just take the summation between all the rejection criteria of each filter. 288 301 is more difficult than just take the summation between all the rejection criteria of each filter.
However this summation gives us an upper bound for rejection although in fact we obtain 289 302 However this summation gives us an upper bound for rejection although in fact we obtain
better rejection than expected. 290 303 better rejection than expected.
291 304
\begin{figure} 292 305 \begin{figure}
\centering 293 306 \centering
\includegraphics[width=\linewidth]{images/cascaded_criterion} 294 307 \includegraphics[width=\linewidth]{images/cascaded_criterion}
\caption{Rejection of two cascaded filters} 295 308 \caption{Rejection of two cascaded filters}
\label{fig:sum_rejection} 296 309 \label{fig:sum_rejection}
\end{figure} 297 310 \end{figure}
298 311
The first problem we address is to maximize the rejection under bounded silicon area 299 312 The first problem we address is to maximize the rejection under bounded silicon area
and feasibility constraints. Variable $a_i$ is the area taken by filter~$i$ 300 313 and feasibility constraints. Variable $a_i$ is the area taken by filter~$i$
(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB). 301 314 (in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).
Constant $\mathcal{A}$ is the total available area. We model our problem as follows: 302 315 Constant $\mathcal{A}$ is the total available area. We model our problem as follows:
303 316
Finally we can describe our abstract model with following expressions : 304 317 Finally we can describe our abstract model with following expressions :
\begin{align} 305 318 \begin{align}
\text{Maximize } & \sum_{i=1}^n r_i \notag \\ 306 319 \text{Maximize } & \sum_{i=1}^n r_i \notag \\
\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\ 307 320 \sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\
a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\ 308 321 a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\
r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\ 309 322 r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\
\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\ 310 323 \pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\
\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\ 311 324 \pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\
\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\ 312 325 \pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\
\pi_1^- &= \Pi^I \label{eq:init} 313 326 \pi_1^- &= \Pi^I \label{eq:init}
\end{align} 314 327 \end{align}
315 328
Equation~\ref{eq:area} states that the total area taken by the filters must be 316 329 Equation~\ref{eq:area} states that the total area taken by the filters must be
less than the available area. Equation~\ref{eq:areadef} gives the definition of 317 330 less than the available area. Equation~\ref{eq:areadef} gives the definition of
the area for a filter. More precisely, it is the area of the FIR as the Shifter 318 331 the area for a filter. More precisely, it is the area of the FIR as the Shifter
does not need any circuitry. We consider that the FIR needs $C_i$ registers of size 319 332 does not need any circuitry. We consider that the FIR needs $C_i$ registers of size
$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the 320 333 $\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the
input data and the coefficients. Equation~\ref{eq:rejectiondef} gives the 321 334 input data and the coefficients. Equation~\ref{eq:rejectiondef} gives the
definition of the rejection of the filter thanks to function~$F$ that we defined 322 335 definition of the rejection of the filter thanks to function~$F$ that we defined
previously. The Shifter does not introduce negative rejection as we explain later, 323 336 previously. The Shifter does not introduce negative rejection as we explain later,
so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the 324 337 so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the
relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add 325 338 relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add
$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes 326 339 $\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes
$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of 327 340 $\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of
a filter is the same as the input number of bits of the next filter. 328 341 a filter is the same as the input number of bits of the next filter.
Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative 329 342 Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative
rejection. Indeed, the results of the FIR can be right shifted without compromising 330 343 rejection. Indeed, the results of the FIR can be right shifted without compromising
the quality of the rejection until a threshold. Each bit of the output data 331 344 the quality of the rejection until a threshold. Each bit of the output data
increases the maximum rejection level of 6~dB. We add one to take the sign bit 332 345 increases the maximum rejection level of 6~dB. We add one to take the sign bit
into account. If equation~\ref{eq:maxshift} was not present, the Shifter could 333 346 into account. If equation~\ref{eq:maxshift} was not present, the Shifter could
shift too much and introduce some noise in the output data. Each supplementary 334 347 shift too much and introduce some noise in the output data. Each supplementary
shift bit would cause 6~dB of noise. A totally equivalent equation is: 335 348 shift bit would cause 6~dB of noise. A totally equivalent equation is:
$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right) $. 336 349 $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right) $.
Finally, equation~\ref{eq:init} gives the global input's number of bits. 337 350 Finally, equation~\ref{eq:init} gives the global input's number of bits.
338 351
This model is non-linear and even non-quadratic, as $F$ does not have a known 339 352 This model is non-linear and even non-quadratic, as $F$ does not have a known
linear or quadratic expression. We introduce $p$ FIR configurations 340 353 linear or quadratic expression. We introduce $p$ FIR configurations
$(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants. We define binary 341 354 $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants. We define binary
variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ 342 355 variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
and 0 otherwise. The new equations are as follows: 343 356 and 0 otherwise. The new equations are as follows:
344 357
\begin{align} 345 358 \begin{align}
a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ 346 359 a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\ 347 360 r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\ 348 361 \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config} 349 362 \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
\end{align} 350 363 \end{align}
351 364
Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace 352 365 Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. 353 366 respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. 354 367 Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
355 368
This modified model is quadratic, and it can be linearised if necessary. The Gurobi 356 369 This modified model is quadratic, and it can be linearised if necessary. The Gurobi
(\url{www.gurobi.com}) optimization software is used to solve this quadratic 357 370 (\url{www.gurobi.com}) optimization software is used to solve this quadratic
model, and since Gurobi is able to linearize, the model is left as is. This model 358 371 model, and since Gurobi is able to linearize, the model is left as is. This model
has $O(np)$ variables and $O(n)$ constraints. 359 372 has $O(np)$ variables and $O(n)$ constraints.
360 373
The section~\ref{sec:fixed_area} shows the results for the first version of quadratic program but the section~\ref{sec:fixed_rej} 361 374 The section~\ref{sec:fixed_area} shows the results for the first version of quadratic program but the section~\ref{sec:fixed_rej}
presents the results for the complementary problem. In this case we want 362 375 presents the results for the complementary problem. In this case we want
minimize the occupied area for a targeted rejection level. Hence we have replace 363 376 minimize the occupied area for a targeted rejection level. Hence we have replace
the objective function with: 364 377 the objective function with:
\begin{align} 365 378 \begin{align}
\text{Minimize } & \sum_{i=1}^n a_i \notag 366 379 \text{Minimize } & \sum_{i=1}^n a_i \notag
\end{align} 367 380 \end{align}
We adapt our constraints of quadratic program to replace the equation \ref{eq:area} 368 381 We adapt our constraints of quadratic program to replace the equation \ref{eq:area}
by the equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal 369 382 by the equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal
rejection required. 370 383 rejection required.
371 384
\begin{align} 372 385 \begin{align}
\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min} 373 386 \sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}
\end{align} 374 387 \end{align}
375 388
\section{Design workflow} 376 389 \section{Design workflow}
\label{sec:workflow} 377 390 \label{sec:workflow}
378 391
In this section, we describe the workflow to compute all the results presented in section~\ref{sec:fixed_area}. 379 392 In this section, we describe the workflow to compute all the results presented in section~\ref{sec:fixed_area}.
Figure~\ref{fig:workflow} shows the global workflow and the different steps involved in the computations of the results. 380 393 Figure~\ref{fig:workflow} shows the global workflow and the different steps involved in the computations of the results.
381 394
\begin{figure} 382 395 \begin{figure}
\centering 383 396 \centering
\begin{tikzpicture}[node distance=0.75cm and 2cm] 384 397 \begin{tikzpicture}[node distance=0.75cm and 2cm]
\node[draw,minimum size=1cm] (Solver) { Filter Solver } ; 385 398 \node[draw,minimum size=1cm] (Solver) { Filter Solver } ;
\node (Start) [left= 3cm of Solver] { } ; 386 399 \node (Start) [left= 3cm of Solver] { } ;
\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ; 387 400 \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;
\node (Input) [above= of TCL] { } ; 388 401 \node (Input) [above= of TCL] { } ;
\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ; 389 402 \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;
\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ; 390 403 \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;
\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ; 391 404 \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;
\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ; 392 405 \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;
\node (Results) [left= of Postproc] { } ; 393 406 \node (Results) [left= of Postproc] { } ;
394 407
\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ; 395 408 \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;
\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ; 396 409 \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;
\draw[->] (Solver) edge node [below] { (1a) } (TCL) ; 397 410 \draw[->] (Solver) edge node [below] { (1a) } (TCL) ;
\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ; 398 411 \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;
\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ; 399 412 \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;
\draw[->,dashed] (Bitstream) -- (Deploy) ; 400 413 \draw[->,dashed] (Bitstream) -- (Deploy) ;
\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ; 401 414 \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;
\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ; 402 415 \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;
\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; 403 416 \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
\draw[->] (Postproc) -- (Results) ; 404 417 \draw[->] (Postproc) -- (Results) ;
\end{tikzpicture} 405 418 \end{tikzpicture}
\caption{Design workflow from the input parameters to the results} 406 419 \caption{Design workflow from the input parameters to the results}
\label{fig:workflow} 407 420 \label{fig:workflow}
\end{figure} 408 421 \end{figure}
409 422
The filter solver is a C++ program that takes as input the maximum area 410 423 The filter solver is a C++ program that takes as input the maximum area
$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$, 411 424 $\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,
the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates 412 425 the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates
the quadratic programs and uses the Gurobi solver to get the optimal results. 413 426 the quadratic programs and uses the Gurobi solver to get the optimal results.
Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow}) 414 427 Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})
and a deploy script ((1b) on figure~\ref{fig:workflow}). 415 428 and a deploy script ((1b) on figure~\ref{fig:workflow}).
416 429
The TCL script describes the whole digital processing chain from the beginning 417 430 The TCL script describes the whole digital processing chain from the beginning
(the raw signal data) to the end (the filtered data). 418 431 (the raw signal data) to the end (the filtered data).
The raw input data generated from a Pseudo Random Number (PRN) 419 432 The raw input data generated from a Pseudo Random Number (PRN)
generator inside the FPGA and $\Pi^I$ is fixed at 16~bits. 420 433 generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.
Then the script builds each stage of the chain with a generic FIR task that 421 434 Then the script builds each stage of the chain with a generic FIR task that
comes from a skeleton library. The generic FIR is highly configurable 422 435 comes from a skeleton library. The generic FIR is highly configurable
with the number of coefficients and the size of the coefficients. The coefficients 423 436 with the number of coefficients and the size of the coefficients. The coefficients
themselves are not stored in the script. 424 437 themselves are not stored in the script.
Whereas the signal is processed in real-time, the output signal is stored as 425 438 Whereas the signal is processed in real-time, the output signal is stored as
consecutive bursts of data. 426 439 consecutive bursts of data.
427 440
The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}). 428 441 The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).
We use the 2018.2 version of Xilinx Vivado and we execute the synthesized 429 442 We use the 2018.2 version of Xilinx Vivado and we execute the synthesized
bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series 430 443 bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series
FPGA (xc7z010clg400-1) and two 125~MS/s ADC. 431 444 FPGA (xc7z010clg400-1) and two 125~MS/s ADC.
The board works with a Buildroot Linux image. We have developed some tools and 432 445 The board works with a Buildroot Linux image. We have developed some tools and
drivers to flash and communicate with the FPGA. They are used to automatize all 433 446 drivers to flash and communicate with the FPGA. They are used to automatize all
the workflow inside the board: load the filter coefficients and retrieve the 434 447 the workflow inside the board: load the filter coefficients and retrieve the
computed data. 435 448 computed data.
436 449
The deploy script uploads the bitstream to the board ((3) on 437 450 The deploy script uploads the bitstream to the board ((3) on
figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers, 438 451 figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,
configures the coefficients of the FIR filters. It then waits for the results 439 452 configures the coefficients of the FIR filters. It then waits for the results
and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}). 440 453 and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).
441 454
Finally, an Octave post-processing script computes the final results thanks to 442 455 Finally, an Octave post-processing script computes the final results thanks to
the output data ((5) on figure~\ref{fig:workflow}). 443 456 the output data ((5) on figure~\ref{fig:workflow}).
The results are normalized so that the Power Spectrum Density (PSD) starts at zero 444 457 The results are normalized so that the Power Spectrum Density (PSD) starts at zero
and the different configurations can be compared. 445 458 and the different configurations can be compared.
446 459
The workflow used to compute the results in section~\ref{sec:fixed_rej}, we 447 460 The workflow used to compute the results in section~\ref{sec:fixed_rej}, we
have just adapted the quadratic program but the rest of the workflow is unchanged. 448 461 have just adapted the quadratic program but the rest of the workflow is unchanged.
449 462
\section{Experiments with fixed area space} 450 463 \section{Experiments with fixed area space}
\label{sec:fixed_area} 451 464 \label{sec:fixed_area}
This section presents the output of the filter solver {\em i.e.} the computed 452 465 This section presents the output of the filter solver {\em i.e.} the computed
configurations for each stage, the computed rejection and the computed silicon area. 453 466 configurations for each stage, the computed rejection and the computed silicon area.
This is interesting to understand the choices made by the solver to compute its solutions. 454 467 This is interesting to understand the choices made by the solver to compute its solutions.
455 468
The experimental setup is composed of three cases. The raw input is generated 456 469 The experimental setup is composed of three cases. The raw input is generated
by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$. 457 470 by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.
Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500 458 471 Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500
arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500. 459 472 arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.
The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$ 460 473 The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$
ranging from 2 to 22. In each case, the quadratic program has been able to give a 461 474 ranging from 2 to 22. In each case, the quadratic program has been able to give a
result up to five stages ($n = 5$) in the cascaded filter. 462 475 result up to five stages ($n = 5$) in the cascaded filter.
463 476
Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500. 464 477 Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.
Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000. 465 478 Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.
Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500. 466 479 Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.
467 480
\renewcommand{\arraystretch}{1.4} 468 481 \renewcommand{\arraystretch}{1.4}
469 482
\begin{table} 470 483 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500} 471 484 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}
\label{tbl:gurobi_max_500} 472 485 \label{tbl:gurobi_max_500}
\centering 473 486 \centering
{\scalefont{0.77} 474 487 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 475 488 \begin{tabular}{|c|ccccc|c|c|}
\hline 476 489 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 477 490 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 478 491 \hline
1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\ 479 492 1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\
2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\ 480 493 2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\
3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\ 481 494 3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\
4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\ 482 495 4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\
5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\ 483 496 5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\
\hline 484 497 \hline
\end{tabular} 485 498 \end{tabular}
} 486 499 }
\end{table} 487 500 \end{table}
488 501
\begin{table} 489 502 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000} 490 503 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}
\label{tbl:gurobi_max_1000} 491 504 \label{tbl:gurobi_max_1000}
\centering 492 505 \centering
{\scalefont{0.77} 493 506 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 494 507 \begin{tabular}{|c|ccccc|c|c|}
\hline 495 508 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 496 509 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 497 510 \hline
1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\ 498 511 1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\
2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\ 499 512 2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\
3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\ 500 513 3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\
4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\ 501 514 4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\
5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\ 502 515 5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\
\hline 503 516 \hline
\end{tabular} 504 517 \end{tabular}
} 505 518 }
\end{table} 506 519 \end{table}
507 520
\begin{table} 508 521 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500} 509 522 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}
\label{tbl:gurobi_max_1500} 510 523 \label{tbl:gurobi_max_1500}
\centering 511 524 \centering
{\scalefont{0.77} 512 525 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 513 526 \begin{tabular}{|c|ccccc|c|c|}
\hline 514 527 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 515 528 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 516 529 \hline
1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\ 517 530 1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\
2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\ 518 531 2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\
3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\ 519 532 3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\
4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\ 520 533 4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\
5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\ 521 534 5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\
\hline 522 535 \hline
\end{tabular} 523 536 \end{tabular}
} 524 537 }
\end{table} 525 538 \end{table}
526 539
\renewcommand{\arraystretch}{1} 527 540 \renewcommand{\arraystretch}{1}
528 541
From these tables, we can first state that the more stages are used to define 529 542 From these tables, we can first state that the more stages are used to define
the cascaded FIR filters, the better the rejection. It was an expected result as it has 530 543 the cascaded FIR filters, the better the rejection. It was an expected result as it has
been previously observed that many small filters are better than 531 544 been previously observed that many small filters are better than
a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusion 532 545 a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusion
being hardly used in practice due to the lack of tools for identifying individual filter 533 546 being hardly used in practice due to the lack of tools for identifying individual filter
coefficients in the cascaded approach. 534 547 coefficients in the cascaded approach.
535 548
Second, the larger the silicon area, the better the rejection. This was also an 536 549 Second, the larger the silicon area, the better the rejection. This was also an
expected result as more area means a filter of better quality (more coefficients 537 550 expected result as more area means a filter of better quality (more coefficients
or more bits per coefficient). 538 551 or more bits per coefficient).
539 552
Then, we also observe that the first stage can have a larger shift than the other 540 553 Then, we also observe that the first stage can have a larger shift than the other
stages. This is explained by the fact that the solver tries to use just enough 541 554 stages. This is explained by the fact that the solver tries to use just enough
bits for the computed rejection after each stage. In the first stage, a 542 555 bits for the computed rejection after each stage. In the first stage, a
balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift} 543 556 balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}
gives the relation between both values. 544 557 gives the relation between both values.
545 558
Finally, we note that the solver consumes all the given silicon area. 546 559 Finally, we note that the solver consumes all the given silicon area.
547 560
The following graphs present the rejection for real data on the FPGA. In all following 548 561 The following graphs present the rejection for real data on the FPGA. In all following
figures, the solid line represents the actual rejection of the filtered 549 562 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line are the noise level 550 563 data on the FPGA as measured experimentally and the dashed line are the noise level
given by the quadratic solver. The configurations are those computed in the previous section. 551 564 given by the quadratic solver. The configurations are those computed in the previous section.
552 565
Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500. 553 566 Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.
Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000. 554 567 Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.
Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500. 555 568 Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.
556 569
\begin{figure} 557 570 \begin{figure}
\centering 558 571 \centering
\includegraphics[width=\linewidth]{images/max_500} 559 572 \includegraphics[width=\linewidth]{images/max_500}
\caption{Signal spectrum for MAX/500} 560 573 \caption{Signal spectrum for MAX/500}
\label{fig:max_500_result} 561 574 \label{fig:max_500_result}
\end{figure} 562 575 \end{figure}
563 576
\begin{figure} 564 577 \begin{figure}
\centering 565 578 \centering
\includegraphics[width=\linewidth]{images/max_1000} 566 579 \includegraphics[width=\linewidth]{images/max_1000}
\caption{Signal spectrum for MAX/1000} 567 580 \caption{Signal spectrum for MAX/1000}
\label{fig:max_1000_result} 568 581 \label{fig:max_1000_result}
\end{figure} 569 582 \end{figure}
570 583
\begin{figure} 571 584 \begin{figure}
\centering 572 585 \centering
\includegraphics[width=\linewidth]{images/max_1500} 573 586 \includegraphics[width=\linewidth]{images/max_1500}
\caption{Signal spectrum for MAX/1500} 574 587 \caption{Signal spectrum for MAX/1500}
\label{fig:max_1500_result} 575 588 \label{fig:max_1500_result}
\end{figure} 576 589 \end{figure}
577 590
In all cases, we observe that the actual rejection is close to the rejection computed by the solver. 578 591 In all cases, we observe that the actual rejection is close to the rejection computed by the solver.
579 592
We compare the actual silicon resources given by Vivado to the 580 593 We compare the actual silicon resources given by Vivado to the
resources in arbitrary units. 581 594 resources in arbitrary units.
The goal is to check that our arbitrary units of silicon area models well enough 582 595 The goal is to check that our arbitrary units of silicon area models well enough
the real resources on the FPGA. Especially we want to verify that, for a given 583 596 the real resources on the FPGA. Especially we want to verify that, for a given
number of arbitrary units, the actual silicon resources do not depend on the 584 597 number of arbitrary units, the actual silicon resources do not depend on the
number of stages $n$. Most significantly, our approach aims 585 598 number of stages $n$. Most significantly, our approach aims
at remaining far enough from the practical logic gate implementation used by 586 599 at remaining far enough from the practical logic gate implementation used by
various vendors to remain platform independent and be portable from one 587 600 various vendors to remain platform independent and be portable from one
architecture to another. 588 601 architecture to another.
589 602
Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and 590 603 Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and
MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000 591 604 MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000
and 1500 arbitrary units. We have taken care to extract solely the resources used by 592 605 and 1500 arbitrary units. We have taken care to extract solely the resources used by
the FIR filters and remove additional processing blocks including FIFO and PL to 593 606 the FIR filters and remove additional processing blocks including FIFO and PL to
PS communication. 594 607 PS communication.
595 608
\begin{table} 596 609 \begin{table}
\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} 597 610 \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
\label{tbl:resources_usage} 598 611 \label{tbl:resources_usage}
\centering 599 612 \centering
\begin{tabular}{|c|c|ccc|c|} 600 613 \begin{tabular}{|c|c|ccc|c|}
\hline 601 614 \hline
$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline 602 615 $n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline
& LUT & 249 & 453 & 627 & \emph{17600} \\ 603 616 & LUT & 249 & 453 & 627 & \emph{17600} \\
1 & BRAM & 1 & 1 & 1 & \emph{120} \\ 604 617 1 & BRAM & 1 & 1 & 1 & \emph{120} \\
& DSP & 21 & 37 & 47 & \emph{80} \\ \hline 605 618 & DSP & 21 & 37 & 47 & \emph{80} \\ \hline
& LUT & 2374 & 5494 & 691 & \emph{17600} \\ 606 619 & LUT & 2374 & 5494 & 691 & \emph{17600} \\
2 & BRAM & 2 & 2 & 2 & \emph{120} \\ 607 620 2 & BRAM & 2 & 2 & 2 & \emph{120} \\
& DSP & 0 & 0 & 70 & \emph{80} \\ \hline 608 621 & DSP & 0 & 0 & 70 & \emph{80} \\ \hline
& LUT & 2443 & 3304 & 3521 & \emph{17600} \\ 609 622 & LUT & 2443 & 3304 & 3521 & \emph{17600} \\
3 & BRAM & 3 & 3 & 3 & \emph{120} \\ 610 623 3 & BRAM & 3 & 3 & 3 & \emph{120} \\
& DSP & 0 & 19 & 35 & \emph{80} \\ \hline 611 624 & DSP & 0 & 19 & 35 & \emph{80} \\ \hline
& LUT & 2634 & 3753 & 2557 & \emph{17600} \\ 612 625 & LUT & 2634 & 3753 & 2557 & \emph{17600} \\
4 & BRAM & 4 & 4 & 4 & \emph{120} \\ 613 626 4 & BRAM & 4 & 4 & 4 & \emph{120} \\
& DPS & 0 & 19 & 46 & \emph{80} \\ \hline 614 627 & DPS & 0 & 19 & 46 & \emph{80} \\ \hline
& LUT & 2423 & 3047 & 2847 & \emph{17600} \\ 615 628 & LUT & 2423 & 3047 & 2847 & \emph{17600} \\
5 & BRAM & 5 & 5 & 5 & \emph{120} \\ 616 629 5 & BRAM & 5 & 5 & 5 & \emph{120} \\
& DPS & 0 & 22 & 46 & \emph{80} \\ \hline 617 630 & DPS & 0 & 22 & 46 & \emph{80} \\ \hline
\end{tabular} 618 631 \end{tabular}
\end{table} 619 632 \end{table}
620 633
In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that, 621 634 In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,
when the filters coefficients are small enough, or when the input size is small 622 635 when the filters coefficients are small enough, or when the input size is small
enough, Vivado optimized resource consumption by selecting multiplexers to 623 636 enough, Vivado optimized resource consumption by selecting multiplexers to
implement the multiplications instead of a DSP. In this case, it is quite difficult 624 637 implement the multiplications instead of a DSP. In this case, it is quite difficult
to compare the whole silicon budget. 625 638 to compare the whole silicon budget.
626 639
However, a rough estimation can be made with a simple equivalence. Looking at 627 640 However, a rough estimation can be made with a simple equivalence. Looking at
the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, 628 641 the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon 629 642 we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
area use. With this equivalence, our 500 arbitraty units corresponds to 2500 LUTs, 630 643 area use. With this equivalence, our 500 arbitraty units corresponds to 2500 LUTs,
1000 arbitrary units corresponds to 5000 LUTs and 1500 arbitrary units corresponds 631 644 1000 arbitrary units corresponds to 5000 LUTs and 1500 arbitrary units corresponds
to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary 632 645 to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
unit are quite good. The relatively small differences can probably be explained 633 646 unit are quite good. The relatively small differences can probably be explained
by the optimizations done by Vivado based on the detailed map of available processing resources. 634 647 by the optimizations done by Vivado based on the detailed map of available processing resources.
635 648
We present the computation time to solve the quadratic problem. 636 649 We present the computation time to solve the quadratic problem.
For each case, the filter solver software are executed with a Intel(R) Xeon(R) CPU E5606 637 650 For each case, the filter solver software are executed with a Intel(R) Xeon(R) CPU E5606
cadenced at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve 638 651 cadenced at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve
the quadratic problem. 639 652 the quadratic problem.
640 653
Table~\ref{tbl:area_time} shows the time needed to solve the quadratic 641 654 Table~\ref{tbl:area_time} shows the time needed to solve the quadratic
problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units. 642 655 problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.
643 656
\begin{table} 644 657 \begin{table}
\caption{Time to solve the quadratic program with Gurobi} 645 658 \caption{Time to solve the quadratic program with Gurobi}
\label{tbl:area_time} 646 659 \label{tbl:area_time}
\centering 647 660 \centering
\begin{tabular}{|c|c|c|c|}\hline 648 661 \begin{tabular}{|c|c|c|c|}\hline
$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline 649 662 $n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline
1 & 0.1~s & 0.1~s & 0.3~s \\ 650 663 1 & 0.1~s & 0.1~s & 0.3~s \\
2 & 1.1~s & 2.2~s & 12~s \\ 651 664 2 & 1.1~s & 2.2~s & 12~s \\
3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\ 652 665 3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\
4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\ 653 666 4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\
5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline 654 667 5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline
\end{tabular} 655 668 \end{tabular}
\end{table} 656 669 \end{table}
657 670
As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ? 658 671 As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?
When the area is limited, the design exploration space is more limited and the solver is able to 659 672 When the area is limited, the design exploration space is more limited and the solver is able to
find an optimal solution faster. On the contrary, in the case of MAX/1500 with 660 673 find an optimal solution faster. On the contrary, in the case of MAX/1500 with
5~stages, we were not able to obtain a result after 40~hours of computation so we decided to stop. 661 674 5~stages, we were not able to obtain a result after 40~hours of computation so we decided to stop.
662 675
\section{Experiments with fixed rejection target} 663 676 \section{Experiments with fixed rejection target}
\label{sec:fixed_rej} 664 677 \label{sec:fixed_rej}
This section presents the results of complementary quadratic program which we 665 678 This section presents the results of complementary quadratic program which we
minimize the area occupation for a targeted noise level. 666 679 minimize the area occupation for a targeted noise level.
667 680
The experimental setup is also composed of three cases. The raw input is the same 668 681 The experimental setup is also composed of three cases. The raw input is the same
as previous section, a PRN generator, which fixes the input data size $\Pi^I$. 669 682 as previous section, a PRN generator, which fixes the input data size $\Pi^I$.
Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60 or 80~dB. 670 683 Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60 or 80~dB.
Hence, the three cases have been named: MIN/40, MIN/60, MIN/80. 671 684 Hence, the three cases have been named: MIN/40, MIN/60, MIN/80.
The number of configurations $p$ is the same as previous section. 672 685 The number of configurations $p$ is the same as previous section.
673 686
Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40. 674 687 Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.
Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60. 675 688 Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.
Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80. 676 689 Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.
677 690
\renewcommand{\arraystretch}{1.4} 678 691 \renewcommand{\arraystretch}{1.4}
679 692
\begin{table} 680 693 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40} 681 694 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}
\label{tbl:gurobi_min_40} 682 695 \label{tbl:gurobi_min_40}
\centering 683 696 \centering
{\scalefont{0.77} 684 697 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 685 698 \begin{tabular}{|c|ccccc|c|c|}
\hline 686 699 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 687 700 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 688 701 \hline
1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\ 689 702 1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\
2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\ 690 703 2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\
3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\ 691 704 3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\
4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\ 692 705 4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\
\hline 693 706 \hline
\end{tabular} 694 707 \end{tabular}
} 695 708 }
\end{table} 696 709 \end{table}
697 710
\begin{table} 698 711 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60} 699 712 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}
\label{tbl:gurobi_min_60} 700 713 \label{tbl:gurobi_min_60}
\centering 701 714 \centering
{\scalefont{0.77} 702 715 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 703 716 \begin{tabular}{|c|ccccc|c|c|}
\hline 704 717 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 705 718 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 706 719 \hline
1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\ 707 720 1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\
2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\ 708 721 2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\
3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\ 709 722 3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\
4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\ 710 723 4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\
5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\ 711 724 5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\
\hline 712 725 \hline
\end{tabular} 713 726 \end{tabular}
} 714 727 }
\end{table} 715 728 \end{table}
716 729
\begin{table} 717 730 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80} 718 731 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}
\label{tbl:gurobi_min_80} 719 732 \label{tbl:gurobi_min_80}
\centering 720 733 \centering
{\scalefont{0.77} 721 734 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 722 735 \begin{tabular}{|c|ccccc|c|c|}
\hline 723 736 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 724 737 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 725 738 \hline
1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\ 726 739 1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\
2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\ 727 740 2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\
3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\ 728 741 3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\
4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\ 729 742 4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\
5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\ 730 743 5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\
\hline 731 744 \hline
\end{tabular} 732 745 \end{tabular}
} 733 746 }
\end{table} 734 747 \end{table}
\renewcommand{\arraystretch}{1} 735 748 \renewcommand{\arraystretch}{1}
736 749
From these tables, we can first state that all configuration reach the target rejection 737 750 From these tables, we can first state that all configuration reach the target rejection
level and more we have stages lesser is the area occupied in arbitrary unit. 738 751 level and more we have stages lesser is the area occupied in arbitrary unit.
Futhermore, the area of the monolithic filter is twice bigger than the two cascaded. 739 752 Futhermore, the area of the monolithic filter is twice bigger than the two cascaded.
More generally, more there is filters lower is the occupied area. 740 753 More generally, more there is filters lower is the occupied area.
741 754
Like in previous section, the solver choose always a little filter as first 742 755 Like in previous section, the solver choose always a little filter as first
filter stage and the second one is often the biggest filter. this choice can be explain 743 756 filter stage and the second one is often the biggest filter. this choice can be explain
as the previous section. The solver uses just enough bits to not degrade the input 744 757 as the previous section. The solver uses just enough bits to not degrade the input
signal and in second filter it can choose a better filter to improve rejection without 745 758 signal and in second filter it can choose a better filter to improve rejection without
have too bits in the output data. 746 759 have too bits in the output data.
747 760
For the specific case in MIN/40 for $n = 5$ the solver has determined that the optimal 748 761 For the specific case in MIN/40 for $n = 5$ the solver has determined that the optimal
number of filter is 4 so it not chose any configuration in last filter. Hence this 749 762 number of filter is 4 so it not chose any configuration in last filter. Hence this
solution is equivalent to the result for $n = 4$. 750 763 solution is equivalent to the result for $n = 4$.
751 764
The following graphs present the rejection for real data on the FPGA. In all following 752 765 The following graphs present the rejection for real data on the FPGA. In all following
figures, the solid line represents the actual rejection of the filtered 753 766 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line are the noise level 754 767 data on the FPGA as measured experimentally and the dashed line are the noise level
given by the quadratic solver. 755 768 given by the quadratic solver.
756 769
Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40. 757 770 Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.
Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60. 758 771 Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.
Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80. 759 772 Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.
760 773
\begin{figure} 761 774 \begin{figure}
\centering 762 775 \centering
\includegraphics[width=\linewidth]{images/min_40} 763 776 \includegraphics[width=\linewidth]{images/min_40}
\caption{Signal spectrum for MIN/40} 764 777 \caption{Signal spectrum for MIN/40}
images/rejection_pyramid.pdf
No preview for this file type