Commit 863c2480f40eba7278a2ffca3115a7eb6435c606

Authored by Arthur HUGEAT
1 parent c7f6afba76
Exists in master

Article avec biographies.

Showing 6 changed files with 29 additions and 0 deletions Inline Diff

43.4 KB

148 KB

31.8 KB

46.6 KB

161 KB

ifcs2018_journal.tex
\documentclass[a4paper,journal]{IEEEtran/IEEEtran} 1 1 \documentclass[a4paper,journal]{IEEEtran/IEEEtran}
\usepackage{graphicx,color,hyperref} 2 2 \usepackage{graphicx,color,hyperref}
\usepackage{amsfonts} 3 3 \usepackage{amsfonts}
\usepackage{amsthm} 4 4 \usepackage{amsthm}
\usepackage{amssymb} 5 5 \usepackage{amssymb}
\usepackage{amsmath} 6 6 \usepackage{amsmath}
\usepackage{algorithm2e} 7 7 \usepackage{algorithm2e}
\usepackage{url,balance} 8 8 \usepackage{url,balance}
\usepackage[normalem]{ulem} 9 9 \usepackage[normalem]{ulem}
\usepackage{tikz} 10 10 \usepackage{tikz}
\usetikzlibrary{positioning,fit} 11 11 \usetikzlibrary{positioning,fit}
\usepackage{multirow} 12 12 \usepackage{multirow}
\usepackage{scalefnt} 13 13 \usepackage{scalefnt}
\usepackage{caption} 14 14 \usepackage{caption}
\usepackage{subcaption} 15 15 \usepackage{subcaption}
16 16
17 17
\hyphenation{op-tical net-works semi-conduc-tor} 18 18 \hyphenation{op-tical net-works semi-conduc-tor}
\textheight=26cm 19 19 \textheight=26cm
\setlength{\footskip}{30pt} 20 20 \setlength{\footskip}{30pt}
\pagenumbering{gobble} 21 21 \pagenumbering{gobble}
\begin{document} 22 22 \begin{document}
\title{Filter optimization for real time digital processing of radiofrequency signals: application 23 23 \title{Filter optimization for real time digital processing of radiofrequency signals: application
to oscillator metrology} 24 24 to oscillator metrology}
25 25
\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, 26 26 \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},
G. Goavec-M\'erou\IEEEauthorrefmark{1}, 27 27 G. Goavec-M\'erou\IEEEauthorrefmark{1},
P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\ 28 28 P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\
\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\ 29 29 \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\
\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ 30 30 \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\
Email: \{pyb2,jmfriedt\}@femto-st.fr} 31 31 Email: \{pyb2,jmfriedt\}@femto-st.fr}
} 32 32 }
\maketitle 33 33 \maketitle
\thispagestyle{plain} 34 34 \thispagestyle{plain}
\pagestyle{plain} 35 35 \pagestyle{plain}
\newtheorem{definition}{Definition} 36 36 \newtheorem{definition}{Definition}
37 37
\begin{abstract} 38 38 \begin{abstract}
Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to 39 39 Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to
radiofrequency signal processing. Applied to oscillator characterization in the context 40 40 radiofrequency signal processing. Applied to oscillator characterization in the context
of ultrastable clocks, stringent filtering requirements are defined by spurious signal or 41 41 of ultrastable clocks, stringent filtering requirements are defined by spurious signal or
noise rejection needs. Since real time radiofrequency processing must be performed in a 42 42 noise rejection needs. Since real time radiofrequency processing must be performed in a
Field Programmable Array to meet timing constraints, we investigate optimization strategies 43 43 Field Programmable Array to meet timing constraints, we investigate optimization strategies
to design filters meeting rejection characteristics while limiting the hardware resources 44 44 to design filters meeting rejection characteristics while limiting the hardware resources
required and keeping timing constraints within the targeted measurement bandwidths. The 45 45 required and keeping timing constraints within the targeted measurement bandwidths. The
presented technique is applicable to scheduling any sequence of processing blocks characterized 46 46 presented technique is applicable to scheduling any sequence of processing blocks characterized
by a throughput, resource occupation and performance tabulated as a function of configuration 47 47 by a throughput, resource occupation and performance tabulated as a function of configuration
characateristics, as is the case for filters with their coefficients and resolution yielding 48 48 characateristics, as is the case for filters with their coefficients and resolution yielding
rejection and number of multipliers. 49 49 rejection and number of multipliers.
\end{abstract} 50 50 \end{abstract}
51 51
\begin{IEEEkeywords} 52 52 \begin{IEEEkeywords}
Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter 53 53 Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter
\end{IEEEkeywords} 54 54 \end{IEEEkeywords}
55 55
\section{Digital signal processing of ultrastable clock signals} 56 56 \section{Digital signal processing of ultrastable clock signals}
57 57
Analog oscillator phase noise characteristics are classically performed by downconverting 58 58 Analog oscillator phase noise characteristics are classically performed by downconverting
the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, 59 59 the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,
followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In 60 60 followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In
a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by 61 61 a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by
multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. 62 62 multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.
63 63
\begin{figure}[h!tb] 64 64 \begin{figure}[h!tb]
\begin{center} 65 65 \begin{center}
\includegraphics[width=.8\linewidth]{images/schema} 66 66 \includegraphics[width=.8\linewidth]{images/schema}
\end{center} 67 67 \end{center}
\caption{Fully digital oscillator phase noise characterization: the Device Under Test 68 68 \caption{Fully digital oscillator phase noise characterization: the Device Under Test
(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and 69 69 (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and
downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals 70 70 downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals
and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite 71 71 and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite
Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays 72 72 Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays
the spectral characteristics of the phase fluctuations.} 73 73 the spectral characteristics of the phase fluctuations.}
\label{schema} 74 74 \label{schema}
\end{figure} 75 75 \end{figure}
76 76
As with the analog mixer, 77 77 As with the analog mixer,
the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as 78 78 the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as
well as the generation of the frequency sum signal in addition to the frequency difference. 79 79 well as the generation of the frequency sum signal in addition to the frequency difference.
These unwanted spectral characteristics must be rejected before decimating the data stream 80 80 These unwanted spectral characteristics must be rejected before decimating the data stream
for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the 81 81 for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the
downconverter 82 82 downconverter
and the decimation processing blocks are core characteristics of an oscillator characterization 83 83 and the decimation processing blocks are core characteristics of an oscillator characterization
system, and must reject out-of-band signals below the targeted phase noise -- typically in the 84 84 system, and must reject out-of-band signals below the targeted phase noise -- typically in the
sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will 85 85 sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will
use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency 86 86 use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency
datastream: optimizing the performance of the filter while reducing the needed resources is 87 87 datastream: optimizing the performance of the filter while reducing the needed resources is
hence tackled in a systematic approach using optimization techniques. Most significantly, we 88 88 hence tackled in a systematic approach using optimization techniques. Most significantly, we
tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with 89 89 tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with
tunable number of coefficients and tunable number of bits representing the coefficients and the 90 90 tunable number of coefficients and tunable number of bits representing the coefficients and the
data being processed. 91 91 data being processed.
92 92
\section{Finite impulse response filter} 93 93 \section{Finite impulse response filter}
94 94
We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined 95 95 We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined
by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the 96 96 by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the
outputs $y_k$ 97 97 outputs $y_k$
\begin{align} 98 98 \begin{align}
y_n=\sum_{k=0}^N b_k x_{n-k} 99 99 y_n=\sum_{k=0}^N b_k x_{n-k}
\label{eq:fir_equation} 100 100 \label{eq:fir_equation}
\end{align} 101 101 \end{align}
102 102
As opposed to an implementation on a general purpose processor in which word size is defined by the 103 103 As opposed to an implementation on a general purpose processor in which word size is defined by the
processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since 104 104 processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since
not only the coefficient values and number of taps must be defined, but also the number of bits 105 105 not only the coefficient values and number of taps must be defined, but also the number of bits
defining the coefficients and the sample size. For this reason, and because we consider pipeline 106 106 defining the coefficients and the sample size. For this reason, and because we consider pipeline
processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency 107 107 processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency
signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but 108 108 signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but
the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language 109 109 the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language
(VHDL) level. 110 110 (VHDL) level.
Since latency is not an issue in a openloop phase noise characterization instrument, 111 111 Since latency is not an issue in a openloop phase noise characterization instrument,
the large 112 112 the large
numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, 113 113 numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,
is not considered as an issue as would be in a closed loop system. 114 114 is not considered as an issue as would be in a closed loop system.
115 115
The coefficients are classically expressed as floating point values. However, this binary 116 116 The coefficients are classically expressed as floating point values. However, this binary
number representation is not efficient for fast arithmetic computation by an FPGA. Instead, 117 117 number representation is not efficient for fast arithmetic computation by an FPGA. Instead,
we select to quantify these floating point values into integer values. This quantization 118 118 we select to quantify these floating point values into integer values. This quantization
will result in some precision loss. 119 119 will result in some precision loss.
120 120
\begin{figure}[h!tb] 121 121 \begin{figure}[h!tb]
\includegraphics[width=\linewidth]{images/zero_values} 122 122 \includegraphics[width=\linewidth]{images/zero_values}
\caption{Impact of the quantization resolution of the coefficients: the quantization is 123 123 \caption{Impact of the quantization resolution of the coefficients: the quantization is
set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting 124 124 set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting
the 30~first and 30~last coefficients out of the initial 128~band-pass 125 125 the 30~first and 30~last coefficients out of the initial 128~band-pass
filter coefficients to 0 (red dots).} 126 126 filter coefficients to 0 (red dots).}
\label{float_vs_int} 127 127 \label{float_vs_int}
\end{figure} 128 128 \end{figure}
129 129
The tradeoff between quantization resolution and number of coefficients when considering 130 130 The tradeoff between quantization resolution and number of coefficients when considering
integer operations is not trivial. As an illustration of the issue related to the 131 131 integer operations is not trivial. As an illustration of the issue related to the
relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits 132 132 relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits
a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon 133 133 a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon
quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the 134 134 quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the
taps become null, making the large number of coefficients irrelevant: processing 135 135 taps become null, making the large number of coefficients irrelevant: processing
resources 136 136 resources
are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources 137 137 are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources
to reach a given rejection level, or maximizing out of band rejection for a given computational 138 138 to reach a given rejection level, or maximizing out of band rejection for a given computational
resource, will drive the investigation on cascading filters designed with varying tap resolution 139 139 resource, will drive the investigation on cascading filters designed with varying tap resolution
and tap length, as will be shown in the next section. Indeed, our development strategy closely 140 140 and tap length, as will be shown in the next section. Indeed, our development strategy closely
follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} 141 141 follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}
in which basic blocks are defined and characterized before being assembled \cite{hide} 142 142 in which basic blocks are defined and characterized before being assembled \cite{hide}
in a complete processing chain. In our case, assembling the filter blocks is a simpler block 143 143 in a complete processing chain. In our case, assembling the filter blocks is a simpler block
combination process since we assume a single value to be processed and a single value to be 144 144 combination process since we assume a single value to be processed and a single value to be
generated at each clock cycle. The FIR filters will not be considered to decimate in the 145 145 generated at each clock cycle. The FIR filters will not be considered to decimate in the
current implementation: the decimation is assumed to be located after the FIR cascade at the 146 146 current implementation: the decimation is assumed to be located after the FIR cascade at the
moment. 147 147 moment.
148 148
\section{Methodology description} 149 149 \section{Methodology description}
150 150
Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP) 151 151 Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)
chain obtained by assembling basic processing blocks, with hardware and manufacturer independence. 152 152 chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.
Achieving such a target requires defining an abstract model to represent some basic properties 153 153 Achieving such a target requires defining an abstract model to represent some basic properties
of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and 154 154 of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and
resource occupation. These abstract properties, not necessarily related to the detailed hardware 155 155 resource occupation. These abstract properties, not necessarily related to the detailed hardware
implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum 156 156 implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum
target, whether in terms of maximizing performance for a given arbitrary resource occupation, or 157 157 target, whether in terms of maximizing performance for a given arbitrary resource occupation, or
minimizing resource occupation for a given performance. In our approach, the solution of the 158 158 minimizing resource occupation for a given performance. In our approach, the solution of the
solver is then synthesized using the dedicated tool provided by each platform manufacturer 159 159 solver is then synthesized using the dedicated tool provided by each platform manufacturer
to assess the validity of our abstract resource occupation indicator, and the result of running 160 160 to assess the validity of our abstract resource occupation indicator, and the result of running
the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize 161 161 the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize
that all solutions found by the solver are synthesized and executed on hardware at the end 162 162 that all solutions found by the solver are synthesized and executed on hardware at the end
of the analysis. 163 163 of the analysis.
164 164
In this demonstration, we focus on only two operations: filtering and shifting the number of 165 165 In this demonstration, we focus on only two operations: filtering and shifting the number of
bits needed to represent the data along the processing chain. 166 166 bits needed to represent the data along the processing chain.
We have chosen these basic operations because shifting and the filtering have already been studied 167 167 We have chosen these basic operations because shifting and the filtering have already been studied
in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for 168 168 in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for
assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend 169 169 assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend
requiring pipelined processing at full bandwidth for the earliest steps, including for 170 170 requiring pipelined processing at full bandwidth for the earliest steps, including for
time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}. 171 171 time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.
172 172
Addressing only two operations allows for demonstrating the methodology but should not be 173 173 Addressing only two operations allows for demonstrating the methodology but should not be
considered as a limitation of the framework which can be extended to assembling any number 174 174 considered as a limitation of the framework which can be extended to assembling any number
of skeleton blocks as long as performance and resource occupation can be determined. 175 175 of skeleton blocks as long as performance and resource occupation can be determined.
Hence, 176 176 Hence,
in this paper we will apply our methodology on simple DSP chains: a white noise input signal 177 177 in this paper we will apply our methodology on simple DSP chains: a white noise input signal
is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s) 178 178 is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)
14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been 179 179 14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been
digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance -- 180 180 digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --
practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction 181 181 practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction
by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing, 182 182 by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,
allowing to assess either filter rejection for a given resource usage, or validating the rejection 183 183 allowing to assess either filter rejection for a given resource usage, or validating the rejection
when implementing a solution minimizing resource occupation. 184 184 when implementing a solution minimizing resource occupation.
185 185
The first step of our approach is to model the DSP chain. Since we aim at only optimizing 186 186 The first step of our approach is to model the DSP chain. Since we aim at only optimizing
the filtering part of the signal processing chain, we have not included the PRN generator or the 187 187 the filtering part of the signal processing chain, we have not included the PRN generator or the
ADC in the model: the input data size and rate are considered fixed and defined by the hardware. 188 188 ADC in the model: the input data size and rate are considered fixed and defined by the hardware.
The filtering can be done in two ways, either by considering a single monolithic FIR filter 189 189 The filtering can be done in two ways, either by considering a single monolithic FIR filter
requiring many coefficients to reach the targeted noise rejection ratio, or by 190 190 requiring many coefficients to reach the targeted noise rejection ratio, or by
cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter. 191 191 cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.
192 192
After each filter we leave the possibility of shifting the filtered data to consume 193 193 After each filter we leave the possibility of shifting the filtered data to consume
less resources. Hence in the case of cascaded filter, we define a stage as a filter 194 194 less resources. Hence in the case of cascaded filter, we define a stage as a filter
and a shifter (the shift could be omitted if we do not need to divide the filtered data). 195 195 and a shifter (the shift could be omitted if we do not need to divide the filtered data).
196 196
\subsection{Model of a FIR filter} 197 197 \subsection{Model of a FIR filter}
198 198
A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$) 199 199 A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)
the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$ 200 200 the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$
bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as 201 201 bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as
the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage} 202 202 the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}
shows a filtering stage. 203 203 shows a filtering stage.
204 204
\begin{figure} 205 205 \begin{figure}
\centering 206 206 \centering
\begin{tikzpicture}[node distance=2cm] 207 207 \begin{tikzpicture}[node distance=2cm]
\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ; 208 208 \node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;
\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ; 209 209 \node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;
\node (Start) [left of=FIR] { } ; 210 210 \node (Start) [left of=FIR] { } ;
\node (End) [right of=Shift] { } ; 211 211 \node (End) [right of=Shift] { } ;
212 212
\node[draw,fit=(FIR) (Shift)] (Filter) { } ; 213 213 \node[draw,fit=(FIR) (Shift)] (Filter) { } ;
214 214
\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ; 215 215 \draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;
\draw[->] (FIR) -- (Shift) ; 216 216 \draw[->] (FIR) -- (Shift) ;
\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ; 217 217 \draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;
\end{tikzpicture} 218 218 \end{tikzpicture}
\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)} 219 219 \caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}
\label{fig:fir_stage} 220 220 \label{fig:fir_stage}
\end{figure} 221 221 \end{figure}
222 222
FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB. 223 223 FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.
This rejection has been computed using GNU Octave software FIR coefficient design functions 224 224 This rejection has been computed using GNU Octave software FIR coefficient design functions
(\texttt{firls} and \texttt{fir1}). 225 225 (\texttt{firls} and \texttt{fir1}).
For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients. 226 226 For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.
Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively, 227 227 Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,
the coefficients are normalized by their absolute maximum before being scaled to integer coefficients. 228 228 the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.
At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits. 229 229 At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.
230 230
With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter 231 231 With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter
transfer function. 232 232 transfer function.
Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag}, 233 233 Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},
the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the 234 234 the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the
bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration, 235 235 bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,
we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\% 236 236 we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%
of the Nyquist frequency to the end of the band, as would be typically selected to prevent 237 237 of the Nyquist frequency to the end of the band, as would be typically selected to prevent
aliasing before decimating the dataflow by 2. The method is however generalized to any filter 238 238 aliasing before decimating the dataflow by 2. The method is however generalized to any filter
shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid} 239 239 shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}
as described below is indeed unique for each filter shape. 240 240 as described below is indeed unique for each filter shape.
241 241
\begin{figure} 242 242 \begin{figure}
\begin{center} 243 243 \begin{center}
\scalebox{0.8}{ 244 244 \scalebox{0.8}{
\centering 245 245 \centering
\begin{tikzpicture}[scale=0.3] 246 246 \begin{tikzpicture}[scale=0.3]
\draw[<->] (0,15) -- (0,0) -- (21,0) ; 247 247 \draw[<->] (0,15) -- (0,0) -- (21,0) ;
\draw[thick] (0,12) -- (8,12) -- (20,0) ; 248 248 \draw[thick] (0,12) -- (8,12) -- (20,0) ;
249 249
\draw (0,14) node [left] { $P$ } ; 250 250 \draw (0,14) node [left] { $P$ } ;
\draw (20,0) node [below] { $f$ } ; 251 251 \draw (20,0) node [below] { $f$ } ;
252 252
\draw[>=latex,<->] (0,14) -- (8,14) ; 253 253 \draw[>=latex,<->] (0,14) -- (8,14) ;
\draw (4,14) node [above] { passband } node [below] { $40\%$ } ; 254 254 \draw (4,14) node [above] { passband } node [below] { $40\%$ } ;
255 255
\draw[>=latex,<->] (8,14) -- (12,14) ; 256 256 \draw[>=latex,<->] (8,14) -- (12,14) ;
\draw (10,14) node [above] { transition } node [below] { $20\%$ } ; 257 257 \draw (10,14) node [above] { transition } node [below] { $20\%$ } ;
258 258
\draw[>=latex,<->] (12,14) -- (20,14) ; 259 259 \draw[>=latex,<->] (12,14) -- (20,14) ;
\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ; 260 260 \draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;
261 261
\draw[>=latex,<->] (16,12) -- (16,8) ; 262 262 \draw[>=latex,<->] (16,12) -- (16,8) ;
\draw (16,10) node [right] { rejection } ; 263 263 \draw (16,10) node [right] { rejection } ;
264 264
\draw[dashed] (8,-1) -- (8,14) ; 265 265 \draw[dashed] (8,-1) -- (8,14) ;
\draw[dashed] (12,-1) -- (12,14) ; 266 266 \draw[dashed] (12,-1) -- (12,14) ;
267 267
\draw[dashed] (8,12) -- (16,12) ; 268 268 \draw[dashed] (8,12) -- (16,12) ;
\draw[dashed] (12,8) -- (16,8) ; 269 269 \draw[dashed] (12,8) -- (16,8) ;
270 270
\end{tikzpicture} 271 271 \end{tikzpicture}
} 272 272 }
\end{center} 273 273 \end{center}
\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$: 274 274 \caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:
the passband is considered to occupy the initial 40\% of the Nyquist frequency range, 275 275 the passband is considered to occupy the initial 40\% of the Nyquist frequency range,
the stopband the last 40\%, allowing 20\% transition width.} 276 276 the stopband the last 40\%, allowing 20\% transition width.}
\label{fig:fir_mag} 277 277 \label{fig:fir_mag}
\end{figure} 278 278 \end{figure}
279 279
In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics. 280 280 In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.
Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches 281 281 Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches
overestimate the rejection capability of the filter. 282 282 overestimate the rejection capability of the filter.
An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values 283 283 An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values
within the passband is subtracted to avoid filters with excessive ripples, normalized to the 284 284 within the passband is subtracted to avoid filters with excessive ripples, normalized to the
bin width to remain consistent with the passband criterion (dBc/Hz units in all cases). 285 285 bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).
In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples 286 286 In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples
led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band. 287 287 led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band.
Hence, the final criterion considers the minimal rejection in the stopband to which the 288 288 Hence, the final criterion considers the minimal rejection in the stopband to which the
the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with 289 289 the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with
a 1~dB threshold on the latter quantity over which the filter is discarded. 290 290 a 1~dB threshold on the latter quantity over which the filter is discarded.
With this 291 291 With this
criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}. 292 292 criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.
The best filter has a correct rejection estimation and the worst filter 293 293 The best filter has a correct rejection estimation and the worst filter
is discarded based on the excessive passband ripple criterion. 294 294 is discarded based on the excessive passband ripple criterion.
295 295
\begin{figure} 296 296 \begin{figure}
\centering 297 297 \centering
\includegraphics[width=\linewidth]{images/custom_criterion} 298 298 \includegraphics[width=\linewidth]{images/custom_criterion}
\caption{Selected filter qualification criterion computed as the maximum rejection in the stopband 299 299 \caption{Selected filter qualification criterion computed as the maximum rejection in the stopband
minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded: 300 300 minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded:
comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).} 301 301 comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).}
\label{fig:custom_criterion} 302 302 \label{fig:custom_criterion}
\end{figure} 303 303 \end{figure}
304 304
Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps 305 305 Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps
and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the 306 306 and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the
rejection as a function of the number of coefficients and the number of bits representing these coefficients. 307 307 rejection as a function of the number of coefficients and the number of bits representing these coefficients.
The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet. 308 308 The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.
Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection. 309 309 Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.
Conversely when setting the a given number of bits, increasing the number of coefficients will not improve 310 310 Conversely when setting the a given number of bits, increasing the number of coefficients will not improve
the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length 311 311 the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length
and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected 312 312 and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected
by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding 313 313 by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding
these filters and so is the solution search space. 314 314 these filters and so is the solution search space.
315 315
\begin{figure} 316 316 \begin{figure}
\centering 317 317 \centering
\includegraphics[width=\linewidth]{images/rejection_pyramid} 318 318 \includegraphics[width=\linewidth]{images/rejection_pyramid}
\caption{Filter rejection as a function of number of coefficients and number of bits 319 319 \caption{Filter rejection as a function of number of coefficients and number of bits
: this lookup table will be used to identify which filter parameters -- number of bits 320 320 : this lookup table will be used to identify which filter parameters -- number of bits
representing coefficients and number of coefficients -- best match the targeted transfer function. Filters 321 321 representing coefficients and number of coefficients -- best match the targeted transfer function. Filters
with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive 322 322 with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive
ripples in the passband.} 323 323 ripples in the passband.}
\label{fig:rejection_pyramid} 324 324 \label{fig:rejection_pyramid}
\end{figure} 325 325 \end{figure}
326 326
Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps), 327 327 Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),
we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria. 328 328 we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.
If the FIR filter coefficients are the same between the stages, we have: 329 329 If the FIR filter coefficients are the same between the stages, we have:
$$F_{total} = F_1 + F_2$$ 330 330 $$F_{total} = F_1 + F_2$$
But selecting two different sets of coefficient will yield a more complex situation in which 331 331 But selecting two different sets of coefficient will yield a more complex situation in which
the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves 332 332 the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves
are two different filters with maximums and notches not located at the same frequency offsets. 333 333 are two different filters with maximums and notches not located at the same frequency offsets.
Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved 334 334 Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved
with respect to a basic sum of the rejection criteria shown as a the dotted yellow line. 335 335 with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.
Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection 336 336 Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection
criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade, 337 337 criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,
this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability 338 338 this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability
of the filter cascade to meet design criteria. 339 339 of the filter cascade to meet design criteria.
340 340
\begin{figure} 341 341 \begin{figure}
\centering 342 342 \centering
\includegraphics[width=\linewidth]{images/cascaded_criterion} 343 343 \includegraphics[width=\linewidth]{images/cascaded_criterion}
\caption{Transfer function of individual filters and after cascading the two filters, 344 344 \caption{Transfer function of individual filters and after cascading the two filters,
demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal 345 345 demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal
lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop 346 346 lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop
maximum of each individual filter. 347 347 maximum of each individual filter.
} 348 348 }
\label{fig:sum_rejection} 349 349 \label{fig:sum_rejection}
\end{figure} 350 350 \end{figure}
351 351
Finally in our case, we consider that the input signal are fully known. The 352 352 Finally in our case, we consider that the input signal are fully known. The
resolution of the input data stream are fixed and still the same for all experiments 353 353 resolution of the input data stream are fixed and still the same for all experiments
in this paper. 354 354 in this paper.
355 355
Based on this analysis, we address the estimate of resource consumption (called 356 356 Based on this analysis, we address the estimate of resource consumption (called
silicon area -- in the case of FPGAs this means processing cells) as a function of 357 357 silicon area -- in the case of FPGAs this means processing cells) as a function of
filter characteristics. As a reminder, we do not aim at matching actual hardware 358 358 filter characteristics. As a reminder, we do not aim at matching actual hardware
configuration but consider an arbitrary silicon area occupied by each processing function, 359 359 configuration but consider an arbitrary silicon area occupied by each processing function,
and will assess after synthesis the adequation of this arbitrary unit with actual 360 360 and will assess after synthesis the adequation of this arbitrary unit with actual
hardware resources provided by FPGA manufacturers. The sum of individual processing 361 361 hardware resources provided by FPGA manufacturers. The sum of individual processing
unit areas is constrained by a total silicon area representative of FPGA global resources. 362 362 unit areas is constrained by a total silicon area representative of FPGA global resources.
Formally, variable $a_i$ is the area taken by filter~$i$ 363 363 Formally, variable $a_i$ is the area taken by filter~$i$
(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB). 364 364 (in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).
Constant $\mathcal{A}$ is the total available area. We model our problem as follows: 365 365 Constant $\mathcal{A}$ is the total available area. We model our problem as follows:
366 366
\begin{align} 367 367 \begin{align}
\text{Maximize } & \sum_{i=1}^n r_i \notag \\ 368 368 \text{Maximize } & \sum_{i=1}^n r_i \notag \\
\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\ 369 369 \sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\
a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\ 370 370 a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\
r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\ 371 371 r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\
\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\ 372 372 \pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\
\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\ 373 373 \pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\
\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\ 374 374 \pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\
\pi_1^- &= \Pi^I \label{eq:init} 375 375 \pi_1^- &= \Pi^I \label{eq:init}
\end{align} 376 376 \end{align}
377 377
Equation~\ref{eq:area} states that the total area taken by the filters must be 378 378 Equation~\ref{eq:area} states that the total area taken by the filters must be
less than the available area. Equation~\ref{eq:areadef} gives the definition of 379 379 less than the available area. Equation~\ref{eq:areadef} gives the definition of
the area used by a filter, considered as the area of the FIR since the Shifter is 380 380 the area used by a filter, considered as the area of the FIR since the Shifter is
assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size 381 381 assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size
$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the 382 382 $\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the
input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the 383 383 input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the
definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined 384 384 definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined
previously. The Shifter does not introduce negative rejection as we will explain later, 385 385 previously. The Shifter does not introduce negative rejection as we will explain later,
so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the 386 386 so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the
relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add 387 387 relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add
$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes 388 388 $\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes
$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of 389 389 $\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of
a filter is the same as the input number of bits of the next filter. 390 390 a filter is the same as the input number of bits of the next filter.
Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative 391 391 Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative
rejection. Indeed, the results of the FIR can be right shifted without compromising 392 392 rejection. Indeed, the results of the FIR can be right shifted without compromising
the quality of the rejection until a threshold. Each bit of the output data 393 393 the quality of the rejection until a threshold. Each bit of the output data
increases the maximum rejection level by 6~dB. We add one to take the sign bit 394 394 increases the maximum rejection level by 6~dB. We add one to take the sign bit
into account. If equation~\ref{eq:maxshift} was not present, the Shifter could 395 395 into account. If equation~\ref{eq:maxshift} was not present, the Shifter could
shift too much and introduce some noise in the output data. Each supplementary 396 396 shift too much and introduce some noise in the output data. Each supplementary
shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is: 397 397 shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:
$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$. 398 398 $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.
Finally, equation~\ref{eq:init} gives the number of bits of the global input. 399 399 Finally, equation~\ref{eq:init} gives the number of bits of the global input.
400 400
This model is non-linear since we multiply some variable with another variable 401 401 This model is non-linear since we multiply some variable with another variable
and it is even non-quadratic, as the cost function $F$ does not have a known 402 402 and it is even non-quadratic, as the cost function $F$ does not have a known
linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations. 403 403 linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.
This variable $p$ is defined by the user, and represents the number of different 404 404 This variable $p$ is defined by the user, and represents the number of different
set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1} 405 405 set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}
functions from GNU Octave) based on the targeted filter characteristics and implementation 406 406 functions from GNU Octave) based on the targeted filter characteristics and implementation
assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and 407 407 assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and
$\pi_{ij}^C$ become constants and 408 408 $\pi_{ij}^C$ become constants and
we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table) 409 409 we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)
for each configurations thanks to the rejection criterion. We also define the binary 410 410 for each configurations thanks to the rejection criterion. We also define the binary
variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ 411 411 variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
and 0 otherwise. The new equations are as follows: 412 412 and 0 otherwise. The new equations are as follows:
413 413
\begin{align} 414 414 \begin{align}
a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ 415 415 a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\ 416 416 r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\ 417 417 \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config} 418 418 \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
\end{align} 419 419 \end{align}
420 420
Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace 421 421 Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. 422 422 respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. 423 423 Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
424 424
The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} 425 425 The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
we multiply 426 426 we multiply
$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can 427 427 $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
linearize this multiplication. The following formula shows how to linearize 428 428 linearize this multiplication. The following formula shows how to linearize
this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$): 429 429 this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):
\begin{equation*} 430 430 \begin{equation*}
m = x \times y \implies 431 431 m = x \times y \implies
\left \{ 432 432 \left \{
\begin{split} 433 433 \begin{split}
m & \geq 0 \\ 434 434 m & \geq 0 \\
m & \leq y \times X^{max} \\ 435 435 m & \leq y \times X^{max} \\
m & \leq x \\ 436 436 m & \leq x \\
m & \geq x - (1 - y) \times X^{max} \\ 437 437 m & \geq x - (1 - y) \times X^{max} \\
\end{split} 438 438 \end{split}
\right . 439 439 \right .
\end{equation*} 440 440 \end{equation*}
So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is 441 441 So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is
assumed on hardware characteristics, 442 442 assumed on hardware characteristics,
the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize 443 443 the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize
for us the quadratic problem so the model is left as is. This model 444 444 for us the quadratic problem so the model is left as is. This model
has $O(np)$ variables and $O(n)$ constraints. 445 445 has $O(np)$ variables and $O(n)$ constraints.
446 446
Two problems will be addressed using the workflow described in the next section: on the one 447 447 Two problems will be addressed using the workflow described in the next section: on the one
hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary 448 448 hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary
silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area 449 449 silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area
for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the 450 450 for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the
objective function is replaced with: 451 451 objective function is replaced with:
\begin{align} 452 452 \begin{align}
\text{Minimize } & \sum_{i=1}^n a_i \notag 453 453 \text{Minimize } & \sum_{i=1}^n a_i \notag
\end{align} 454 454 \end{align}
We adapt our constraints of quadratic program to replace equation \ref{eq:area} 455 455 We adapt our constraints of quadratic program to replace equation \ref{eq:area}
with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal 456 456 with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal
rejection required. 457 457 rejection required.
458 458
\begin{align} 459 459 \begin{align}
\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min} 460 460 \sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}
\end{align} 461 461 \end{align}
462 462
\section{Design workflow} 463 463 \section{Design workflow}
\label{sec:workflow} 464 464 \label{sec:workflow}
465 465
In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area} 466 466 In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}
and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved 467 467 and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved
in the computation of the results. 468 468 in the computation of the results.
469 469
\begin{figure} 470 470 \begin{figure}
\centering 471 471 \centering
\begin{tikzpicture}[node distance=0.75cm and 2cm] 472 472 \begin{tikzpicture}[node distance=0.75cm and 2cm]
\node[draw,minimum size=1cm] (Solver) { Filter Solver } ; 473 473 \node[draw,minimum size=1cm] (Solver) { Filter Solver } ;
\node (Start) [left= 3cm of Solver] { } ; 474 474 \node (Start) [left= 3cm of Solver] { } ;
\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ; 475 475 \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;
\node (Input) [above= of TCL] { } ; 476 476 \node (Input) [above= of TCL] { } ;
\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ; 477 477 \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;
\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ; 478 478 \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;
\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ; 479 479 \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;
\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ; 480 480 \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;
\node (Results) [left= of Postproc] { } ; 481 481 \node (Results) [left= of Postproc] { } ;
482 482
\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ; 483 483 \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;
\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ; 484 484 \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;
\draw[->] (Solver) edge node [below] { (1a) } (TCL) ; 485 485 \draw[->] (Solver) edge node [below] { (1a) } (TCL) ;
\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ; 486 486 \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;
\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ; 487 487 \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;
\draw[->,dashed] (Bitstream) -- (Deploy) ; 488 488 \draw[->,dashed] (Bitstream) -- (Deploy) ;
\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ; 489 489 \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;
\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ; 490 490 \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;
\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; 491 491 \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
\draw[->] (Postproc) -- (Results) ; 492 492 \draw[->] (Postproc) -- (Results) ;
\end{tikzpicture} 493 493 \end{tikzpicture}
\caption{Design workflow from the input parameters to the results allowing for 494 494 \caption{Design workflow from the input parameters to the results allowing for
a fully automated optimal solution search.} 495 495 a fully automated optimal solution search.}
\label{fig:workflow} 496 496 \label{fig:workflow}
\end{figure} 497 497 \end{figure}
498 498
The filter solver is a C++ program that takes as input the maximum area 499 499 The filter solver is a C++ program that takes as input the maximum area
$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$, 500 500 $\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,
the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates 501 501 the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates
the quadratic programs and uses the Gurobi solver to estimate the optimal results. 502 502 the quadratic programs and uses the Gurobi solver to estimate the optimal results.
Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow}) 503 503 Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})
and a deploy script ((1b) on figure~\ref{fig:workflow}). 504 504 and a deploy script ((1b) on figure~\ref{fig:workflow}).
505 505
The TCL script describes the whole digital processing chain from the beginning 506 506 The TCL script describes the whole digital processing chain from the beginning
(the raw signal data) to the end (the filtered data) in a language compatible 507 507 (the raw signal data) to the end (the filtered data) in a language compatible
with proprietary synthesis software, namely Vivado for Xilinx and Quartus for 508 508 with proprietary synthesis software, namely Vivado for Xilinx and Quartus for
Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN) 509 509 Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)
generator inside the FPGA and $\Pi^I$ is fixed at 16~bits. 510 510 generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.
Then the script builds each stage of the chain with a generic FIR task that 511 511 Then the script builds each stage of the chain with a generic FIR task that
comes from a skeleton library. The generic FIR is highly configurable 512 512 comes from a skeleton library. The generic FIR is highly configurable
with the number of coefficients and the size of the coefficients. The coefficients 513 513 with the number of coefficients and the size of the coefficients. The coefficients
themselves are not stored in the script. 514 514 themselves are not stored in the script.
As the signal is processed in real-time, the output signal is stored as 515 515 As the signal is processed in real-time, the output signal is stored as
consecutive bursts of data for post-processing, mainly assessing the consistency of the 516 516 consecutive bursts of data for post-processing, mainly assessing the consistency of the
implemented FIR cascade transfer function with the design criteria and the expected 517 517 implemented FIR cascade transfer function with the design criteria and the expected
transfer function. 518 518 transfer function.
519 519
The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}). 520 520 The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).
We use the 2018.2 version of Xilinx Vivado and we execute the synthesized 521 521 We use the 2018.2 version of Xilinx Vivado and we execute the synthesized
bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series 522 522 bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series
FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to 523 523 FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to
provide a broadband noise source. 524 524 provide a broadband noise source.
The board runs the Linux kernel and surrounding environment produced from the 525 525 The board runs the Linux kernel and surrounding environment produced from the
Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring 526 526 Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring
the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and 527 527 the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and
fetching the results is automated. 528 528 fetching the results is automated.
529 529
The deploy script uploads the bitstream to the board ((3) on 530 530 The deploy script uploads the bitstream to the board ((3) on
figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers, 531 531 figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,
configures the coefficients of the FIR filters. It then waits for the results 532 532 configures the coefficients of the FIR filters. It then waits for the results
and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}). 533 533 and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).
534 534
Finally, an Octave post-processing script computes the final results thanks to 535 535 Finally, an Octave post-processing script computes the final results thanks to
the output data ((5) on figure~\ref{fig:workflow}). 536 536 the output data ((5) on figure~\ref{fig:workflow}).
The results are normalized so that the Power Spectrum Density (PSD) starts at zero 537 537 The results are normalized so that the Power Spectrum Density (PSD) starts at zero
and the different configurations can be compared. 538 538 and the different configurations can be compared.
539 539
\section{Maximizing the rejection at fixed silicon area} 540 540 \section{Maximizing the rejection at fixed silicon area}
\label{sec:fixed_area} 541 541 \label{sec:fixed_area}
This section presents the output of the filter solver {\em i.e.} the computed 542 542 This section presents the output of the filter solver {\em i.e.} the computed
configurations for each stage, the computed rejection and the computed silicon area. 543 543 configurations for each stage, the computed rejection and the computed silicon area.
Such results allow for understanding the choices made by the solver to compute its solutions. 544 544 Such results allow for understanding the choices made by the solver to compute its solutions.
545 545
The experimental setup is composed of three cases. The raw input is generated 546 546 The experimental setup is composed of three cases. The raw input is generated
by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$. 547 547 by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.
Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500 548 548 Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500
arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500. 549 549 arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.
The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$ 550 550 The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$
ranging from 2 to 22. In each case, the quadratic program has been able to give a 551 551 ranging from 2 to 22. In each case, the quadratic program has been able to give a
result up to five stages ($n = 5$) in the cascaded filter. 552 552 result up to five stages ($n = 5$) in the cascaded filter.
553 553
Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500. 554 554 Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.
Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000. 555 555 Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.
Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500. 556 556 Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.
557 557
\renewcommand{\arraystretch}{1.4} 558 558 \renewcommand{\arraystretch}{1.4}
559 559
\begin{table} 560 560 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500} 561 561 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}
\label{tbl:gurobi_max_500} 562 562 \label{tbl:gurobi_max_500}
\centering 563 563 \centering
{\scalefont{0.77} 564 564 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 565 565 \begin{tabular}{|c|ccccc|c|c|}
\hline 566 566 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 567 567 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 568 568 \hline
1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\ 569 569 1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\
2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\ 570 570 2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\
3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ 571 571 3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ 572 572 4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\ 573 573 5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
\hline 574 574 \hline
\end{tabular} 575 575 \end{tabular}
} 576 576 }
\end{table} 577 577 \end{table}
578 578
\begin{table} 579 579 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000} 580 580 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}
\label{tbl:gurobi_max_1000} 581 581 \label{tbl:gurobi_max_1000}
\centering 582 582 \centering
{\scalefont{0.77} 583 583 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 584 584 \begin{tabular}{|c|ccccc|c|c|}
\hline 585 585 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 586 586 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 587 587 \hline
1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\ 588 588 1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\
2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ 589 589 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\
3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\ 590 590 3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\
4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ 591 591 4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\
5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\ 592 592 5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\
\hline 593 593 \hline
\end{tabular} 594 594 \end{tabular}
} 595 595 }
\end{table} 596 596 \end{table}
597 597
\begin{table} 598 598 \begin{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500} 599 599 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}
\label{tbl:gurobi_max_1500} 600 600 \label{tbl:gurobi_max_1500}
\centering 601 601 \centering
{\scalefont{0.77} 602 602 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 603 603 \begin{tabular}{|c|ccccc|c|c|}
\hline 604 604 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 605 605 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 606 606 \hline
1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\ 607 607 1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\
2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\ 608 608 2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\
3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\ 609 609 3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\
4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ 610 610 4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\
5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\ 611 611 5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\
\hline 612 612 \hline
\end{tabular} 613 613 \end{tabular}
} 614 614 }
\end{table} 615 615 \end{table}
616 616
\renewcommand{\arraystretch}{1} 617 617 \renewcommand{\arraystretch}{1}
618 618
By analyzing these tables, we can first state that we reach an optimal solution 619 619 By analyzing these tables, we can first state that we reach an optimal solution
for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover 620 620 for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover
the cascaded filters always exhibit better performance than the monolithic solution. 621 621 the cascaded filters always exhibit better performance than the monolithic solution.
It was an expected result as it has 622 622 It was an expected result as it has
been previously observed that many small filters are better than 623 623 been previously observed that many small filters are better than
a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions 624 624 a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions
being hardly used in practice due to the lack of tools for identifying individual filter 625 625 being hardly used in practice due to the lack of tools for identifying individual filter
coefficients in the cascaded approach. 626 626 coefficients in the cascaded approach.
627 627
Second, the larger the silicon area, the better the rejection. This was also an 628 628 Second, the larger the silicon area, the better the rejection. This was also an
expected result as more area means a filter of better quality with more coefficients 629 629 expected result as more area means a filter of better quality with more coefficients
or more bits per coefficient. 630 630 or more bits per coefficient.
631 631
Then, we also observe that the first stage can have a larger shift than the other 632 632 Then, we also observe that the first stage can have a larger shift than the other
stages. This is explained by the fact that the solver tries to use just enough 633 633 stages. This is explained by the fact that the solver tries to use just enough
bits for the computed rejection after each stage. In the first stage, a 634 634 bits for the computed rejection after each stage. In the first stage, a
balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift} 635 635 balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}
gives the relation between both values. 636 636 gives the relation between both values.
637 637
Finally, we note that the solver consumes all the given silicon area. 638 638 Finally, we note that the solver consumes all the given silicon area.
639 639
The following graphs present the rejection for real data on the FPGA. In all the following 640 640 The following graphs present the rejection for real data on the FPGA. In all the following
figures, the solid line represents the actual rejection of the filtered 641 641 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line are the noise levels 642 642 data on the FPGA as measured experimentally and the dashed line are the noise levels
given by the quadratic solver. The configurations are those computed in the previous section. 643 643 given by the quadratic solver. The configurations are those computed in the previous section.
644 644
Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500. 645 645 Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.
Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000. 646 646 Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.
Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500. 647 647 Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.
648 648
\begin{figure} 649 649 \begin{figure}
\centering 650 650 \centering
\begin{subfigure}{\linewidth} 651 651 \begin{subfigure}{\linewidth}
\includegraphics[width=\linewidth]{images/max_500} 652 652 \includegraphics[width=\linewidth]{images/max_500}
\caption{Filter transfer functions for varying number of cascaded filters solving 653 653 \caption{Filter transfer functions for varying number of cascaded filters solving
the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).} 654 654 the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}
\label{fig:max_500_result} 655 655 \label{fig:max_500_result}
\end{subfigure} 656 656 \end{subfigure}
657 657
\begin{subfigure}{\linewidth} 658 658 \begin{subfigure}{\linewidth}
\includegraphics[width=\linewidth]{images/max_1000} 659 659 \includegraphics[width=\linewidth]{images/max_1000}
\caption{Filter transfer functions for varying number of cascaded filters solving 660 660 \caption{Filter transfer functions for varying number of cascaded filters solving
the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).} 661 661 the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}
\label{fig:max_1000_result} 662 662 \label{fig:max_1000_result}
\end{subfigure} 663 663 \end{subfigure}
664 664
\begin{subfigure}{\linewidth} 665 665 \begin{subfigure}{\linewidth}
\includegraphics[width=\linewidth]{images/max_1500} 666 666 \includegraphics[width=\linewidth]{images/max_1500}
\caption{Filter transfer functions for varying number of cascaded filters solving 667 667 \caption{Filter transfer functions for varying number of cascaded filters solving
the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).} 668 668 the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}
\label{fig:max_1500_result} 669 669 \label{fig:max_1500_result}
\end{subfigure} 670 670 \end{subfigure}
\caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing 671 671 \caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing
rejection for a given resource allocation. 672 672 rejection for a given resource allocation.
The filter shape constraint (bandpass and bandstop) is shown as thick 673 673 The filter shape constraint (bandpass and bandstop) is shown as thick
horizontal lines on each chart.} 674 674 horizontal lines on each chart.}
\end{figure} 675 675 \end{figure}
676 676
In all cases, we observe that the actual rejection is close to the rejection computed by the solver. 677 677 In all cases, we observe that the actual rejection is close to the rejection computed by the solver.
678 678
We compare the actual silicon resources given by Vivado to the 679 679 We compare the actual silicon resources given by Vivado to the
resources in arbitrary units. 680 680 resources in arbitrary units.
The goal is to check that our arbitrary units of silicon area models well enough 681 681 The goal is to check that our arbitrary units of silicon area models well enough
the real resources on the FPGA. Especially we want to verify that, for a given 682 682 the real resources on the FPGA. Especially we want to verify that, for a given
number of arbitrary units, the actual silicon resources do not depend on the 683 683 number of arbitrary units, the actual silicon resources do not depend on the
number of stages $n$. Most significantly, our approach aims 684 684 number of stages $n$. Most significantly, our approach aims
at remaining far enough from the practical logic gate implementation used by 685 685 at remaining far enough from the practical logic gate implementation used by
various vendors to remain platform independent and be portable from one 686 686 various vendors to remain platform independent and be portable from one
architecture to another. 687 687 architecture to another.
688 688
Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and 689 689 Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and
MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000 690 690 MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000
and 1500 arbitrary units. We have taken care to extract solely the resources used by 691 691 and 1500 arbitrary units. We have taken care to extract solely the resources used by
the FIR filters and remove additional processing blocks including FIFO and Programmable 692 692 the FIR filters and remove additional processing blocks including FIFO and Programmable
Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication. 693 693 Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.
694 694
\begin{table}[h!tb] 695 695 \begin{table}[h!tb]
\caption{Resource occupation following synthesis of the solutions found for 696 696 \caption{Resource occupation following synthesis of the solutions found for
the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} 697 697 the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
\label{tbl:resources_usage} 698 698 \label{tbl:resources_usage}
\centering 699 699 \centering
\begin{tabular}{|c|c|ccc|c|} 700 700 \begin{tabular}{|c|c|ccc|c|}
\hline 701 701 \hline
$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline 702 702 $n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline
& LUT & 249 & 453 & 627 & \emph{17600} \\ 703 703 & LUT & 249 & 453 & 627 & \emph{17600} \\
1 & BRAM & 1 & 1 & 1 & \emph{120} \\ 704 704 1 & BRAM & 1 & 1 & 1 & \emph{120} \\
& DSP & 21 & 37 & 47 & \emph{80} \\ \hline 705 705 & DSP & 21 & 37 & 47 & \emph{80} \\ \hline
& LUT & 2253 & 474 & 691 & \emph{17600} \\ 706 706 & LUT & 2253 & 474 & 691 & \emph{17600} \\
2 & BRAM & 2 & 2 & 2 & \emph{120} \\ 707 707 2 & BRAM & 2 & 2 & 2 & \emph{120} \\
& DSP & 0 & 50 & 70 & \emph{80} \\ \hline 708 708 & DSP & 0 & 50 & 70 & \emph{80} \\ \hline
& LUT & 1329 & 2006 & 3158 & \emph{17600} \\ 709 709 & LUT & 1329 & 2006 & 3158 & \emph{17600} \\
3 & BRAM & 3 & 3 & 3 & \emph{120} \\ 710 710 3 & BRAM & 3 & 3 & 3 & \emph{120} \\
& DSP & 15 & 30 & 42 & \emph{80} \\ \hline 711 711 & DSP & 15 & 30 & 42 & \emph{80} \\ \hline
& LUT & 1329 & 1600 & 2260 & \emph{17600} \\ 712 712 & LUT & 1329 & 1600 & 2260 & \emph{17600} \\
4 & BRAM & 3 & 4 & 4 & \emph{120} \\ 713 713 4 & BRAM & 3 & 4 & 4 & \emph{120} \\
& DPS & 15 & 38 & 49 & \emph{80} \\ \hline 714 714 & DPS & 15 & 38 & 49 & \emph{80} \\ \hline
& LUT & 1329 & 1600 & 2260 & \emph{17600} \\ 715 715 & LUT & 1329 & 1600 & 2260 & \emph{17600} \\
5 & BRAM & 3 & 4 & 4 & \emph{120} \\ 716 716 5 & BRAM & 3 & 4 & 4 & \emph{120} \\
& DPS & 15 & 38 & 49 & \emph{80} \\ \hline 717 717 & DPS & 15 & 38 & 49 & \emph{80} \\ \hline
\end{tabular} 718 718 \end{tabular}
\end{table} 719 719 \end{table}
720 720
In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that, 721 721 In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,
when the filter coefficients are small enough, or when the input size is small 722 722 when the filter coefficients are small enough, or when the input size is small
enough, Vivado optimizes resource consumption by selecting multiplexers to 723 723 enough, Vivado optimizes resource consumption by selecting multiplexers to
implement the multiplications instead of a DSP. In this case, it is quite difficult 724 724 implement the multiplications instead of a DSP. In this case, it is quite difficult
to compare the whole silicon budget. 725 725 to compare the whole silicon budget.
726 726
However, a rough estimation can be made with a simple equivalence: looking at 727 727 However, a rough estimation can be made with a simple equivalence: looking at
the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, 728 728 the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon 729 729 we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs, 730 730 area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,
1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond 731 731 1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond
to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary 732 732 to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
unit map well to actual hardware resources. The relatively small differences can probably be explained 733 733 unit map well to actual hardware resources. The relatively small differences can probably be explained
by the optimizations done by Vivado based on the detailed map of available processing resources. 734 734 by the optimizations done by Vivado based on the detailed map of available processing resources.
735 735
We now present the computation time needed to solve the quadratic problem. 736 736 We now present the computation time needed to solve the quadratic problem.
For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606 737 737 For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606
clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve 738 738 clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve
the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic 739 739 the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic
problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units. 740 740 problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.
741 741
\begin{table}[h!tb] 742 742 \begin{table}[h!tb]
\caption{Time needed to solve the quadratic program with Gurobi} 743 743 \caption{Time needed to solve the quadratic program with Gurobi}
\label{tbl:area_time} 744 744 \label{tbl:area_time}
\centering 745 745 \centering
\begin{tabular}{|c|c|c|c|}\hline 746 746 \begin{tabular}{|c|c|c|c|}\hline
$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline 747 747 $n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline
1 & 0.01~s & 0.02~s & 0.03~s \\ 748 748 1 & 0.01~s & 0.02~s & 0.03~s \\
2 & 0.1~s & 1~s & 2~s \\ 749 749 2 & 0.1~s & 1~s & 2~s \\
3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\ 750 750 3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\
4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\ 751 751 4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\
5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline 752 752 5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline
\end{tabular} 753 753 \end{tabular}
\end{table} 754 754 \end{table}
755 755
As expected, the computation time seems to rise exponentially with the number of stages. 756 756 As expected, the computation time seems to rise exponentially with the number of stages.
When the area is limited, the design exploration space is more limited and the solver is able to 757 757 When the area is limited, the design exploration space is more limited and the solver is able to
find an optimal solution faster. 758 758 find an optimal solution faster.
We also notice that the solution with $n$ greater than the optimal value 759 759 We also notice that the solution with $n$ greater than the optimal value
takes more time to be found than the optimal one. This can be explained since the search space is 760 760 takes more time to be found than the optimal one. This can be explained since the search space is
larger and we need more time to ensure that the previous solution (from the 761 761 larger and we need more time to ensure that the previous solution (from the
smaller value of $n$) still remains the optimal solution. 762 762 smaller value of $n$) still remains the optimal solution.
763 763
\subsection{Minimizing resource occupation at fixed rejection} 764 764 \subsection{Minimizing resource occupation at fixed rejection}
\label{sec:fixed_rej} 765 765 \label{sec:fixed_rej}
766 766
This section presents the results of the complementary quadratic program aimed at 767 767 This section presents the results of the complementary quadratic program aimed at
minimizing the area occupation for a targeted rejection level. 768 768 minimizing the area occupation for a targeted rejection level.
769 769
The experimental setup is composed of four cases. The raw input is the same 770 770 The experimental setup is composed of four cases. The raw input is the same
as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$. 771 771 as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.
Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB. 772 772 Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.
Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100. 773 773 Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.
The number of configurations $p$ is the same as previous section. 774 774 The number of configurations $p$ is the same as previous section.
775 775
Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40. 776 776 Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.
Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60. 777 777 Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.
Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80. 778 778 Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.
Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100. 779 779 Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.
780 780
\renewcommand{\arraystretch}{1.4} 781 781 \renewcommand{\arraystretch}{1.4}
782 782
\begin{table}[h!tb] 783 783 \begin{table}[h!tb]
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40} 784 784 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}
\label{tbl:gurobi_min_40} 785 785 \label{tbl:gurobi_min_40}
\centering 786 786 \centering
{\scalefont{0.77} 787 787 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 788 788 \begin{tabular}{|c|ccccc|c|c|}
\hline 789 789 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 790 790 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 791 791 \hline
1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\ 792 792 1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\
2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ 793 793 2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ 794 794 3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ 795 795 4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\ 796 796 5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
\hline 797 797 \hline
\end{tabular} 798 798 \end{tabular}
} 799 799 }
\end{table} 800 800 \end{table}
801 801
\begin{table}[h!tb] 802 802 \begin{table}[h!tb]
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60} 803 803 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}
\label{tbl:gurobi_min_60} 804 804 \label{tbl:gurobi_min_60}
\centering 805 805 \centering
{\scalefont{0.77} 806 806 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 807 807 \begin{tabular}{|c|ccccc|c|c|}
\hline 808 808 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 809 809 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 810 810 \hline
1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\ 811 811 1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\
2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\ 812 812 2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\
3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ 813 813 3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ 814 814 4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\ 815 815 5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
\hline 816 816 \hline
\end{tabular} 817 817 \end{tabular}
} 818 818 }
\end{table} 819 819 \end{table}
820 820
\begin{table}[h!tb] 821 821 \begin{table}[h!tb]
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80} 822 822 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}
\label{tbl:gurobi_min_80} 823 823 \label{tbl:gurobi_min_80}
\centering 824 824 \centering
{\scalefont{0.77} 825 825 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 826 826 \begin{tabular}{|c|ccccc|c|c|}
\hline 827 827 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 828 828 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 829 829 \hline
1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\ 830 830 1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\
2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\ 831 831 2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\
3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ 832 832 3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ 833 833 4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\ 834 834 5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
\hline 835 835 \hline
\end{tabular} 836 836 \end{tabular}
} 837 837 }
\end{table} 838 838 \end{table}
839 839
\begin{table}[h!tb] 840 840 \begin{table}[h!tb]
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100} 841 841 \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}
\label{tbl:gurobi_min_100} 842 842 \label{tbl:gurobi_min_100}
\centering 843 843 \centering
{\scalefont{0.77} 844 844 {\scalefont{0.77}
\begin{tabular}{|c|ccccc|c|c|} 845 845 \begin{tabular}{|c|ccccc|c|c|}
\hline 846 846 \hline
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\ 847 847 $n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\hline 848 848 \hline
1 & - & - & - & - & - & - & - \\ 849 849 1 & - & - & - & - & - & - & - \\
2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\ 850 850 2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\
3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\ 851 851 3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\
4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ 852 852 4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\
5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\ 853 853 5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\
\hline 854 854 \hline
\end{tabular} 855 855 \end{tabular}
} 856 856 }
\end{table} 857 857 \end{table}
\renewcommand{\arraystretch}{1} 858 858 \renewcommand{\arraystretch}{1}
859 859
From these tables, we can first state that almost all configurations reach the targeted rejection 860 860 From these tables, we can first state that almost all configurations reach the targeted rejection
level or even better thanks to our underestimate of the cascade rejection as the sum of the 861 861 level or even better thanks to our underestimate of the cascade rejection as the sum of the
individual filter rejection. The only exception is for the monolithic case ($n = 1$) in 862 862 individual filter rejection. The only exception is for the monolithic case ($n = 1$) in
MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection. 863 863 MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.
Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters 864 864 Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters
(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection 865 865 (675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection
respectively). More generally, the more filters are cascaded, the lower the occupied area. 866 866 respectively). More generally, the more filters are cascaded, the lower the occupied area.
867 867
Like in previous section, the solver chooses always a little filter as first 868 868 Like in previous section, the solver chooses always a little filter as first
filter stage and the second one is often the biggest filter. This choice can be explained 869 869 filter stage and the second one is often the biggest filter. This choice can be explained
as in the previous section, with the solver using just enough bits not to degrade the input 870 870 as in the previous section, with the solver using just enough bits not to degrade the input
signal and in the second filter selecting a better filter to improve rejection without 871 871 signal and in the second filter selecting a better filter to improve rejection without
having too many bits in the output data. 872 872 having too many bits in the output data.
873 873
For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$, 874 874 For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,
for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions 875 875 for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions
when $n$ is greater than this optimal $n$ remain identical to the optimal one. 876 876 when $n$ is greater than this optimal $n$ remain identical to the optimal one.
877 877
The following graphs present the rejection for real data on the FPGA. In all the following 878 878 The following graphs present the rejection for real data on the FPGA. In all the following
figures, the solid line represents the actual rejection of the filtered 879 879 figures, the solid line represents the actual rejection of the filtered
data on the FPGA as measured experimentally and the dashed line is the noise level 880 880 data on the FPGA as measured experimentally and the dashed line is the noise level
given by the quadratic solver. 881 881 given by the quadratic solver.
882 882
Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40. 883 883 Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.
Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60. 884 884 Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.
Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80. 885 885 Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.
Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100. 886 886 Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.
887 887
\begin{figure} 888 888 \begin{figure}
\centering 889 889 \centering
\begin{subfigure}{\linewidth} 890 890 \begin{subfigure}{\linewidth}
\includegraphics[width=.91\linewidth]{images/min_40} 891 891 \includegraphics[width=.91\linewidth]{images/min_40}
\caption{Filter transfer functions for varying number of cascaded filters solving 892 892 \caption{Filter transfer functions for varying number of cascaded filters solving
the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.} 893 893 the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}
\label{fig:min_40} 894 894 \label{fig:min_40}
\end{subfigure} 895 895 \end{subfigure}
896 896
\begin{subfigure}{\linewidth} 897 897 \begin{subfigure}{\linewidth}
\includegraphics[width=.91\linewidth]{images/min_60} 898 898 \includegraphics[width=.91\linewidth]{images/min_60}
\caption{Filter transfer functions for varying number of cascaded filters solving 899 899 \caption{Filter transfer functions for varying number of cascaded filters solving
the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.} 900 900 the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}
\label{fig:min_60} 901 901 \label{fig:min_60}
\end{subfigure} 902 902 \end{subfigure}
903 903
\begin{subfigure}{\linewidth} 904 904 \begin{subfigure}{\linewidth}
\includegraphics[width=.91\linewidth]{images/min_80} 905 905 \includegraphics[width=.91\linewidth]{images/min_80}
\caption{Filter transfer functions for varying number of cascaded filters solving 906 906 \caption{Filter transfer functions for varying number of cascaded filters solving
the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.} 907 907 the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}
\label{fig:min_80} 908 908 \label{fig:min_80}
\end{subfigure} 909 909 \end{subfigure}
910 910
\begin{subfigure}{\linewidth} 911 911 \begin{subfigure}{\linewidth}
\includegraphics[width=.91\linewidth]{images/min_100} 912 912 \includegraphics[width=.91\linewidth]{images/min_100}
\caption{Filter transfer functions for varying number of cascaded filters solving 913 913 \caption{Filter transfer functions for varying number of cascaded filters solving
the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.} 914 914 the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}
\label{fig:min_100} 915 915 \label{fig:min_100}
\end{subfigure} 916 916 \end{subfigure}
\caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a 917 917 \caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a
given rejection while minimizing resource allocation. The filter shape constraint (bandpass and 918 918 given rejection while minimizing resource allocation. The filter shape constraint (bandpass and
bandstop) is shown as thick 919 919 bandstop) is shown as thick
horizontal lines on each chart.} 920 920 horizontal lines on each chart.}
\end{figure} 921 921 \end{figure}
922 922
We observe that all rejections given by the quadratic solver are close to the experimentally 923 923 We observe that all rejections given by the quadratic solver are close to the experimentally
measured rejection. All curves prove that the constraint to reach the target rejection is 924 924 measured rejection. All curves prove that the constraint to reach the target rejection is
respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters. 925 925 respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.
926 926
Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60; 927 927 Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;
MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We 928 928 MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We
have taken care to extract solely the resources used by 929 929 have taken care to extract solely the resources used by
the FIR filters and remove additional processing blocks including FIFO and PL to 930 930 the FIR filters and remove additional processing blocks including FIFO and PL to
PS communication. 931 931 PS communication.
932 932
\renewcommand{\arraystretch}{1.2} 933 933 \renewcommand{\arraystretch}{1.2}
\begin{table} 934 934 \begin{table}
\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} 935 935 \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
\label{tbl:resources_usage_comp} 936 936 \label{tbl:resources_usage_comp}
\centering 937 937 \centering
{\scalefont{0.90} 938 938 {\scalefont{0.90}
\begin{tabular}{|c|c|cccc|c|} 939 939 \begin{tabular}{|c|c|cccc|c|}
\hline 940 940 \hline
$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline 941 941 $n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline
& LUT & 343 & 334 & 772 & - & \emph{17600} \\ 942 942 & LUT & 343 & 334 & 772 & - & \emph{17600} \\
1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\ 943 943 1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\
& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline 944 944 & DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline
& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\ 945 945 & LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\
2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\ 946 946 2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\
& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline 947 947 & DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline
& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\ 948 948 & LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\
3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\ 949 949 3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\
& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline 950 950 & DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline
& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\ 951 951 & LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\
4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\ 952 952 4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\
& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline 953 953 & DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline
& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\ 954 954 & LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\
5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\ 955 955 5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\
& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline 956 956 & DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline
\end{tabular} 957 957 \end{tabular}
} 958 958 }
\end{table} 959 959 \end{table}
\renewcommand{\arraystretch}{1} 960 960 \renewcommand{\arraystretch}{1}
961 961
If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT) 962 962 If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)
the real resource consumption decreases as a function of the number of stages in the cascaded 963 963 the real resource consumption decreases as a function of the number of stages in the cascaded
filter according 964 964 filter according
to the solution given by the quadratic solver. Indeed, we have always a decreasing 965 965 to the solution given by the quadratic solver. Indeed, we have always a decreasing
consumption even if the difference between the monolithic and the two cascaded 966 966 consumption even if the difference between the monolithic and the two cascaded
filters is less than expected. 967 967 filters is less than expected.
968 968
Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve 969 969 Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve
the quadratic program. 970 970 the quadratic program.
971 971
\renewcommand{\arraystretch}{1.2} 972 972 \renewcommand{\arraystretch}{1.2}
\begin{table}[h!tb] 973 973 \begin{table}[h!tb]
\caption{Time to solve the quadratic program with Gurobi} 974 974 \caption{Time to solve the quadratic program with Gurobi}