jfriedt / IFCS2018 article

Compare View

Commits (3)

7c951bd35 Typo + texte en noir. Browse Code »

Arthur HUGEAT
2019-09-10 10:33:04 +0200
b5ace9bdc Revision 2. Browse Code »

Arthur HUGEAT
2019-09-16 13:21:25 +0200
dda4cf0c1 Lettre aux reviewers. Browse Code »

Arthur HUGEAT
2019-09-16 15:21:04 +0200

Diff

Showing 12 changed files Inline Diff

ifcs2018_journal.tex
ifcs2018_journal_reponse2.tex
images/custom_criterion.pdf
images/letter_pondered_criterion.pdf
images/max_1000.pdf
images/max_1500.pdf
images/max_500.pdf
images/min_100.pdf
images/min_40.pdf
images/min_60.pdf
images/min_80.pdf
images/rejection_pyramid.pdf

ifcs2018_journal.tex

% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee	1	1	% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee
% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de	2	2	% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de
% rejection par bit et perte si moins de bits que rejection/6	3	3	% rejection par bit et perte si moins de bits que rejection/6
% developper programme lineaire en incluant le decalage de bits	4	4	% developper programme lineaire en incluant le decalage de bits
% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on	5	5	% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on
% implemente et on demontre que ca tourne	6	6	% implemente et on demontre que ca tourne
% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?	7	7	% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?
% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer	8	8	% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer
% (zedboard ou redpit)	9	9	% (zedboard ou redpit)
	10	10
% label schema : verifier que "argumenter de la cascade de FIR" est fait	11	11	% label schema : verifier que "argumenter de la cascade de FIR" est fait
	12	12
\documentclass[a4paper,journal]{IEEEtran/IEEEtran}	13	13	\documentclass[a4paper,journal]{IEEEtran/IEEEtran}
\usepackage{graphicx,color,hyperref}	14	14	\usepackage{graphicx,color,hyperref}
\usepackage{amsfonts}	15	15	\usepackage{amsfonts}
\usepackage{amsthm}	16	16	\usepackage{amsthm}
\usepackage{amssymb}	17	17	\usepackage{amssymb}
\usepackage{amsmath}	18	18	\usepackage{amsmath}
\usepackage{algorithm2e}	19	19	\usepackage{algorithm2e}
\usepackage{url,balance}	20	20	\usepackage{url,balance}
\usepackage[normalem]{ulem}	21	21	\usepackage[normalem]{ulem}
\usepackage{tikz}	22	22	\usepackage{tikz}
\usetikzlibrary{positioning,fit}	23	23	\usetikzlibrary{positioning,fit}
\usepackage{multirow}	24	24	\usepackage{multirow}
\usepackage{scalefnt}	25	25	\usepackage{scalefnt}
\usepackage{caption}	26	26	\usepackage{caption}
\usepackage{subcaption}	27	27	\usepackage{subcaption}
	28	28
% correct bad hyphenation here	29	29	% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}	30	30	\hyphenation{op-tical net-works semi-conduc-tor}
\textheight=26cm	31	31	\textheight=26cm
\setlength{\footskip}{30pt}	32	32	\setlength{\footskip}{30pt}
\pagenumbering{gobble}	33	33	\pagenumbering{gobble}
\begin{document}	34	34	\begin{document}
\title{Filter optimization for real time digital processing of radiofrequency signals: application	35	35	\title{Filter optimization for real time digital processing of radiofrequency signals: application
to oscillator metrology}	36	36	to oscillator metrology}
	37	37
\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},	38	38	\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},
G. Goavec-M\'erou\IEEEauthorrefmark{1},	39	39	G. Goavec-M\'erou\IEEEauthorrefmark{1},
P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\	40	40	P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\
\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\	41	41	\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\
\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\	42	42	\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\
Email: \{pyb2,jmfriedt\}@femto-st.fr}	43	43	Email: \{pyb2,jmfriedt\}@femto-st.fr}
}	44	44	}
\maketitle	45	45	\maketitle
\thispagestyle{plain}	46	46	\thispagestyle{plain}
\pagestyle{plain}	47	47	\pagestyle{plain}
\newtheorem{definition}{Definition}	48	48	\newtheorem{definition}{Definition}
	49	49
\begin{abstract}	50	50	\begin{abstract}
Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to	51	51	Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to
radiofrequency signal processing. Applied to oscillator characterization in the context	52	52	radiofrequency signal processing. Applied to oscillator characterization in the context
of ultrastable clocks, stringent filtering requirements are defined by spurious signal or	53	53	of ultrastable clocks, stringent filtering requirements are defined by spurious signal or
noise rejection needs. Since real time radiofrequency processing must be performed in a	54	54	noise rejection needs. Since real time radiofrequency processing must be performed in a
Field Programmable Array to meet timing constraints, we investigate optimization strategies	55	55	Field Programmable Array to meet timing constraints, we investigate optimization strategies
to design filters meeting rejection characteristics while limiting the hardware resources	56	56	to design filters meeting rejection characteristics while limiting the hardware resources
required and keeping timing constraints within the targeted measurement bandwidths. The	57	57	required and keeping timing constraints within the targeted measurement bandwidths. The
presented technique is applicable to scheduling any sequence of processing blocks characterized	58	58	presented technique is applicable to scheduling any sequence of processing blocks characterized
by a throughput, resource occupation and performance tabulated as a function of configuration	59	59	by a throughput, resource occupation and performance tabulated as a function of configuration
characateristics, as is the case for filters with their coefficients and resolution yielding	60	60	characateristics, as is the case for filters with their coefficients and resolution yielding
rejection and number of multipliers.	61	61	rejection and number of multipliers.
\end{abstract}	62	62	\end{abstract}
	63	63
\begin{IEEEkeywords}	64	64	\begin{IEEEkeywords}
Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter	65	65	Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter
\end{IEEEkeywords}	66	66	\end{IEEEkeywords}
	67	67
\section{Digital signal processing of ultrastable clock signals}	68	68	\section{Digital signal processing of ultrastable clock signals}
	69	69
Analog oscillator phase noise characteristics are classically performed by downconverting	70	70	Analog oscillator phase noise characteristics are classically performed by downconverting
the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,	71	71	the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,
followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In	72	72	followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In
a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by	73	73	a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by
multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.	74	74	multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.
	75	75
\begin{figure}[h!tb]	76	76	\begin{figure}[h!tb]
\begin{center}	77	77	\begin{center}
\includegraphics[width=.8\linewidth]{images/schema}	78	78	\includegraphics[width=.8\linewidth]{images/schema}
\end{center}	79	79	\end{center}
\caption{Fully digital oscillator phase noise characterization: the Device Under Test	80	80	\caption{Fully digital oscillator phase noise characterization: the Device Under Test
(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and	81	81	(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and
downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals	82	82	downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals
and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite	83	83	and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite
Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays	84	84	Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays
the spectral characteristics of the phase fluctuations.}	85	85	the spectral characteristics of the phase fluctuations.}
\label{schema}	86	86	\label{schema}
\end{figure}	87	87	\end{figure}
	88	88
As with the analog mixer,	89	89	As with the analog mixer,
the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as	90	90	the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as
well as the generation of the frequency sum signal in addition to the frequency difference.	91	91	well as the generation of the frequency sum signal in addition to the frequency difference.
These unwanted spectral characteristics must be rejected before decimating the data stream	92	92	These unwanted spectral characteristics must be rejected before decimating the data stream
for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the	93	93	for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the
downconverter	94	94	downconverter
and the decimation processing blocks are core characteristics of an oscillator characterization	95	95	and the decimation processing blocks are core characteristics of an oscillator characterization
system, and must reject out-of-band signals below the targeted phase noise -- typically in the	96	96	system, and must reject out-of-band signals below the targeted phase noise -- typically in the
sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will	97	97	sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will
use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency	98	98	use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency
datastream: optimizing the performance of the filter while reducing the needed resources is	99	99	datastream: optimizing the performance of the filter while reducing the needed resources is
hence tackled in a systematic approach using optimization techniques. Most significantly, we	100	100	hence tackled in a systematic approach using optimization techniques. Most significantly, we
tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with	101	101	tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with
tunable number of coefficients and tunable number of bits representing the coefficients and the	102	102	tunable number of coefficients and tunable number of bits representing the coefficients and the
data being processed.	103	103	data being processed.
	104	104
\section{Finite impulse response filter}	105	105	\section{Finite impulse response filter}
	106	106
We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined	107	107	We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined
by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the	108	108	by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the
outputs $y_k$	109	109	outputs $y_k$
\begin{align}	110	110	\begin{align}
y_n=\sum_{k=0}^N b_k x_{n-k}	111	111	y_n=\sum_{k=0}^N b_k x_{n-k}
\label{eq:fir_equation}	112	112	\label{eq:fir_equation}
\end{align}	113	113	\end{align}
	114	114
As opposed to an implementation on a general purpose processor in which word size is defined by the	115	115	As opposed to an implementation on a general purpose processor in which word size is defined by the
processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since	116	116	processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since
not only the coefficient values and number of taps must be defined, but also the number of bits	117	117	not only the coefficient values and number of taps must be defined, but also the number of bits
defining the coefficients and the sample size. For this reason, and because we consider pipeline	118	118	defining the coefficients and the sample size. For this reason, and because we consider pipeline
processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency	119	119	processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency
signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but	120	120	signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but
the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language	121	121	the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language
(VHDL) level.	122	122	(VHDL) level.
Since latency is not an issue in a openloop phase noise characterization instrument,	123	123	Since latency is not an issue in a openloop phase noise characterization instrument,
the large	124	124	the large
numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,	125	125	numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,
is not considered as an issue as would be in a closed loop system.	126	126	is not considered as an issue as would be in a closed loop system.
	127	127
The coefficients are classically expressed as floating point values. However, this binary	128	128	The coefficients are classically expressed as floating point values. However, this binary
number representation is not efficient for fast arithmetic computation by an FPGA. Instead,	129	129	number representation is not efficient for fast arithmetic computation by an FPGA. Instead,
we select to quantify these floating point values into integer values. This quantization	130	130	we select to quantify these floating point values into integer values. This quantization
will result in some precision loss.	131	131	will result in some precision loss.
	132	132
\begin{figure}[h!tb]	133	133	\begin{figure}[h!tb]
\includegraphics[width=\linewidth]{images/zero_values}	134	134	\includegraphics[width=\linewidth]{images/zero_values}
\caption{Impact of the quantization resolution of the coefficients: the quantization is	135	135	\caption{Impact of the quantization resolution of the coefficients: the quantization is
set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting	136	136	set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting
the 30~first and 30~last coefficients out of the initial 128~band-pass	137	137	the 30~first and 30~last coefficients out of the initial 128~band-pass
filter coefficients to 0 (red dots).}	138	138	filter coefficients to 0 (red dots).}
\label{float_vs_int}	139	139	\label{float_vs_int}
\end{figure}	140	140	\end{figure}
	141	141
The tradeoff between quantization resolution and number of coefficients when considering	142	142	The tradeoff between quantization resolution and number of coefficients when considering
integer operations is not trivial. As an illustration of the issue related to the	143	143	integer operations is not trivial. As an illustration of the issue related to the
relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits	144	144	relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits
a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon	145	145	a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon
quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the	146	146	quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the
taps become null, making the large number of coefficients irrelevant: processing	147	147	taps become null, making the large number of coefficients irrelevant: processing
resources	148	148	resources
are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources	149	149	are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources
to reach a given rejection level, or maximizing out of band rejection for a given computational	150	150	to reach a given rejection level, or maximizing out of band rejection for a given computational
resource, will drive the investigation on cascading filters designed with varying tap resolution	151	151	resource, will drive the investigation on cascading filters designed with varying tap resolution
and tap length, as will be shown in the next section. Indeed, our development strategy closely	152	152	and tap length, as will be shown in the next section. Indeed, our development strategy closely
follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}	153	153	follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}
in which basic blocks are defined and characterized before being assembled \cite{hide}	154	154	in which basic blocks are defined and characterized before being assembled \cite{hide}
in a complete processing chain. In our case, assembling the filter blocks is a simpler block	155	155	in a complete processing chain. In our case, assembling the filter blocks is a simpler block
combination process since we assume a single value to be processed and a single value to be	156	156	combination process since we assume a single value to be processed and a single value to be
generated at each clock cycle. The FIR filters will not be considered to decimate in the	157	157	generated at each clock cycle. The FIR filters will not be considered to decimate in the
current implementation: the decimation is assumed to be located after the FIR cascade at the	158	158	current implementation: the decimation is assumed to be located after the FIR cascade at the
moment.	159	159	moment.
	160	160
\section{Methodology description}	161	161	\section{Methodology description}
	162	162
Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)	163	163	Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)
chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.	164	164	chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.
Achieving such a target requires defining an abstract model to represent some basic properties	165	165	Achieving such a target requires defining an abstract model to represent some basic properties
of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and	166	166	of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and
resource occupation. These abstract properties, not necessarily related to the detailed hardware	167	167	resource occupation. These abstract properties, not necessarily related to the detailed hardware
implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum	168	168	implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum
target, whether in terms of maximizing performance for a given arbitrary resource occupation, or	169	169	target, whether in terms of maximizing performance for a given arbitrary resource occupation, or
minimizing resource occupation for a given performance. In our approach, the solution of the	170	170	minimizing resource occupation for a given performance. In our approach, the solution of the
solver is then synthesized using the dedicated tool provided by each platform manufacturer	171	171	solver is then synthesized using the dedicated tool provided by each platform manufacturer
to assess the validity of our abstract resource occupation indicator, and the result of running	172	172	to assess the validity of our abstract resource occupation indicator, and the result of running
the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize	173	173	the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize
that all solutions found by the solver are synthesized and executed on hardware at the end	174	174	that all solutions found by the solver are synthesized and executed on hardware at the end
of the analysis.	175	175	of the analysis.
	176	176
In this demonstration, we focus on only two operations: filtering and shifting the number of	177	177	In this demonstration, we focus on only two operations: filtering and shifting the number of
bits needed to represent the data along the processing chain.	178	178	bits needed to represent the data along the processing chain.
We have chosen these basic operations because shifting and the filtering have already been studied	179	179	We have chosen these basic operations because shifting and the filtering have already been studied
in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for	180	180	in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for
assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend	181	181	assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend
requiring pipelined processing at full bandwidth for the earliest steps, including for	182	182	requiring pipelined processing at full bandwidth for the earliest steps, including for
time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.	183	183	time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.
	184	184
Addressing only two operations allows for demonstrating the methodology but should not be	185	185	Addressing only two operations allows for demonstrating the methodology but should not be
considered as a limitation of the framework which can be extended to assembling any number	186	186	considered as a limitation of the framework which can be extended to assembling any number
of skeleton blocks as long as performance and resource occupation can be determined.	187	187	of skeleton blocks as long as performance and resource occupation can be determined.
Hence,	188	188	Hence,
in this paper we will apply our methodology on simple DSP chains: a white noise input signal	189	189	in this paper we will apply our methodology on simple DSP chains: a white noise input signal
is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)	190	190	is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)
14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been	191	191	14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been
digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --	192	192	digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --
practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction	193	193	practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction
by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,	194	194	by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,
allowing to assess either filter rejection for a given resource usage, or validating the rejection	195	195	allowing to assess either filter rejection for a given resource usage, or validating the rejection
when implementing a solution minimizing resource occupation.	196	196	when implementing a solution minimizing resource occupation.
	197	197
The first step of our approach is to model the DSP chain. Since we aim at only optimizing	198	198	The first step of our approach is to model the DSP chain. Since we aim at only optimizing
the filtering part of the signal processing chain, we have not included the PRN generator or the	199
ADC in the model: the input data size and rate are considered fixed and defined by the hardware.	200	199	the filtering part of the signal processing chain, we have not included the PRN generator or the
The filtering can be done in two ways, either by considering a single monolithic FIR filter	201	200	ADC in the model: the input data size and rate are considered fixed and defined by the hardware.
requiring many coefficients to reach the targeted noise rejection ratio, or by	202	201	The filtering can be done in two ways, either by considering a single monolithic FIR filter
cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.	203	202	requiring many coefficients to reach the targeted noise rejection ratio, or by
	204	203	cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.
After each filter we leave the possibility of shifting the filtered data to consume	205	204
less resources. Hence in the case of cascaded filter, we define a stage as a filter	206	205	After each filter we leave the possibility of shifting the filtered data to consume
and a shifter (the shift could be omitted if we do not need to divide the filtered data).	207	206	less resources. Hence in the case of cascaded filter, we define a stage as a filter
	208	207	and a shifter (the shift could be omitted if we do not need to divide the filtered data).
\subsection{Model of a FIR filter}	209	208
	210	209	\subsection{Model of a FIR filter}
A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)	211	210
the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$	212	211	A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)
bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as	213	212	the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$
the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}	214	213	bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as
shows a filtering stage.	215	214	the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}
	216	215	shows a filtering stage.
\begin{figure}	217	216
\centering	218	217	\begin{figure}
\begin{tikzpicture}[node distance=2cm]	219	218	\centering
\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;	220	219	\begin{tikzpicture}[node distance=2cm]
\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;	221	220	\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;
\node (Start) [left of=FIR] { } ;	222	221	\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;
\node (End) [right of=Shift] { } ;	223	222	\node (Start) [left of=FIR] { } ;
	224	223	\node (End) [right of=Shift] { } ;
\node[draw,fit=(FIR) (Shift)] (Filter) { } ;	225	224
	226	225	\node[draw,fit=(FIR) (Shift)] (Filter) { } ;
\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;	227	226
\draw[->] (FIR) -- (Shift) ;	228	227	\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;
\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;	229	228	\draw[->] (FIR) -- (Shift) ;
\end{tikzpicture}	230	229	\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;
\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}	231	230	\end{tikzpicture}
\label{fig:fir_stage}	232	231	\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}
\end{figure}	233	232	\label{fig:fir_stage}
	234	233	\end{figure}
FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.	235	234
This rejection has been computed using GNU Octave software FIR coefficient design functions	236	235	FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.
(\texttt{firls} and \texttt{fir1}).	237	236	This rejection has been computed using GNU Octave software FIR coefficient design functions
For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.	238	237	(\texttt{firls} and \texttt{fir1}).
Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,	239	238	For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.
the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.	240	239	Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,
At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.	241	240	the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.
	242	241	At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.
With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter	243	242
transfer function.	244	243	With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter
Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},	245	244	transfer function.
the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the	246	245	Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},
bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,	247	246	the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the
we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%	248	247	bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,
of the Nyquist frequency to the end of the band, as would be typically selected to prevent	249	248	we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%
aliasing before decimating the dataflow by 2. The method is however generalized to any filter	250	249	of the Nyquist frequency to the end of the band, as would be typically selected to prevent
shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}	251	250	aliasing before decimating the dataflow by 2. The method is however generalized to any filter
as described below is indeed unique for each filter shape.	252	251	shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}
	253	252	as described below is indeed unique for each filter shape.
\begin{figure}	254	253
\begin{center}	255	254	\begin{figure}
\scalebox{0.8}{	256	255	\begin{center}
\centering	257	256	\scalebox{0.8}{
\begin{tikzpicture}[scale=0.3]	258	257	\centering
\draw[<->] (0,15) -- (0,0) -- (21,0) ;	259	258	\begin{tikzpicture}[scale=0.3]
\draw[thick] (0,12) -- (8,12) -- (20,0) ;	260	259	\draw[<->] (0,15) -- (0,0) -- (21,0) ;
	261	260	\draw[thick] (0,12) -- (8,12) -- (20,0) ;
\draw (0,14) node [left] { $P$ } ;	262	261
\draw (20,0) node [below] { $f$ } ;	263	262	\draw (0,14) node [left] { $P$ } ;
	264	263	\draw (20,0) node [below] { $f$ } ;
\draw[>=latex,<->] (0,14) -- (8,14) ;	265	264
\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;	266	265	\draw[>=latex,<->] (0,14) -- (8,14) ;
	267	266	\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;
\draw[>=latex,<->] (8,14) -- (12,14) ;	268	267
\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;	269	268	\draw[>=latex,<->] (8,14) -- (12,14) ;
	270	269	\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;
\draw[>=latex,<->] (12,14) -- (20,14) ;	271	270
\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;	272	271	\draw[>=latex,<->] (12,14) -- (20,14) ;
	273	272	\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;
\draw[>=latex,<->] (16,12) -- (16,8) ;	274	273
\draw (16,10) node [right] { rejection } ;	275	274	\draw[>=latex,<->] (16,12) -- (16,8) ;
	276	275	\draw (16,10) node [right] { rejection } ;
\draw[dashed] (8,-1) -- (8,14) ;	277	276
\draw[dashed] (12,-1) -- (12,14) ;	278	277	\draw[dashed] (8,-1) -- (8,14) ;
	279	278	\draw[dashed] (12,-1) -- (12,14) ;
\draw[dashed] (8,12) -- (16,12) ;	280	279
\draw[dashed] (12,8) -- (16,8) ;	281	280	\draw[dashed] (8,12) -- (16,12) ;
	282	281	\draw[dashed] (12,8) -- (16,8) ;
\end{tikzpicture}	283	282
}	284	283	\end{tikzpicture}
\end{center}	285	284	}
\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:	286	285	\end{center}
the passband is considered to occupy the initial 40\% of the Nyquist frequency range,	287	286	\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:
the stopband the last 40\%, allowing 20\% transition width.}	288	287	the passband is considered to occupy the initial 40\% of the Nyquist frequency range,
\label{fig:fir_mag}	289	288	the stopband the last 40\%, allowing 20\% transition width.}
\end{figure}	290	289	\label{fig:fir_mag}
	291	290	\end{figure}
In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.	292	291
% r2.7	293	292	In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.
Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches	294	293	% r2.7
overestimate the rejection capability of the filter.	295	294	Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches
% Furthermore, the losses within	296	295	overestimate the rejection capability of the filter.
% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.	297	296	% Furthermore, the losses within
{\color{red} In intermediate criterion considered the minimal rejection within the stopband, to which the sum of the absolute values	298	297	% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.
within the passband is subtracted to avoid filters with excessive ripples, normalized to the	299	298	{\color{red} In intermediate criterion considered the minimal rejection within the stopband, to which the sum of the absolute values
bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).	300
In this case, when we cascaded too filters with a excessive deviation in passband ($>$ 1~dB),	301
the final deviation in passband may be too considerable ($>$ 10~dB). Hence our final	302	299	within the passband is subtracted to avoid filters with excessive ripples, normalized to the
criterion always take the minimal rejection in stopband but we substract the maximal	303	300	bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).
		301	In this case, when we cascaded too filters with a excessive deviation in passband ($>$ 1~dB),
		302	the final deviation in passband may be too considerable ($>$ 10~dB). Hence our final
		303	criterion always take the minimal rejection in stopband but we substract the maximal
		304	amplitude in passband (maximum value minus the minimum value). If this amplitude is
		305	greater than 1~dB, we discard the filter.}
		306	% Our final criterion to compute the filter rejection considers
		307	% % r2.8 et r2.2 r2.3
		308	% the minimal rejection within the stopband, to which the sum of the absolute values
		309	% within the passband is subtracted to avoid filters with excessive ripples, normalized to the
		310	% bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).
		311	With this
amplitude in passband (maximum value minus the minimum value). If this amplitude is	304	312	criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.
		313	{\color{red} The best filter has a correct rejection estimation and the worst filter
		314	is discarded.} % AH 20191609: Utile ?
greater than 1~dB, we discard the filter.}	305	315
% Our final criterion to compute the filter rejection considers	306	316	% \begin{figure}
% % r2.8 et r2.2 r2.3	307	317	% \centering
% the minimal rejection within the stopband, to which the sum of the absolute values	308	318	% \includegraphics[width=\linewidth]{images/colored_mean_criterion}
% within the passband is subtracted to avoid filters with excessive ripples, normalized to the	309	319	% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}
% bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).	310	320	% \label{fig:mean_criterion}
With this	311	321	% \end{figure}
criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.	312	322
{\color{red} The best filter has a correct rejection estimation and the worst filter	313	323	\begin{figure}
is discarded.} % AH 20191609: Utile ?	314	324	\centering
	315	325	\includegraphics[width=\linewidth]{images/custom_criterion}
% \begin{figure}	316	326	\caption{\color{red}Custom criterion (maximum rejection in the stopband minus the maximal
% \centering	317	327	amplitude in passband (if $>$ 1~dB the filter is discarded) rejection normalized to the bandwidth)
% \includegraphics[width=\linewidth]{images/colored_mean_criterion}	318	328	comparison between monolithic filter and cascaded filters}
% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}	319	329	\label{fig:custom_criterion}
% \label{fig:mean_criterion}	320	330	\end{figure}
% \end{figure}	321	331
	322	332	Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps
\begin{figure}	323	333	and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the
\centering	324	334	rejection as a function of the number of coefficients and the number of bits representing these coefficients.
\includegraphics[width=\linewidth]{images/custom_criterion}	325	335	The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.
\caption{\color{red}Custom criterion (maximum rejection in the stopband minus the maximal	326	336	Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.
amplitude in passband (if $>$ 1~dB the filter is discarded) rejection normalized to the bandwidth)	327	337	Conversely when setting the a given number of bits, increasing the number of coefficients will not improve
comparison between monolithic filter and cascaded filters}	328	338	the rejection. Hence the best coefficient set are on the vertex of the pyramid.
\label{fig:custom_criterion}	329	339
\end{figure}	330	340	\begin{figure}
	331	341	\centering
Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps	332	342	\includegraphics[width=\linewidth]{images/rejection_pyramid}
and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the	333	343	\caption{\color{red}Filter rejection as a function of number of coefficients and number of bits
rejection as a function of the number of coefficients and the number of bits representing these coefficients.	334	344	: this lookup table will be used to identify which filter parameters -- number of bits
The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.	335	345	representing coefficients and number of coefficients -- best match the targeted transfer function.}
Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.	336	346	\label{fig:rejection_pyramid}
Conversely when setting the a given number of bits, increasing the number of coefficients will not improve	337	347	\end{figure}
the rejection. Hence the best coefficient set are on the vertex of the pyramid.	338	348
	339	349	Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),
\begin{figure}	340	350	we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.
\centering	341	351	If the FIR filter coefficients are the same between the stages, we have:
\includegraphics[width=\linewidth]{images/rejection_pyramid}	342	352	$$F_{total} = F_1 + F_2$$
\caption{\color{red}Filter rejection as a function of number of coefficients and number of bits	343	353	But selecting two different sets of coefficient will yield a more complex situation in which
: this lookup table will be used to identify which filter parameters -- number of bits	344	354	the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves
representing coefficients and number of coefficients -- best match the targeted transfer function.}	345	355	are two different filters with maximums and notches not located at the same frequency offsets.
\label{fig:rejection_pyramid}	346	356	Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved
\end{figure}	347	357	with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.
	348	358	% r2.9
Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),	349	359	Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection
we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.	350	360	criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,
If the FIR filter coefficients are the same between the stages, we have:	351	361	% r2.10
$$F_{total} = F_1 + F_2$$	352	362	this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability
But selecting two different sets of coefficient will yield a more complex situation in which	353	363	of the filter cascade to meet design criteria.
the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves	354	364
are two different filters with maximums and notches not located at the same frequency offsets.	355	365	\begin{figure}
Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved	356	366	\centering
with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.	357	367	\includegraphics[width=\linewidth]{images/cascaded_criterion}
% r2.9	358	368	\caption{Transfer function of individual filters and after cascading the two filters,
Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection	359	369	demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal
criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,	360	370	lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop
% r2.10	361	371	maximum of each individual filter.
this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability	362	372	}
of the filter cascade to meet design criteria.	363	373	\label{fig:sum_rejection}
	364	374	\end{figure}
\begin{figure}	365	375
\centering	366
\includegraphics[width=\linewidth]{images/cascaded_criterion}	367
\caption{Transfer function of individual filters and after cascading the two filters,	368	376	Finally in our case, we consider that the input signal are fully known. The
demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal	369	377	resolution of the input data stream are fixed and still the same for all experiments
lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop	370	378	in this paper.
maximum of each individual filter.	371	379
}	372	380	Based on this analysis, we address the estimate of resource consumption (called
\label{fig:sum_rejection}	373	381	% r2.11
\end{figure}	374	382	silicon area -- in the case of FPGAs this means processing cells) as a function of
	375	383	filter characteristics. As a reminder, we do not aim at matching actual hardware
Finally in our case, we consider that the input signal are fully known. The	376	384	configuration but consider an arbitrary silicon area occupied by each processing function,
resolution of the input data stream are fixed and still the same for all experiments	377	385	and will assess after synthesis the adequation of this arbitrary unit with actual
in this paper.	378	386	hardware resources provided by FPGA manufacturers. The sum of individual processing
	379	387	unit areas is constrained by a total silicon area representative of FPGA global resources.
Based on this analysis, we address the estimate of resource consumption (called	380	388	Formally, variable $a_i$ is the area taken by filter~$i$
% r2.11	381	389	(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).
silicon area -- in the case of FPGAs this means processing cells) as a function of	382	390	Constant $\mathcal{A}$ is the total available area. We model our problem as follows:
filter characteristics. As a reminder, we do not aim at matching actual hardware	383	391
configuration but consider an arbitrary silicon area occupied by each processing function,	384	392	\begin{align}
and will assess after synthesis the adequation of this arbitrary unit with actual	385	393	\text{Maximize } & \sum_{i=1}^n r_i \notag \\
hardware resources provided by FPGA manufacturers. The sum of individual processing	386	394	\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\
unit areas is constrained by a total silicon area representative of FPGA global resources.	387	395	a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\
Formally, variable $a_i$ is the area taken by filter~$i$	388	396	r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\
(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).	389	397	\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\
Constant $\mathcal{A}$ is the total available area. We model our problem as follows:	390	398	\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\
	391	399	\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\
\begin{align}	392	400	\pi_1^- &= \Pi^I \label{eq:init}
\text{Maximize } & \sum_{i=1}^n r_i \notag \\	393	401	\end{align}
\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\	394	402
a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\	395	403	Equation~\ref{eq:area} states that the total area taken by the filters must be
r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\	396	404	less than the available area. Equation~\ref{eq:areadef} gives the definition of
\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\	397	405	the area used by a filter, considered as the area of the FIR since the Shifter is
\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\	398	406	assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size
\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\	399	407	$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the
\pi_1^- &= \Pi^I \label{eq:init}	400	408	input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the
\end{align}	401	409	definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined
	402	410	previously. The Shifter does not introduce negative rejection as we will explain later,
Equation~\ref{eq:area} states that the total area taken by the filters must be	403	411	so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the
less than the available area. Equation~\ref{eq:areadef} gives the definition of	404	412	relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add
the area used by a filter, considered as the area of the FIR since the Shifter is	405	413	$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes
assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size	406	414	$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of
$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the	407	415	a filter is the same as the input number of bits of the next filter.
input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the	408	416	Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative
definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined	409	417	rejection. Indeed, the results of the FIR can be right shifted without compromising
previously. The Shifter does not introduce negative rejection as we will explain later,	410	418	the quality of the rejection until a threshold. Each bit of the output data
so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the	411	419	increases the maximum rejection level by 6~dB. We add one to take the sign bit
relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add	412	420	into account. If equation~\ref{eq:maxshift} was not present, the Shifter could
$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes	413	421	shift too much and introduce some noise in the output data. Each supplementary
$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of	414	422	shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:
a filter is the same as the input number of bits of the next filter.	415	423	$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.
Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative	416	424	Finally, equation~\ref{eq:init} gives the number of bits of the global input.
rejection. Indeed, the results of the FIR can be right shifted without compromising	417	425
the quality of the rejection until a threshold. Each bit of the output data	418
increases the maximum rejection level by 6~dB. We add one to take the sign bit	419	426	This model is non-linear since we multiply some variable with another variable
into account. If equation~\ref{eq:maxshift} was not present, the Shifter could	420	427	and it is even non-quadratic, as the cost function $F$ does not have a known
shift too much and introduce some noise in the output data. Each supplementary	421	428	linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.
shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:	422	429	% AH: conflit merge
$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.	423	430	% This variable must be defined by the user, it represent the number of different
Finally, equation~\ref{eq:init} gives the number of bits of the global input.	424	431	% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}
	425	432	% functions from GNU Octave). To choose this value, we consider a subset of the figure~\ref{fig:rejection_pyramid}
This model is non-linear since we multiply some variable with another variable	426	433	% to restrict the number of configurations. Indeed, it is useless to have too many coefficients or
and it is even non-quadratic, as the cost function $F$ does not have a known	427	434	% too many bits, hence we take the configurations close to edge of pyramid. Thank to theses
linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.	428	435	% configurations $C_{ij}$ and $\pi_{ij}^C$ ($1 \leq j \leq p$) become constant
% AH: conflit merge	429	436	% and the function $F$ can be estimate for each configurations
% This variable must be defined by the user, it represent the number of different	430	437	% thanks our rejection criterion. We also defined binary
% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}	431	438	This variable $p$ is defined by the user, and represents the number of different
% functions from GNU Octave). To choose this value, we consider a subset of the figure~\ref{fig:rejection_pyramid}	432	439	set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}
% to restrict the number of configurations. Indeed, it is useless to have too many coefficients or	433	440	functions from GNU Octave) based on the targeted filter characteristics and implementation
% too many bits, hence we take the configurations close to edge of pyramid. Thank to theses	434	441	assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and
% configurations $C_{ij}$ and $\pi_{ij}^C$ ($1 \leq j \leq p$) become constant	435	442	$\pi_{ij}^C$ become constants and
% and the function $F$ can be estimate for each configurations	436	443	we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)
% thanks our rejection criterion. We also defined binary	437	444	for each configurations thanks to the rejection criterion. We also define the binary
This variable $p$ is defined by the user, and represents the number of different	438	445	variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}	439	446	and 0 otherwise. The new equations are as follows:
functions from GNU Octave) based on the targeted filter characteristics and implementation	440
assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and	441	447
$\pi_{ij}^C$ become constants and	442	448	\begin{align}
we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)	443	449	a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
for each configurations thanks to the rejection criterion. We also define the binary	444	450	r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$	445	451	\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
and 0 otherwise. The new equations are as follows:	446	452	\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
	447	453	\end{align}
\begin{align}	448	454
a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\	449	455	Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\	450	456	respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\	451	457	Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}	452	458
\end{align}	453
	454	459	% JM: conflict merge
Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace	455	460	% However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.	456	461	% we multiply
Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.	457	462	% $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
	458	463	% linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,
% JM: conflict merge	459	464	% we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is
% However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}	460	465	% assumed on hardware characteristics.
% we multiply	461	466	% The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic
% $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can	462	467	% model is able to linearize the model provided as is. This model
% linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,	463	468	% has $O(np)$ variables and $O(n)$ constraints.}
% we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is	464	469	The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
% assumed on hardware characteristics.	465	470	we multiply
% The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic	466	471	$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
% model is able to linearize the model provided as is. This model	467	472	linearize this multiplication. The following formula shows how to linearize
% has $O(np)$ variables and $O(n)$ constraints.}	468	473	this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):
The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}	469	474	\begin{equation*}
we multiply	470	475	m = x \times y \implies
$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can	471	476	\left \{
linearize this multiplication. The following formula shows how to linearize	472	477	\begin{split}
this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):	473	478	m & \geq 0 \\
\begin{equation*}	474	479	m & \leq y \times X^{max} \\
m = x \times y \implies	475	480	m & \leq x \\
\left \{	476	481	m & \geq x - (1 - y) \times X^{max} \\
\begin{split}	477	482	\end{split}
m & \geq 0 \\	478	483	\right .
m & \leq y \times X^{max} \\	479	484	\end{equation*}
m & \leq x \\	480	485	So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is
m & \geq x - (1 - y) \times X^{max} \\	481	486	assumed on hardware characteristics,
\end{split}	482	487	the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize
\right .	483	488	for us the quadratic problem so the model is left as is. This model
\end{equation*}	484	489	has $O(np)$ variables and $O(n)$ constraints.
So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is	485	490
assumed on hardware characteristics,	486	491	% This model is non-linear and even non-quadratic, as $F$ does not have a known
the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize	487	492	% linear or quadratic expression. We introduce $p$ FIR configurations
for us the quadratic problem so the model is left as is. This model	488	493	% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.
has $O(np)$ variables and $O(n)$ constraints.	489	494	% % r2.12
	490	495	% This variable must be defined by the user, it represent the number of different
% This model is non-linear and even non-quadratic, as $F$ does not have a known	491	496	% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}
% linear or quadratic expression. We introduce $p$ FIR configurations	492	497	% functions from GNU Octave).
% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.	493	498	% We define binary
% % r2.12	494	499	% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
% This variable must be defined by the user, it represent the number of different	495	500	% and 0 otherwise. The new equations are as follows:
% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}	496	501	%
% functions from GNU Octave).	497	502	% \begin{align}
% We define binary	498	503	% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$	499	504	% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
% and 0 otherwise. The new equations are as follows:	500	505	% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
%	501	506	% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
% \begin{align}	502	507	% \end{align}
% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\	503	508	%
% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\	504	509	% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\	505	510	% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}	506	511	% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
% \end{align}	507	512	%
%	508	513	% % r2.13
% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace	509	514	% This modified model is quadratic since we multiply two variables in the
% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.	510	515	% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.
% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.	511	516	% The Gurobi
%	512	517	% (\url{www.gurobi.com}) optimization software is used to solve this quadratic
% % r2.13	513	518	% model, and since Gurobi is able to linearize, the model is left as is. This model
% This modified model is quadratic since we multiply two variables in the	514	519	% has $O(np)$ variables and $O(n)$ constraints.
% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.	515	520
% The Gurobi	516	521	Two problems will be addressed using the workflow described in the next section: on the one
% (\url{www.gurobi.com}) optimization software is used to solve this quadratic	517	522	hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary
% model, and since Gurobi is able to linearize, the model is left as is. This model	518	523	silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area
% has $O(np)$ variables and $O(n)$ constraints.	519	524	for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the
	520	525	objective function is replaced with:
Two problems will be addressed using the workflow described in the next section: on the one	521	526	\begin{align}
hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary	522	527	\text{Minimize } & \sum_{i=1}^n a_i \notag
silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area	523	528	\end{align}
for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the	524	529	We adapt our constraints of quadratic program to replace equation \ref{eq:area}
objective function is replaced with:	525	530	with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal
\begin{align}	526	531	rejection required.
\text{Minimize } & \sum_{i=1}^n a_i \notag	527	532
\end{align}	528	533	\begin{align}
We adapt our constraints of quadratic program to replace equation \ref{eq:area}	529	534	\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}
with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal	530	535	\end{align}
rejection required.	531	536
	532	537	\section{Design workflow}
\begin{align}	533	538	\label{sec:workflow}
\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}	534	539
\end{align}	535	540	In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}
	536	541	and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved
\section{Design workflow}	537	542	in the computation of the results.
\label{sec:workflow}	538	543
	539	544	\begin{figure}
In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}	540	545	\centering
and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved	541	546	\begin{tikzpicture}[node distance=0.75cm and 2cm]
in the computation of the results.	542	547	\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;
	543	548	\node (Start) [left= 3cm of Solver] { } ;
\begin{figure}	544	549	\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;
\centering	545	550	\node (Input) [above= of TCL] { } ;
\begin{tikzpicture}[node distance=0.75cm and 2cm]	546	551	\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;
\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;	547	552	\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;
\node (Start) [left= 3cm of Solver] { } ;	548	553	\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;
\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;	549	554	\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;
\node (Input) [above= of TCL] { } ;	550	555	\node (Results) [left= of Postproc] { } ;
\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;	551	556
\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;	552	557	\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;
\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;	553	558	\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;
\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;	554	559	\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;
\node (Results) [left= of Postproc] { } ;	555	560	\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;
	556	561	\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;
\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;	557	562	\draw[->,dashed] (Bitstream) -- (Deploy) ;
\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;	558	563	\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;
\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;	559	564	\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;
\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;	560	565	\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;	561	566	\draw[->] (Postproc) -- (Results) ;
\draw[->,dashed] (Bitstream) -- (Deploy) ;	562	567	\end{tikzpicture}
\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;	563	568	\caption{Design workflow from the input parameters to the results allowing for
\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;	564	569	a fully automated optimal solution search.}
\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;	565	570	\label{fig:workflow}
\draw[->] (Postproc) -- (Results) ;	566	571	\end{figure}
\end{tikzpicture}	567	572
\caption{Design workflow from the input parameters to the results allowing for	568	573	The filter solver is a C++ program that takes as input the maximum area
a fully automated optimal solution search.}	569	574	$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,
\label{fig:workflow}	570	575	the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates
\end{figure}	571	576	the quadratic programs and uses the Gurobi solver to estimate the optimal results.
	572	577	Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})
The filter solver is a C++ program that takes as input the maximum area	573	578	and a deploy script ((1b) on figure~\ref{fig:workflow}).
$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,	574	579
the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates	575	580	The TCL script describes the whole digital processing chain from the beginning
the quadratic programs and uses the Gurobi solver to estimate the optimal results.	576	581	(the raw signal data) to the end (the filtered data) in a language compatible
Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})	577	582	with proprietary synthesis software, namely Vivado for Xilinx and Quartus for
and a deploy script ((1b) on figure~\ref{fig:workflow}).	578	583	Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)
	579	584	generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.
The TCL script describes the whole digital processing chain from the beginning	580	585	Then the script builds each stage of the chain with a generic FIR task that
(the raw signal data) to the end (the filtered data) in a language compatible	581	586	comes from a skeleton library. The generic FIR is highly configurable
with proprietary synthesis software, namely Vivado for Xilinx and Quartus for	582	587	with the number of coefficients and the size of the coefficients. The coefficients
Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)	583	588	themselves are not stored in the script.
generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.	584	589	As the signal is processed in real-time, the output signal is stored as
Then the script builds each stage of the chain with a generic FIR task that	585	590	consecutive bursts of data for post-processing, mainly assessing the consistency of the
comes from a skeleton library. The generic FIR is highly configurable	586	591	implemented FIR cascade transfer function with the design criteria and the expected
with the number of coefficients and the size of the coefficients. The coefficients	587	592	transfer function.
themselves are not stored in the script.	588	593
As the signal is processed in real-time, the output signal is stored as	589	594	The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).
consecutive bursts of data for post-processing, mainly assessing the consistency of the	590	595	We use the 2018.2 version of Xilinx Vivado and we execute the synthesized
implemented FIR cascade transfer function with the design criteria and the expected	591	596	bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series
transfer function.	592	597	FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to
	593	598	provide a broadband noise source.
The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).	594	599	The board runs the Linux kernel and surrounding environment produced from the
We use the 2018.2 version of Xilinx Vivado and we execute the synthesized	595	600	Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring
bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series	596	601	the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and
FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to	597	602	fetching the results is automated.
provide a broadband noise source.	598	603
The board runs the Linux kernel and surrounding environment produced from the	599	604	The deploy script uploads the bitstream to the board ((3) on
Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring	600	605	figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,
the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and	601	606	configures the coefficients of the FIR filters. It then waits for the results
fetching the results is automated.	602	607	and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).
	603	608
The deploy script uploads the bitstream to the board ((3) on	604	609	Finally, an Octave post-processing script computes the final results thanks to
figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,	605	610	the output data ((5) on figure~\ref{fig:workflow}).
configures the coefficients of the FIR filters. It then waits for the results	606	611	The results are normalized so that the Power Spectrum Density (PSD) starts at zero
and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).	607	612	and the different configurations can be compared.
	608	613
Finally, an Octave post-processing script computes the final results thanks to	609	614	\section{Maximizing the rejection at fixed silicon area}
the output data ((5) on figure~\ref{fig:workflow}).	610	615	\label{sec:fixed_area}
The results are normalized so that the Power Spectrum Density (PSD) starts at zero	611	616	This section presents the output of the filter solver {\em i.e.} the computed
and the different configurations can be compared.	612	617	configurations for each stage, the computed rejection and the computed silicon area.
	613	618	Such results allow for understanding the choices made by the solver to compute its solutions.
\section{Maximizing the rejection at fixed silicon area}	614	619
\label{sec:fixed_area}	615	620	The experimental setup is composed of three cases. The raw input is generated
This section presents the output of the filter solver {\em i.e.} the computed	616	621	by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.
configurations for each stage, the computed rejection and the computed silicon area.	617	622	Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500
Such results allow for understanding the choices made by the solver to compute its solutions.	618	623	arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.
	619	624	The number of configurations $p$ is \color{1133}, with $C_i$ ranging from 3 to 60 and $\pi^C$
The experimental setup is composed of three cases. The raw input is generated	620	625	ranging from 2 to 22. In each case, the quadratic program has been able to give a
by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.	621	626	result up to five stages ($n = 5$) in the cascaded filter.
Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500	622	627
arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.	623	628	Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.
The number of configurations $p$ is \color{1133}, with $C_i$ ranging from 3 to 60 and $\pi^C$	624	629	Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.
ranging from 2 to 22. In each case, the quadratic program has been able to give a	625	630	Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.
result up to five stages ($n = 5$) in the cascaded filter.	626	631
	627	632	\renewcommand{\arraystretch}{1.4}
Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.	628	633
Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.	629	634	\begin{table}
Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.	630	635	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}
	631	636	\label{tbl:gurobi_max_500}
\renewcommand{\arraystretch}{1.4}	632	637	\centering
	633	638	{\color{red}
		639	\scalefont{0.77}
\begin{table}	634	640	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}	635	641	\hline
\label{tbl:gurobi_max_500}	636	642	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\centering	637	643	\hline
{\color{red}	638	644	1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\
\scalefont{0.77}	639	645	2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\
\begin{tabular}{\|c\|ccccc\|c\|c\|}	640	646	3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
\hline	641	647	4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	642	648	5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\
\hline	643	649	\hline
1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\	644	650	\end{tabular}
2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\	645	651	}
3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\	646	652	\end{table}
4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\	647	653
5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\	648	654	\begin{table}
\hline	649	655	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}
\end{tabular}	650	656	\label{tbl:gurobi_max_1000}
}	651	657	\centering
\end{table}	652	658	{\color{red}\scalefont{0.77}
	653	659	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\begin{table}	654	660	\hline
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}	655	661	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\label{tbl:gurobi_max_1000}	656	662	\hline
\centering	657	663	1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\
{\color{red}\scalefont{0.77}	658	664	2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\
\begin{tabular}{\|c\|ccccc\|c\|c\|}	659	665	3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\
\hline	660	666	4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	661	667	5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\
\hline	662	668	\hline
1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\	663	669	\end{tabular}
2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\	664	670	}
3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\	665	671	\end{table}
4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\	666	672
5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\	667	673	\begin{table}
\hline	668	674	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}
\end{tabular}	669	675	\label{tbl:gurobi_max_1500}
}	670	676	\centering
\end{table}	671	677	{\color{red}\scalefont{0.77}
	672	678	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\begin{table}	673	679	\hline
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}	674	680	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
\label{tbl:gurobi_max_1500}	675	681	\hline
\centering	676	682	1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\
{\color{red}\scalefont{0.77}	677	683	2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\
\begin{tabular}{\|c\|ccccc\|c\|c\|}	678	684	3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\
\hline	679	685	4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	680	686	5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\
\hline	681	687	\hline
1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\	682	688	\end{tabular}
2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\	683	689	}
3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\	684	690	\end{table}
4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\	685	691
5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\	686	692	\renewcommand{\arraystretch}{1}
\hline	687	693
\end{tabular}	688	694	% From these tables, we can first state that the more stages are used to define
}	689	695	% the cascaded FIR filters, the better the rejection.
		696	{\color{red} From these tables, we can first state that we reach an optimal solution
		697	for each case : $n = 3$ for MAX/500 and $n = 4$ for MAX/1000 and MAX/1500. Moreover
		698	the cascade filters always are better than monolithic solution.}
		699	It was an expected result as it has
\end{table}	690	700	been previously observed that many small filters are better than
	691	701	a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions
\renewcommand{\arraystretch}{1}	692	702	being hardly used in practice due to the lack of tools for identifying individual filter
	693	703	coefficients in the cascaded approach.
% From these tables, we can first state that the more stages are used to define	694	704
% the cascaded FIR filters, the better the rejection.	695	705	Second, the larger the silicon area, the better the rejection. This was also an
{\color{red} From these tables, we can first state that we reach an optimal solution	696	706	expected result as more area means a filter of better quality with more coefficients
for each case : $n = 3$ for MAX/500 and $n = 4$ for MAX/1000 and MAX/1500. Moreover	697	707	or more bits per coefficient.
the cascade filters always are better than monolithic solution.}	698	708
It was an expected result as it has	699	709	Then, we also observe that the first stage can have a larger shift than the other
been previously observed that many small filters are better than	700	710	stages. This is explained by the fact that the solver tries to use just enough
a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions	701	711	bits for the computed rejection after each stage. In the first stage, a
being hardly used in practice due to the lack of tools for identifying individual filter	702	712	balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}
coefficients in the cascaded approach.	703	713	gives the relation between both values.
	704	714
Second, the larger the silicon area, the better the rejection. This was also an	705	715	Finally, we note that the solver consumes all the given silicon area.
expected result as more area means a filter of better quality with more coefficients	706	716
or more bits per coefficient.	707	717	The following graphs present the rejection for real data on the FPGA. In all the following
	708	718	figures, the solid line represents the actual rejection of the filtered
Then, we also observe that the first stage can have a larger shift than the other	709	719	data on the FPGA as measured experimentally and the dashed line are the noise levels
stages. This is explained by the fact that the solver tries to use just enough	710	720	given by the quadratic solver. The configurations are those computed in the previous section.
bits for the computed rejection after each stage. In the first stage, a	711	721
balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}	712	722	Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.
gives the relation between both values.	713	723	Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.
	714	724	Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.
Finally, we note that the solver consumes all the given silicon area.	715	725
	716	726	% \begin{figure}
The following graphs present the rejection for real data on the FPGA. In all the following	717	727	% \centering
figures, the solid line represents the actual rejection of the filtered	718	728	% \includegraphics[width=\linewidth]{images/max_500}
data on the FPGA as measured experimentally and the dashed line are the noise levels	719	729	% \caption{Signal spectrum for MAX/500}
given by the quadratic solver. The configurations are those computed in the previous section.	720	730	% \label{fig:max_500_result}
	721	731	% \end{figure}
Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.	722	732	%
Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.	723	733	% \begin{figure}
Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.	724	734	% \centering
	725	735	% \includegraphics[width=\linewidth]{images/max_1000}
% \begin{figure}	726	736	% \caption{Signal spectrum for MAX/1000}
% \centering	727	737	% \label{fig:max_1000_result}
% \includegraphics[width=\linewidth]{images/max_500}	728	738	% \end{figure}
% \caption{Signal spectrum for MAX/500}	729	739	%
% \label{fig:max_500_result}	730	740	% \begin{figure}
% \end{figure}	731	741	% \centering
%	732	742	% \includegraphics[width=\linewidth]{images/max_1500}
% \begin{figure}	733	743	% \caption{Signal spectrum for MAX/1500}
% \centering	734	744	% \label{fig:max_1500_result}
% \includegraphics[width=\linewidth]{images/max_1000}	735	745	% \end{figure}
% \caption{Signal spectrum for MAX/1000}	736	746
% \label{fig:max_1000_result}	737	747	% r2.14 et r2.15 et r2.16
% \end{figure}	738	748	\begin{figure}
%	739	749	\centering
% \begin{figure}	740	750	\begin{subfigure}{\linewidth}
% \centering	741	751	\includegraphics[width=\linewidth]{images/max_500}
% \includegraphics[width=\linewidth]{images/max_1500}	742	752	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
% \caption{Signal spectrum for MAX/1500}	743	753	the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}
% \label{fig:max_1500_result}	744	754	\label{fig:max_500_result}
% \end{figure}	745	755	\end{subfigure}
	746	756
% r2.14 et r2.15 et r2.16	747	757	\begin{subfigure}{\linewidth}
\begin{figure}	748	758	\includegraphics[width=\linewidth]{images/max_1000}
\centering	749	759	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
\begin{subfigure}{\linewidth}	750	760	the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}
\includegraphics[width=\linewidth]{images/max_500}	751	761	\label{fig:max_1000_result}
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	752	762	\end{subfigure}
the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}	753	763
\label{fig:max_500_result}	754	764	\begin{subfigure}{\linewidth}
\end{subfigure}	755	765	\includegraphics[width=\linewidth]{images/max_1500}
	756	766	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
\begin{subfigure}{\linewidth}	757	767	the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}
\includegraphics[width=\linewidth]{images/max_1000}	758	768	\label{fig:max_1500_result}
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	759	769	\end{subfigure}
the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}	760	770	\caption{\color{red}Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing
\label{fig:max_1000_result}	761	771	rejection for a given resource allocation.
\end{subfigure}	762	772	The filter shape constraint (bandpass and bandstop) is shown as thick
	763	773	horizontal lines on each chart.}
\begin{subfigure}{\linewidth}	764	774	\end{figure}
\includegraphics[width=\linewidth]{images/max_1500}	765	775
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	766	776	In all cases, we observe that the actual rejection is close to the rejection computed by the solver.
the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}	767	777
\label{fig:max_1500_result}	768	778	We compare the actual silicon resources given by Vivado to the
\end{subfigure}	769	779	resources in arbitrary units.
\caption{\color{red}Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing	770	780	The goal is to check that our arbitrary units of silicon area models well enough
rejection for a given resource allocation.	771	781	the real resources on the FPGA. Especially we want to verify that, for a given
The filter shape constraint (bandpass and bandstop) is shown as thick	772	782	number of arbitrary units, the actual silicon resources do not depend on the
horizontal lines on each chart.}	773	783	number of stages $n$. Most significantly, our approach aims
\end{figure}	774	784	at remaining far enough from the practical logic gate implementation used by
	775	785	various vendors to remain platform independent and be portable from one
In all cases, we observe that the actual rejection is close to the rejection computed by the solver.	776	786	architecture to another.
	777	787
We compare the actual silicon resources given by Vivado to the	778	788	Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and
resources in arbitrary units.	779	789	MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000
The goal is to check that our arbitrary units of silicon area models well enough	780	790	and 1500 arbitrary units. We have taken care to extract solely the resources used by
the real resources on the FPGA. Especially we want to verify that, for a given	781	791	the FIR filters and remove additional processing blocks including FIFO and Programmable
number of arbitrary units, the actual silicon resources do not depend on the	782	792	Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.
number of stages $n$. Most significantly, our approach aims	783	793
at remaining far enough from the practical logic gate implementation used by	784	794	\begin{table}[h!tb]
various vendors to remain platform independent and be portable from one	785	795	\caption{Resource occupation following synthesis of the solutions found for
architecture to another.	786	796	the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
	787	797	\label{tbl:resources_usage}
		798	\color{red}
Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and	788	799	\centering
MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000	789	800	\begin{tabular}{\|c\|c\|ccc\|c\|}
and 1500 arbitrary units. We have taken care to extract solely the resources used by	790	801	\hline
the FIR filters and remove additional processing blocks including FIFO and Programmable	791	802	$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline
Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.	792	803	& LUT & 249 & 453 & 627 & \emph{17600} \\
	793	804	1 & BRAM & 1 & 1 & 1 & \emph{120} \\
\begin{table}[h!tb]	794	805	& DSP & 21 & 37 & 47 & \emph{80} \\ \hline
\caption{Resource occupation following synthesis of the solutions found for	795	806	& LUT & 2253 & 474 & 691 & \emph{17600} \\
the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}	796	807	2 & BRAM & 2 & 2 & 2 & \emph{120} \\
\label{tbl:resources_usage}	797	808	& DSP & 0 & 50 & 70 & \emph{80} \\ \hline
\color{red}	798	809	& LUT & 1329 & 2006 & 3158 & \emph{17600} \\
\centering	799	810	3 & BRAM & 3 & 3 & 3 & \emph{120} \\
\begin{tabular}{\|c\|c\|ccc\|c\|}	800	811	& DSP & 15 & 30 & 42 & \emph{80} \\ \hline
\hline	801	812	& LUT & 1329 & 1600 & 2260 & \emph{17600} \\
$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline	802	813	4 & BRAM & 3 & 4 & 4 & \emph{120} \\
& LUT & 249 & 453 & 627 & \emph{17600} \\	803	814	& DPS & 15 & 38 & 49 & \emph{80} \\ \hline
1 & BRAM & 1 & 1 & 1 & \emph{120} \\	804	815	& LUT & 1329 & 1600 & 2260 & \emph{17600} \\
& DSP & 21 & 37 & 47 & \emph{80} \\ \hline	805	816	5 & BRAM & 3 & 4 & 4 & \emph{120} \\
& LUT & 2253 & 474 & 691 & \emph{17600} \\	806	817	& DPS & 15 & 38 & 49 & \emph{80} \\ \hline
2 & BRAM & 2 & 2 & 2 & \emph{120} \\	807	818	\end{tabular}
& DSP & 0 & 50 & 70 & \emph{80} \\ \hline	808	819	\end{table}
& LUT & 1329 & 2006 & 3158 & \emph{17600} \\	809	820
3 & BRAM & 3 & 3 & 3 & \emph{120} \\	810	821	{\color{red} In case $n = 2$ for MAX/500}, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,
& DSP & 15 & 30 & 42 & \emph{80} \\ \hline	811	822	when the filter coefficients are small enough, or when the input size is small
& LUT & 1329 & 1600 & 2260 & \emph{17600} \\	812	823	enough, Vivado optimizes resource consumption by selecting multiplexers to
4 & BRAM & 3 & 4 & 4 & \emph{120} \\	813	824	implement the multiplications instead of a DSP. In this case, it is quite difficult
& DPS & 15 & 38 & 49 & \emph{80} \\ \hline	814	825	to compare the whole silicon budget.
& LUT & 1329 & 1600 & 2260 & \emph{17600} \\	815	826
5 & BRAM & 3 & 4 & 4 & \emph{120} \\	816	827	However, a rough estimation can be made with a simple equivalence: looking at
& DPS & 15 & 38 & 49 & \emph{80} \\ \hline	817	828	the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
\end{tabular}	818	829	we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
\end{table}	819	830	area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,
	820	831	1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond
{\color{red} In case $n = 2$ for MAX/500}, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,	821	832	to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
when the filter coefficients are small enough, or when the input size is small	822	833	unit map well to actual hardware resources. The relatively small differences can probably be explained
enough, Vivado optimizes resource consumption by selecting multiplexers to	823	834	by the optimizations done by Vivado based on the detailed map of available processing resources.
implement the multiplications instead of a DSP. In this case, it is quite difficult	824	835
to compare the whole silicon budget.	825	836	We now present the computation time needed to solve the quadratic problem.
	826	837	For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606
However, a rough estimation can be made with a simple equivalence: looking at	827	838	clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve
the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,	828	839	the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic
we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon	829	840	problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.
area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,	830	841
1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond	831	842	\begin{table}[h!tb]
to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary	832	843	\caption{Time needed to solve the quadratic program with Gurobi}
unit map well to actual hardware resources. The relatively small differences can probably be explained	833	844	\label{tbl:area_time}
by the optimizations done by Vivado based on the detailed map of available processing resources.	834	845	\centering
		846	\color{red}
	835	847	\begin{tabular}{\|c\|c\|c\|c\|}\hline
We now present the computation time needed to solve the quadratic problem.	836	848	$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline
For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606	837	849	1 & 0.01~s & 0.02~s & 0.03~s \\
clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve	838	850	2 & 0.1~s & 1~s & 2~s \\
the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic	839	851	3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\
problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.	840	852	4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\
	841	853	5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline
\begin{table}[h!tb]	842	854	\end{tabular}
\caption{Time needed to solve the quadratic program with Gurobi}	843	855	\end{table}
\label{tbl:area_time}	844	856
\centering	845	857	As expected, the computation time seems to rise exponentially with the number of stages.
\color{red}	846	858	When the area is limited, the design exploration space is more limited and the solver is able to
\begin{tabular}{\|c\|c\|c\|c\|}\hline	847	859	find an optimal solution faster.
		860	{\color{red} We can also notice that the solution with $n$ greater than the optimal $n$
		861	take more time than the optimal one. This can be explain since the search space is
		862	more important and we need more time to ensure that the previous solution (from the
		863	smaller value of $n$) still the optimal solution.}
$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline	848	864
1 & 0.01~s & 0.02~s & 0.03~s \\	849	865	\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}
2 & 0.1~s & 1~s & 2~s \\	850	866
3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\	851	867	This section presents the results of the complementary quadratic program aimed at
4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\	852	868	minimizing the area occupation for a targeted rejection level.
5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline	853	869
\end{tabular}	854	870	The experimental setup is composed of four cases. The raw input is the same
\end{table}	855	871	as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.
	856	872	Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.
As expected, the computation time seems to rise exponentially with the number of stages.	857	873	Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.
When the area is limited, the design exploration space is more limited and the solver is able to	858	874	The number of configurations $p$ is the same as previous section.
find an optimal solution faster.	859	875
{\color{red} We can also notice that the solution with $n$ greater than the optimal $n$	860	876	Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.
take more time than the optimal one. This can be explain since the search space is	861	877	Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.
more important and we need more time to ensure that the previous solution (from the	862	878	Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.
smaller value of $n$) still the optimal solution.}	863	879	Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.
	864	880
\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}	865	881	\renewcommand{\arraystretch}{1.4}
	866	882
This section presents the results of the complementary quadratic program aimed at	867	883	\begin{table}[h!tb]
minimizing the area occupation for a targeted rejection level.	868	884	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}
	869	885	\label{tbl:gurobi_min_40}
The experimental setup is composed of four cases. The raw input is the same	870	886	\centering
as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.	871	887	{\scalefont{0.77}\color{red}
Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.	872	888	\begin{tabular}{\|c\|ccccc\|c\|c\|}
Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.	873	889	\hline
The number of configurations $p$ is the same as previous section.	874	890	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
	875	891	\hline
Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.	876	892	1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\
Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.	877	893	2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.	878	894	3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.	879	895	4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
		896	5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\
	880	897	\hline
\renewcommand{\arraystretch}{1.4}	881	898	\end{tabular}
	882	899	}
\begin{table}[h!tb]	883	900	\end{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}	884	901
\label{tbl:gurobi_min_40}	885	902	\begin{table}[h!tb]
\centering	886	903	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}
{\scalefont{0.77}\color{red}	887	904	\label{tbl:gurobi_min_60}
\begin{tabular}{\|c\|ccccc\|c\|c\|}	888	905	\centering
\hline	889	906	{\scalefont{0.77}\color{red}
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	890	907	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\hline	891	908	\hline
1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\	892	909	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\	893	910	\hline
3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\	894	911	1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\
4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\	895	912	2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\
5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\	896	913	3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
\hline	897	914	4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
\end{tabular}	898	915	5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\
}	899	916	\hline
\end{table}	900	917	\end{tabular}
	901	918	}
\begin{table}[h!tb]	902	919	\end{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}	903	920
\label{tbl:gurobi_min_60}	904	921	\begin{table}[h!tb]
\centering	905	922	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}
{\scalefont{0.77}\color{red}	906	923	\label{tbl:gurobi_min_80}
\begin{tabular}{\|c\|ccccc\|c\|c\|}	907	924	\centering
\hline	908	925	{\scalefont{0.77}\color{red}
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	909	926	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\hline	910	927	\hline
1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\	911	928	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\	912	929	\hline
3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\	913	930	1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\
4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\	914	931	2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\
5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\	915	932	3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
\hline	916	933	4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
\end{tabular}	917	934	5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\
}	918	935	\hline
\end{table}	919	936	\end{tabular}
	920	937	}
\begin{table}[h!tb]	921	938	\end{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}	922	939
\label{tbl:gurobi_min_80}	923	940	\begin{table}[h!tb]
\centering	924	941	\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}
{\scalefont{0.77}\color{red}	925	942	\label{tbl:gurobi_min_100}
\begin{tabular}{\|c\|ccccc\|c\|c\|}	926	943	\centering
\hline	927	944	{\scalefont{0.77}\color{red}
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	928	945	\begin{tabular}{\|c\|ccccc\|c\|c\|}
\hline	929	946	\hline
1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\	930	947	$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\
2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\	931	948	\hline
3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\	932	949	1 & - & - & - & - & - & - & - \\
4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\	933	950	2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\
5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\	934	951	3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\
\hline	935	952	4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\
\end{tabular}	936	953	5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\
}	937	954	\hline
\end{table}	938	955	\end{tabular}
	939	956	}
\begin{table}[h!tb]	940	957	\end{table}
\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}	941	958	\renewcommand{\arraystretch}{1}
\label{tbl:gurobi_min_100}	942	959
\centering	943	960	From these tables, we can first state that almost all configurations reach the targeted rejection
{\scalefont{0.77}\color{red}	944	961	level or even better thanks to our underestimate of the cascade rejection as the sum of the
\begin{tabular}{\|c\|ccccc\|c\|c\|}	945	962	individual filter rejection. The only exception is for the monolithic case ($n = 1$) in
\hline	946	963	MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.
$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\	947	964	Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters
\hline	948	965	{\color{red}(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection}
1 & - & - & - & - & - & - & - \\	949	966	respectively). More generally, the more filters are cascaded, the lower the occupied area.
2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\	950	967
3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\	951	968	Like in previous section, the solver chooses always a little filter as first
4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\	952	969	filter stage and the second one is often the biggest filter. This choice can be explained
5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\	953	970	as in the previous section, with the solver using just enough bits not to degrade the input
\hline	954	971	signal and in the second filter selecting a better filter to improve rejection without
\end{tabular}	955	972	having too many bits in the output data.
}	956	973
\end{table}	957	974	{\color{red} For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,
\renewcommand{\arraystretch}{1}	958	975	for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions
	959	976	when $n$ is greater than the optimal $n$ they remain identical to the optimal one.}
		977	% For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal
		978	% number of filters is 4 so it did not chose any configuration for the last filter. Hence this
		979	% solution is equivalent to the result for $n = 4$.
From these tables, we can first state that almost all configurations reach the targeted rejection	960	980
level or even better thanks to our underestimate of the cascade rejection as the sum of the	961	981	The following graphs present the rejection for real data on the FPGA. In all the following
individual filter rejection. The only exception is for the monolithic case ($n = 1$) in	962	982	figures, the solid line represents the actual rejection of the filtered
MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.	963	983	data on the FPGA as measured experimentally and the dashed line is the noise level
Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters	964	984	given by the quadratic solver.
{\color{red}(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection}	965	985
respectively). More generally, the more filters are cascaded, the lower the occupied area.	966	986	Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.
	967	987	Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.
Like in previous section, the solver chooses always a little filter as first	968	988	Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.
filter stage and the second one is often the biggest filter. This choice can be explained	969	989	Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.
as in the previous section, with the solver using just enough bits not to degrade the input	970	990
signal and in the second filter selecting a better filter to improve rejection without	971	991	% \begin{figure}
having too many bits in the output data.	972	992	% \centering
	973	993	% \includegraphics[width=\linewidth]{images/min_40}
{\color{red} For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,	974	994	% \caption{Signal spectrum for MIN/40}
for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions	975	995	% \label{fig:min_40}
when $n$ is greater than the optimal $n$ they remain identical to the optimal one.}	976	996	% \end{figure}
% For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal	977	997	%
% number of filters is 4 so it did not chose any configuration for the last filter. Hence this	978	998	% \begin{figure}
% solution is equivalent to the result for $n = 4$.	979	999	% \centering
	980	1000	% \includegraphics[width=\linewidth]{images/min_60}
The following graphs present the rejection for real data on the FPGA. In all the following	981	1001	% \caption{Signal spectrum for MIN/60}
figures, the solid line represents the actual rejection of the filtered	982	1002	% \label{fig:min_60}
data on the FPGA as measured experimentally and the dashed line is the noise level	983	1003	% \end{figure}
given by the quadratic solver.	984	1004	%
	985	1005	% \begin{figure}
Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.	986	1006	% \centering
Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.	987	1007	% \includegraphics[width=\linewidth]{images/min_80}
Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.	988	1008	% \caption{Signal spectrum for MIN/80}
Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.	989	1009	% \label{fig:min_80}
	990	1010	% \end{figure}
% \begin{figure}	991	1011	%
% \centering	992	1012	% \begin{figure}
% \includegraphics[width=\linewidth]{images/min_40}	993	1013	% \centering
% \caption{Signal spectrum for MIN/40}	994	1014	% \includegraphics[width=\linewidth]{images/min_100}
% \label{fig:min_40}	995	1015	% \caption{Signal spectrum for MIN/100}
% \end{figure}	996	1016	% \label{fig:min_100}
%	997	1017	% \end{figure}
% \begin{figure}	998	1018
% \centering	999	1019	% r2.14 et r2.15 et r2.16
% \includegraphics[width=\linewidth]{images/min_60}	1000	1020	\begin{figure}
% \caption{Signal spectrum for MIN/60}	1001	1021	\centering
% \label{fig:min_60}	1002	1022	\begin{subfigure}{\linewidth}
% \end{figure}	1003	1023	\includegraphics[width=.91\linewidth]{images/min_40}
%	1004	1024	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
% \begin{figure}	1005	1025	the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}
% \centering	1006	1026	\label{fig:min_40}
% \includegraphics[width=\linewidth]{images/min_80}	1007	1027	\end{subfigure}
% \caption{Signal spectrum for MIN/80}	1008	1028
% \label{fig:min_80}	1009	1029	\begin{subfigure}{\linewidth}
% \end{figure}	1010	1030	\includegraphics[width=.91\linewidth]{images/min_60}
%	1011	1031	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
% \begin{figure}	1012	1032	the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}
% \centering	1013	1033	\label{fig:min_60}
% \includegraphics[width=\linewidth]{images/min_100}	1014	1034	\end{subfigure}
% \caption{Signal spectrum for MIN/100}	1015	1035
% \label{fig:min_100}	1016	1036	\begin{subfigure}{\linewidth}
% \end{figure}	1017	1037	\includegraphics[width=.91\linewidth]{images/min_80}
	1018	1038	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
% r2.14 et r2.15 et r2.16	1019	1039	the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}
\begin{figure}	1020	1040	\label{fig:min_80}
\centering	1021	1041	\end{subfigure}
\begin{subfigure}{\linewidth}	1022	1042
\includegraphics[width=.91\linewidth]{images/min_40}	1023	1043	\begin{subfigure}{\linewidth}
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	1024	1044	\includegraphics[width=.91\linewidth]{images/min_100}
the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}	1025	1045	\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
\label{fig:min_40}	1026	1046	the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}
\end{subfigure}	1027	1047	\label{fig:min_100}
	1028	1048	\end{subfigure}
\begin{subfigure}{\linewidth}	1029	1049	\caption{\color{red}Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a
\includegraphics[width=.91\linewidth]{images/min_60}	1030	1050	given rejection while minimizing resource allocation. The filter shape constraint (bandpass and
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	1031	1051	bandstop) is shown as thick
the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}	1032	1052	horizontal lines on each chart.}
\label{fig:min_60}	1033	1053	\end{figure}
\end{subfigure}	1034	1054
	1035	1055	We observe that all rejections given by the quadratic solver are close to the experimentally
\begin{subfigure}{\linewidth}	1036	1056	measured rejection. All curves prove that the constraint to reach the target rejection is
\includegraphics[width=.91\linewidth]{images/min_80}	1037	1057	respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	1038	1058
the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}	1039	1059	Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;
\label{fig:min_80}	1040	1060	MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We
\end{subfigure}	1041	1061	have taken care to extract solely the resources used by
	1042	1062	the FIR filters and remove additional processing blocks including FIFO and PL to
\begin{subfigure}{\linewidth}	1043	1063	PS communication.
\includegraphics[width=.91\linewidth]{images/min_100}	1044	1064
\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving	1045	1065	\renewcommand{\arraystretch}{1.2}
the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}	1046	1066	\begin{table}
\label{fig:min_100}	1047	1067	\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
\end{subfigure}	1048	1068	\label{tbl:resources_usage_comp}
\caption{\color{red}Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a	1049	1069	\centering
given rejection while minimizing resource allocation. The filter shape constraint (bandpass and	1050	1070	{\scalefont{0.90}\color{red}
bandstop) is shown as thick	1051	1071	\begin{tabular}{\|c\|c\|cccc\|c\|}
horizontal lines on each chart.}	1052	1072	\hline
\end{figure}	1053	1073	$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline
	1054	1074	& LUT & 343 & 334 & 772 & - & \emph{17600} \\
We observe that all rejections given by the quadratic solver are close to the experimentally	1055	1075	1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\
measured rejection. All curves prove that the constraint to reach the target rejection is	1056	1076	& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline
respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.	1057	1077	& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\
	1058	1078	2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\
Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;	1059	1079	& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline
MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We	1060	1080	& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\
have taken care to extract solely the resources used by	1061	1081	3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\
the FIR filters and remove additional processing blocks including FIFO and PL to	1062	1082	& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline
PS communication.	1063	1083	& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\
	1064	1084	4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\
\renewcommand{\arraystretch}{1.2}	1065	1085	& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline
\begin{table}	1066	1086	& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\
\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}	1067	1087	5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\
\label{tbl:resources_usage_comp}	1068	1088	& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline
\centering	1069	1089	\end{tabular}
{\scalefont{0.90}\color{red}	1070	1090	}
\begin{tabular}{\|c\|c\|cccc\|c\|}	1071	1091	\end{table}
\hline	1072	1092	\renewcommand{\arraystretch}{1}
$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline	1073	1093
& LUT & 343 & 334 & 772 & - & \emph{17600} \\	1074	1094	If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)
1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\	1075	1095	the real resource consumption decreases as a function of the number of stages in the cascaded
& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline	1076	1096	filter according
& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\	1077	1097	to the solution given by the quadratic solver. Indeed, we have always a decreasing
2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\	1078	1098	consumption even if the difference between the monolithic and the two cascaded
& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline	1079	1099	filters is less than expected.
& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\	1080	1100
3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\	1081	1101	Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve
& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline	1082	1102	the quadratic program.
& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\	1083	1103
4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\	1084	1104	\renewcommand{\arraystretch}{1.2}
& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline	1085	1105	\begin{table}[h!tb]
& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\	1086	1106	\caption{Time to solve the quadratic program with Gurobi}
5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\	1087	1107	\label{tbl:area_time_comp}
& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline	1088	1108	\centering
\end{tabular}	1089	1109	{\scalefont{0.90}\color{red}
}	1090	1110	\begin{tabular}{\|c\|c\|c\|c\|c\|}\hline
\end{table}	1091	1111	$n$ & Time (MIN/40) & Time (MIN/60) & Time (MIN/80) & Time (MIN/100) \\\hline\hline
\renewcommand{\arraystretch}{1}	1092	1112	1 & 0.04~s & 0.01~s & 0.01~s & - \\
	1093	1113	2 & 2.7~s & 2.4~s & 2.4~s & 0.8~s \\
If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)	1094	1114	3 & 4.6~s & 7~s & 7~s & 18~s \\
the real resource consumption decreases as a function of the number of stages in the cascaded	1095	1115	4 & 3~s & 22~s & 70~s & 220~s ($\approx$ 3~min) \\
filter according	1096	1116	5 & 5~s & 122~s & 200~s & 384~s ($\approx$ 5~min) \\\hline
to the solution given by the quadratic solver. Indeed, we have always a decreasing	1097	1117	\end{tabular}
consumption even if the difference between the monolithic and the two cascaded	1098	1118	}
filters is less than expected.	1099	1119	\end{table}
	1100	1120	\renewcommand{\arraystretch}{1}
Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve	1101	1121
the quadratic program.	1102	1122	The time needed to solve this configuration is significantly shorter than the time
	1103	1123	needed in the previous section. Indeed the worst time in this case is only {\color{red}5~minutes,
\renewcommand{\arraystretch}{1.2}	1104	1124	compared to 13~hours} in the previous section: this problem is more easily solved than the
\begin{table}[h!tb]	1105	1125	previous one.
\caption{Time to solve the quadratic program with Gurobi}	1106	1126
\label{tbl:area_time_comp}	1107
\centering	1108	1127	To conclude, we compare our monolithic filters with the FIR Compiler provided by
{\scalefont{0.90}\color{red}	1109	1128	Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the
\begin{tabular}{\|c\|c\|c\|c\|c\|}\hline	1110	1129	same coefficient set and we compare the resource consumption, having checked that
$n$ & Time (MIN/40) & Time (MIN/60) & Time (MIN/80) & Time (MIN/100) \\\hline\hline	1111	1130	the transfer functions are indeed the same with both implementations.
1 & 0.04~s & 0.01~s & 0.01~s & - \\	1112	1131	Table~\ref{tbl:xilinx_resources} exhibits the results.
2 & 2.7~s & 2.4~s & 2.4~s & 0.8~s \\	1113	1132	The FIR Compiler never uses BRAM while our filter implementation uses one block. This difference
3 & 4.6~s & 7~s & 7~s & 18~s \\	1114	1133	is explained be our wish to have a dynamically reconfigurable FIR filter whose
4 & 3~s & 22~s & 70~s & 220~s ($\approx$ 3~min) \\	1115	1134	coefficients can be updated from the processing system without having to update the FPGA design.
5 & 5~s & 122~s & 200~s & 384~s ($\approx$ 5~min) \\\hline	1116	1135	With the FIR compiler, the coefficients are defined during the FPGA design so that
\end{tabular}	1117	1136	changing coefficients required generating a new design. The difference with the LUT consumption
}	1118	1137	is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest
\end{table}	1119	1138	resource, is the same between the Xilinx FIR Compiler end
\renewcommand{\arraystretch}{1}	1120	1139	our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation.
	1121	1140
The time needed to solve this configuration is significantly shorter than the time	1122	1141	\renewcommand{\arraystretch}{1.2}
needed in the previous section. Indeed the worst time in this case is only {\color{red}5~minutes,	1123	1142	\begin{table}
compared to 13~hours} in the previous section: this problem is more easily solved than the	1124	1143	\centering
previous one.	1125	1144	\caption{Resource consumption compared between the FIR Compiler from Xilinx and our FIR block}
	1126	1145	\label{tbl:xilinx_resources}
To conclude, we compare our monolithic filters with the FIR Compiler provided by	1127	1146	\begin{tabular}{\|c\|c\|c\|c\|c\|c\|c\|}
Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the	1128	1147	\hline
same coefficient set and we compare the resource consumption, having checked that	1129	1148	\multirow{2}{*}{} & \multicolumn{3}{c\|}{Xilinx} & \multicolumn{3}{c\|}{Our FIR block} \\ \cline{2-7}
the transfer functions are indeed the same with both implementations.	1130	1149	& LUT & BRAM & DSP & LUT & BRAM & DSP \\ \hline
Table~\ref{tbl:xilinx_resources} exhibits the results.	1131	1150	MAX/500 & 177 & 0 & 21 & 249 & 1 & 21 \\ \hline
The FIR Compiler never uses BRAM while our filter implementation uses one block. This difference	1132	1151	MAX/1000 & 306 & 0 & 37 & 453 & 1 & 37 \\ \hline
is explained be our wish to have a dynamically reconfigurable FIR filter whose	1133	1152	MAX/1500 & 418 & 0 & 47 & 627 & 1 & 47 \\ \hline
coefficients can be updated from the processing system without having to update the FPGA design.	1134	1153	MIN/40 & 225 & 0 & 27 & 347 & 1 & 27 \\ \hline
With the FIR compiler, the coefficients are defined during the FPGA design so that	1135	1154	MIN/60 & 322 & 0 & 39 & 334 & 1 & 39 \\ \hline
changing coefficients required generating a new design. The difference with the LUT consumption	1136	1155	MIN/80 & 482 & 0 & 55 & 772 & 1 & 55 \\ \hline
is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest	1137	1156	\end{tabular}
resource, is the same between the Xilinx FIR Compiler end	1138	1157	\end{table}
our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation.	1139	1158	\renewcommand{\arraystretch}{1}
	1140
\renewcommand{\arraystretch}{1.2}	1141	1159
\begin{table}	1142	1160	\section{Conclusion}
\centering	1143	1161
\caption{Resource consumption compared between the FIR Compiler from Xilinx and our FIR block}	1144	1162	We have proposed a new approach to optimize a set of signal processing blocks whose performances
\label{tbl:xilinx_resources}	1145	1163	and resource consumption has been tabulated, and applied this methodology to the practical
\begin{tabular}{\|c\|c\|c\|c\|c\|c\|c\|}	1146	1164	case of implementing cascaded FIR filters inside a FPGA.
\hline	1147	1165	This method aims to be hardware independent and focuses an a high-level of abstraction.
\multirow{2}{*}{} & \multicolumn{3}{c\|}{Xilinx} & \multicolumn{3}{c\|}{Our FIR block} \\ \cline{2-7}	1148	1166	We have modeled the FIR filter operation and the impact of data shift. Thanks to this model,
& LUT & BRAM & DSP & LUT & BRAM & DSP \\ \hline	1149	1167	we have created a quadratic program to select the optimal FIR taps to reach a targeted
MAX/500 & 177 & 0 & 21 & 249 & 1 & 21 \\ \hline	1150	1168	rejection. Individual filter taps have been identified using commonly available tools and the
MAX/1000 & 306 & 0 & 37 & 453 & 1 & 37 \\ \hline	1151	1169	emphasis is on FIR assembly rather than individual FIR coefficient identification.
MAX/1500 & 418 & 0 & 47 & 627 & 1 & 47 \\ \hline	1152	1170
MIN/40 & 225 & 0 & 27 & 347 & 1 & 27 \\ \hline	1153	1171	Our experimental results are very promising in providing a rational approach to selecting
MIN/60 & 322 & 0 & 39 & 334 & 1 & 39 \\ \hline	1154	1172	the coefficients of each FIR filter in the context of a performance target for a chain of
MIN/80 & 482 & 0 & 55 & 772 & 1 & 55 \\ \hline	1155	1173	such filters. The FPGA design that is produced automatically by the proposed
\end{tabular}	1156	1174	workflow is able to filter an input signal as expected, validating experimentally our model and our approach.
\end{table}	1157	1175	The quadratic program can be adapted it to an other problem based on assembling skeleton blocks.
\renewcommand{\arraystretch}{1}	1158	1176
	1159	1177	A perspective is to model and add the decimators to the processing chain to have a classical
\section{Conclusion}	1160	1178	FIR filter and decimator. The impact of the decimator is not trivial, especially in terms of silicon
	1161	1179	area usage for subsequent stages since some hardware optimization can be applied in
We have proposed a new approach to optimize a set of signal processing blocks whose performances	1162	1180	this case.
and resource consumption has been tabulated, and applied this methodology to the practical	1163	1181
case of implementing cascaded FIR filters inside a FPGA.	1164	1182	The software used to demonstrate the concepts developed in this paper is based on the
This method aims to be hardware independent and focuses an a high-level of abstraction.	1165	1183	CPU-FPGA co-design framework available at \url{https://github.com/oscimp/oscimpDigital}.
We have modeled the FIR filter operation and the impact of data shift. Thanks to this model,	1166	1184
we have created a quadratic program to select the optimal FIR taps to reach a targeted	1167	1185	\section*{Acknowledgement}
rejection. Individual filter taps have been identified using commonly available tools and the	1168	1186
emphasis is on FIR assembly rather than individual FIR coefficient identification.	1169	1187	This work is supported by the ANR Programme d'Investissement d'Avenir in
	1170	1188	progress at the Time and Frequency Departments of the FEMTO-ST Institute
Our experimental results are very promising in providing a rational approach to selecting	1171	1189	(Oscillator IMP, First-TF and Refimeve+), and by R\'egion de Franche-Comt\'e.
the coefficients of each FIR filter in the context of a performance target for a chain of	1172	1190	The authors would like to thank E. Rubiola, F. Vernotte, and G. Cabodevila
such filters. The FPGA design that is produced automatically by the proposed	1173	1191	for support and fruitful discussions.
workflow is able to filter an input signal as expected, validating experimentally our model and our approach.	1174	1192
The quadratic program can be adapted it to an other problem based on assembling skeleton blocks.	1175	1193	\bibliographystyle{IEEEtran}
	1176	1194	\balance
A perspective is to model and add the decimators to the processing chain to have a classical	1177	1195	\bibliography{references,biblio}
FIR filter and decimator. The impact of the decimator is not trivial, especially in terms of silicon	1178	1196	\end{document}
area usage for subsequent stages since some hardware optimization can be applied in	1179	1197
this case.	1180
	1181
The software used to demonstrate the concepts developed in this paper is based on the	1182
CPU-FPGA co-design framework available at \url{https://github.com/oscimp/oscimpDigital}.	1183
	1184
\section*{Acknowledgement}	1185
	1186
This work is supported by the ANR Programme d'Investissement d'Avenir in	1187
progress at the Time and Frequency Departments of the FEMTO-ST Institute	1188
(Oscillator IMP, First-TF and Refimeve+), and by R\'egion de Franche-Comt\'e.	1189
The authors would like to thank E. Rubiola, F. Vernotte, and G. Cabodevila	1190

ifcs2018_journal_reponse2.tex

Diff comments View file @ dda4cf0

File was created		1	% MANUSCRIPT NO. TUFFC-09469-2019.R1
		2	% MANUSCRIPT TYPE: Papers
		3	% TITLE: Filter optimization for real time digital processing of radiofrequency signals: application to oscillator metrology
		4	% AUTHOR(S): HUGEAT, Arthur; BERNARD, Julien; Goavec-Mérou, Gwenhaël; Bourgeois, Pierre-Yves; Friedt, Jean-Michel
		5
		6	\documentclass[a4paper]{article}
		7	\usepackage[english]{babel}
		8	\usepackage{fullpage,graphicx,amsmath, subcaption}
		9	\begin{document}
		10	\begin{center}
		11	{\bf\Large
		12	Rebuttal letter to the review #2 of the manuscript entitled
		13
		14	``Filter optimization for real time digital processing of radiofrequency
		15	signals: application to oscillator metrology''
		16	}
		17
		18	by A. Hugeat \& al.
		19	\end{center}
		20
		21	%
		22	% REVIEWERS' COMMENTS:
		23	% Reviewer: 1
		24	%
		25	% Comments to the Author
		26	% The Authors have implemented all Reviewers’ remarks except the one related to the criterion that, in my opinion, is the most important one. By considering ``the minimal rejection within the stopband, to which the sum of the absolute values within the passband is subtracted to avoid filters with excessive ripples, normalized to the bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)'' (please, find a way to state criterions more clearly), the Authors get filters with very different behaviors in pass band and, consequently, their comparison loses its meaning.
		27	% In practice, the Authors use a good method based on a bad criterion, and this point weakens a lot the results they present.
		28	% In phase noise metrology, the target is an uncertainty of 1 dB, even less. In this regard, I would personally use a maximum ripple in pass band of 1 dB (or less), while, in some cases, the filters presented in the Manuscript exceed 10 dB of ripple, which is definitely too much.
		29	% The Authors seem to be reactive in redoing the measures and it does not seem a big problem for them to re-run the analysis with a better criterion. The article would gain a lot, because, in addition to the methodology, the reader could understand if it is actually better to put a cascade of small filters rather than a single large filter that is an interesting point.
		30	% To help the Authors in finding a better criterion (``…finding a better criterion to avoid the ripples in the passband is challenging...''), in addition to the minimum rejection in stop band, I suggest to specify also the maximum ripple in pass band as it is done, for example, in fig. 4.10, pg. 146 of Crochierie R. E. and Rabiner L. R. (1983) ``Multirate Digital Signal Processing'', Prentice-Hall (see attach). This suggestion, in practice, specify the maximum allowed deviation from the transfer function modulus of an ideal filter: 1 in pass band and 0 in stop band. As a result, it should solve one of the Authors’ concerns: ``Selecting a strong constraint such as the sum of absolute values in the passband is too selective because it considers all frequency bins in the passband while the stopband criterion is limited to a single bin at which rejection is poorest…'' since both pass and stop bands are considered in the same way.
		31	%
		32	% I understand that the Manuscript is devoted to present a methodology (``In this article we focus on the methodology, so even if our criterion could be improved, our methodology still remains and works independently of rejection criterion.''). Please, remember that a methodology is a solution to a class of problems and the example chosen to present the methodology plays a key role in showing to the reader if the method is valid or not. Here the example problem is represented by the synthesis of a decimation filter to be used in phase noise metrology. Many of the filters presented by the Authors in figures 9 and 10 as the output of this methodology are not suitable to be used in this context, since, for example, some of them have an attenuation as high as 50 dB in DC (!) that poses severe problems in interpreting the phase noise power spectral densities. What is the cause of this fail? The methodology or the criterion?
		33
		34	{\bf
		35	In my opinion, it is mandatory to correct the criterion and to re-run the analysis for checking if the methodology works properly or not.
		36	In the end, I suggest to publish the Manuscript After Minor Revisions.
		37	}
		38
		39	We have change our criterion to be more selective in passband. Now, when the filter response
		40	exceed 1~dB in the passband, we discard the filter. We have re-run all experimentation
		41	and we have updated the dataset and our conclusion. The methodology provide the
		42	same results but since we have less filters we found the optimal solution earlier.
		43	Our argumentation about the needed time to compute the optimal solution is not so
		44	valid anymore since we need less time but we can also see that for biggest cases
		45	we need more time.
		46
		47	\end{document}
% MANUSCRIPT NO. TUFFC-09469-2019.R1	1	48
% MANUSCRIPT TYPE: Papers	2
% TITLE: Filter optimization for real time digital processing of radiofrequency signals: application to oscillator metrology	3
% AUTHOR(S): HUGEAT, Arthur; BERNARD, Julien; Goavec-Mérou, Gwenhaël; Bourgeois, Pierre-Yves; Friedt, Jean-Michel	4
%	5
% REVIEWERS' COMMENTS:	6
% Reviewer: 1	7
%	8
% Comments to the Author	9
% The Authors have implemented all Reviewers’ remarks except the one related to the criterion that, in my opinion, is the most important one. By considering “the minimal rejection within the stopband, to which the sum of the absolute values within the passband is subtracted to avoid filters with excessive ripples, normalized to the bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)” (please, find a way to state criterions more clearly), the Authors get filters with very different behaviors in pass band and, consequently, their comparison loses its meaning.	10
% In practice, the Authors use a good method based on a bad criterion, and this point weakens a lot the results they present.	11
% In phase noise metrology, the target is an uncertainty of 1 dB, even less. In this regard, I would personally use a maximum ripple in pass band of 1 dB (or less), while, in some cases, the filters presented in the Manuscript exceed 10 dB of ripple, which is definitely too much.	12
% The Authors seem to be reactive in redoing the measures and it does not seem a big problem for them to re-run the analysis with a better criterion. The article would gain a lot, because, in addition to the methodology, the reader could understand if it is actually better to put a cascade of small filters rather than a single large filter that is an interesting point.	13
% To help the Authors in finding a better criterion (“…finding a better criterion to avoid the ripples in the passband is challenging...”), in addition to the minimum rejection in stop band, I suggest to specify also the maximum ripple in pass band as it is done, for example, in fig. 4.10, pg. 146 of Crochierie R. E. and Rabiner L. R. (1983) “Multirate Digital Signal Processing”, Prentice-Hall (see attach). This suggestion, in practice, specify the maximum allowed deviation from the transfer function modulus of an ideal filter: 1 in pass band and 0 in stop band. As a result, it should solve one of the Authors’ concerns: “Selecting a strong constraint such as the sum of absolute values in the passband is too selective because it considers all frequency bins in the passband while the stopband criterion is limited to a single bin at which rejection is poorest…” since both pass and stop bands are considered in the same way.	14
% I understand that the Manuscript is devoted to present a methodology (“In this article we focus on the methodology, so even if our criterion could be improved, our methodology still remains and works independently of rejection criterion.”). Please, remember that a methodology is a solution to a class of problems and the example chosen to present the methodology plays a key role in showing to the reader if the method is valid or not. Here the example problem is represented by the synthesis of a decimation filter to be used in phase noise metrology. Many of the filters presented by the Authors in figures 9 and 10 as the output of this methodology are not suitable to be used in this context, since, for example, some of them have an attenuation as high as 50 dB in DC (!) that poses severe problems in interpreting the phase noise power spectral densities. What is the cause of this fail? The methodology or the criterion?	15
% In my opinion, it is mandatory to correct the criterion and to re-run the analysis for checking if the methodology works properly or not.	16
% In the end, I suggest to publish the Manuscript After Minor Revisions.	17