jfriedt / IFCS2018 article

% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee

1

% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee

% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de

2

% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de

% rejection par bit et perte si moins de bits que rejection/6

3

% rejection par bit et perte si moins de bits que rejection/6

% developper programme lineaire en incluant le decalage de bits

4

% developper programme lineaire en incluant le decalage de bits

% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on

5

% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on

% implemente et on demontre que ca tourne

6

% implemente et on demontre que ca tourne

% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?

7

% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?

% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer

8

% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer

% (zedboard ou redpit)

9

% (zedboard ou redpit)

10

% label schema : verifier que "argumenter de la cascade de FIR" est fait

11

% label schema : verifier que "argumenter de la cascade de FIR" est fait

12

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

13

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

\usepackage{graphicx,color,hyperref}

14

\usepackage{graphicx,color,hyperref}

\usepackage{amsfonts}

15

\usepackage{amsfonts}

\usepackage{amsthm}

16

\usepackage{amsthm}

\usepackage{amssymb}

17

\usepackage{amssymb}

\usepackage{amsmath}

18

\usepackage{amsmath}

\usepackage{algorithm2e}

19

\usepackage{algorithm2e}

\usepackage{url,balance}

20

\usepackage{url,balance}

\usepackage[normalem]{ulem}

21

\usepackage[normalem]{ulem}

\usepackage{tikz}

22

\usepackage{tikz}

\usetikzlibrary{positioning,fit}

23

\usetikzlibrary{positioning,fit}

\usepackage{multirow}

24

\usepackage{multirow}

\usepackage{scalefnt}

25

\usepackage{scalefnt}

\usepackage{caption}

26

\usepackage{caption}

\usepackage{subcaption}

27

\usepackage{subcaption}

28

% correct bad hyphenation here

29

% correct bad hyphenation here

\hyphenation{op-tical net-works semi-conduc-tor}

30

\hyphenation{op-tical net-works semi-conduc-tor}

\textheight=26cm

31

\textheight=26cm

\setlength{\footskip}{30pt}

32

\setlength{\footskip}{30pt}

\pagenumbering{gobble}

33

\pagenumbering{gobble}

\begin{document}

34

\begin{document}

\title{Filter optimization for real time digital processing of radiofrequency signals: application

35

\title{Filter optimization for real time digital processing of radiofrequency signals: application

to oscillator metrology}

36

to oscillator metrology}

37

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

38

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

G. Goavec-M\'erou\IEEEauthorrefmark{1},

39

G. Goavec-M\'erou\IEEEauthorrefmark{1},

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

40

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

41

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

42

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

Email: \{pyb2,jmfriedt\}@femto-st.fr}

43

Email: \{pyb2,jmfriedt\}@femto-st.fr}

}

44

}

\maketitle

45

\maketitle

\thispagestyle{plain}

46

\thispagestyle{plain}

\pagestyle{plain}

47

\pagestyle{plain}

\newtheorem{definition}{Definition}

48

\newtheorem{definition}{Definition}

49

\begin{abstract}

50

\begin{abstract}

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

51

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

radiofrequency signal processing. Applied to oscillator characterization in the context

52

radiofrequency signal processing. Applied to oscillator characterization in the context

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

53

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

noise rejection needs. Since real time radiofrequency processing must be performed in a

54

noise rejection needs. Since real time radiofrequency processing must be performed in a

Field Programmable Array to meet timing constraints, we investigate optimization strategies

55

Field Programmable Array to meet timing constraints, we investigate optimization strategies

to design filters meeting rejection characteristics while limiting the hardware resources

56

to design filters meeting rejection characteristics while limiting the hardware resources

required and keeping timing constraints within the targeted measurement bandwidths. The

57

required and keeping timing constraints within the targeted measurement bandwidths. The

presented technique is applicable to scheduling any sequence of processing blocks characterized

58

presented technique is applicable to scheduling any sequence of processing blocks characterized

by a throughput, resource occupation and performance tabulated as a function of configuration

59

by a throughput, resource occupation and performance tabulated as a function of configuration

characateristics, as is the case for filters with their coefficients and resolution yielding

60

characateristics, as is the case for filters with their coefficients and resolution yielding

rejection and number of multipliers.

61

rejection and number of multipliers.

\end{abstract}

62

\end{abstract}

63

\begin{IEEEkeywords}

64

\begin{IEEEkeywords}

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

65

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

\end{IEEEkeywords}

66

\end{IEEEkeywords}

67

\section{Digital signal processing of ultrastable clock signals}

68

\section{Digital signal processing of ultrastable clock signals}

69

Analog oscillator phase noise characteristics are classically performed by downconverting

70

Analog oscillator phase noise characteristics are classically performed by downconverting

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

71

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

72

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

73

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

74

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

75

\begin{figure}[h!tb]

76

\begin{figure}[h!tb]

\begin{center}

77

\begin{center}

\includegraphics[width=.8\linewidth]{images/schema}

78

\includegraphics[width=.8\linewidth]{images/schema}

\end{center}

79

\end{center}

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

80

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

81

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

82

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

83

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

84

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

the spectral characteristics of the phase fluctuations.}

85

the spectral characteristics of the phase fluctuations.}

\label{schema}

86

\label{schema}

\end{figure}

87

\end{figure}

88

As with the analog mixer,

89

As with the analog mixer,

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

90

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

well as the generation of the frequency sum signal in addition to the frequency difference.

91

well as the generation of the frequency sum signal in addition to the frequency difference.

These unwanted spectral characteristics must be rejected before decimating the data stream

92

These unwanted spectral characteristics must be rejected before decimating the data stream

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

93

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

downconverter

94

downconverter

and the decimation processing blocks are core characteristics of an oscillator characterization

95

and the decimation processing blocks are core characteristics of an oscillator characterization

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

96

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

97

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

98

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

datastream: optimizing the performance of the filter while reducing the needed resources is

99

datastream: optimizing the performance of the filter while reducing the needed resources is

hence tackled in a systematic approach using optimization techniques. Most significantly, we

100

hence tackled in a systematic approach using optimization techniques. Most significantly, we

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

101

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

tunable number of coefficients and tunable number of bits representing the coefficients and the

102

tunable number of coefficients and tunable number of bits representing the coefficients and the

data being processed.

103

data being processed.

104

\section{Finite impulse response filter}

105

\section{Finite impulse response filter}

106

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

107

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

108

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

outputs $y_k$

109

outputs $y_k$

\begin{align}

110

\begin{align}

y_n=\sum_{k=0}^N b_k x_{n-k}

111

y_n=\sum_{k=0}^N b_k x_{n-k}

\label{eq:fir_equation}

112

\label{eq:fir_equation}

\end{align}

113

\end{align}

114

As opposed to an implementation on a general purpose processor in which word size is defined by the

115

As opposed to an implementation on a general purpose processor in which word size is defined by the

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

116

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

not only the coefficient values and number of taps must be defined, but also the number of bits

117

not only the coefficient values and number of taps must be defined, but also the number of bits

defining the coefficients and the sample size. For this reason, and because we consider pipeline

118

defining the coefficients and the sample size. For this reason, and because we consider pipeline

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

119

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

120

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

121

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

(VHDL) level.

122

(VHDL) level.

{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument,

123

{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument,

the large

124

the large

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

125

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

is not considered as an issue as would be in a closed loop system.} % r2.4

126

is not considered as an issue as would be in a closed loop system.} % r2.4

127

The coefficients are classically expressed as floating point values. However, this binary

128

The coefficients are classically expressed as floating point values. However, this binary

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

129

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

we select to quantify these floating point values into integer values. This quantization

130

we select to quantify these floating point values into integer values. This quantization

will result in some precision loss.

131

will result in some precision loss.

132

\begin{figure}[h!tb]

133

\begin{figure}[h!tb]

\includegraphics[width=\linewidth]{images/zero_values}

134

\includegraphics[width=\linewidth]{images/zero_values}

\caption{Impact of the quantization resolution of the coefficients: the quantization is

135

\caption{Impact of the quantization resolution of the coefficients: the quantization is

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

136

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

the 30~first and 30~last coefficients out of the initial 128~band-pass

137

the 30~first and 30~last coefficients out of the initial 128~band-pass

filter coefficients to 0 (red dots).}

138

filter coefficients to 0 (red dots).}

\label{float_vs_int}

139

\label{float_vs_int}

\end{figure}

140

\end{figure}

141

The tradeoff between quantization resolution and number of coefficients when considering

142

The tradeoff between quantization resolution and number of coefficients when considering

integer operations is not trivial. As an illustration of the issue related to the

143

integer operations is not trivial. As an illustration of the issue related to the

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

144

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

145

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

146

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

taps become null, {\color{red}making the large number of coefficients irrelevant: processing

147

taps become null, {\color{red}making the large number of coefficients irrelevant: processing

resources % r1.1

148

resources % r1.1

are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources

149

are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources

to reach a given rejection level, or maximizing out of band rejection for a given computational

150

to reach a given rejection level, or maximizing out of band rejection for a given computational

resource, will drive the investigation on cascading filters designed with varying tap resolution

151

resource, will drive the investigation on cascading filters designed with varying tap resolution

and tap length, as will be shown in the next section. Indeed, our development strategy closely

152

and tap length, as will be shown in the next section. Indeed, our development strategy closely

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

153

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

in which basic blocks are defined and characterized before being assembled \cite{hide}

154

in which basic blocks are defined and characterized before being assembled \cite{hide}

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

155

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

combination process since we assume a single value to be processed and a single value to be

156

combination process since we assume a single value to be processed and a single value to be

generated at each clock cycle. The FIR filters will not be considered to decimate in the

157

generated at each clock cycle. The FIR filters will not be considered to decimate in the

current implementation: the decimation is assumed to be located after the FIR cascade at the

158

current implementation: the decimation is assumed to be located after the FIR cascade at the

moment.

159

moment.

160

\section{Methodology description}

161

\section{Methodology description}

162

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

163

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

164

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

Achieving such a target requires defining an abstract model to represent some basic properties

165

Achieving such a target requires defining an abstract model to represent some basic properties

of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and

166

of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and

resource occupation. These abstract properties, not necessarily related to the detailed hardware

167

resource occupation. These abstract properties, not necessarily related to the detailed hardware

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

168

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

169

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

minimizing resource occupation for a given perfomance. In our approach, the solution of the

170

minimizing resource occupation for a given perfomance. In our approach, the solution of the

solver is then synthesized using the dedicated tool provided by each platform manufacturer

171

solver is then synthesized using the dedicated tool provided by each platform manufacturer

to assess the validity of our abstract resource occupation indicator, and the result of running

172

to assess the validity of our abstract resource occupation indicator, and the result of running

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

173

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

that all solutions found by the solver are synthesized and executed on hardware at the end

174

that all solutions found by the solver are synthesized and executed on hardware at the end

of the analysis.

175

of the analysis.

176

In this demonstration, we focus on only two operations: filtering and shifting the number of

177

In this demonstration, we focus on only two operations: filtering and shifting the number of

bits needed to represent the data along the processing chain.

178

bits needed to represent the data along the processing chain.

We have chosen these basic operations because shifting and the filtering have already been studied

179

We have chosen these basic operations because shifting and the filtering have already been studied

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

180

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

181

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

requiring pipelined processing at full bandwidth for the earliest steps, including for

182

requiring pipelined processing at full bandwidth for the earliest steps, including for

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

183

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

184

Addressing only two operations allows for demonstrating the methodology but should not be

185

Addressing only two operations allows for demonstrating the methodology but should not be

considered as a limitation of the framework which can be extended to assembling any number

186

considered as a limitation of the framework which can be extended to assembling any number

of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red}

187

of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red}

Hence,

188

Hence,

in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2

189

in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

190

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been

191

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

192

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

193

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

194

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

allowing to assess either filter rejection for a given resource usage, or validating the rejection

195

allowing to assess either filter rejection for a given resource usage, or validating the rejection

when implementing a solution minimizing resource occupation.

196

when implementing a solution minimizing resource occupation.

197

{\color{red}

198

{\color{red}

The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3

199

The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3

the filtering part of the signal processing chain, we have not included the PRN generator or the

200

the filtering part of the signal processing chain, we have not included the PRN generator or the

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

201

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

The filtering can be done in two ways, either by considering a single monolithic FIR filter

202

The filtering can be done in two ways, either by considering a single monolithic FIR filter

requiring many coefficients to reach the targeted noise rejection ratio, or by

203

requiring many coefficients to reach the targeted noise rejection ratio, or by

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.}

204

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.}

205

After each filter we leave the possibility of shifting the filtered data to consume

206

After each filter we leave the possibility of shifting the filtered data to consume

less resources. Hence in the case of cascaded filter, we define a stage as a filter

207

less resources. Hence in the case of cascaded filter, we define a stage as a filter

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

208

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

209

\subsection{Model of a FIR filter}

210

\subsection{Model of a FIR filter}

211

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

212

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

213

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

214

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

215

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

shows a filtering stage.

216

shows a filtering stage.

217

\begin{figure}

218

\begin{figure}

\centering

219

\centering

\begin{tikzpicture}[node distance=2cm]

220

\begin{tikzpicture}[node distance=2cm]

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

221

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

222

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

\node (Start) [left of=FIR] { } ;

223

\node (Start) [left of=FIR] { } ;

\node (End) [right of=Shift] { } ;

224

\node (End) [right of=Shift] { } ;

225

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

226

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

227

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

228

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

\draw[->] (FIR) -- (Shift) ;

229

\draw[->] (FIR) -- (Shift) ;

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

230

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

\end{tikzpicture}

231

\end{tikzpicture}

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

232

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

\label{fig:fir_stage}

233

\label{fig:fir_stage}

\end{figure}

234

\end{figure}

235

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

236

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

This rejection has been computed using GNU Octave software FIR coefficient design functions

237

This rejection has been computed using GNU Octave software FIR coefficient design functions

(\texttt{firls} and \texttt{fir1}).

238

(\texttt{firls} and \texttt{fir1}).

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

239

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

240

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

241

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

242

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

243

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

244

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

transfer function.

245

transfer function.

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

246

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

247

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration,

248

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration,

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

249

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

250

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

251

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid}

252

shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid}

as described below is indeed unique for each filter shape.}

253

as described below is indeed unique for each filter shape.}

254

\begin{figure}

255

\begin{figure}

\begin{center}

256

\begin{center}

\scalebox{0.8}{

257

\scalebox{0.8}{

\centering

258

\centering

\begin{tikzpicture}[scale=0.3]

259

\begin{tikzpicture}[scale=0.3]

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

260

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

261

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

262

\draw (0,14) node [left] { $P$ } ;

263

\draw (0,14) node [left] { $P$ } ;

\draw (20,0) node [below] { $f$ } ;

264

\draw (20,0) node [below] { $f$ } ;

265

\draw[>=latex,<->] (0,14) -- (8,14) ;

266

\draw[>=latex,<->] (0,14) -- (8,14) ;

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

267

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

268

\draw[>=latex,<->] (8,14) -- (12,14) ;

269

\draw[>=latex,<->] (8,14) -- (12,14) ;

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

270

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

271

\draw[>=latex,<->] (12,14) -- (20,14) ;

272

\draw[>=latex,<->] (12,14) -- (20,14) ;

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

273

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

274

\draw[>=latex,<->] (16,12) -- (16,8) ;

275

\draw[>=latex,<->] (16,12) -- (16,8) ;

\draw (16,10) node [right] { rejection } ;

276

\draw (16,10) node [right] { rejection } ;

277

\draw[dashed] (8,-1) -- (8,14) ;

278

\draw[dashed] (8,-1) -- (8,14) ;

\draw[dashed] (12,-1) -- (12,14) ;

279

\draw[dashed] (12,-1) -- (12,14) ;

280

\draw[dashed] (8,12) -- (16,12) ;

281

\draw[dashed] (8,12) -- (16,12) ;

\draw[dashed] (12,8) -- (16,8) ;

282

\draw[dashed] (12,8) -- (16,8) ;

283

\end{tikzpicture}

284

\end{tikzpicture}

}

285

}

\end{center}

286

\end{center}

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

287

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

288

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

the stopband the last 40\%, allowing 20\% transition width.}

289

the stopband the last 40\%, allowing 20\% transition width.}

\label{fig:fir_mag}

290

\label{fig:fir_mag}

\end{figure}

291

\end{figure}

292

In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics.

293

In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics.

% r2.7

294

% r2.7

% Our initial criterion considered the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion

295

% Our initial criterion considered the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion

% yields unacceptable results since notches overestimate the rejection capability of the filter. Furthermore, the losses within

296

% yields unacceptable results since notches overestimate the rejection capability of the filter. Furthermore, the losses within

% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.

297

% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.

Our criterion to compute the filter rejection considers

298

Our criterion to compute the filter rejection considers

% r2.8 et r2.2 r2.3

299

% r2.8 et r2.2 r2.3

the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values

300

the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

301

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this

302

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

303

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

304

% \begin{figure}

305

% \begin{figure}

% \centering

306

% \centering

% \includegraphics[width=\linewidth]{images/colored_mean_criterion}

307

% \includegraphics[width=\linewidth]{images/colored_mean_criterion}

% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}

308

% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}

% \label{fig:mean_criterion}

309

% \label{fig:mean_criterion}

% \end{figure}

310

% \end{figure}

311

\begin{figure}

312

\begin{figure}

\centering

313

\centering

\includegraphics[width=\linewidth]{images/colored_custom_criterion}

314

\includegraphics[width=\linewidth]{images/colored_custom_criterion}

\caption{Custom criterion (maximum rejection in the stopband minus the mean of the absolute value of the passband rejection)

315

\caption{Custom criterion (maximum rejection in the stopband minus the {\color{red} sum of the

316

absolute values of the passband rejection normalized to the bandwidth})

comparison between monolithic filter and cascaded filters}

316

317

comparison between monolithic filter and cascaded filters}

\label{fig:custom_criterion}

317

318

\label{fig:custom_criterion}

\end{figure}

318

319

\end{figure}

319

320

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

320

321

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

321

322

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

322

323

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

323

324

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

324

325

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

325

326

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

the rejection. Hence the best coefficient set are on the vertex of the pyramid.

326

327

the rejection. Hence the best coefficient set are on the vertex of the pyramid.

327

328

\begin{figure}

328

329

\begin{figure}

\centering

329

330

\centering

\includegraphics[width=\linewidth]{images/rejection_pyramid}

330

331

\includegraphics[width=\linewidth]{images/rejection_pyramid}

\caption{Rejection as a function of number of coefficients and number of bits}

331

332

\caption{{\color{red}{Filter}} rejection as a function of number of coefficients and number of bits

333

{\color{red}: this lookup table will be used to identify which filter parameters -- number of bits

334

representing coefficients and number of coefficients -- best match the targeted transfer function.}}

\label{fig:rejection_pyramid}

332

335

\label{fig:rejection_pyramid}

\end{figure}

333

336

\end{figure}

334

337

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

335

338

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

336

339

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

If the FIR filter coefficients are the same between the stages, we have:

337

340

If the FIR filter coefficients are the same between the stages, we have:

$$F_{total} = F_1 + F_2$$

338

341

$$F_{total} = F_1 + F_2$$

But selecting two different sets of coefficient will yield a more complex situation in which

339

342

But selecting two different sets of coefficient will yield a more complex situation in which

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

340

343

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

are two different filters with maximums and notches not located at the same frequency offsets.

341

344

are two different filters with maximums and notches not located at the same frequency offsets.

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

342

345

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

343

346

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

% r2.9

344

347

% r2.9

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

345

348

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

criteria of each filter. However since the this sum underestimates the rejection capability of the cascade,

346

349

criteria of each filter. However since the {\color{red}individual filter rejection} sum underestimates the rejection capability of the cascade,

% r2.10

347

350

% r2.10

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

348

351

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

of the filter cascade to meet design criteria.

349

352

of the filter cascade to meet design criteria.

350

353

\begin{figure}

351

354

\begin{figure}

\centering

352

355

\centering

\includegraphics[width=\linewidth]{images/cascaded_criterion}

353

356

\includegraphics[width=\linewidth]{images/cascaded_criterion}

\caption{Rejection of two cascaded filters}

354

357

\caption{{\color{red}Transfer function of individual filters and after cascading} the two filters,

358

{\color{red}demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal

359

lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop

360

maximum of each individual filter.}

361

}

\label{fig:sum_rejection}

355

362

\label{fig:sum_rejection}

\end{figure}

356

363

\end{figure}

357

364

% r2.6

358

365

% r2.6

{\color{red}

359

366

{\color{red}

Finally in our case, we consider that the input signal are fully known. The

360

367

Finally in our case, we consider that the input signal are fully known. The

resolution of the input data stream are fixed and still the same for all experiments

361

368

resolution of the input data stream are fixed and still the same for all experiments

in this paper.}

362

369

in this paper.}

363

370

Based on this analysis, we address the estimate of resource consumption (called

364

371

Based on this analysis, we address the estimate of resource consumption (called

% r2.11

365

372

% r2.11

silicon area -- in the case of FPGAs this means processing cells) as a function of

366

373

silicon area -- in the case of FPGAs this means processing cells) as a function of

filter characteristics. As a reminder, we do not aim at matching actual hardware

367

374

filter characteristics. As a reminder, we do not aim at matching actual hardware

configuration but consider an arbitrary silicon area occupied by each processing function,

368

375

configuration but consider an arbitrary silicon area occupied by each processing function,

and will assess after synthesis the adequation of this arbitrary unit with actual

369

376

and will assess after synthesis the adequation of this arbitrary unit with actual

hardware resources provided by FPGA manufacturers. The sum of individual processing

370

377

hardware resources provided by FPGA manufacturers. The sum of individual processing

unit areas is constrained by a total silicon area representative of FPGA global resources.

371

378

unit areas is constrained by a total silicon area representative of FPGA global resources.

Formally, variable $a_i$ is the area taken by filter~$i$

372

379

Formally, variable $a_i$ is the area taken by filter~$i$

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

373

380

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

374

381

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

375

382

\begin{align}

376

383

\begin{align}

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

377

384

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

378

385

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

379

386

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

380

387

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

381

388

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

382

389

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

383

390

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

\pi_1^- &= \Pi^I \label{eq:init}

384

391

\pi_1^- &= \Pi^I \label{eq:init}

\end{align}

385

392

\end{align}

386

393

Equation~\ref{eq:area} states that the total area taken by the filters must be

387

394

Equation~\ref{eq:area} states that the total area taken by the filters must be

less than the available area. Equation~\ref{eq:areadef} gives the definition of

388

395

less than the available area. Equation~\ref{eq:areadef} gives the definition of

the area used by a filter, considered as the area of the FIR since the Shifter is

389

396

the area used by a filter, considered as the area of the FIR since the Shifter is

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

390

397

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

391

398

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

392

399

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

393

400

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

previously. The Shifter does not introduce negative rejection as we will explain later,

394

401

previously. The Shifter does not introduce negative rejection as we will explain later,

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

395

402

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

396

403

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

397

404

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

398

405

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

a filter is the same as the input number of bits of the next filter.

399

406

a filter is the same as the input number of bits of the next filter.

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

400

407

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

rejection. Indeed, the results of the FIR can be right shifted without compromising

401

408

rejection. Indeed, the results of the FIR can be right shifted without compromising

the quality of the rejection until a threshold. Each bit of the output data

402

409

the quality of the rejection until a threshold. Each bit of the output data

increases the maximum rejection level by 6~dB. We add one to take the sign bit

403

410

increases the maximum rejection level by 6~dB. We add one to take the sign bit

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

404

411

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

shift too much and introduce some noise in the output data. Each supplementary

405

412

shift too much and introduce some noise in the output data. Each supplementary

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

406

413

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

407

414

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

408

415

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

409

416

{\color{red}

410

417

{\color{red}

This model is non-linear since we multiply some variable with another variable

411

418

This model is non-linear since we multiply some variable with another variable

and it is even non-quadratic, as the cost function $F$ does not have a known

412

419

and it is even non-quadratic, as the cost function $F$ does not have a known

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

413

420

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

This variable $p$ is defined by the user, and represents the number of different

414

421

This variable $p$ is defined by the user, and represents the number of different

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

415

422

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

functions from GNU Octave) based on the targeted filter characteristics and implementation

416

423

functions from GNU Octave) based on the targeted filter characteristics and implementation

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

417

424

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

$\pi_{ij}^C$ become constants and

418

425

$\pi_{ij}^C$ become constants and

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

419

426

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

for each configurations thanks to the rejection criterion. We also define the binary

420

427

for each configurations thanks to the rejection criterion. We also define the binary

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

421

428

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

and 0 otherwise. The new equations are as follows:

422

429

and 0 otherwise. The new equations are as follows:

}

423

430

}

424

431

\begin{align}

425

432

\begin{align}

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

426

433

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

427

434

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

428

435

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

429

436

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

\end{align}

430

437

\end{align}

431

438

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

432

439

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

433

440

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

434

441

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

435

442

{\color{red}

436

443

{\color{red}

However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

437

444

However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

we multiply

438

445

we multiply

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

439

446

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,

440

447

linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,

we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is

441

448

we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is

assumed on hardware characteristics.

442

449

assumed on hardware characteristics.

The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic

443

450

The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic

model is able to linearize the model provided as is. This model

444

451

model is able to linearize the model provided as is. This model

has $O(np)$ variables and $O(n)$ constraints.}

445

452

has $O(np)$ variables and $O(n)$ constraints.}

446

453

% This model is non-linear and even non-quadratic, as $F$ does not have a known

447

454

% This model is non-linear and even non-quadratic, as $F$ does not have a known

% linear or quadratic expression. We introduce $p$ FIR configurations

448

455

% linear or quadratic expression. We introduce $p$ FIR configurations

% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.

449

456

% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.

% % r2.12

450

457

% % r2.12

% This variable must be defined by the user, it represent the number of different

451

458

% This variable must be defined by the user, it represent the number of different

% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}

452

459

% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}

% functions from GNU Octave).

453

460

% functions from GNU Octave).

% We define binary

454

461

% We define binary

% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

455

462

% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

% and 0 otherwise. The new equations are as follows:

456

463

% and 0 otherwise. The new equations are as follows:

%

457

464

%

% \begin{align}

458

465

% \begin{align}

% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

459

466

% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

460

467

% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

461

468

% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

462

469

% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

% \end{align}

463

470

% \end{align}

%

464

471

%

% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

465

472

% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

466

473

% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

467

474

% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

%

468

475

%

% % r2.13

469

476

% % r2.13

% This modified model is quadratic since we multiply two variables in the

470

477

% This modified model is quadratic since we multiply two variables in the

% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.

471

478

% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.

% The Gurobi

472

479

% The Gurobi

% (\url{www.gurobi.com}) optimization software is used to solve this quadratic

473

480

% (\url{www.gurobi.com}) optimization software is used to solve this quadratic

% model, and since Gurobi is able to linearize, the model is left as is. This model

474

481

% model, and since Gurobi is able to linearize, the model is left as is. This model

% has $O(np)$ variables and $O(n)$ constraints.

475

482

% has $O(np)$ variables and $O(n)$ constraints.

476

483

Two problems will be addressed using the workflow described in the next section: on the one

477

484

Two problems will be addressed using the workflow described in the next section: on the one

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

478

485

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

479

486

silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

480

487

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

objective function is replaced with:

481

488

objective function is replaced with:

\begin{align}

482

489

\begin{align}

\text{Minimize } & \sum_{i=1}^n a_i \notag

483

490

\text{Minimize } & \sum_{i=1}^n a_i \notag

\end{align}

484

491

\end{align}

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

485

492

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

486

493

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

rejection required.

487

494

rejection required.

488

495

\begin{align}

489

496

\begin{align}

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

490

497

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

\end{align}

491

498

\end{align}

492

499

\section{Design workflow}

493

500

\section{Design workflow}

\label{sec:workflow}

494

501

\label{sec:workflow}

495

502

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

496

503

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

497

504

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

in the computation of the results.

498

505

in the computation of the results.

499

506

\begin{figure}

500

507

\begin{figure}

\centering

501

508

\centering

\begin{tikzpicture}[node distance=0.75cm and 2cm]

502

509

\begin{tikzpicture}[node distance=0.75cm and 2cm]

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

503

510

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

\node (Start) [left= 3cm of Solver] { } ;

504

511

\node (Start) [left= 3cm of Solver] { } ;

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

505

512

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

\node (Input) [above= of TCL] { } ;

506

513

\node (Input) [above= of TCL] { } ;

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

507

514

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

508

515

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

509

516

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

510

517

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

\node (Results) [left= of Postproc] { } ;

511

518

\node (Results) [left= of Postproc] { } ;

512

519

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

513

520

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

514

521

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

515

522

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

516

523

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

517

524

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

\draw[->,dashed] (Bitstream) -- (Deploy) ;

518

525

\draw[->,dashed] (Bitstream) -- (Deploy) ;

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

519

526

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

520

527

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

521

528

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

\draw[->] (Postproc) -- (Results) ;

522

529

\draw[->] (Postproc) -- (Results) ;

\end{tikzpicture}

523

530

\end{tikzpicture}

\caption{Design workflow from the input parameters to the results}

524

531

\caption{Design workflow from the input parameters to the results {\color{red} allowing for

532

a fully automated optimal solution search.}}

\label{fig:workflow}

525

533

\label{fig:workflow}

\end{figure}

526

534

\end{figure}

527

535

The filter solver is a C++ program that takes as input the maximum area

528

536

The filter solver is a C++ program that takes as input the maximum area

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

529

537

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

530

538

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

531

539

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

532

540

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

and a deploy script ((1b) on figure~\ref{fig:workflow}).

533

541

and a deploy script ((1b) on figure~\ref{fig:workflow}).

534

542

The TCL script describes the whole digital processing chain from the beginning

535

543

The TCL script describes the whole digital processing chain from the beginning

(the raw signal data) to the end (the filtered data) in a language compatible

536

544

(the raw signal data) to the end (the filtered data) in a language compatible

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

537

545

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

538

546

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

539

547

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

Then the script builds each stage of the chain with a generic FIR task that

540

548

Then the script builds each stage of the chain with a generic FIR task that

comes from a skeleton library. The generic FIR is highly configurable

541

549

comes from a skeleton library. The generic FIR is highly configurable

with the number of coefficients and the size of the coefficients. The coefficients

542

550

with the number of coefficients and the size of the coefficients. The coefficients

themselves are not stored in the script.

543

551

themselves are not stored in the script.

As the signal is processed in real-time, the output signal is stored as

544

552

As the signal is processed in real-time, the output signal is stored as

consecutive bursts of data for post-processing, mainly assessing the consistency of the

545

553

consecutive bursts of data for post-processing, mainly assessing the consistency of the

implemented FIR cascade transfer function with the design criteria and the expected

546

554

implemented FIR cascade transfer function with the design criteria and the expected

transfer function.

547

555

transfer function.

548

556

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

549

557

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

550

558

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

551

559

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

552

560

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

provide a broadband noise source.

553

561

provide a broadband noise source.

The board runs the Linux kernel and surrounding environment produced from the

554

562

The board runs the Linux kernel and surrounding environment produced from the

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

555

563

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

556

564

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

fetching the results is automated.

557

565

fetching the results is automated.

558

566

The deploy script uploads the bitstream to the board ((3) on

559

567

The deploy script uploads the bitstream to the board ((3) on

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

560

568

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

configures the coefficients of the FIR filters. It then waits for the results

561

569

configures the coefficients of the FIR filters. It then waits for the results

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

562

570

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

563

571

Finally, an Octave post-processing script computes the final results thanks to

564

572

Finally, an Octave post-processing script computes the final results thanks to

the output data ((5) on figure~\ref{fig:workflow}).

565

573

the output data ((5) on figure~\ref{fig:workflow}).

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

566

574

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

and the different configurations can be compared.

567

575

and the different configurations can be compared.

568

576

\section{Maximizing the rejection at fixed silicon area}

569

577

\section{Maximizing the rejection at fixed silicon area}

\label{sec:fixed_area}

570

578

\label{sec:fixed_area}

This section presents the output of the filter solver {\em i.e.} the computed

571

579

This section presents the output of the filter solver {\em i.e.} the computed

configurations for each stage, the computed rejection and the computed silicon area.

572

580

configurations for each stage, the computed rejection and the computed silicon area.

Such results allow for understanding the choices made by the solver to compute its solutions.

573

581

Such results allow for understanding the choices made by the solver to compute its solutions.

574

582

The experimental setup is composed of three cases. The raw input is generated

575

583

The experimental setup is composed of three cases. The raw input is generated

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

576

584

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

577

585

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

578

586

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$

579

587

The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$

ranging from 2 to 22. In each case, the quadratic program has been able to give a

580

588

ranging from 2 to 22. In each case, the quadratic program has been able to give a

result up to five stages ($n = 5$) in the cascaded filter.

581

589

result up to five stages ($n = 5$) in the cascaded filter.

582

590

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

583

591

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

584

592

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

585

593

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

586

594

\renewcommand{\arraystretch}{1.4}

587

595

\renewcommand{\arraystretch}{1.4}

588

596

\begin{table}

589

597

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

590

598

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

\label{tbl:gurobi_max_500}

591

599

\label{tbl:gurobi_max_500}

\centering

592

600

\centering

{\scalefont{0.77}

593

601

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

594

602

\begin{tabular}{|c|ccccc|c|c|}

\hline

595

603

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

596

604

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

597

605

\hline

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

598

606

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\

599

607

2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\

3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\

600

608

3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\

4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\

601

609

4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\

5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\

602

610

5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\

\hline

603

611

\hline

\end{tabular}

604

612

\end{tabular}

}

605

613

}

\end{table}

606

614

\end{table}

607

615

\begin{table}

608

616

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

609

617

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

\label{tbl:gurobi_max_1000}

610

618

\label{tbl:gurobi_max_1000}

\centering

611

619

\centering

{\scalefont{0.77}

612

620

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

613

621

\begin{tabular}{|c|ccccc|c|c|}

\hline

614

622

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

615

623

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

616

624

\hline

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

617

625

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\

618

626

2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\

3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\

619

627

3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\

4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\

620

628

4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\

5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\

621

629

5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\

\hline

622

630

\hline

\end{tabular}

623

631

\end{tabular}

}

624

632

}

\end{table}

625

633

\end{table}

626

634

\begin{table}

627

635

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

628

636

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

\label{tbl:gurobi_max_1500}

629

637

\label{tbl:gurobi_max_1500}

\centering

630

638

\centering

{\scalefont{0.77}

631

639

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

632

640

\begin{tabular}{|c|ccccc|c|c|}

\hline

633

641

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

634

642

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

635

643

\hline

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

636

644

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\

637

645

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\

3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\

638

646

3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\

4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\

639

647

4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\

5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\

640

648

5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\

\hline

641

649

\hline

\end{tabular}

642

650

\end{tabular}

}

643

651

}

\end{table}

644

652

\end{table}

645

653

\renewcommand{\arraystretch}{1}

646

654

\renewcommand{\arraystretch}{1}

647

655

From these tables, we can first state that the more stages are used to define

648

656

From these tables, we can first state that the more stages are used to define

the cascaded FIR filters, the better the rejection. It was an expected result as it has

649

657

the cascaded FIR filters, the better the rejection. It was an expected result as it has

been previously observed that many small filters are better than

650

658

been previously observed that many small filters are better than

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

651

659

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

being hardly used in practice due to the lack of tools for identifying individual filter

652

660

being hardly used in practice due to the lack of tools for identifying individual filter

coefficients in the cascaded approach.

653

661

coefficients in the cascaded approach.

654

662

Second, the larger the silicon area, the better the rejection. This was also an

655

663

Second, the larger the silicon area, the better the rejection. This was also an

expected result as more area means a filter of better quality with more coefficients

656

664

expected result as more area means a filter of better quality with more coefficients

or more bits per coefficient.

657

665

or more bits per coefficient.

658

666

Then, we also observe that the first stage can have a larger shift than the other

659

667

Then, we also observe that the first stage can have a larger shift than the other

stages. This is explained by the fact that the solver tries to use just enough

660

668

stages. This is explained by the fact that the solver tries to use just enough

bits for the computed rejection after each stage. In the first stage, a

661

669

bits for the computed rejection after each stage. In the first stage, a

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

662

670

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

gives the relation between both values.

663

671

gives the relation between both values.

664

672

Finally, we note that the solver consumes all the given silicon area.

665

673

Finally, we note that the solver consumes all the given silicon area.

666

674

The following graphs present the rejection for real data on the FPGA. In all the following

667

675

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

668

676

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line are the noise levels

669

677

data on the FPGA as measured experimentally and the dashed line are the noise levels

given by the quadratic solver. The configurations are those computed in the previous section.

670

678

given by the quadratic solver. The configurations are those computed in the previous section.

671

679

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

672

680

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

673

681

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

674

682

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

675

683

% \begin{figure}

676

684

% \begin{figure}

% \centering

677

685

% \centering

% \includegraphics[width=\linewidth]{images/max_500}

678

686

% \includegraphics[width=\linewidth]{images/max_500}

% \caption{Signal spectrum for MAX/500}

679

687

% \caption{Signal spectrum for MAX/500}

% \label{fig:max_500_result}

680

688

% \label{fig:max_500_result}

% \end{figure}

681

689

% \end{figure}

%

682

690

%

% \begin{figure}

683

691

% \begin{figure}

% \centering

684

692

% \centering

% \includegraphics[width=\linewidth]{images/max_1000}

685

693

% \includegraphics[width=\linewidth]{images/max_1000}

% \caption{Signal spectrum for MAX/1000}

686

694

% \caption{Signal spectrum for MAX/1000}

% \label{fig:max_1000_result}

687

695

% \label{fig:max_1000_result}

% \end{figure}

688

696

% \end{figure}

%

689

697

%

% \begin{figure}

690

698

% \begin{figure}

% \centering

691

699

% \centering

% \includegraphics[width=\linewidth]{images/max_1500}

692

700

% \includegraphics[width=\linewidth]{images/max_1500}

% \caption{Signal spectrum for MAX/1500}

693

701

% \caption{Signal spectrum for MAX/1500}

% \label{fig:max_1500_result}

694

702

% \label{fig:max_1500_result}

% \end{figure}

695

703

% \end{figure}

696

704

% r2.14 et r2.15 et r2.16

697

705

% r2.14 et r2.15 et r2.16

\begin{figure}

698

706

\begin{figure}

\centering

699

707

\centering

\begin{subfigure}{\linewidth}

700

708

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_500}

701

709

\includegraphics[width=\linewidth]{images/max_500}

\caption{Signal spectrum for MAX/500}

702

710

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

711

the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}

\label{fig:max_500_result}

703

712

\label{fig:max_500_result}

\end{subfigure}

704

713

\end{subfigure}

705

714

\begin{subfigure}{\linewidth}

706

715

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1000}

707

716

\includegraphics[width=\linewidth]{images/max_1000}

\caption{Signal spectrum for MAX/1000}

708

717

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

718

the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}

\label{fig:max_1000_result}

709

719

\label{fig:max_1000_result}

\end{subfigure}

710

720

\end{subfigure}

711

721

\begin{subfigure}{\linewidth}

712

722

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1500}

713

723

\includegraphics[width=\linewidth]{images/max_1500}

\caption{Signal spectrum for MAX/1500}

714

724

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

725

the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}

\label{fig:max_1500_result}

715

726

\label{fig:max_1500_result}

\end{subfigure}

716

727

\end{subfigure}

\caption{Signal spectrum of each experimental configurations MAX/500, MAX/1000 and MAX/1500}

717

728

\caption{\color{red}Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing

729

rejection for a given resource allocation.

730

The filter shape constraint (bandpass and bandstop) is shown as thick

731

horizontal lines on each chart.}

\end{figure}

718

732

\end{figure}

719

733

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

720

734

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

721

735

We compare the actual silicon resources given by Vivado to the

722

736

We compare the actual silicon resources given by Vivado to the

resources in arbitrary units.

723

737

resources in arbitrary units.

The goal is to check that our arbitrary units of silicon area models well enough

724

738

The goal is to check that our arbitrary units of silicon area models well enough

the real resources on the FPGA. Especially we want to verify that, for a given

725

739

the real resources on the FPGA. Especially we want to verify that, for a given

number of arbitrary units, the actual silicon resources do not depend on the

726

740

number of arbitrary units, the actual silicon resources do not depend on the

number of stages $n$. Most significantly, our approach aims

727

741

number of stages $n$. Most significantly, our approach aims

at remaining far enough from the practical logic gate implementation used by

728

742

at remaining far enough from the practical logic gate implementation used by

various vendors to remain platform independent and be portable from one

729

743

various vendors to remain platform independent and be portable from one

architecture to another.

730

744

architecture to another.

731

745

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

732

746

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

733

747

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

and 1500 arbitrary units. We have taken care to extract solely the resources used by

734

748

and 1500 arbitrary units. We have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and Programmable

735

749

the FIR filters and remove additional processing blocks including FIFO and Programmable

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

736

750

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

737

751

\begin{table}[h!tb]

738

752

\begin{table}[h!tb]

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

739

753

\caption{Resource occupation {\color{red}following synthesis of the solutions found for

754

the problem of maximizing rejection for a given resource allocation}. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage}

740

755

\label{tbl:resources_usage}

\centering

741

756

\centering

\begin{tabular}{|c|c|ccc|c|}

742

757

\begin{tabular}{|c|c|ccc|c|}

\hline

743

758

\hline

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

744

759

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 249 & 453 & 627 & \emph{17600} \\

745

760

& LUT & 249 & 453 & 627 & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

746

761

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

747

762

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

& LUT & 2374 & 5494 & 691 & \emph{17600} \\

748

763

& LUT & 2374 & 5494 & 691 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

749

764

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 0 & 70 & \emph{80} \\ \hline

750

765

& DSP & 0 & 0 & 70 & \emph{80} \\ \hline

& LUT & 2443 & 3304 & 3521 & \emph{17600} \\

751

766

& LUT & 2443 & 3304 & 3521 & \emph{17600} \\

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

752

767

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

& DSP & 0 & 19 & 35 & \emph{80} \\ \hline

753

768

& DSP & 0 & 19 & 35 & \emph{80} \\ \hline

& LUT & 2634 & 3753 & 2557 & \emph{17600} \\

754

769

& LUT & 2634 & 3753 & 2557 & \emph{17600} \\

4 & BRAM & 4 & 4 & 4 & \emph{120} \\

755

770

4 & BRAM & 4 & 4 & 4 & \emph{120} \\

& DPS & 0 & 19 & 46 & \emph{80} \\ \hline

756

771

& DPS & 0 & 19 & 46 & \emph{80} \\ \hline

& LUT & 2423 & 3047 & 2847 & \emph{17600} \\

757

772

& LUT & 2423 & 3047 & 2847 & \emph{17600} \\

5 & BRAM & 5 & 5 & 5 & \emph{120} \\

758

773

5 & BRAM & 5 & 5 & 5 & \emph{120} \\

& DPS & 0 & 22 & 46 & \emph{80} \\ \hline

759

774

& DPS & 0 & 22 & 46 & \emph{80} \\ \hline

\end{tabular}

760

775

\end{tabular}

\end{table}

761

776

\end{table}

762

777

In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

763

778

In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

when the filter coefficients are small enough, or when the input size is small

764

779

when the filter coefficients are small enough, or when the input size is small

enough, Vivado optimizes resource consumption by selecting multiplexers to

765

780

enough, Vivado optimizes resource consumption by selecting multiplexers to

implement the multiplications instead of a DSP. In this case, it is quite difficult

766

781

implement the multiplications instead of a DSP. In this case, it is quite difficult

to compare the whole silicon budget.

767

782

to compare the whole silicon budget.

768

783

However, a rough estimation can be made with a simple equivalence: looking at

769

784

However, a rough estimation can be made with a simple equivalence: looking at

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

770

785

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

771

786

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs,

772

787

area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs,

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

773

788

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

774

789

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

unit map well to actual hardware resources. The relatively small differences can probably be explained

775

790

unit map well to actual hardware resources. The relatively small differences can probably be explained

by the optimizations done by Vivado based on the detailed map of available processing resources.

776

791

by the optimizations done by Vivado based on the detailed map of available processing resources.

777

792

We now present the computation time needed to solve the quadratic problem.

778

793

We now present the computation time needed to solve the quadratic problem.

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

779

794

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

780

795

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

781

796

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

782

797

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

783

798

\begin{table}[h!tb]

784

799

\begin{table}[h!tb]

\caption{Time needed to solve the quadratic program with Gurobi}

785

800

\caption{Time needed to solve the quadratic program with Gurobi}

\label{tbl:area_time}

786

801

\label{tbl:area_time}

\centering

787

802

\centering

\begin{tabular}{|c|c|c|c|}\hline

788

803

\begin{tabular}{|c|c|c|c|}\hline

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

789

804

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

1 & 0.1~s & 0.1~s & 0.3~s \\

790

805

1 & 0.1~s & 0.1~s & 0.3~s \\

2 & 1.1~s & 2.2~s & 12~s \\

791

806

2 & 1.1~s & 2.2~s & 12~s \\

3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\

792

807

3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\

4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\

793

808

4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\

5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline

794

809

5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline

\end{tabular}

795

810

\end{tabular}

\end{table}

796

811

\end{table}

797

812

As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?

798

813

As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?

When the area is limited, the design exploration space is more limited and the solver is able to

799

814

When the area is limited, the design exploration space is more limited and the solver is able to

find an optimal solution faster.

800

815

find an optimal solution faster.

801

816

\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}

802

817

\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}

803

818

This section presents the results of the complementary quadratic program aimed at

804

819

This section presents the results of the complementary quadratic program aimed at

minimizing the area occupation for a targeted rejection level.

805

820

minimizing the area occupation for a targeted rejection level.

806

821

The experimental setup is composed of four cases. The raw input is the same

807

822

The experimental setup is composed of four cases. The raw input is the same

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

808

823

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

809

824

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

810

825

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

The number of configurations $p$ is the same as previous section.

811

826

The number of configurations $p$ is the same as previous section.

812

827

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

813

828

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

814

829

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

815

830

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

816

831

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

817

832

\renewcommand{\arraystretch}{1.4}

818

833

\renewcommand{\arraystretch}{1.4}

819

834

\begin{table}[h!tb]

820

835

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

821

836

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

\label{tbl:gurobi_min_40}

822

837

\label{tbl:gurobi_min_40}

\centering

823

838

\centering

{\scalefont{0.77}

824

839

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

825

840

\begin{tabular}{|c|ccccc|c|c|}

\hline

826

841

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

827

842

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

828

843

\hline

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

829

844

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\

830

845

2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\

3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\

831

846

3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\

4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\

832

847

4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\

\hline

833

848

\hline

\end{tabular}

834

849

\end{tabular}

}

835

850

}

\end{table}

836

851

\end{table}

837

852

\begin{table}[h!tb]

838

853

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

839

854

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

\label{tbl:gurobi_min_60}

840

855

\label{tbl:gurobi_min_60}

\centering

841

856

\centering

{\scalefont{0.77}

842

857

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

843

858

\begin{tabular}{|c|ccccc|c|c|}

\hline

844

859

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

845

860

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

846

861

\hline

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

847

862

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\

848

863

2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\

3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\

849

864

3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\

4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\

850

865

4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\

5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\

851

866

5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\

\hline

852

867

\hline

\end{tabular}

853

868

\end{tabular}

}

854

869

}

\end{table}

855

870

\end{table}

856

871

\begin{table}[h!tb]

857

872

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

858

873

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

\label{tbl:gurobi_min_80}

859

874

\label{tbl:gurobi_min_80}

\centering

860

875

\centering

{\scalefont{0.77}

861

876

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

862

877

\begin{tabular}{|c|ccccc|c|c|}

\hline

863

878

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

864

879

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

865

880

\hline

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

866

881

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\

867

882

2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\

3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\

868

883

3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\

4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\

869

884

4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\

5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\

870

885

5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\

\hline

871

886

\hline

\end{tabular}

872

887

\end{tabular}

}

873

888

}

\end{table}

874

889

\end{table}

875

890

\begin{table}[h!tb]

876

891

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

877

892

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

\label{tbl:gurobi_min_100}

878

893

\label{tbl:gurobi_min_100}

\centering

879

894

\centering

{\scalefont{0.77}

880

895

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

881

896

\begin{tabular}{|c|ccccc|c|c|}

\hline

882

897

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

883

898

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

884

899

\hline

1 & - & - & - & - & - & - & - \\

885

900

1 & - & - & - & - & - & - & - \\

2 & (15, 7, 17) & (51, 14, 0) & - & - & - & 100~dB & 1365 \\

886

901

2 & (15, 7, 17) & (51, 14, 0) & - & - & - & 100~dB & 1365 \\

3 & (3, 3, 15) & (27, 9, 0) & (27, 9, 0) & - & - & 100~dB & 1002 \\

887

902

3 & (3, 3, 15) & (27, 9, 0) & (27, 9, 0) & - & - & 100~dB & 1002 \\

4 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 0) & - & 101~dB & 909 \\

888

903

4 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 0) & - & 101~dB & 909 \\

5 & (3, 3, 15) & (23, 8, 1) & (19, 7, 0) & (3, 3, 0) & (3, 3, 0) & 101~dB & 810 \\

889

904

5 & (3, 3, 15) & (23, 8, 1) & (19, 7, 0) & (3, 3, 0) & (3, 3, 0) & 101~dB & 810 \\

\hline

890

905

\hline

\end{tabular}

891

906

\end{tabular}

}

892

907

}

\end{table}

893

908

\end{table}

\renewcommand{\arraystretch}{1}

894

909

\renewcommand{\arraystretch}{1}

895

910

From these tables, we can first state that almost all configurations reach the targeted rejection

896

911

From these tables, we can first state that almost all configurations reach the targeted rejection

level or even better thanks to our underestimate of the cascade rejection as the sum of the

897

912

level or even better thanks to our underestimate of the cascade rejection as the sum of the

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

898

913

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

899

914

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters

900

915

Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters

(1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection

901

916

(1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection

respectively). More generally, the more filters are cascaded, the lower the occupied area.

902

917

respectively). More generally, the more filters are cascaded, the lower the occupied area.

903

918

Like in previous section, the solver chooses always a little filter as first

904

919

Like in previous section, the solver chooses always a little filter as first

filter stage and the second one is often the biggest filter. This choice can be explained

905

920

filter stage and the second one is often the biggest filter. This choice can be explained

as in the previous section, with the solver using just enough bits not to degrade the input

906

921

as in the previous section, with the solver using just enough bits not to degrade the input

signal and in the second filter selecting a better filter to improve rejection without

907

922

signal and in the second filter selecting a better filter to improve rejection without

having too many bits in the output data.

908

923

having too many bits in the output data.

909

924

For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal

910

925

For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal

number of filters is 4 so it did not chose any configuration for the last filter. Hence this

911

926

number of filters is 4 so it did not chose any configuration for the last filter. Hence this

solution is equivalent to the result for $n = 4$.

912

927

solution is equivalent to the result for $n = 4$.

913

928

The following graphs present the rejection for real data on the FPGA. In all the following

914

929

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

915

930

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line is the noise level

916

931

data on the FPGA as measured experimentally and the dashed line is the noise level

given by the quadratic solver.

917

932

given by the quadratic solver.

918

933

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

919

934

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

920

935

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

921

936

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

922

937

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

923

938

% \begin{figure}

924

939

% \begin{figure}

% \centering

925

940

% \centering

% \includegraphics[width=\linewidth]{images/min_40}

926

941

% \includegraphics[width=\linewidth]{images/min_40}

% \caption{Signal spectrum for MIN/40}

927

942

% \caption{Signal spectrum for MIN/40}

% \label{fig:min_40}

928

943

% \label{fig:min_40}

% \end{figure}

929

944

% \end{figure}

%

930

945

%

% \begin{figure}

931

946

% \begin{figure}

% \centering

932

947

% \centering

% \includegraphics[width=\linewidth]{images/min_60}

933

948

% \includegraphics[width=\linewidth]{images/min_60}

% \caption{Signal spectrum for MIN/60}

934

949

% \caption{Signal spectrum for MIN/60}

% \label{fig:min_60}

935

950

% \label{fig:min_60}

% \end{figure}

936

951

% \end{figure}

%

937

952

%

% \begin{figure}

938

953

% \begin{figure}

% \centering

939

954

% \centering

% \includegraphics[width=\linewidth]{images/min_80}

940

955

% \includegraphics[width=\linewidth]{images/min_80}

% \caption{Signal spectrum for MIN/80}

941

956

% \caption{Signal spectrum for MIN/80}

% \label{fig:min_80}

942

957

% \label{fig:min_80}

% \end{figure}

943

958

% \end{figure}

%

944

959

%

% \begin{figure}

945

960

% \begin{figure}

% \centering

946

961

% \centering

% \includegraphics[width=\linewidth]{images/min_100}

947

962

% \includegraphics[width=\linewidth]{images/min_100}

% \caption{Signal spectrum for MIN/100}

948

963

% \caption{Signal spectrum for MIN/100}

% \label{fig:min_100}

949

964

% \label{fig:min_100}

% \end{figure}

950

965

% \end{figure}

951

966

% r2.14 et r2.15 et r2.16

952

967

% r2.14 et r2.15 et r2.16

\begin{figure}

953

968

\begin{figure}

\centering

954

969

\centering

\begin{subfigure}{\linewidth}

955

970

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_40}

956

971

\includegraphics[width=.91\linewidth]{images/min_40}

\caption{Signal spectrum for MIN/40}

957

972

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

973

the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}

\label{fig:min_40}

958

974

\label{fig:min_40}

\end{subfigure}

959

975

\end{subfigure}

960

976

\begin{subfigure}{\linewidth}

961

977

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_60}

962

978

\includegraphics[width=.91\linewidth]{images/min_60}

\caption{Signal spectrum for MIN/60}

963

979

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

980

the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}

\label{fig:min_60}

964

981

\label{fig:min_60}

\end{subfigure}

965

982

\end{subfigure}

966

983

\begin{subfigure}{\linewidth}

967

984

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_80}

968

985

\includegraphics[width=.91\linewidth]{images/min_80}

\caption{Signal spectrum for MIN/80}

969

986

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

987

the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}

\label{fig:min_80}

970

988

\label{fig:min_80}

\end{subfigure}

971

989

\end{subfigure}

972

990

\begin{subfigure}{\linewidth}

973

991

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_100}

974

992

\includegraphics[width=.91\linewidth]{images/min_100}

\caption{Signal spectrum for MIN/100}

975

993

\caption{\color{red}Filter transfer functions for varying number of cascaded filters solving

994

the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}

\label{fig:min_100}

976

995

\label{fig:min_100}

\end{subfigure}

977

996

\end{subfigure}

\caption{Signal spectrum of each experimental configurations MIN/40, MIN/60, MIN/80 and MIN/100}

978

997

\caption{\color{red}Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a

998

given rejection while minimizing resource allocation. The filter shape constraint (bandpass and

999

bandstop) is shown as thick

1000

horizontal lines on each chart.}

\end{figure}

979

1001

\end{figure}

980

1002

We observe that all rejections given by the quadratic solver are close to the experimentally

981

1003

We observe that all rejections given by the quadratic solver are close to the experimentally

measured rejection. All curves prove that the constraint to reach the target rejection is

982

1004

measured rejection. All curves prove that the constraint to reach the target rejection is

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

983

1005

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

984

1006

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

985

1007

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

986

1008

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

have taken care to extract solely the resources used by

987

1009

have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and PL to

988

1010

the FIR filters and remove additional processing blocks including FIFO and PL to

PS communication.

989

1011

PS communication.

990

1012

\renewcommand{\arraystretch}{1.2}

991

1013

\renewcommand{\arraystretch}{1.2}

\begin{table}

992

1014

\begin{table}

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

993

1015

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage_comp}

994

1016

\label{tbl:resources_usage_comp}

\centering

995

1017

\centering

{\scalefont{0.90}

996

1018

{\scalefont{0.90}

\begin{tabular}{|c|c|cccc|c|}

997

1019

\begin{tabular}{|c|c|cccc|c|}

\hline

998

1020

\hline

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

999

1021

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

1000

1022

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

1001

1023

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

1002

1024

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

& LUT & 1252 & 2862 & 5099 & 640 & \emph{17600} \\

1003

1025

& LUT & 1252 & 2862 & 5099 & 640 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

1004

1026

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 0 & 0 & 66 & \emph{80} \\ \hline

1005

1027

& DSP & 0 & 0 & 0 & 66 & \emph{80} \\ \hline

& LUT & 891 & 2148 & 2023 & 2448 & \emph{17600} \\

1006

1028

& LUT & 891 & 2148 & 2023 & 2448 & \emph{17600} \\

3 & BRAM & 3 & 3 & 3 & 3 & \emph{120} \\

1007

1029

3 & BRAM & 3 & 3 & 3 & 3 & \emph{120} \\

& DSP & 0 & 0 & 19 & 27 & \emph{80} \\ \hline

1008

1030

& DSP & 0 & 0 & 19 & 27 & \emph{80} \\ \hline

& LUT & 662 & 1729 & 2451 & 2893 & \emph{17600} \\

1009

1031

& LUT & 662 & 1729 & 2451 & 2893 & \emph{17600} \\

4 & BRAM & 4 & 4 & 4 & 4 & \emph{120} \\

1010

1032

4 & BRAM & 4 & 4 & 4 & 4 & \emph{120} \\

& DPS & 0 & 0 & 7 & 19 & \emph{80} \\ \hline

1011

1033

& DPS & 0 & 0 & 7 & 19 & \emph{80} \\ \hline

& LUT & - & 1259 & 2602 & 2505 & \emph{17600} \\

1012

1034

& LUT & - & 1259 & 2602 & 2505 & \emph{17600} \\

5 & BRAM & - & 5 & 5 & 5 & \emph{120} \\

1013

1035

5 & BRAM & - & 5 & 5 & 5 & \emph{120} \\

& DPS & - & 0 & 0 & 19 & \emph{80} \\ \hline

1014

1036

& DPS & - & 0 & 0 & 19 & \emph{80} \\ \hline

\end{tabular}

1015

1037

\end{tabular}

GITLAB

jfriedt / IFCS2018 article

captions figures