jfriedt / IFCS2018 article

% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee

1

% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee

% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de

2

% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de

% rejection par bit et perte si moins de bits que rejection/6

3

% rejection par bit et perte si moins de bits que rejection/6

% developper programme lineaire en incluant le decalage de bits

4

% developper programme lineaire en incluant le decalage de bits

% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on

5

% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on

% implemente et on demontre que ca tourne

6

% implemente et on demontre que ca tourne

% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?

7

% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?

% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer

8

% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer

% (zedboard ou redpit)

9

% (zedboard ou redpit)

10

% label schema : verifier que "argumenter de la cascade de FIR" est fait

11

% label schema : verifier que "argumenter de la cascade de FIR" est fait

12

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

13

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

\usepackage{graphicx,color,hyperref}

14

\usepackage{graphicx,color,hyperref}

\usepackage{amsfonts}

15

\usepackage{amsfonts}

\usepackage{amsthm}

16

\usepackage{amsthm}

\usepackage{amssymb}

17

\usepackage{amssymb}

\usepackage{amsmath}

18

\usepackage{amsmath}

\usepackage{algorithm2e}

19

\usepackage{algorithm2e}

\usepackage{url,balance}

20

\usepackage{url,balance}

\usepackage[normalem]{ulem}

21

\usepackage[normalem]{ulem}

\usepackage{tikz}

22

\usepackage{tikz}

\usetikzlibrary{positioning,fit}

23

\usetikzlibrary{positioning,fit}

\usepackage{multirow}

24

\usepackage{multirow}

\usepackage{scalefnt}

25

\usepackage{scalefnt}

\usepackage{caption}

26

\usepackage{caption}

\usepackage{subcaption}

27

\usepackage{subcaption}

28

% correct bad hyphenation here

29

% correct bad hyphenation here

\hyphenation{op-tical net-works semi-conduc-tor}

30

\hyphenation{op-tical net-works semi-conduc-tor}

\textheight=26cm

31

\textheight=26cm

\setlength{\footskip}{30pt}

32

\setlength{\footskip}{30pt}

\pagenumbering{gobble}

33

\pagenumbering{gobble}

\begin{document}

34

\begin{document}

\title{Filter optimization for real time digital processing of radiofrequency signals: application

35

\title{Filter optimization for real time digital processing of radiofrequency signals: application

to oscillator metrology}

36

to oscillator metrology}

37

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

38

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

G. Goavec-M\'erou\IEEEauthorrefmark{1},

39

G. Goavec-M\'erou\IEEEauthorrefmark{1},

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

40

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

41

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

42

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

Email: \{pyb2,jmfriedt\}@femto-st.fr}

43

Email: \{pyb2,jmfriedt\}@femto-st.fr}

}

44

}

\maketitle

45

\maketitle

\thispagestyle{plain}

46

\thispagestyle{plain}

\pagestyle{plain}

47

\pagestyle{plain}

\newtheorem{definition}{Definition}

48

\newtheorem{definition}{Definition}

49

\begin{abstract}

50

\begin{abstract}

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

51

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

radiofrequency signal processing. Applied to oscillator characterization in the context

52

radiofrequency signal processing. Applied to oscillator characterization in the context

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

53

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

noise rejection needs. Since real time radiofrequency processing must be performed in a

54

noise rejection needs. Since real time radiofrequency processing must be performed in a

Field Programmable Array to meet timing constraints, we investigate optimization strategies

55

Field Programmable Array to meet timing constraints, we investigate optimization strategies

to design filters meeting rejection characteristics while limiting the hardware resources

56

to design filters meeting rejection characteristics while limiting the hardware resources

required and keeping timing constraints within the targeted measurement bandwidths. The

57

required and keeping timing constraints within the targeted measurement bandwidths. The

presented technique is applicable to scheduling any sequence of processing blocks characterized

58

presented technique is applicable to scheduling any sequence of processing blocks characterized

by a throughput, resource occupation and performance tabulated as a function of configuration

59

by a throughput, resource occupation and performance tabulated as a function of configuration

characateristics, as is the case for filters with their coefficients and resolution yielding

60

characateristics, as is the case for filters with their coefficients and resolution yielding

rejection and number of multipliers.

61

rejection and number of multipliers.

\end{abstract}

62

\end{abstract}

63

\begin{IEEEkeywords}

64

\begin{IEEEkeywords}

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

65

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

\end{IEEEkeywords}

66

\end{IEEEkeywords}

67

\section{Digital signal processing of ultrastable clock signals}

68

\section{Digital signal processing of ultrastable clock signals}

69

Analog oscillator phase noise characteristics are classically performed by downconverting

70

Analog oscillator phase noise characteristics are classically performed by downconverting

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

71

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

72

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

73

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

74

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

75

\begin{figure}[h!tb]

76

\begin{figure}[h!tb]

\begin{center}

77

\begin{center}

\includegraphics[width=.8\linewidth]{images/schema}

78

\includegraphics[width=.8\linewidth]{images/schema}

\end{center}

79

\end{center}

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

80

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

81

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

82

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

83

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

84

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

the spectral characteristics of the phase fluctuations.}

85

the spectral characteristics of the phase fluctuations.}

\label{schema}

86

\label{schema}

\end{figure}

87

\end{figure}

88

As with the analog mixer,

89

As with the analog mixer,

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

90

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

well as the generation of the frequency sum signal in addition to the frequency difference.

91

well as the generation of the frequency sum signal in addition to the frequency difference.

These unwanted spectral characteristics must be rejected before decimating the data stream

92

These unwanted spectral characteristics must be rejected before decimating the data stream

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

93

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

downconverter

94

downconverter

and the decimation processing blocks are core characteristics of an oscillator characterization

95

and the decimation processing blocks are core characteristics of an oscillator characterization

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

96

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

97

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

98

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

datastream: optimizing the performance of the filter while reducing the needed resources is

99

datastream: optimizing the performance of the filter while reducing the needed resources is

hence tackled in a systematic approach using optimization techniques. Most significantly, we

100

hence tackled in a systematic approach using optimization techniques. Most significantly, we

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

101

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

tunable number of coefficients and tunable number of bits representing the coefficients and the

102

tunable number of coefficients and tunable number of bits representing the coefficients and the

data being processed.

103

data being processed.

104

\section{Finite impulse response filter}

105

\section{Finite impulse response filter}

106

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

107

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

108

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

outputs $y_k$

109

outputs $y_k$

\begin{align}

110

\begin{align}

y_n=\sum_{k=0}^N b_k x_{n-k}

111

y_n=\sum_{k=0}^N b_k x_{n-k}

\label{eq:fir_equation}

112

\label{eq:fir_equation}

\end{align}

113

\end{align}

114

As opposed to an implementation on a general purpose processor in which word size is defined by the

115

As opposed to an implementation on a general purpose processor in which word size is defined by the

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

116

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

not only the coefficient values and number of taps must be defined, but also the number of bits

117

not only the coefficient values and number of taps must be defined, but also the number of bits

defining the coefficients and the sample size. For this reason, and because we consider pipeline

118

defining the coefficients and the sample size. For this reason, and because we consider pipeline

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

119

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

120

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

121

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

(VHDL) level.

122

(VHDL) level.

{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument,

123

{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument,

the large

124

the large

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

125

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

is not considered as an issue as would be in a closed loop system.} % r2.4

126

is not considered as an issue as would be in a closed loop system.} % r2.4

127

The coefficients are classically expressed as floating point values. However, this binary

128

The coefficients are classically expressed as floating point values. However, this binary

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

129

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

we select to quantify these floating point values into integer values. This quantization

130

we select to quantify these floating point values into integer values. This quantization

will result in some precision loss.

131

will result in some precision loss.

132

\begin{figure}[h!tb]

133

\begin{figure}[h!tb]

\includegraphics[width=\linewidth]{images/zero_values}

134

\includegraphics[width=\linewidth]{images/zero_values}

\caption{Impact of the quantization resolution of the coefficients: the quantization is

135

\caption{Impact of the quantization resolution of the coefficients: the quantization is

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

136

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

the 30~first and 30~last coefficients out of the initial 128~band-pass

137

the 30~first and 30~last coefficients out of the initial 128~band-pass

filter coefficients to 0 (red dots).}

138

filter coefficients to 0 (red dots).}

\label{float_vs_int}

139

\label{float_vs_int}

\end{figure}

140

\end{figure}

141

The tradeoff between quantization resolution and number of coefficients when considering

142

The tradeoff between quantization resolution and number of coefficients when considering

integer operations is not trivial. As an illustration of the issue related to the

143

integer operations is not trivial. As an illustration of the issue related to the

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

144

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

145

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

146

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

taps become null, {\color{red}making the large number of coefficients irrelevant: processing

147

taps become null, {\color{red}making the large number of coefficients irrelevant: processing

resources % r1.1

148

resources % r1.1

are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources

149

are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources

to reach a given rejection level, or maximizing out of band rejection for a given computational

150

to reach a given rejection level, or maximizing out of band rejection for a given computational

resource, will drive the investigation on cascading filters designed with varying tap resolution

151

resource, will drive the investigation on cascading filters designed with varying tap resolution

and tap length, as will be shown in the next section. Indeed, our development strategy closely

152

and tap length, as will be shown in the next section. Indeed, our development strategy closely

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

153

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

in which basic blocks are defined and characterized before being assembled \cite{hide}

154

in which basic blocks are defined and characterized before being assembled \cite{hide}

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

155

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

combination process since we assume a single value to be processed and a single value to be

156

combination process since we assume a single value to be processed and a single value to be

generated at each clock cycle. The FIR filters will not be considered to decimate in the

157

generated at each clock cycle. The FIR filters will not be considered to decimate in the

current implementation: the decimation is assumed to be located after the FIR cascade at the

158

current implementation: the decimation is assumed to be located after the FIR cascade at the

moment.

159

moment.

160

\section{Methodology description}

161

\section{Methodology description}

162

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

163

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

164

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

Achieving such a target requires defining an abstract model to represent some basic properties

165

Achieving such a target requires defining an abstract model to represent some basic properties

of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and

166

of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and

resource occupation. These abstract properties, not necessarily related to the detailed hardware

167

resource occupation. These abstract properties, not necessarily related to the detailed hardware

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

168

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

169

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

minimizing resource occupation for a given perfomance. In our approach, the solution of the

170

minimizing resource occupation for a given perfomance. In our approach, the solution of the

solver is then synthesized using the dedicated tool provided by each platform manufacturer

171

solver is then synthesized using the dedicated tool provided by each platform manufacturer

to assess the validity of our abstract resource occupation indicator, and the result of running

172

to assess the validity of our abstract resource occupation indicator, and the result of running

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

173

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

that all solutions found by the solver are synthesized and executed on hardware at the end

174

that all solutions found by the solver are synthesized and executed on hardware at the end

of the analysis.

175

of the analysis.

176

In this demonstration , we focus on only two operations: filtering and shifting the number of

177

In this demonstration, we focus on only two operations: filtering and shifting the number of

bits needed to represent the data along the processing chain.

178

bits needed to represent the data along the processing chain.

We have chosen these basic operations because shifting and the filtering have already been studied

179

We have chosen these basic operations because shifting and the filtering have already been studied

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

180

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

181

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

requiring pipelined processing at full bandwidth for the earliest steps, including for

182

requiring pipelined processing at full bandwidth for the earliest steps, including for

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

183

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

184

Addressing only two operations allows for demonstrating the methodology but should not be

185

Addressing only two operations allows for demonstrating the methodology but should not be

considered as a limitation of the framework which can be extended to assembling any number

186

considered as a limitation of the framework which can be extended to assembling any number

of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red}

187

of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red}

Hence,

188

Hence,

in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2

189

in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

190

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been

191

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

192

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

193

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

194

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

allowing to assess either filter rejection for a given resource usage, or validating the rejection

195

allowing to assess either filter rejection for a given resource usage, or validating the rejection

when implementing a solution minimizing resource occupation.

196

when implementing a solution minimizing resource occupation.

197

{\color{red}

198

{\color{red}

The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3

199

The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3

the filtering part of the signal processing chain, we have not included the PRN generator or the

200

the filtering part of the signal processing chain, we have not included the PRN generator or the

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

201

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

The filtering can be done in two ways, either by considering a single monolithic FIR filter

202

The filtering can be done in two ways, either by considering a single monolithic FIR filter

requiring many coefficients to reach the targeted noise rejection ratio, or by

203

requiring many coefficients to reach the targeted noise rejection ratio, or by

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.}

204

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.}

205

After each filter we leave the possibility of shifting the filtered data to consume

206

After each filter we leave the possibility of shifting the filtered data to consume

less resources. Hence in the case of cascaded filter, we define a stage as a filter

207

less resources. Hence in the case of cascaded filter, we define a stage as a filter

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

208

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

209

\subsection{Model of a FIR filter}

210

\subsection{Model of a FIR filter}

211

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

212

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

213

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

214

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

215

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

shows a filtering stage.

216

shows a filtering stage.

217

\begin{figure}

218

\begin{figure}

\centering

219

\centering

\begin{tikzpicture}[node distance=2cm]

220

\begin{tikzpicture}[node distance=2cm]

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

221

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

222

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

\node (Start) [left of=FIR] { } ;

223

\node (Start) [left of=FIR] { } ;

\node (End) [right of=Shift] { } ;

224

\node (End) [right of=Shift] { } ;

225

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

226

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

227

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

228

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

\draw[->] (FIR) -- (Shift) ;

229

\draw[->] (FIR) -- (Shift) ;

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

230

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

\end{tikzpicture}

231

\end{tikzpicture}

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

232

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

\label{fig:fir_stage}

233

\label{fig:fir_stage}

\end{figure}

234

\end{figure}

235

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

236

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

This rejection has been computed using GNU Octave software FIR coefficient design functions

237

This rejection has been computed using GNU Octave software FIR coefficient design functions

(\texttt{firls} and \texttt{fir1}).

238

(\texttt{firls} and \texttt{fir1}).

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

239

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

240

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

241

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

242

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

243

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

244

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

transfer function.

245

transfer function.

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

246

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

247

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration,

248

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration,

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

249

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

250

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

251

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid}

252

shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid}

as described below is indeed unique for each filter shape.}

253

as described below is indeed unique for each filter shape.}

254

\begin{figure}

255

\begin{figure}

\begin{center}

256

\begin{center}

\scalebox{0.8}{

257

\scalebox{0.8}{

\centering

258

\centering

\begin{tikzpicture}[scale=0.3]

259

\begin{tikzpicture}[scale=0.3]

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

260

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

261

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

262

\draw (0,14) node [left] { $P$ } ;

263

\draw (0,14) node [left] { $P$ } ;

\draw (20,0) node [below] { $f$ } ;

264

\draw (20,0) node [below] { $f$ } ;

265

\draw[>=latex,<->] (0,14) -- (8,14) ;

266

\draw[>=latex,<->] (0,14) -- (8,14) ;

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

267

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

268

\draw[>=latex,<->] (8,14) -- (12,14) ;

269

\draw[>=latex,<->] (8,14) -- (12,14) ;

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

270

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

271

\draw[>=latex,<->] (12,14) -- (20,14) ;

272

\draw[>=latex,<->] (12,14) -- (20,14) ;

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

273

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

274

\draw[>=latex,<->] (16,12) -- (16,8) ;

275

\draw[>=latex,<->] (16,12) -- (16,8) ;

\draw (16,10) node [right] { rejection } ;

276

\draw (16,10) node [right] { rejection } ;

277

\draw[dashed] (8,-1) -- (8,14) ;

278

\draw[dashed] (8,-1) -- (8,14) ;

\draw[dashed] (12,-1) -- (12,14) ;

279

\draw[dashed] (12,-1) -- (12,14) ;

280

\draw[dashed] (8,12) -- (16,12) ;

281

\draw[dashed] (8,12) -- (16,12) ;

\draw[dashed] (12,8) -- (16,8) ;

282

\draw[dashed] (12,8) -- (16,8) ;

283

\end{tikzpicture}

284

\end{tikzpicture}

}

285

}

\end{center}

286

\end{center}

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

287

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

288

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

the stopband the last 40\%, allowing 20\% transition width.}

289

the stopband the last 40\%, allowing 20\% transition width.}

\label{fig:fir_mag}

290

\label{fig:fir_mag}

\end{figure}

291

\end{figure}

292

In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics.

293

In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics.

% r2.7

294

% r2.7

% Our initial criterion considered the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion

295

% Our initial criterion considered the mean value of the stopband rejection, as shown in figure~\ref{fig:mean_criterion}. This criterion

% yields unacceptable results since notches overestimate the rejection capability of the filter. Furthermore, the losses within

296

% yields unacceptable results since notches overestimate the rejection capability of the filter. Furthermore, the losses within

% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.

297

% the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.

Our criterion to compute the filter rejection considers

298

Our criterion to compute the filter rejection considers

% r2.8 et r2.2 r2.3

299

% r2.8 et r2.2 r2.3

the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values

300

the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values

within the passband is subtracted to avoid filters with excessive ripples}. With this

301

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

302

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

302

303

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

303

304

% \begin{figure}

304

305

% \begin{figure}

% \centering

305

306

% \centering

% \includegraphics[width=\linewidth]{images/colored_mean_criterion}

306

307

% \includegraphics[width=\linewidth]{images/colored_mean_criterion}

% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}

307

308

% \caption{Mean stopband rejection criterion comparison between monolithic filter and cascaded filters}

% \label{fig:mean_criterion}

308

309

% \label{fig:mean_criterion}

% \end{figure}

309

310

% \end{figure}

310

311

\begin{figure}

311

312

\begin{figure}

\centering

312

313

\centering

\includegraphics[width=\linewidth]{images/colored_custom_criterion}

313

314

\includegraphics[width=\linewidth]{images/colored_custom_criterion}

\caption{Custom criterion (maximum rejection in the stopband minus the mean of the absolute value of the passband rejection)

314

315

\caption{Custom criterion (maximum rejection in the stopband minus the mean of the absolute value of the passband rejection)

comparison between monolithic filter and cascaded filters}

315

316

comparison between monolithic filter and cascaded filters}

\label{fig:custom_criterion}

316

317

\label{fig:custom_criterion}

\end{figure}

317

318

\end{figure}

318

319

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

319

320

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

320

321

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

321

322

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

322

323

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

323

324

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

324

325

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

the rejection. Hence the best coefficient set are on the vertex of the pyramid.

325

326

the rejection. Hence the best coefficient set are on the vertex of the pyramid.

326

327

\begin{figure}

327

328

\begin{figure}

\centering

328

329

\centering

\includegraphics[width=\linewidth]{images/rejection_pyramid}

329

330

\includegraphics[width=\linewidth]{images/rejection_pyramid}

\caption{Rejection as a function of number of coefficients and number of bits}

330

331

\caption{Rejection as a function of number of coefficients and number of bits}

\label{fig:rejection_pyramid}

331

332

\label{fig:rejection_pyramid}

\end{figure}

332

333

\end{figure}

333

334

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

334

335

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

335

336

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

If the FIR filter coefficients are the same between the stages, we have:

336

337

If the FIR filter coefficients are the same between the stages, we have:

$$F_{total} = F_1 + F_2$$

337

338

$$F_{total} = F_1 + F_2$$

But selecting two different sets of coefficient will yield a more complex situation in which

338

339

But selecting two different sets of coefficient will yield a more complex situation in which

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

339

340

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

are two different filters with maximums and notches not located at the same frequency offsets.

340

341

are two different filters with maximums and notches not located at the same frequency offsets.

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

341

342

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

342

343

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

% r2.9

343

344

% r2.9

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

344

345

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

criteria of each filter. However since the this sum underestimates the rejection capability of the cascade,

345

346

criteria of each filter. However since the this sum underestimates the rejection capability of the cascade,

% r2.10

346

347

% r2.10

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

347

348

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

of the filter cascade to meet design criteria.

348

349

of the filter cascade to meet design criteria.

349

350

\begin{figure}

350

351

\begin{figure}

\centering

351

352

\centering

\includegraphics[width=\linewidth]{images/cascaded_criterion}

352

353

\includegraphics[width=\linewidth]{images/cascaded_criterion}

\caption{Rejection of two cascaded filters}

353

354

\caption{Rejection of two cascaded filters}

\label{fig:sum_rejection}

354

355

\label{fig:sum_rejection}

\end{figure}

355

356

\end{figure}

356

357

% r2.6

357

358

% r2.6

Finally in our case, we consider that the input signal are fully known. So the

358

359

{\color{red}

resolution of the data stream are fixed and still the same for all experiments

359

360

Finally in our case, we consider that the input signal are fully known. The

in this paper.

360

361

resolution of the input data stream are fixed and still the same for all experiments

362

in this paper.}

361

363

Based on this analysis, we address the estimate of resource consumption (called

362

364

Based on this analysis, we address the estimate of resource consumption (called

% r2.11

363

365

% r2.11

silicon area -- in the case of FPGAs this means processing cells) as a function of

364

366

silicon area -- in the case of FPGAs this means processing cells) as a function of

filter characteristics. As a reminder, we do not aim at matching actual hardware

365

367

filter characteristics. As a reminder, we do not aim at matching actual hardware

configuration but consider an arbitrary silicon area occupied by each processing function,

366

368

configuration but consider an arbitrary silicon area occupied by each processing function,

and will assess after synthesis the adequation of this arbitrary unit with actual

367

369

and will assess after synthesis the adequation of this arbitrary unit with actual

hardware resources provided by FPGA manufacturers. The sum of individual processing

368

370

hardware resources provided by FPGA manufacturers. The sum of individual processing

unit areas is constrained by a total silicon area representative of FPGA global resources.

369

371

unit areas is constrained by a total silicon area representative of FPGA global resources.

Formally, variable $a_i$ is the area taken by filter~$i$

370

372

Formally, variable $a_i$ is the area taken by filter~$i$

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

371

373

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

372

374

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

373

375

\begin{align}

374

376

\begin{align}

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

375

377

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

376

378

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

377

379

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

378

380

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

379

381

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

380

382

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

381

383

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

\pi_1^- &= \Pi^I \label{eq:init}

382

384

\pi_1^- &= \Pi^I \label{eq:init}

\end{align}

383

385

\end{align}

384

386

Equation~\ref{eq:area} states that the total area taken by the filters must be

385

387

Equation~\ref{eq:area} states that the total area taken by the filters must be

less than the available area. Equation~\ref{eq:areadef} gives the definition of

386

388

less than the available area. Equation~\ref{eq:areadef} gives the definition of

the area used by a filter, considered as the area of the FIR since the Shifter is

387

389

the area used by a filter, considered as the area of the FIR since the Shifter is

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

388

390

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

389

391

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

390

392

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

391

393

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

previously. The Shifter does not introduce negative rejection as we will explain later,

392

394

previously. The Shifter does not introduce negative rejection as we will explain later,

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

393

395

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

394

396

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

395

397

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

396

398

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

a filter is the same as the input number of bits of the next filter.

397

399

a filter is the same as the input number of bits of the next filter.

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

398

400

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

rejection. Indeed, the results of the FIR can be right shifted without compromising

399

401

rejection. Indeed, the results of the FIR can be right shifted without compromising

the quality of the rejection until a threshold. Each bit of the output data

400

402

the quality of the rejection until a threshold. Each bit of the output data

increases the maximum rejection level by 6~dB. We add one to take the sign bit

401

403

increases the maximum rejection level by 6~dB. We add one to take the sign bit

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

402

404

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

shift too much and introduce some noise in the output data. Each supplementary

403

405

shift too much and introduce some noise in the output data. Each supplementary

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

404

406

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

405

407

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

406

408

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

407

409

{\color{red}

408

410

{\color{red}

This model is non-linear since we multiply some variable with another variable

409

411

This model is non-linear since we multiply some variable with another variable

and it is even non-quadratic, as $F$ does not have a known

410

412

and it is even non-quadratic, as the cost function $F$ does not have a known

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

411

413

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

This variable must be defined by the user, it represent the number of different

412

414

This variable $p$ is defined by the user, and represents the number of different

set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}

413

415

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

functions from GNU Octave). So $C_{ij}$ and $\pi_{ij}^C$ become constant and

414

416

functions from GNU Octave) based on the targeted filter characteristics and implementation

we defined $1 \leq j \leq p$ and the function $F$ can be estimate for each configurations

415

417

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

thanks our rejection criterion. We also defined binary

416

418

$\pi_{ij}^C$ become constants and

419

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

420

for each configurations thanks to the rejection criterion. We also define the binary

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

417

421

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

and 0 otherwise. The new equations are as follows:

418

422

and 0 otherwise. The new equations are as follows:

}

419

423

}

420

424

\begin{align}

421

425

\begin{align}

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

422

426

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

423

427

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

424

428

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

425

429

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

\end{align}

426

430

\end{align}

427

431

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

428

432

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

429

433

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

430

434

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

431

435

{\color{red}

432

436

{\color{red}

However the problem still quadratic since in the constraint~\ref{eq:areadef2} we multiply

433

437

However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

$\delta_{ij}$ and $\pi_i^-$. But like $\delta_{ij}$ is a binary variable we can

434

438

we multiply

linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size

435

439

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

we define $0 < \pi_i^- \leq 128$ which is the maximal data size that we can process.

436

440

linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,

}

437

441

we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is

Moreover the Gurobi

438

442

assumed on hardware characteristics.

(\url{www.gurobi.com}) optimization software is used to solve this quadratic

439

443

The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic

model, and since Gurobi is able to linearize, the model is left as is. This model

440

444

model is able to linearize the model provided as is. This model

has $O(np)$ variables and $O(n)$ constraints.

441

445

has $O(np)$ variables and $O(n)$ constraints.}

442

446

% This model is non-linear and even non-quadratic, as $F$ does not have a known

443

447

% This model is non-linear and even non-quadratic, as $F$ does not have a known

% linear or quadratic expression. We introduce $p$ FIR configurations

444

448

% linear or quadratic expression. We introduce $p$ FIR configurations

% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.

445

449

% $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants.

% % r2.12

446

450

% % r2.12

% This variable must be defined by the user, it represent the number of different

447

451

% This variable must be defined by the user, it represent the number of different

% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}

448

452

% set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}

% functions from GNU Octave).

449

453

% functions from GNU Octave).

% We define binary

450

454

% We define binary

% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

451

455

% variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

% and 0 otherwise. The new equations are as follows:

452

456

% and 0 otherwise. The new equations are as follows:

%

453

457

%

% \begin{align}

454

458

% \begin{align}

% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

455

459

% a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

456

460

% r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

457

461

% \pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

458

462

% \sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

% \end{align}

459

463

% \end{align}

%

460

464

%

% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

461

465

% Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

462

466

% respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

463

467

% Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

%

464

468

%

% % r2.13

465

469

% % r2.13

% This modified model is quadratic since we multiply two variables in the

466

470

% This modified model is quadratic since we multiply two variables in the

% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.

467

471

% equation~\ref{eq:areadef2} ($\delta_{ij}$ by $\pi_{ij}^-$) but it can be linearised if necessary.

% The Gurobi

468

472

% The Gurobi

% (\url{www.gurobi.com}) optimization software is used to solve this quadratic

469

473

% (\url{www.gurobi.com}) optimization software is used to solve this quadratic

% model, and since Gurobi is able to linearize, the model is left as is. This model

470

474

% model, and since Gurobi is able to linearize, the model is left as is. This model

% has $O(np)$ variables and $O(n)$ constraints.

471

475

% has $O(np)$ variables and $O(n)$ constraints.

472

476

Two problems will be addressed using the workflow described in the next section: on the one

473

477

Two problems will be addressed using the workflow described in the next section: on the one

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

474

478

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

475

479

silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

476

480

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

objective function is replaced with:

477

481

objective function is replaced with:

\begin{align}

478

482

\begin{align}

\text{Minimize } & \sum_{i=1}^n a_i \notag

479

483

\text{Minimize } & \sum_{i=1}^n a_i \notag

\end{align}

480

484

\end{align}

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

481

485

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

482

486

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

rejection required.

483

487

rejection required.

484

488

\begin{align}

485

489

\begin{align}

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

486

490

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

\end{align}

487

491

\end{align}

488

492

\section{Design workflow}

489

493

\section{Design workflow}

\label{sec:workflow}

490

494

\label{sec:workflow}

491

495

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

492

496

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

493

497

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

in the computation of the results.

494

498

in the computation of the results.

495

499

\begin{figure}

496

500

\begin{figure}

\centering

497

501

\centering

\begin{tikzpicture}[node distance=0.75cm and 2cm]

498

502

\begin{tikzpicture}[node distance=0.75cm and 2cm]

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

499

503

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

\node (Start) [left= 3cm of Solver] { } ;

500

504

\node (Start) [left= 3cm of Solver] { } ;

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

501

505

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

\node (Input) [above= of TCL] { } ;

502

506

\node (Input) [above= of TCL] { } ;

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

503

507

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

504

508

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

505

509

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

506

510

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

\node (Results) [left= of Postproc] { } ;

507

511

\node (Results) [left= of Postproc] { } ;

508

512

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

509

513

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

510

514

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

511

515

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

512

516

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

513

517

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

\draw[->,dashed] (Bitstream) -- (Deploy) ;

514

518

\draw[->,dashed] (Bitstream) -- (Deploy) ;

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

515

519

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

516

520

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

517

521

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

\draw[->] (Postproc) -- (Results) ;

518

522

\draw[->] (Postproc) -- (Results) ;

\end{tikzpicture}

519

523

\end{tikzpicture}

\caption{Design workflow from the input parameters to the results}

520

524

\caption{Design workflow from the input parameters to the results}

\label{fig:workflow}

521

525

\label{fig:workflow}

\end{figure}

522

526

\end{figure}

523

527

The filter solver is a C++ program that takes as input the maximum area

524

528

The filter solver is a C++ program that takes as input the maximum area

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

525

529

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

526

530

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

527

531

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

528

532

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

and a deploy script ((1b) on figure~\ref{fig:workflow}).

529

533

and a deploy script ((1b) on figure~\ref{fig:workflow}).

530

534

The TCL script describes the whole digital processing chain from the beginning

531

535

The TCL script describes the whole digital processing chain from the beginning

(the raw signal data) to the end (the filtered data) in a language compatible

532

536

(the raw signal data) to the end (the filtered data) in a language compatible

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

533

537

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

534

538

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

535

539

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

Then the script builds each stage of the chain with a generic FIR task that

536

540

Then the script builds each stage of the chain with a generic FIR task that

comes from a skeleton library. The generic FIR is highly configurable

537

541

comes from a skeleton library. The generic FIR is highly configurable

with the number of coefficients and the size of the coefficients. The coefficients

538

542

with the number of coefficients and the size of the coefficients. The coefficients

themselves are not stored in the script.

539

543

themselves are not stored in the script.

As the signal is processed in real-time, the output signal is stored as

540

544

As the signal is processed in real-time, the output signal is stored as

consecutive bursts of data for post-processing, mainly assessing the consistency of the

541

545

consecutive bursts of data for post-processing, mainly assessing the consistency of the

implemented FIR cascade transfer function with the design criteria and the expected

542

546

implemented FIR cascade transfer function with the design criteria and the expected

transfer function.

543

547

transfer function.

544

548

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

545

549

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

546

550

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

547

551

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

548

552

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

provide a broadband noise source.

549

553

provide a broadband noise source.

The board runs the Linux kernel and surrounding environment produced from the

550

554

The board runs the Linux kernel and surrounding environment produced from the

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

551

555

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

552

556

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

fetching the results is automated.

553

557

fetching the results is automated.

554

558

The deploy script uploads the bitstream to the board ((3) on

555

559

The deploy script uploads the bitstream to the board ((3) on

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

556

560

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

configures the coefficients of the FIR filters. It then waits for the results

557

561

configures the coefficients of the FIR filters. It then waits for the results

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

558

562

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

559

563

Finally, an Octave post-processing script computes the final results thanks to

560

564

Finally, an Octave post-processing script computes the final results thanks to

the output data ((5) on figure~\ref{fig:workflow}).

561

565

the output data ((5) on figure~\ref{fig:workflow}).

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

562

566

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

and the different configurations can be compared.

563

567

and the different configurations can be compared.

564

568

\section{Maximizing the rejection at fixed silicon area}

565

569

\section{Maximizing the rejection at fixed silicon area}

\label{sec:fixed_area}

566

570

\label{sec:fixed_area}

This section presents the output of the filter solver {\em i.e.} the computed

567

571

This section presents the output of the filter solver {\em i.e.} the computed

configurations for each stage, the computed rejection and the computed silicon area.

568

572

configurations for each stage, the computed rejection and the computed silicon area.

Such results allow for understanding the choices made by the solver to compute its solutions.

569

573

Such results allow for understanding the choices made by the solver to compute its solutions.

570

574

The experimental setup is composed of three cases. The raw input is generated

571

575

The experimental setup is composed of three cases. The raw input is generated

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

572

576

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

573

577

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

574

578

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$

575

579

The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$

ranging from 2 to 22. In each case, the quadratic program has been able to give a

576

580

ranging from 2 to 22. In each case, the quadratic program has been able to give a

result up to five stages ($n = 5$) in the cascaded filter.

577

581

result up to five stages ($n = 5$) in the cascaded filter.

578

582

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

579

583

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

580

584

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

581

585

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

582

586

\renewcommand{\arraystretch}{1.4}

583

587

\renewcommand{\arraystretch}{1.4}

584

588

\begin{table}

585

589

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

586

590

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

\label{tbl:gurobi_max_500}

587

591

\label{tbl:gurobi_max_500}

\centering

588

592

\centering

{\scalefont{0.77}

589

593

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

590

594

\begin{tabular}{|c|ccccc|c|c|}

\hline

591

595

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

592

596

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

593

597

\hline

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

594

598

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\

595

599

2 & (3, 3, 15) & (31, 9, 0) & - & - & - & 58~dB & 460 \\

3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\

596

600

3 & (3, 3, 15) & (27, 9, 0) & (5, 3, 0) & - & - & 66~dB & 488 \\

4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\

597

601

4 & (3, 3, 15) & (19, 7, 0) & (11, 5, 0) & (3, 3, 0) & - & 74~dB & 499 \\

5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\

598

602

5 & (3, 3, 15) & (23, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 78~dB & 489 \\

\hline

599

603

\hline

\end{tabular}

600

604

\end{tabular}

}

601

605

}

\end{table}

602

606

\end{table}

603

607

\begin{table}

604

608

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

605

609

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

\label{tbl:gurobi_max_1000}

606

610

\label{tbl:gurobi_max_1000}

\centering

607

611

\centering

{\scalefont{0.77}

608

612

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

609

613

\begin{tabular}{|c|ccccc|c|c|}

\hline

610

614

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

611

615

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

612

616

\hline

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

613

617

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\

614

618

2 & (3, 3, 15) & (51, 14, 0) & - & - & - & 87~dB & 975 \\

3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\

615

619

3 & (3, 3, 15) & (35, 11, 0) & (19, 7, 0) & - & - & 99~dB & 1000 \\

4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\

616

620

4 & (3, 4, 16) & (27, 8, 0) & (19, 7, 1) & (11, 5, 0) & - & 103~dB & 998 \\

5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\

617

621

5 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 1) & (3, 3, 0) & 111~dB & 984 \\

\hline

618

622

\hline

\end{tabular}

619

623

\end{tabular}

}

620

624

}

\end{table}

621

625

\end{table}

622

626

\begin{table}

623

627

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

624

628

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

\label{tbl:gurobi_max_1500}

625

629

\label{tbl:gurobi_max_1500}

\centering

626

630

\centering

{\scalefont{0.77}

627

631

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

628

632

\begin{tabular}{|c|ccccc|c|c|}

\hline

629

633

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

630

634

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

631

635

\hline

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

632

636

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\

633

637

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 103~dB & 1489 \\

3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\

634

638

3 & (3, 3, 15) & (35, 11, 0) & (35, 11, 0) & - & - & 122~dB & 1492 \\

4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\

635

639

4 & (3, 3, 15) & (27, 8, 0) & (19, 7, 0) & (27, 9, 0) & - & 129~dB & 1498 \\

5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\

636

640

5 & (3, 3, 15) & (23, 9, 2) & (27, 9, 0) & (19, 7, 0) & (3, 3, 0) & 136~dB & 1499 \\

\hline

637

641

\hline

\end{tabular}

638

642

\end{tabular}

}

639

643

}

\end{table}

640

644

\end{table}

641

645

\renewcommand{\arraystretch}{1}

642

646

\renewcommand{\arraystretch}{1}

643

647

From these tables, we can first state that the more stages are used to define

644

648

From these tables, we can first state that the more stages are used to define

the cascaded FIR filters, the better the rejection. It was an expected result as it has

645

649

the cascaded FIR filters, the better the rejection. It was an expected result as it has

been previously observed that many small filters are better than

646

650

been previously observed that many small filters are better than

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

647

651

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

being hardly used in practice due to the lack of tools for identifying individual filter

648

652

being hardly used in practice due to the lack of tools for identifying individual filter

coefficients in the cascaded approach.

649

653

coefficients in the cascaded approach.

650

654

Second, the larger the silicon area, the better the rejection. This was also an

651

655

Second, the larger the silicon area, the better the rejection. This was also an

expected result as more area means a filter of better quality with more coefficients

652

656

expected result as more area means a filter of better quality with more coefficients

or more bits per coefficient.

653

657

or more bits per coefficient.

654

658

Then, we also observe that the first stage can have a larger shift than the other

655

659

Then, we also observe that the first stage can have a larger shift than the other

stages. This is explained by the fact that the solver tries to use just enough

656

660

stages. This is explained by the fact that the solver tries to use just enough

bits for the computed rejection after each stage. In the first stage, a

657

661

bits for the computed rejection after each stage. In the first stage, a

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

658

662

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

gives the relation between both values.

659

663

gives the relation between both values.

660

664

Finally, we note that the solver consumes all the given silicon area.

661

665

Finally, we note that the solver consumes all the given silicon area.

662

666

The following graphs present the rejection for real data on the FPGA. In all the following

663

667

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

664

668

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line are the noise levels

665

669

data on the FPGA as measured experimentally and the dashed line are the noise levels

given by the quadratic solver. The configurations are those computed in the previous section.

666

670

given by the quadratic solver. The configurations are those computed in the previous section.

667

671

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

668

672

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

669

673

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

670

674

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

671

675

% \begin{figure}

672

676

% \begin{figure}

% \centering

673

677

% \centering

% \includegraphics[width=\linewidth]{images/max_500}

674

678

% \includegraphics[width=\linewidth]{images/max_500}

% \caption{Signal spectrum for MAX/500}

675

679

% \caption{Signal spectrum for MAX/500}

% \label{fig:max_500_result}

676

680

% \label{fig:max_500_result}

% \end{figure}

677

681

% \end{figure}

%

678

682

%

% \begin{figure}

679

683

% \begin{figure}

% \centering

680

684

% \centering

% \includegraphics[width=\linewidth]{images/max_1000}

681

685

% \includegraphics[width=\linewidth]{images/max_1000}

% \caption{Signal spectrum for MAX/1000}

682

686

% \caption{Signal spectrum for MAX/1000}

% \label{fig:max_1000_result}

683

687

% \label{fig:max_1000_result}

% \end{figure}

684

688

% \end{figure}

%

685

689

%

% \begin{figure}

686

690

% \begin{figure}

% \centering

687

691

% \centering

% \includegraphics[width=\linewidth]{images/max_1500}

688

692

% \includegraphics[width=\linewidth]{images/max_1500}

% \caption{Signal spectrum for MAX/1500}

689

693

% \caption{Signal spectrum for MAX/1500}

% \label{fig:max_1500_result}

690

694

% \label{fig:max_1500_result}

% \end{figure}

691

695

% \end{figure}

692

696

% r2.14 et r2.15 et r2.16

693

697

% r2.14 et r2.15 et r2.16

\begin{figure}

694

698

\begin{figure}

\centering

695

699

\centering

\begin{subfigure}{\linewidth}

696

700

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_500}

697

701

\includegraphics[width=\linewidth]{images/max_500}

\caption{Signal spectrum for MAX/500}

698

702

\caption{Signal spectrum for MAX/500}

\label{fig:max_500_result}

699

703

\label{fig:max_500_result}

\end{subfigure}

700

704

\end{subfigure}

701

705

\begin{subfigure}{\linewidth}

702

706

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1000}

703

707

\includegraphics[width=\linewidth]{images/max_1000}

\caption{Signal spectrum for MAX/1000}

704

708

\caption{Signal spectrum for MAX/1000}

\label{fig:max_1000_result}

705

709

\label{fig:max_1000_result}

\end{subfigure}

706

710

\end{subfigure}

707

711

\begin{subfigure}{\linewidth}

708

712

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1500}

709

713

\includegraphics[width=\linewidth]{images/max_1500}

\caption{Signal spectrum for MAX/1500}

710

714

\caption{Signal spectrum for MAX/1500}

\label{fig:max_1500_result}

711

715

\label{fig:max_1500_result}

\end{subfigure}

712

716

\end{subfigure}

\caption{Signal spectrum of each experimental configurations MAX/500, MAX/1000 and MAX/1500}

713

717

\caption{Signal spectrum of each experimental configurations MAX/500, MAX/1000 and MAX/1500}

\end{figure}

714

718

\end{figure}

715

719

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

716

720

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

717

721

We compare the actual silicon resources given by Vivado to the

718

722

We compare the actual silicon resources given by Vivado to the

resources in arbitrary units.

719

723

resources in arbitrary units.

The goal is to check that our arbitrary units of silicon area models well enough

720

724

The goal is to check that our arbitrary units of silicon area models well enough

the real resources on the FPGA. Especially we want to verify that, for a given

721

725

the real resources on the FPGA. Especially we want to verify that, for a given

number of arbitrary units, the actual silicon resources do not depend on the

722

726

number of arbitrary units, the actual silicon resources do not depend on the

number of stages $n$. Most significantly, our approach aims

723

727

number of stages $n$. Most significantly, our approach aims

at remaining far enough from the practical logic gate implementation used by

724

728

at remaining far enough from the practical logic gate implementation used by

various vendors to remain platform independent and be portable from one

725

729

various vendors to remain platform independent and be portable from one

architecture to another.

726

730

architecture to another.

727

731

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

728

732

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

729

733

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

and 1500 arbitrary units. We have taken care to extract solely the resources used by

730

734

and 1500 arbitrary units. We have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and Programmable

731

735

the FIR filters and remove additional processing blocks including FIFO and Programmable

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

732

736

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

733

737

\begin{table}[h!tb]

734

738

\begin{table}[h!tb]

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

735

739

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage}

736

740

\label{tbl:resources_usage}

\centering

737

741

\centering

\begin{tabular}{|c|c|ccc|c|}

738

742

\begin{tabular}{|c|c|ccc|c|}

\hline

739

743

\hline

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

740

744

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 249 & 453 & 627 & \emph{17600} \\

741

745

& LUT & 249 & 453 & 627 & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

742

746

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

743

747

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

& LUT & 2374 & 5494 & 691 & \emph{17600} \\

744

748

& LUT & 2374 & 5494 & 691 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

745

749

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 0 & 70 & \emph{80} \\ \hline

746

750

& DSP & 0 & 0 & 70 & \emph{80} \\ \hline

& LUT & 2443 & 3304 & 3521 & \emph{17600} \\

747

751

& LUT & 2443 & 3304 & 3521 & \emph{17600} \\

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

748

752

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

& DSP & 0 & 19 & 35 & \emph{80} \\ \hline

749

753

& DSP & 0 & 19 & 35 & \emph{80} \\ \hline

& LUT & 2634 & 3753 & 2557 & \emph{17600} \\

750

754

& LUT & 2634 & 3753 & 2557 & \emph{17600} \\

4 & BRAM & 4 & 4 & 4 & \emph{120} \\

751

755

4 & BRAM & 4 & 4 & 4 & \emph{120} \\

& DPS & 0 & 19 & 46 & \emph{80} \\ \hline

752

756

& DPS & 0 & 19 & 46 & \emph{80} \\ \hline

& LUT & 2423 & 3047 & 2847 & \emph{17600} \\

753

757

& LUT & 2423 & 3047 & 2847 & \emph{17600} \\

5 & BRAM & 5 & 5 & 5 & \emph{120} \\

754

758

5 & BRAM & 5 & 5 & 5 & \emph{120} \\

& DPS & 0 & 22 & 46 & \emph{80} \\ \hline

755

759

& DPS & 0 & 22 & 46 & \emph{80} \\ \hline

\end{tabular}

756

760

\end{tabular}

\end{table}

757

761

\end{table}

758

762

In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

759

763

In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

when the filter coefficients are small enough, or when the input size is small

760

764

when the filter coefficients are small enough, or when the input size is small

enough, Vivado optimizes resource consumption by selecting multiplexers to

761

765

enough, Vivado optimizes resource consumption by selecting multiplexers to

implement the multiplications instead of a DSP. In this case, it is quite difficult

762

766

implement the multiplications instead of a DSP. In this case, it is quite difficult

to compare the whole silicon budget.

763

767

to compare the whole silicon budget.

764

768

However, a rough estimation can be made with a simple equivalence: looking at

765

769

However, a rough estimation can be made with a simple equivalence: looking at

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

766

770

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

767

771

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs,

768

772

area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs,

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

769

773

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

770

774

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

unit map well to actual hardware resources. The relatively small differences can probably be explained

771

775

unit map well to actual hardware resources. The relatively small differences can probably be explained

by the optimizations done by Vivado based on the detailed map of available processing resources.

772

776

by the optimizations done by Vivado based on the detailed map of available processing resources.

773

777

We now present the computation time needed to solve the quadratic problem.

774

778

We now present the computation time needed to solve the quadratic problem.

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

775

779

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

776

780

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

777

781

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

778

782

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

779

783

\begin{table}[h!tb]

780

784

\begin{table}[h!tb]

\caption{Time needed to solve the quadratic program with Gurobi}

781

785

\caption{Time needed to solve the quadratic program with Gurobi}

\label{tbl:area_time}

782

786

\label{tbl:area_time}

\centering

783

787

\centering

\begin{tabular}{|c|c|c|c|}\hline

784

788

\begin{tabular}{|c|c|c|c|}\hline

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

785

789

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

1 & 0.1~s & 0.1~s & 0.3~s \\

786

790

1 & 0.1~s & 0.1~s & 0.3~s \\

2 & 1.1~s & 2.2~s & 12~s \\

787

791

2 & 1.1~s & 2.2~s & 12~s \\

3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\

788

792

3 & 17~s & 137~s ($\approx$ 2~min) & 275~s ($\approx$ 4~min) \\

4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\

789

793

4 & 52~s & 5448~s ($\approx$ 90~min) & 5505~s ($\approx$ 17~h) \\

5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline

790

794

5 & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min) & 235479~s ($\approx$ 3~days) \\\hline

\end{tabular}

791

795

\end{tabular}

\end{table}

792

796

\end{table}

793

797

As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?

794

798

As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?

When the area is limited, the design exploration space is more limited and the solver is able to

795

799

When the area is limited, the design exploration space is more limited and the solver is able to

find an optimal solution faster.

796

800

find an optimal solution faster.

797

801

\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}

798

802

\subsection{Minimizing resource occupation at fixed rejection}\label{sec:fixed_rej}

799

803

This section presents the results of the complementary quadratic program aimed at

800

804

This section presents the results of the complementary quadratic program aimed at

minimizing the area occupation for a targeted rejection level.

801

805

minimizing the area occupation for a targeted rejection level.

802

806

The experimental setup is composed of four cases. The raw input is the same

803

807

The experimental setup is composed of four cases. The raw input is the same

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

804

808

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

805

809

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

806

810

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

The number of configurations $p$ is the same as previous section.

807

811

The number of configurations $p$ is the same as previous section.

808

812

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

809

813

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

810

814

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

811

815

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

812

816

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

813

817

\renewcommand{\arraystretch}{1.4}

814

818

\renewcommand{\arraystretch}{1.4}

815

819

\begin{table}[h!tb]

816

820

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

817

821

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

\label{tbl:gurobi_min_40}

818

822

\label{tbl:gurobi_min_40}

\centering

819

823

\centering

{\scalefont{0.77}

820

824

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

821

825

\begin{tabular}{|c|ccccc|c|c|}

\hline

822

826

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

823

827

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

824

828

\hline

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

825

829

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\

826

830

2 & (3, 2, 14) & (19, 7, 0) & - & - & - & 40~dB & 263 \\

3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\

827

831

3 & (3, 3, 15) & (11, 5, 0) & (3, 3, 0) & - & - & 41~dB & 192 \\

4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\

828

832

4 & (3, 3, 15) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & - & 42~dB & 147 \\

\hline

829

833

\hline

\end{tabular}

830

834

\end{tabular}

}

831

835

}

\end{table}

832

836

\end{table}

833

837

\begin{table}[h!tb]

834

838

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

835

839

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

\label{tbl:gurobi_min_60}

836

840

\label{tbl:gurobi_min_60}

\centering

837

841

\centering

{\scalefont{0.77}

838

842

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

839

843

\begin{tabular}{|c|ccccc|c|c|}

\hline

840

844

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

841

845

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

842

846

\hline

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

843

847

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\

844

848

2 & (3, 3, 15) & (35, 10, 0) & - & - & - & 60~dB & 547 \\

3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\

845

849

3 & (3, 3, 15) & (27, 8, 0) & (3, 3, 0) & - & - & 62~dB & 426 \\

4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\

846

850

4 & (3, 2, 14) & (11, 5, 1) & (11, 5, 0) & (3, 3, 0) & - & 60~dB & 344 \\

5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\

847

851

5 & (3, 2, 14) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & (3, 3, 0) & 60~dB & 279 \\

\hline

848

852

\hline

\end{tabular}

849

853

\end{tabular}

}

850

854

}

\end{table}

851

855

\end{table}

852

856

\begin{table}[h!tb]

853

857

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

854

858

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

\label{tbl:gurobi_min_80}

855

859

\label{tbl:gurobi_min_80}

\centering

856

860

\centering

{\scalefont{0.77}

857

861

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

858

862

\begin{tabular}{|c|ccccc|c|c|}

\hline

859

863

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

860

864

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

861

865

\hline

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

862

866

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\

863

867

2 & (3, 3, 15) & (47, 14, 0) & - & - & - & 80~dB & 903 \\

3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\

864

868

3 & (3, 3, 15) & (23, 9, 0) & (19, 7, 0) & - & - & 80~dB & 698 \\

4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\

865

869

4 & (3, 3, 15) & (27, 9, 0) & (7, 7, 4) & (3, 3, 0) & - & 80~dB & 605 \\

5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\

866

870

5 & (3, 2, 14) & (27, 8, 0) & (3, 3, 1) & (3, 3, 0) & (3, 3, 0) & 81~dB & 534 \\

\hline

867

871

\hline

\end{tabular}

868

872

\end{tabular}

}

869

873

}

\end{table}

870

874

\end{table}

871

875

\begin{table}[h!tb]

872

876

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

873

877

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

\label{tbl:gurobi_min_100}

874

878

\label{tbl:gurobi_min_100}

\centering

875

879

\centering

{\scalefont{0.77}

876

880

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

877

881

\begin{tabular}{|c|ccccc|c|c|}

\hline

878

882

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

879

883

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

880

884

\hline

1 & - & - & - & - & - & - & - \\

881

885

1 & - & - & - & - & - & - & - \\

2 & (15, 7, 17) & (51, 14, 0) & - & - & - & 100~dB & 1365 \\

882

886

2 & (15, 7, 17) & (51, 14, 0) & - & - & - & 100~dB & 1365 \\

3 & (3, 3, 15) & (27, 9, 0) & (27, 9, 0) & - & - & 100~dB & 1002 \\

883

887

3 & (3, 3, 15) & (27, 9, 0) & (27, 9, 0) & - & - & 100~dB & 1002 \\

4 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 0) & - & 101~dB & 909 \\

884

888

4 & (3, 3, 15) & (31, 9, 0) & (19, 7, 0) & (3, 3, 0) & - & 101~dB & 909 \\

5 & (3, 3, 15) & (23, 8, 1) & (19, 7, 0) & (3, 3, 0) & (3, 3, 0) & 101~dB & 810 \\

885

889

5 & (3, 3, 15) & (23, 8, 1) & (19, 7, 0) & (3, 3, 0) & (3, 3, 0) & 101~dB & 810 \\

\hline

886

890

\hline

\end{tabular}

887

891

\end{tabular}

}

888

892

}

\end{table}

889

893

\end{table}

\renewcommand{\arraystretch}{1}

890

894

\renewcommand{\arraystretch}{1}

891

895

From these tables, we can first state that almost all configurations reach the targeted rejection

892

896

From these tables, we can first state that almost all configurations reach the targeted rejection

level or even better thanks to our underestimate of the cascade rejection as the sum of the

893

897

level or even better thanks to our underestimate of the cascade rejection as the sum of the

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

894

898

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

895

899

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters

896

900

Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters

(1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection

897

901

(1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection

respectively). More generally, the more filters are cascaded, the lower the occupied area.

898

902

respectively). More generally, the more filters are cascaded, the lower the occupied area.

899

903

Like in previous section, the solver chooses always a little filter as first

900

904

Like in previous section, the solver chooses always a little filter as first

filter stage and the second one is often the biggest filter. This choice can be explained

901

905

filter stage and the second one is often the biggest filter. This choice can be explained

as in the previous section, with the solver using just enough bits not to degrade the input

902

906

as in the previous section, with the solver using just enough bits not to degrade the input

signal and in the second filter selecting a better filter to improve rejection without

903

907

signal and in the second filter selecting a better filter to improve rejection without

having too many bits in the output data.

904

908

having too many bits in the output data.

905

909

For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal

906

910

For the specific case of MIN/40 for $n = 5$ the solver has determined that the optimal

number of filters is 4 so it did not chose any configuration for the last filter. Hence this

907

911

number of filters is 4 so it did not chose any configuration for the last filter. Hence this

solution is equivalent to the result for $n = 4$.

908

912

solution is equivalent to the result for $n = 4$.

909

913

The following graphs present the rejection for real data on the FPGA. In all the following

910

914

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

911

915

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line is the noise level

912

916

data on the FPGA as measured experimentally and the dashed line is the noise level

given by the quadratic solver.

913

917

given by the quadratic solver.

914

918

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

915

919

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

916

920

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

917

921

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

918

922

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

919

923

% \begin{figure}

920

924

% \begin{figure}

% \centering

921

925

% \centering

% \includegraphics[width=\linewidth]{images/min_40}

922

926

% \includegraphics[width=\linewidth]{images/min_40}

% \caption{Signal spectrum for MIN/40}

923

927

% \caption{Signal spectrum for MIN/40}

% \label{fig:min_40}

924

928

% \label{fig:min_40}

% \end{figure}

925

929

% \end{figure}

%

926

930

%

% \begin{figure}

927

931

% \begin{figure}

% \centering

928

932

% \centering

% \includegraphics[width=\linewidth]{images/min_60}

929

933

% \includegraphics[width=\linewidth]{images/min_60}

% \caption{Signal spectrum for MIN/60}

930

934

% \caption{Signal spectrum for MIN/60}

% \label{fig:min_60}

931

935

% \label{fig:min_60}

% \end{figure}

932

936

% \end{figure}

%

933

937

%

% \begin{figure}

934

938

% \begin{figure}

% \centering

935

939

% \centering

% \includegraphics[width=\linewidth]{images/min_80}

936

940

% \includegraphics[width=\linewidth]{images/min_80}

% \caption{Signal spectrum for MIN/80}

937

941

% \caption{Signal spectrum for MIN/80}

% \label{fig:min_80}

938

942

% \label{fig:min_80}

% \end{figure}

939

943

% \end{figure}

%

940

944

%

% \begin{figure}

941

945

% \begin{figure}

% \centering

942

946

% \centering

% \includegraphics[width=\linewidth]{images/min_100}

943

947

% \includegraphics[width=\linewidth]{images/min_100}

% \caption{Signal spectrum for MIN/100}

944

948

% \caption{Signal spectrum for MIN/100}

% \label{fig:min_100}

945

949

% \label{fig:min_100}

% \end{figure}

946

950

% \end{figure}

947

951

% r2.14 et r2.15 et r2.16

948

952

% r2.14 et r2.15 et r2.16

\begin{figure}

949

953

\begin{figure}

\centering

950

954

\centering

\begin{subfigure}{\linewidth}

951

955

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_40}

952

956

\includegraphics[width=\linewidth]{images/min_40}

\caption{Signal spectrum for MIN/40}

953

957

\caption{Signal spectrum for MIN/40}

\label{fig:min_40}

954

958

\label{fig:min_40}

\end{subfigure}

955

959

\end{subfigure}

956

960

\begin{subfigure}{\linewidth}

957

961

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_60}

958

962

\includegraphics[width=\linewidth]{images/min_60}

\caption{Signal spectrum for MIN/60}

959

963

\caption{Signal spectrum for MIN/60}

\label{fig:min_60}

960

964

\label{fig:min_60}

\end{subfigure}

961

965

\end{subfigure}

962

966

\begin{subfigure}{\linewidth}

963

967

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_80}

964

968

\includegraphics[width=\linewidth]{images/min_80}

\caption{Signal spectrum for MIN/80}

965

969

\caption{Signal spectrum for MIN/80}

\label{fig:min_80}

966

970

\label{fig:min_80}

\end{subfigure}

967

971

\end{subfigure}

968

972

\begin{subfigure}{\linewidth}

969

973

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/min_100}

970

974

\includegraphics[width=\linewidth]{images/min_100}

\caption{Signal spectrum for MIN/100}

971

975

\caption{Signal spectrum for MIN/100}

\label{fig:min_100}

972

976

\label{fig:min_100}

\end{subfigure}

973

977

\end{subfigure}

\caption{Signal spectrum of each experimental configurations MIN/40, MIN/60, MIN/80 and MIN/100}

974

978

\caption{Signal spectrum of each experimental configurations MIN/40, MIN/60, MIN/80 and MIN/100}

\end{figure}

975

979

\end{figure}

976

980

We observe that all rejections given by the quadratic solver are close to the experimentally

977

981

We observe that all rejections given by the quadratic solver are close to the experimentally

measured rejection. All curves prove that the constraint to reach the target rejection is

978

982

measured rejection. All curves prove that the constraint to reach the target rejection is

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

979

983

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

980

984

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

981

985

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

982

986

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

have taken care to extract solely the resources used by

983

987

have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and PL to

984

988

the FIR filters and remove additional processing blocks including FIFO and PL to

PS communication.

985

989

PS communication.

986

990

\renewcommand{\arraystretch}{1.2}

987

991

\renewcommand{\arraystretch}{1.2}

\begin{table}

988

992

\begin{table}

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

989

993

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage_comp}

990

994

\label{tbl:resources_usage_comp}

\centering

991

995

\centering

{\scalefont{0.90}

992

996

{\scalefont{0.90}

\begin{tabular}{|c|c|cccc|c|}

993

997

\begin{tabular}{|c|c|cccc|c|}

\hline

994

998

\hline

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

995

999

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

996

1000

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

997

1001

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

998

1002

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

& LUT & 1252 & 2862 & 5099 & 640 & \emph{17600} \\

999

1003

& LUT & 1252 & 2862 & 5099 & 640 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

1000

1004

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 0 & 0 & 66 & \emph{80} \\ \hline

1001

1005

& DSP & 0 & 0 & 0 & 66 & \emph{80} \\ \hline

& LUT & 891 & 2148 & 2023 & 2448 & \emph{17600} \\

1002

1006

& LUT & 891 & 2148 & 2023 & 2448 & \emph{17600} \\

3 & BRAM & 3 & 3 & 3 & 3 & \emph{120} \\

1003

1007

3 & BRAM & 3 & 3 & 3 & 3 & \emph{120} \\

& DSP & 0 & 0 & 19 & 27 & \emph{80} \\ \hline

1004

1008

& DSP & 0 & 0 & 19 & 27 & \emph{80} \\ \hline

& LUT & 662 & 1729 & 2451 & 2893 & \emph{17600} \\

1005

1009

& LUT & 662 & 1729 & 2451 & 2893 & \emph{17600} \\

4 & BRAM & 4 & 4 & 4 & 4 & \emph{120} \\

1006

1010

4 & BRAM & 4 & 4 & 4 & 4 & \emph{120} \\

& DPS & 0 & 0 & 7 & 19 & \emph{80} \\ \hline

1007

1011

& DPS & 0 & 0 & 7 & 19 & \emph{80} \\ \hline

& LUT & - & 1259 & 2602 & 2505 & \emph{17600} \\

1008

1012

& LUT & - & 1259 & 2602 & 2505 & \emph{17600} \\

5 & BRAM & - & 5 & 5 & 5 & \emph{120} \\

1009

1013

5 & BRAM & - & 5 & 5 & 5 & \emph{120} \\

& DPS & - & 0 & 0 & 19 & \emph{80} \\ \hline

1010

1014

& DPS & - & 0 & 0 & 19 & \emph{80} \\ \hline

\end{tabular}

1011

1015

\end{tabular}

}

1012

1016

}

\end{table}

1013

1017

\end{table}

\renewcommand{\arraystretch}{1}

1014

1018

\renewcommand{\arraystretch}{1}

1015

1019

If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)

1016

1020

If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)

the real resource consumption decreases as a function of the number of stages in the cascaded

1017

1021

the real resource consumption decreases as a function of the number of stages in the cascaded

filter according

1018

1022

filter according

to the solution given by the quadratic solver. Indeed, we have always a decreasing

1019

1023

to the solution given by the quadratic solver. Indeed, we have always a decreasing

consumption even if the difference between the monolithic and the two cascaded

1020

1024

consumption even if the difference between the monolithic and the two cascaded

filters is less than expected.

1021

1025

filters is less than expected.

1022

1026

Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve

1023

1027

Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve

GITLAB

jfriedt / IFCS2018 article

relecture