jfriedt / IFCS2018 article

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

1

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

\usepackage{graphicx,color,hyperref}

2

\usepackage{graphicx,color,hyperref}

\usepackage{amsfonts}

3

\usepackage{amsfonts}

\usepackage{amsthm}

4

\usepackage{amsthm}

\usepackage{amssymb}

5

\usepackage{amssymb}

\usepackage{amsmath}

6

\usepackage{amsmath}

\usepackage{algorithm2e}

7

\usepackage{algorithm2e}

\usepackage{url,balance}

8

\usepackage{url,balance}

\usepackage[normalem]{ulem}

9

\usepackage[normalem]{ulem}

\usepackage{tikz}

10

\usepackage{tikz}

\usetikzlibrary{positioning,fit}

11

\usetikzlibrary{positioning,fit}

\usepackage{multirow}

12

\usepackage{multirow}

\usepackage{scalefnt}

13

\usepackage{scalefnt}

\usepackage{caption}

14

\usepackage{caption}

\usepackage{subcaption}

15

\usepackage{subcaption}

16

17

\hyphenation{op-tical net-works semi-conduc-tor}

18

\hyphenation{op-tical net-works semi-conduc-tor}

\textheight=26cm

19

\textheight=26cm

\setlength{\footskip}{30pt}

20

\setlength{\footskip}{30pt}

\pagenumbering{gobble}

21

\pagenumbering{gobble}

\begin{document}

22

\begin{document}

\title{Filter optimization for real time digital processing of radiofrequency signals: application

23

\title{Filter optimization for real time digital processing of radiofrequency signals: application

to oscillator metrology}

24

to oscillator metrology}

25

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

26

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

G. Goavec-M\'erou\IEEEauthorrefmark{1},

27

G. Goavec-M\'erou\IEEEauthorrefmark{1},

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

28

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

29

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

30

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

Email: \{pyb2,jmfriedt\}@femto-st.fr}

31

Email: \{pyb2,jmfriedt\}@femto-st.fr}

}

32

}

\maketitle

33

\maketitle

\thispagestyle{plain}

34

\thispagestyle{plain}

\pagestyle{plain}

35

\pagestyle{plain}

\newtheorem{definition}{Definition}

36

\newtheorem{definition}{Definition}

37

\begin{abstract}

38

\begin{abstract}

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

39

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

radiofrequency signal processing. Applied to oscillator characterization in the context

40

radiofrequency signal processing. Applied to oscillator characterization in the context

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

41

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

noise rejection needs. Since real time radiofrequency processing must be performed in a

42

noise rejection needs. Since real time radiofrequency processing must be performed in a

Field Programmable Array to meet timing constraints, we investigate optimization strategies

43

Field Programmable Array to meet timing constraints, we investigate optimization strategies

to design filters meeting rejection characteristics while limiting the hardware resources

44

to design filters meeting rejection characteristics while limiting the hardware resources

required and keeping timing constraints within the targeted measurement bandwidths. The

45

required and keeping timing constraints within the targeted measurement bandwidths. The

presented technique is applicable to scheduling any sequence of processing blocks characterized

46

presented technique is applicable to scheduling any sequence of processing blocks characterized

by a throughput, resource occupation and performance tabulated as a function of configuration

47

by a throughput, resource occupation and performance tabulated as a function of configuration

characateristics, as is the case for filters with their coefficients and resolution yielding

48

characateristics, as is the case for filters with their coefficients and resolution yielding

rejection and number of multipliers.

49

rejection and number of multipliers.

\end{abstract}

50

\end{abstract}

51

\begin{IEEEkeywords}

52

\begin{IEEEkeywords}

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

53

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

\end{IEEEkeywords}

54

\end{IEEEkeywords}

55

\section{Digital signal processing of ultrastable clock signals}

56

\section{Digital signal processing of ultrastable clock signals}

57

Analog oscillator phase noise characteristics are classically performed by downconverting

58

Analog oscillator phase noise characteristics are classically performed by downconverting

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

59

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

60

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

61

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

62

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

63

\begin{figure}[h!tb]

64

\begin{figure}[h!tb]

\begin{center}

65

\begin{center}

\includegraphics[width=.8\linewidth]{images/schema}

66

\includegraphics[width=.8\linewidth]{images/schema}

\end{center}

67

\end{center}

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

68

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

69

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

70

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

71

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

72

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

the spectral characteristics of the phase fluctuations.}

73

the spectral characteristics of the phase fluctuations.}

\label{schema}

74

\label{schema}

\end{figure}

75

\end{figure}

76

As with the analog mixer,

77

As with the analog mixer,

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

78

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

well as the generation of the frequency sum signal in addition to the frequency difference.

79

well as the generation of the frequency sum signal in addition to the frequency difference.

These unwanted spectral characteristics must be rejected before decimating the data stream

80

These unwanted spectral characteristics must be rejected before decimating the data stream

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

81

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

downconverter

82

downconverter

and the decimation processing blocks are core characteristics of an oscillator characterization

83

and the decimation processing blocks are core characteristics of an oscillator characterization

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

84

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

85

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

86

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

datastream: optimizing the performance of the filter while reducing the needed resources is

87

datastream: optimizing the performance of the filter while reducing the needed resources is

hence tackled in a systematic approach using optimization techniques. Most significantly, we

88

hence tackled in a systematic approach using optimization techniques. Most significantly, we

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

89

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

tunable number of coefficients and tunable number of bits representing the coefficients and the

90

tunable number of coefficients and tunable number of bits representing the coefficients and the

data being processed.

91

data being processed.

92

\section{Finite impulse response filter}

93

\section{Finite impulse response filter}

94

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

95

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

96

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

outputs $y_k$

97

outputs $y_k$

\begin{align}

98

\begin{align}

y_n=\sum_{k=0}^N b_k x_{n-k}

99

y_n=\sum_{k=0}^N b_k x_{n-k}

\label{eq:fir_equation}

100

\label{eq:fir_equation}

\end{align}

101

\end{align}

102

As opposed to an implementation on a general purpose processor in which word size is defined by the

103

As opposed to an implementation on a general purpose processor in which word size is defined by the

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

104

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

not only the coefficient values and number of taps must be defined, but also the number of bits

105

not only the coefficient values and number of taps must be defined, but also the number of bits

defining the coefficients and the sample size. For this reason, and because we consider pipeline

106

defining the coefficients and the sample size. For this reason, and because we consider pipeline

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

107

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

108

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

109

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

(VHDL) level.

110

(VHDL) level.

Since latency is not an issue in a openloop phase noise characterization instrument,

111

Since latency is not an issue in a openloop phase noise characterization instrument,

the large

112

the large

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

113

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

is not considered as an issue as would be in a closed loop system.

114

is not considered as an issue as would be in a closed loop system.

115

The coefficients are classically expressed as floating point values. However, this binary

116

The coefficients are classically expressed as floating point values. However, this binary

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

117

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

we select to quantify these floating point values into integer values. This quantization

118

we select to quantify these floating point values into integer values. This quantization

will result in some precision loss.

119

will result in some precision loss.

120

\begin{figure}[h!tb]

121

\begin{figure}[h!tb]

\includegraphics[width=\linewidth]{images/zero_values}

122

\includegraphics[width=\linewidth]{images/zero_values}

\caption{Impact of the quantization resolution of the coefficients: the quantization is

123

\caption{Impact of the quantization resolution of the coefficients: the quantization is

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

124

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

the 30~first and 30~last coefficients out of the initial 128~band-pass

125

the 30~first and 30~last coefficients out of the initial 128~band-pass

filter coefficients to 0 (red dots).}

126

filter coefficients to 0 (red dots).}

\label{float_vs_int}

127

\label{float_vs_int}

\end{figure}

128

\end{figure}

129

The tradeoff between quantization resolution and number of coefficients when considering

130

The tradeoff between quantization resolution and number of coefficients when considering

integer operations is not trivial. As an illustration of the issue related to the

131

integer operations is not trivial. As an illustration of the issue related to the

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

132

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

133

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

134

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

taps become null, making the large number of coefficients irrelevant: processing

135

taps become null, making the large number of coefficients irrelevant: processing

resources

136

resources

are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources

137

are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources

to reach a given rejection level, or maximizing out of band rejection for a given computational

138

to reach a given rejection level, or maximizing out of band rejection for a given computational

resource, will drive the investigation on cascading filters designed with varying tap resolution

139

resource, will drive the investigation on cascading filters designed with varying tap resolution

and tap length, as will be shown in the next section. Indeed, our development strategy closely

140

and tap length, as will be shown in the next section. Indeed, our development strategy closely

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

141

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

in which basic blocks are defined and characterized before being assembled \cite{hide}

142

in which basic blocks are defined and characterized before being assembled \cite{hide}

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

143

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

combination process since we assume a single value to be processed and a single value to be

144

combination process since we assume a single value to be processed and a single value to be

generated at each clock cycle. The FIR filters will not be considered to decimate in the

145

generated at each clock cycle. The FIR filters will not be considered to decimate in the

current implementation: the decimation is assumed to be located after the FIR cascade at the

146

current implementation: the decimation is assumed to be located after the FIR cascade at the

moment.

147

moment.

148

\section{Methodology description}

149

\section{Methodology description}

150

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

151

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

152

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

Achieving such a target requires defining an abstract model to represent some basic properties

153

Achieving such a target requires defining an abstract model to represent some basic properties

of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and

154

of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and

resource occupation. These abstract properties, not necessarily related to the detailed hardware

155

resource occupation. These abstract properties, not necessarily related to the detailed hardware

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

156

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

157

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

minimizing resource occupation for a given performance. In our approach, the solution of the

158

minimizing resource occupation for a given performance. In our approach, the solution of the

solver is then synthesized using the dedicated tool provided by each platform manufacturer

159

solver is then synthesized using the dedicated tool provided by each platform manufacturer

to assess the validity of our abstract resource occupation indicator, and the result of running

160

to assess the validity of our abstract resource occupation indicator, and the result of running

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

161

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

that all solutions found by the solver are synthesized and executed on hardware at the end

162

that all solutions found by the solver are synthesized and executed on hardware at the end

of the analysis.

163

of the analysis.

164

In this demonstration, we focus on only two operations: filtering and shifting the number of

165

In this demonstration, we focus on only two operations: filtering and shifting the number of

bits needed to represent the data along the processing chain.

166

bits needed to represent the data along the processing chain.

We have chosen these basic operations because shifting and the filtering have already been studied

167

We have chosen these basic operations because shifting and the filtering have already been studied

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

168

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

169

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

requiring pipelined processing at full bandwidth for the earliest steps, including for

170

requiring pipelined processing at full bandwidth for the earliest steps, including for

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

171

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

172

Addressing only two operations allows for demonstrating the methodology but should not be

173

Addressing only two operations allows for demonstrating the methodology but should not be

considered as a limitation of the framework which can be extended to assembling any number

174

considered as a limitation of the framework which can be extended to assembling any number

of skeleton blocks as long as performance and resource occupation can be determined.

175

of skeleton blocks as long as performance and resource occupation can be determined.

Hence,

176

Hence,

in this paper we will apply our methodology on simple DSP chains: a white noise input signal

177

in this paper we will apply our methodology on simple DSP chains: a white noise input signal

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

178

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been

179

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

180

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

181

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

182

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

allowing to assess either filter rejection for a given resource usage, or validating the rejection

183

allowing to assess either filter rejection for a given resource usage, or validating the rejection

when implementing a solution minimizing resource occupation.

184

when implementing a solution minimizing resource occupation.

185

The first step of our approach is to model the DSP chain. Since we aim at only optimizing

186

The first step of our approach is to model the DSP chain. Since we aim at only optimizing

the filtering part of the signal processing chain, we have not included the PRN generator or the

187

the filtering part of the signal processing chain, we have not included the PRN generator or the

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

188

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

The filtering can be done in two ways, either by considering a single monolithic FIR filter

189

The filtering can be done in two ways, either by considering a single monolithic FIR filter

requiring many coefficients to reach the targeted noise rejection ratio, or by

190

requiring many coefficients to reach the targeted noise rejection ratio, or by

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.

191

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.

192

After each filter we leave the possibility of shifting the filtered data to consume

193

After each filter we leave the possibility of shifting the filtered data to consume

less resources. Hence in the case of cascaded filter, we define a stage as a filter

194

less resources. Hence in the case of cascaded filter, we define a stage as a filter

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

195

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

196

\subsection{Model of a FIR filter}

197

\subsection{Model of a FIR filter}

198

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

199

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

200

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

201

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

202

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

shows a filtering stage.

203

shows a filtering stage.

204

\begin{figure}

205

\begin{figure}

\centering

206

\centering

\begin{tikzpicture}[node distance=2cm]

207

\begin{tikzpicture}[node distance=2cm]

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

208

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

209

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

\node (Start) [left of=FIR] { } ;

210

\node (Start) [left of=FIR] { } ;

\node (End) [right of=Shift] { } ;

211

\node (End) [right of=Shift] { } ;

212

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

213

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

214

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

215

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

\draw[->] (FIR) -- (Shift) ;

216

\draw[->] (FIR) -- (Shift) ;

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

217

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

\end{tikzpicture}

218

\end{tikzpicture}

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

219

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

\label{fig:fir_stage}

220

\label{fig:fir_stage}

\end{figure}

221

\end{figure}

222

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

223

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

This rejection has been computed using GNU Octave software FIR coefficient design functions

224

This rejection has been computed using GNU Octave software FIR coefficient design functions

(\texttt{firls} and \texttt{fir1}).

225

(\texttt{firls} and \texttt{fir1}).

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

226

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

227

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

228

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

229

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

230

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

231

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

transfer function.

232

transfer function.

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

233

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

234

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,

235

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

236

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

237

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

238

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}

239

shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}

as described below is indeed unique for each filter shape.

240

as described below is indeed unique for each filter shape.

241

\begin{figure}

242

\begin{figure}

\begin{center}

243

\begin{center}

\scalebox{0.8}{

244

\scalebox{0.8}{

\centering

245

\centering

\begin{tikzpicture}[scale=0.3]

246

\begin{tikzpicture}[scale=0.3]

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

247

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

248

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

249

\draw (0,14) node [left] { $P$ } ;

250

\draw (0,14) node [left] { $P$ } ;

\draw (20,0) node [below] { $f$ } ;

251

\draw (20,0) node [below] { $f$ } ;

252

\draw[>=latex,<->] (0,14) -- (8,14) ;

253

\draw[>=latex,<->] (0,14) -- (8,14) ;

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

254

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

255

\draw[>=latex,<->] (8,14) -- (12,14) ;

256

\draw[>=latex,<->] (8,14) -- (12,14) ;

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

257

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

258

\draw[>=latex,<->] (12,14) -- (20,14) ;

259

\draw[>=latex,<->] (12,14) -- (20,14) ;

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

260

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

261

\draw[>=latex,<->] (16,12) -- (16,8) ;

262

\draw[>=latex,<->] (16,12) -- (16,8) ;

\draw (16,10) node [right] { rejection } ;

263

\draw (16,10) node [right] { rejection } ;

264

\draw[dashed] (8,-1) -- (8,14) ;

265

\draw[dashed] (8,-1) -- (8,14) ;

\draw[dashed] (12,-1) -- (12,14) ;

266

\draw[dashed] (12,-1) -- (12,14) ;

267

\draw[dashed] (8,12) -- (16,12) ;

268

\draw[dashed] (8,12) -- (16,12) ;

\draw[dashed] (12,8) -- (16,8) ;

269

\draw[dashed] (12,8) -- (16,8) ;

270

\end{tikzpicture}

271

\end{tikzpicture}

}

272

}

\end{center}

273

\end{center}

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

274

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

275

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

the stopband the last 40\%, allowing 20\% transition width.}

276

the stopband the last 40\%, allowing 20\% transition width.}

\label{fig:fir_mag}

277

\label{fig:fir_mag}

\end{figure}

278

\end{figure}

279

In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.

280

In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.

Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches

281

Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches

overestimate the rejection capability of the filter.

282

overestimate the rejection capability of the filter.

An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values

283

An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

284

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).

285

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).

In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples

286

In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples

led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band.

287

led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band.

Hence, the final criterion considers the minimal rejection in the stopband to which the

288

Hence, the final criterion considers the minimal rejection in the stopband to which the

the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with

289

the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with

a 1~dB threshold on the latter quantity over which the filter is discarded.

290

a 1~dB threshold on the latter quantity over which the filter is discarded.

With this

291

With this

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

292

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

The best filter has a correct rejection estimation and the worst filter

293

The best filter has a correct rejection estimation and the worst filter

is discarded based on the excessive passband ripple criterion.

294

is discarded based on the excessive passband ripple criterion.

295

\begin{figure}

296

\begin{figure}

\centering

297

\centering

\includegraphics[width=\linewidth]{images/custom_criterion}

298

\includegraphics[width=\linewidth]{images/custom_criterion}

\caption{Selected filter qualification criterion computed as the maximum rejection in the stopband

299

\caption{Selected filter qualification criterion computed as the maximum rejection in the stopband

minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded:

300

minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded:

comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).}

301

comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).}

\label{fig:custom_criterion}

302

\label{fig:custom_criterion}

\end{figure}

303

\end{figure}

304

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

305

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

306

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

307

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

308

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

309

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

310

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length

311

the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length

and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected

312

and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected

by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding

313

by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding

these filters and so is the solution search space.

314

these filters and so is the solution search space.

315

\begin{figure}

316

\begin{figure}

\centering

317

\centering

\includegraphics[width=\linewidth]{images/rejection_pyramid}

318

\includegraphics[width=\linewidth]{images/rejection_pyramid}

\caption{Filter rejection as a function of number of coefficients and number of bits

319

\caption{Filter rejection as a function of number of coefficients and number of bits

: this lookup table will be used to identify which filter parameters -- number of bits

320

: this lookup table will be used to identify which filter parameters -- number of bits

representing coefficients and number of coefficients -- best match the targeted transfer function. Filters

321

representing coefficients and number of coefficients -- best match the targeted transfer function. Filters

with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive

322

with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive

ripples in the passband.}

323

ripples in the passband.}

\label{fig:rejection_pyramid}

324

\label{fig:rejection_pyramid}

\end{figure}

325

\end{figure}

326

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

327

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

328

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

If the FIR filter coefficients are the same between the stages, we have:

329

If the FIR filter coefficients are the same between the stages, we have:

$$F_{total} = F_1 + F_2$$

330

$$F_{total} = F_1 + F_2$$

But selecting two different sets of coefficient will yield a more complex situation in which

331

But selecting two different sets of coefficient will yield a more complex situation in which

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

332

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

are two different filters with maximums and notches not located at the same frequency offsets.

333

are two different filters with maximums and notches not located at the same frequency offsets.

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

334

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

335

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

336

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,

337

criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

338

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

of the filter cascade to meet design criteria.

339

of the filter cascade to meet design criteria.

340

\begin{figure}

341

\begin{figure}

\centering

342

\centering

\includegraphics[width=\linewidth]{images/cascaded_criterion}

343

\includegraphics[width=\linewidth]{images/cascaded_criterion}

\caption{Transfer function of individual filters and after cascading the two filters,

344

\caption{Transfer function of individual filters and after cascading the two filters,

demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal

345

demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal

lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop

346

lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop

maximum of each individual filter.

347

maximum of each individual filter.

}

348

}

\label{fig:sum_rejection}

349

\label{fig:sum_rejection}

\end{figure}

350

\end{figure}

351

Finally in our case, we consider that the input signal are fully known. The

352

Finally in our case, we consider that the input signal are fully known. The

resolution of the input data stream are fixed and still the same for all experiments

353

resolution of the input data stream are fixed and still the same for all experiments

in this paper.

354

in this paper.

355

Based on this analysis, we address the estimate of resource consumption (called

356

Based on this analysis, we address the estimate of resource consumption (called

silicon area -- in the case of FPGAs this means processing cells) as a function of

357

silicon area -- in the case of FPGAs this means processing cells) as a function of

filter characteristics. As a reminder, we do not aim at matching actual hardware

358

filter characteristics. As a reminder, we do not aim at matching actual hardware

configuration but consider an arbitrary silicon area occupied by each processing function,

359

configuration but consider an arbitrary silicon area occupied by each processing function,

and will assess after synthesis the adequation of this arbitrary unit with actual

360

and will assess after synthesis the adequation of this arbitrary unit with actual

hardware resources provided by FPGA manufacturers. The sum of individual processing

361

hardware resources provided by FPGA manufacturers. The sum of individual processing

unit areas is constrained by a total silicon area representative of FPGA global resources.

362

unit areas is constrained by a total silicon area representative of FPGA global resources.

Formally, variable $a_i$ is the area taken by filter~$i$

363

Formally, variable $a_i$ is the area taken by filter~$i$

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

364

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

365

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

366

\begin{align}

367

\begin{align}

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

368

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

369

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

370

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

371

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

372

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

373

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

374

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

\pi_1^- &= \Pi^I \label{eq:init}

375

\pi_1^- &= \Pi^I \label{eq:init}

\end{align}

376

\end{align}

377

Equation~\ref{eq:area} states that the total area taken by the filters must be

378

Equation~\ref{eq:area} states that the total area taken by the filters must be

less than the available area. Equation~\ref{eq:areadef} gives the definition of

379

less than the available area. Equation~\ref{eq:areadef} gives the definition of

the area used by a filter, considered as the area of the FIR since the Shifter is

380

the area used by a filter, considered as the area of the FIR since the Shifter is

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

381

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

382

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

383

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

384

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

previously. The Shifter does not introduce negative rejection as we will explain later,

385

previously. The Shifter does not introduce negative rejection as we will explain later,

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

386

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

387

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

388

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

389

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

a filter is the same as the input number of bits of the next filter.

390

a filter is the same as the input number of bits of the next filter.

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

391

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

rejection. Indeed, the results of the FIR can be right shifted without compromising

392

rejection. Indeed, the results of the FIR can be right shifted without compromising

the quality of the rejection until a threshold. Each bit of the output data

393

the quality of the rejection until a threshold. Each bit of the output data

increases the maximum rejection level by 6~dB. We add one to take the sign bit

394

increases the maximum rejection level by 6~dB. We add one to take the sign bit

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

395

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

shift too much and introduce some noise in the output data. Each supplementary

396

shift too much and introduce some noise in the output data. Each supplementary

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

397

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

398

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

399

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

400

This model is non-linear since we multiply some variable with another variable

401

This model is non-linear since we multiply some variable with another variable

and it is even non-quadratic, as the cost function $F$ does not have a known

402

and it is even non-quadratic, as the cost function $F$ does not have a known

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

403

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

This variable $p$ is defined by the user, and represents the number of different

404

This variable $p$ is defined by the user, and represents the number of different

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

405

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

functions from GNU Octave) based on the targeted filter characteristics and implementation

406

functions from GNU Octave) based on the targeted filter characteristics and implementation

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

407

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

$\pi_{ij}^C$ become constants and

408

$\pi_{ij}^C$ become constants and

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

409

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

for each configurations thanks to the rejection criterion. We also define the binary

410

for each configurations thanks to the rejection criterion. We also define the binary

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

411

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

and 0 otherwise. The new equations are as follows:

412

and 0 otherwise. The new equations are as follows:

413

\begin{align}

414

\begin{align}

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

415

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

416

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

417

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

418

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

\end{align}

419

\end{align}

420

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

421

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

422

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

423

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

424

The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

425

The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

we multiply

426

we multiply

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

427

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

linearize this multiplication. The following formula shows how to linearize

428

linearize this multiplication. The following formula shows how to linearize

this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):

429

this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):

\begin{equation*}

430

\begin{equation*}

m = x \times y \implies

431

m = x \times y \implies

\left \{

432

\left \{

\begin{split}

433

\begin{split}

m & \geq 0 \\

434

m & \geq 0 \\

m & \leq y \times X^{max} \\

435

m & \leq y \times X^{max} \\

m & \leq x \\

436

m & \leq x \\

m & \geq x - (1 - y) \times X^{max} \\

437

m & \geq x - (1 - y) \times X^{max} \\

\end{split}

438

\end{split}

\right .

439

\right .

\end{equation*}

440

\end{equation*}

So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is

441

So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is

assumed on hardware characteristics,

442

assumed on hardware characteristics,

the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize

443

the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize

for us the quadratic problem so the model is left as is. This model

444

for us the quadratic problem so the model is left as is. This model

has $O(np)$ variables and $O(n)$ constraints.

445

has $O(np)$ variables and $O(n)$ constraints.

446

Two problems will be addressed using the workflow described in the next section: on the one

447

Two problems will be addressed using the workflow described in the next section: on the one

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

448

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

449

silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

450

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

objective function is replaced with:

451

objective function is replaced with:

\begin{align}

452

\begin{align}

\text{Minimize } & \sum_{i=1}^n a_i \notag

453

\text{Minimize } & \sum_{i=1}^n a_i \notag

\end{align}

454

\end{align}

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

455

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

456

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

rejection required.

457

rejection required.

458

\begin{align}

459

\begin{align}

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

460

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

\end{align}

461

\end{align}

462

\section{Design workflow}

463

\section{Design workflow}

\label{sec:workflow}

464

\label{sec:workflow}

465

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

466

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

467

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

in the computation of the results.

468

in the computation of the results.

469

\begin{figure}

470

\begin{figure}

\centering

471

\centering

\begin{tikzpicture}[node distance=0.75cm and 2cm]

472

\begin{tikzpicture}[node distance=0.75cm and 2cm]

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

473

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

\node (Start) [left= 3cm of Solver] { } ;

474

\node (Start) [left= 3cm of Solver] { } ;

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

475

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

\node (Input) [above= of TCL] { } ;

476

\node (Input) [above= of TCL] { } ;

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

477

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

478

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

479

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

480

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

\node (Results) [left= of Postproc] { } ;

481

\node (Results) [left= of Postproc] { } ;

482

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

483

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

484

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

485

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

486

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

487

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

\draw[->,dashed] (Bitstream) -- (Deploy) ;

488

\draw[->,dashed] (Bitstream) -- (Deploy) ;

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

489

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

490

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

491

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

\draw[->] (Postproc) -- (Results) ;

492

\draw[->] (Postproc) -- (Results) ;

\end{tikzpicture}

493

\end{tikzpicture}

\caption{Design workflow from the input parameters to the results allowing for

494

\caption{Design workflow from the input parameters to the results allowing for

a fully automated optimal solution search.}

495

a fully automated optimal solution search.}

\label{fig:workflow}

496

\label{fig:workflow}

\end{figure}

497

\end{figure}

498

The filter solver is a C++ program that takes as input the maximum area

499

The filter solver is a C++ program that takes as input the maximum area

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

500

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

501

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

502

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

503

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

and a deploy script ((1b) on figure~\ref{fig:workflow}).

504

and a deploy script ((1b) on figure~\ref{fig:workflow}).

505

The TCL script describes the whole digital processing chain from the beginning

506

The TCL script describes the whole digital processing chain from the beginning

(the raw signal data) to the end (the filtered data) in a language compatible

507

(the raw signal data) to the end (the filtered data) in a language compatible

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

508

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

509

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

510

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

Then the script builds each stage of the chain with a generic FIR task that

511

Then the script builds each stage of the chain with a generic FIR task that

comes from a skeleton library. The generic FIR is highly configurable

512

comes from a skeleton library. The generic FIR is highly configurable

with the number of coefficients and the size of the coefficients. The coefficients

513

with the number of coefficients and the size of the coefficients. The coefficients

themselves are not stored in the script.

514

themselves are not stored in the script.

As the signal is processed in real-time, the output signal is stored as

515

As the signal is processed in real-time, the output signal is stored as

consecutive bursts of data for post-processing, mainly assessing the consistency of the

516

consecutive bursts of data for post-processing, mainly assessing the consistency of the

implemented FIR cascade transfer function with the design criteria and the expected

517

implemented FIR cascade transfer function with the design criteria and the expected

transfer function.

518

transfer function.

519

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

520

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

521

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

522

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

523

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

provide a broadband noise source.

524

provide a broadband noise source.

The board runs the Linux kernel and surrounding environment produced from the

525

The board runs the Linux kernel and surrounding environment produced from the

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

526

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

527

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

fetching the results is automated.

528

fetching the results is automated.

529

The deploy script uploads the bitstream to the board ((3) on

530

The deploy script uploads the bitstream to the board ((3) on

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

531

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

configures the coefficients of the FIR filters. It then waits for the results

532

configures the coefficients of the FIR filters. It then waits for the results

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

533

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

534

Finally, an Octave post-processing script computes the final results thanks to

535

Finally, an Octave post-processing script computes the final results thanks to

the output data ((5) on figure~\ref{fig:workflow}).

536

the output data ((5) on figure~\ref{fig:workflow}).

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

537

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

and the different configurations can be compared.

538

and the different configurations can be compared.

539

\section{Maximizing the rejection at fixed silicon area}

540

\section{Maximizing the rejection at fixed silicon area}

\label{sec:fixed_area}

541

\label{sec:fixed_area}

This section presents the output of the filter solver {\em i.e.} the computed

542

This section presents the output of the filter solver {\em i.e.} the computed

configurations for each stage, the computed rejection and the computed silicon area.

543

configurations for each stage, the computed rejection and the computed silicon area.

Such results allow for understanding the choices made by the solver to compute its solutions.

544

Such results allow for understanding the choices made by the solver to compute its solutions.

545

The experimental setup is composed of three cases. The raw input is generated

546

The experimental setup is composed of three cases. The raw input is generated

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

547

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

548

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

549

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$

550

The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$

ranging from 2 to 22. In each case, the quadratic program has been able to give a

551

ranging from 2 to 22. In each case, the quadratic program has been able to give a

result up to five stages ($n = 5$) in the cascaded filter.

552

result up to five stages ($n = 5$) in the cascaded filter.

553

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

554

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

555

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

556

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

557

\renewcommand{\arraystretch}{1.4}

558

\renewcommand{\arraystretch}{1.4}

559

\begin{table}

560

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

561

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

\label{tbl:gurobi_max_500}

562

\label{tbl:gurobi_max_500}

\centering

563

\centering

{\scalefont{0.77}

564

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

565

\begin{tabular}{|c|ccccc|c|c|}

\hline

566

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

567

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

568

\hline

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

569

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\

570

2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\

3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

571

3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

572

4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

573

5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

\hline

574

\hline

\end{tabular}

575

\end{tabular}

}

576

}

\end{table}

577

\end{table}

578

\begin{table}

579

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

580

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

\label{tbl:gurobi_max_1000}

581

\label{tbl:gurobi_max_1000}

\centering

582

\centering

{\scalefont{0.77}

583

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

584

\begin{tabular}{|c|ccccc|c|c|}

\hline

585

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

586

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

587

\hline

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

588

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

589

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\

590

3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\

4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

591

4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

592

5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

\hline

593

\hline

\end{tabular}

594

\end{tabular}

}

595

}

\end{table}

596

\end{table}

597

\begin{table}

598

\begin{table}

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

599

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

\label{tbl:gurobi_max_1500}

600

\label{tbl:gurobi_max_1500}

\centering

601

\centering

{\scalefont{0.77}

602

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

603

\begin{tabular}{|c|ccccc|c|c|}

\hline

604

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

605

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

606

\hline

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

607

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\

608

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\

3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\

609

3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\

4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

610

4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

611

5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

\hline

612

\hline

\end{tabular}

613

\end{tabular}

}

614

}

\end{table}

615

\end{table}

616

\renewcommand{\arraystretch}{1}

617

\renewcommand{\arraystretch}{1}

618

By analyzing these tables, we can first state that we reach an optimal solution

619

By analyzing these tables, we can first state that we reach an optimal solution

for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover

620

for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover

the cascaded filters always exhibit better performance than the monolithic solution.

621

the cascaded filters always exhibit better performance than the monolithic solution.

It was an expected result as it has

622

It was an expected result as it has

been previously observed that many small filters are better than

623

been previously observed that many small filters are better than

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

624

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

being hardly used in practice due to the lack of tools for identifying individual filter

625

being hardly used in practice due to the lack of tools for identifying individual filter

coefficients in the cascaded approach.

626

coefficients in the cascaded approach.

627

Second, the larger the silicon area, the better the rejection. This was also an

628

Second, the larger the silicon area, the better the rejection. This was also an

expected result as more area means a filter of better quality with more coefficients

629

expected result as more area means a filter of better quality with more coefficients

or more bits per coefficient.

630

or more bits per coefficient.

631

Then, we also observe that the first stage can have a larger shift than the other

632

Then, we also observe that the first stage can have a larger shift than the other

stages. This is explained by the fact that the solver tries to use just enough

633

stages. This is explained by the fact that the solver tries to use just enough

bits for the computed rejection after each stage. In the first stage, a

634

bits for the computed rejection after each stage. In the first stage, a

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

635

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

gives the relation between both values.

636

gives the relation between both values.

637

Finally, we note that the solver consumes all the given silicon area.

638

Finally, we note that the solver consumes all the given silicon area.

639

The following graphs present the rejection for real data on the FPGA. In all the following

640

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

641

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line are the noise levels

642

data on the FPGA as measured experimentally and the dashed line are the noise levels

given by the quadratic solver. The configurations are those computed in the previous section.

643

given by the quadratic solver. The configurations are those computed in the previous section.

644

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

645

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

646

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

647

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

648

\begin{figure}

649

\begin{figure}

\centering

650

\centering

\begin{subfigure}{\linewidth}

651

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_500}

652

\includegraphics[width=\linewidth]{images/max_500}

\caption{Filter transfer functions for varying number of cascaded filters solving

653

\caption{Filter transfer functions for varying number of cascaded filters solving

the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}

654

the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}

\label{fig:max_500_result}

655

\label{fig:max_500_result}

\end{subfigure}

656

\end{subfigure}

657

\begin{subfigure}{\linewidth}

658

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1000}

659

\includegraphics[width=\linewidth]{images/max_1000}

\caption{Filter transfer functions for varying number of cascaded filters solving

660

\caption{Filter transfer functions for varying number of cascaded filters solving

the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}

661

the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}

\label{fig:max_1000_result}

662

\label{fig:max_1000_result}

\end{subfigure}

663

\end{subfigure}

664

\begin{subfigure}{\linewidth}

665

\begin{subfigure}{\linewidth}

\includegraphics[width=\linewidth]{images/max_1500}

666

\includegraphics[width=\linewidth]{images/max_1500}

\caption{Filter transfer functions for varying number of cascaded filters solving

667

\caption{Filter transfer functions for varying number of cascaded filters solving

the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}

668

the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}

\label{fig:max_1500_result}

669

\label{fig:max_1500_result}

\end{subfigure}

670

\end{subfigure}

\caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing

671

\caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing

rejection for a given resource allocation.

672

rejection for a given resource allocation.

The filter shape constraint (bandpass and bandstop) is shown as thick

673

The filter shape constraint (bandpass and bandstop) is shown as thick

horizontal lines on each chart.}

674

horizontal lines on each chart.}

\end{figure}

675

\end{figure}

676

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

677

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

678

We compare the actual silicon resources given by Vivado to the

679

We compare the actual silicon resources given by Vivado to the

resources in arbitrary units.

680

resources in arbitrary units.

The goal is to check that our arbitrary units of silicon area models well enough

681

The goal is to check that our arbitrary units of silicon area models well enough

the real resources on the FPGA. Especially we want to verify that, for a given

682

the real resources on the FPGA. Especially we want to verify that, for a given

number of arbitrary units, the actual silicon resources do not depend on the

683

number of arbitrary units, the actual silicon resources do not depend on the

number of stages $n$. Most significantly, our approach aims

684

number of stages $n$. Most significantly, our approach aims

at remaining far enough from the practical logic gate implementation used by

685

at remaining far enough from the practical logic gate implementation used by

various vendors to remain platform independent and be portable from one

686

various vendors to remain platform independent and be portable from one

architecture to another.

687

architecture to another.

688

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

689

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

690

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

and 1500 arbitrary units. We have taken care to extract solely the resources used by

691

and 1500 arbitrary units. We have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and Programmable

692

the FIR filters and remove additional processing blocks including FIFO and Programmable

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

693

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

694

\begin{table}[h!tb]

695

\begin{table}[h!tb]

\caption{Resource occupation following synthesis of the solutions found for

696

\caption{Resource occupation following synthesis of the solutions found for

the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

697

the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage}

698

\label{tbl:resources_usage}

\centering

699

\centering

\begin{tabular}{|c|c|ccc|c|}

700

\begin{tabular}{|c|c|ccc|c|}

\hline

701

\hline

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

702

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 249 & 453 & 627 & \emph{17600} \\

703

& LUT & 249 & 453 & 627 & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

704

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

705

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

& LUT & 2253 & 474 & 691 & \emph{17600} \\

706

& LUT & 2253 & 474 & 691 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

707

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 50 & 70 & \emph{80} \\ \hline

708

& DSP & 0 & 50 & 70 & \emph{80} \\ \hline

& LUT & 1329 & 2006 & 3158 & \emph{17600} \\

709

& LUT & 1329 & 2006 & 3158 & \emph{17600} \\

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

710

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

& DSP & 15 & 30 & 42 & \emph{80} \\ \hline

711

& DSP & 15 & 30 & 42 & \emph{80} \\ \hline

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

712

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

4 & BRAM & 3 & 4 & 4 & \emph{120} \\

713

4 & BRAM & 3 & 4 & 4 & \emph{120} \\

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

714

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

715

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

5 & BRAM & 3 & 4 & 4 & \emph{120} \\

716

5 & BRAM & 3 & 4 & 4 & \emph{120} \\

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

717

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

\end{tabular}

718

\end{tabular}

\end{table}

719

\end{table}

720

In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

721

In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

when the filter coefficients are small enough, or when the input size is small

722

when the filter coefficients are small enough, or when the input size is small

enough, Vivado optimizes resource consumption by selecting multiplexers to

723

enough, Vivado optimizes resource consumption by selecting multiplexers to

implement the multiplications instead of a DSP. In this case, it is quite difficult

724

implement the multiplications instead of a DSP. In this case, it is quite difficult

to compare the whole silicon budget.

725

to compare the whole silicon budget.

726

However, a rough estimation can be made with a simple equivalence: looking at

727

However, a rough estimation can be made with a simple equivalence: looking at

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

728

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

729

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,

730

area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

731

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

732

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

unit map well to actual hardware resources. The relatively small differences can probably be explained

733

unit map well to actual hardware resources. The relatively small differences can probably be explained

by the optimizations done by Vivado based on the detailed map of available processing resources.

734

by the optimizations done by Vivado based on the detailed map of available processing resources.

735

We now present the computation time needed to solve the quadratic problem.

736

We now present the computation time needed to solve the quadratic problem.

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

737

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

738

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

739

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

740

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

741

\begin{table}[h!tb]

742

\begin{table}[h!tb]

\caption{Time needed to solve the quadratic program with Gurobi}

743

\caption{Time needed to solve the quadratic program with Gurobi}

\label{tbl:area_time}

744

\label{tbl:area_time}

\centering

745

\centering

\begin{tabular}{|c|c|c|c|}\hline

746

\begin{tabular}{|c|c|c|c|}\hline

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

747

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

1 & 0.01~s & 0.02~s & 0.03~s \\

748

1 & 0.01~s & 0.02~s & 0.03~s \\

2 & 0.1~s & 1~s & 2~s \\

749

2 & 0.1~s & 1~s & 2~s \\

3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\

750

3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\

4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\

751

4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\

5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline

752

5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline

\end{tabular}

753

\end{tabular}

\end{table}

754

\end{table}

755

As expected, the computation time seems to rise exponentially with the number of stages.

756

As expected, the computation time seems to rise exponentially with the number of stages.

When the area is limited, the design exploration space is more limited and the solver is able to

757

When the area is limited, the design exploration space is more limited and the solver is able to

find an optimal solution faster.

758

find an optimal solution faster.

We also notice that the solution with $n$ greater than the optimal value

759

We also notice that the solution with $n$ greater than the optimal value

takes more time to be found than the optimal one. This can be explained since the search space is

760

takes more time to be found than the optimal one. This can be explained since the search space is

larger and we need more time to ensure that the previous solution (from the

761

larger and we need more time to ensure that the previous solution (from the

smaller value of $n$) still remains the optimal solution.

762

smaller value of $n$) still remains the optimal solution.

763

\subsection{Minimizing resource occupation at fixed rejection}

764

\subsection{Minimizing resource occupation at fixed rejection}

\label{sec:fixed_rej}

765

\label{sec:fixed_rej}

766

This section presents the results of the complementary quadratic program aimed at

767

This section presents the results of the complementary quadratic program aimed at

minimizing the area occupation for a targeted rejection level.

768

minimizing the area occupation for a targeted rejection level.

769

The experimental setup is composed of four cases. The raw input is the same

770

The experimental setup is composed of four cases. The raw input is the same

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

771

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

772

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

773

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

The number of configurations $p$ is the same as previous section.

774

The number of configurations $p$ is the same as previous section.

775

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

776

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

777

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

778

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

779

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

780

\renewcommand{\arraystretch}{1.4}

781

\renewcommand{\arraystretch}{1.4}

782

\begin{table}[h!tb]

783

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

784

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

\label{tbl:gurobi_min_40}

785

\label{tbl:gurobi_min_40}

\centering

786

\centering

{\scalefont{0.77}

787

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

788

\begin{tabular}{|c|ccccc|c|c|}

\hline

789

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

790

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

791

\hline

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

792

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

793

2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

794

3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

795

4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

796

5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

\hline

797

\hline

\end{tabular}

798

\end{tabular}

}

799

}

\end{table}

800

\end{table}

801

\begin{table}[h!tb]

802

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

803

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

\label{tbl:gurobi_min_60}

804

\label{tbl:gurobi_min_60}

\centering

805

\centering

{\scalefont{0.77}

806

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

807

\begin{tabular}{|c|ccccc|c|c|}

\hline

808

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

809

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

810

\hline

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

811

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\

812

2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\

3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

813

3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

814

4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

815

5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

\hline

816

\hline

\end{tabular}

817

\end{tabular}

}

818

}

\end{table}

819

\end{table}

820

\begin{table}[h!tb]

821

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

822

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

\label{tbl:gurobi_min_80}

823

\label{tbl:gurobi_min_80}

\centering

824

\centering

{\scalefont{0.77}

825

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

826

\begin{tabular}{|c|ccccc|c|c|}

\hline

827

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

828

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

829

\hline

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

830

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

831

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

832

3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

833

4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

834

5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

\hline

835

\hline

\end{tabular}

836

\end{tabular}

}

837

}

\end{table}

838

\end{table}

839

\begin{table}[h!tb]

840

\begin{table}[h!tb]

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

841

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

\label{tbl:gurobi_min_100}

842

\label{tbl:gurobi_min_100}

\centering

843

\centering

{\scalefont{0.77}

844

{\scalefont{0.77}

\begin{tabular}{|c|ccccc|c|c|}

845

\begin{tabular}{|c|ccccc|c|c|}

\hline

846

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

847

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\hline

848

\hline

1 & - & - & - & - & - & - & - \\

849

1 & - & - & - & - & - & - & - \\

2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\

850

2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\

3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\

851

3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\

4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

852

4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

853

5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

\hline

854

\hline

\end{tabular}

855

\end{tabular}

}

856

}

\end{table}

857

\end{table}

\renewcommand{\arraystretch}{1}

858

\renewcommand{\arraystretch}{1}

859

From these tables, we can first state that almost all configurations reach the targeted rejection

860

From these tables, we can first state that almost all configurations reach the targeted rejection

level or even better thanks to our underestimate of the cascade rejection as the sum of the

861

level or even better thanks to our underestimate of the cascade rejection as the sum of the

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

862

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

863

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters

864

Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters

(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection

865

(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection

respectively). More generally, the more filters are cascaded, the lower the occupied area.

866

respectively). More generally, the more filters are cascaded, the lower the occupied area.

867

Like in previous section, the solver chooses always a little filter as first

868

Like in previous section, the solver chooses always a little filter as first

filter stage and the second one is often the biggest filter. This choice can be explained

869

filter stage and the second one is often the biggest filter. This choice can be explained

as in the previous section, with the solver using just enough bits not to degrade the input

870

as in the previous section, with the solver using just enough bits not to degrade the input

signal and in the second filter selecting a better filter to improve rejection without

871

signal and in the second filter selecting a better filter to improve rejection without

having too many bits in the output data.

872

having too many bits in the output data.

873

For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,

874

For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,

for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions

875

for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions

when $n$ is greater than this optimal $n$ remain identical to the optimal one.

876

when $n$ is greater than this optimal $n$ remain identical to the optimal one.

877

The following graphs present the rejection for real data on the FPGA. In all the following

878

The following graphs present the rejection for real data on the FPGA. In all the following

figures, the solid line represents the actual rejection of the filtered

879

figures, the solid line represents the actual rejection of the filtered

data on the FPGA as measured experimentally and the dashed line is the noise level

880

data on the FPGA as measured experimentally and the dashed line is the noise level

given by the quadratic solver.

881

given by the quadratic solver.

882

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

883

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

884

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

885

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

886

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

887

\begin{figure}

888

\begin{figure}

\centering

889

\centering

\begin{subfigure}{\linewidth}

890

\begin{subfigure}{\linewidth}

\includegraphics[width=.91\linewidth]{images/min_40}

891

\includegraphics[width=.91\linewidth]{images/min_40}

\caption{Filter transfer functions for varying number of cascaded filters solving

892

\caption{Filter transfer functions for varying number of cascaded filters solving

the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}

893

the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}

\label{fig:min_40}

894

\label{fig:min_40}

\end{subfigure}

895

\end{subfigure}

896

\begin{subfigure}{\linewidth}

897

\begin{subfigure}{\linewidth}

\includegraphics[width=.91\linewidth]{images/min_60}

898

\includegraphics[width=.91\linewidth]{images/min_60}

\caption{Filter transfer functions for varying number of cascaded filters solving

899

\caption{Filter transfer functions for varying number of cascaded filters solving

the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}

900

the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}

\label{fig:min_60}

901

\label{fig:min_60}

\end{subfigure}

902

\end{subfigure}

903

\begin{subfigure}{\linewidth}

904

\begin{subfigure}{\linewidth}

\includegraphics[width=.91\linewidth]{images/min_80}

905

\includegraphics[width=.91\linewidth]{images/min_80}

\caption{Filter transfer functions for varying number of cascaded filters solving

906

\caption{Filter transfer functions for varying number of cascaded filters solving

the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}

907

the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}

\label{fig:min_80}

908

\label{fig:min_80}

\end{subfigure}

909

\end{subfigure}

910

\begin{subfigure}{\linewidth}

911

\begin{subfigure}{\linewidth}

\includegraphics[width=.91\linewidth]{images/min_100}

912

\includegraphics[width=.91\linewidth]{images/min_100}

\caption{Filter transfer functions for varying number of cascaded filters solving

913

\caption{Filter transfer functions for varying number of cascaded filters solving

the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}

914

the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}

\label{fig:min_100}

915

\label{fig:min_100}

\end{subfigure}

916

\end{subfigure}

\caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a

917

\caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a

given rejection while minimizing resource allocation. The filter shape constraint (bandpass and

918

given rejection while minimizing resource allocation. The filter shape constraint (bandpass and

bandstop) is shown as thick

919

bandstop) is shown as thick

horizontal lines on each chart.}

920

horizontal lines on each chart.}

\end{figure}

921

\end{figure}

922

We observe that all rejections given by the quadratic solver are close to the experimentally

923

We observe that all rejections given by the quadratic solver are close to the experimentally

measured rejection. All curves prove that the constraint to reach the target rejection is

924

measured rejection. All curves prove that the constraint to reach the target rejection is

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

925

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

926

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

927

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

928

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

have taken care to extract solely the resources used by

929

have taken care to extract solely the resources used by

the FIR filters and remove additional processing blocks including FIFO and PL to

930

the FIR filters and remove additional processing blocks including FIFO and PL to

PS communication.

931

PS communication.

932

\renewcommand{\arraystretch}{1.2}

933

\renewcommand{\arraystretch}{1.2}

\begin{table}

934

\begin{table}

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

935

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:resources_usage_comp}

936

\label{tbl:resources_usage_comp}

\centering

937

\centering

{\scalefont{0.90}

938

{\scalefont{0.90}

\begin{tabular}{|c|c|cccc|c|}

939

\begin{tabular}{|c|c|cccc|c|}

\hline

940

\hline

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

941

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

942

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

943

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

944

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\

945

& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

946

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline

947

& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline

& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\

948

& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\

3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\

949

3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\

& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline

950

& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

951

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

952

4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline

953

& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

954

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

955

5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline

956

& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline

\end{tabular}

957

\end{tabular}

}

958

}

\end{table}

959

\end{table}

\renewcommand{\arraystretch}{1}

960

\renewcommand{\arraystretch}{1}

961

If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)

962

If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)

the real resource consumption decreases as a function of the number of stages in the cascaded

963

the real resource consumption decreases as a function of the number of stages in the cascaded

filter according

964

filter according

to the solution given by the quadratic solver. Indeed, we have always a decreasing

965

to the solution given by the quadratic solver. Indeed, we have always a decreasing

consumption even if the difference between the monolithic and the two cascaded

966

consumption even if the difference between the monolithic and the two cascaded

filters is less than expected.

967

filters is less than expected.

968

Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve

969

Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve

the quadratic program.

970

the quadratic program.

971

\renewcommand{\arraystretch}{1.2}

972

\renewcommand{\arraystretch}{1.2}

\begin{table}[h!tb]

973

\begin{table}[h!tb]

\caption{Time to solve the quadratic program with Gurobi}

974

\caption{Time to solve the quadratic program with Gurobi}

GITLAB

jfriedt / IFCS2018 article

Article avec biographies.