jfriedt / IFCS2018 article

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

1

\usepackage{graphicx,color,hyperref}

2

\usepackage{amsfonts}

3

\usepackage{amsthm}

4

\usepackage{amssymb}

5

\usepackage{amsmath}

6

\usepackage{algorithm2e}

7

\usepackage{url,balance}

8

\usepackage[normalem]{ulem}

9

\usepackage{tikz}

10

\usetikzlibrary{positioning,fit}

11

\usepackage{multirow}

12

\usepackage{scalefnt}

13

1

\documentclass[a4paper,journal]{IEEEtran/IEEEtran}

\usepackage{caption}

14

2

\usepackage{graphicx,color,hyperref}

\usepackage{subcaption}

15

3

\usepackage{amsfonts}

16

4

\usepackage{amsthm}

17

5

\usepackage{amssymb}

\hyphenation{op-tical net-works semi-conduc-tor}

18

6

\usepackage{amsmath}

\textheight=26cm

19

7

\usepackage{algorithm2e}

\setlength{\footskip}{30pt}

20

8

\usepackage{url,balance}

\pagenumbering{gobble}

21

9

\usepackage[normalem]{ulem}

\begin{document}

22

10

\usepackage{tikz}

\title{Filter optimization for real time digital processing of radiofrequency signals: application

23

11

\usetikzlibrary{positioning,fit}

to oscillator metrology}

24

12

\usepackage{multirow}

25

13

\usepackage{scalefnt}

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

26

14

\usepackage{caption}

G. Goavec-M\'erou\IEEEauthorrefmark{1},

27

15

\usepackage{subcaption}

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

28

16

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

29

17

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

30

18

\hyphenation{op-tical net-works semi-conduc-tor}

Email: \{pyb2,jmfriedt\}@femto-st.fr}

31

19

\textheight=26cm

}

32

20

\setlength{\footskip}{30pt}

\maketitle

33

21

\pagenumbering{gobble}

\thispagestyle{plain}

34

22

\begin{document}

\pagestyle{plain}

35

23

\title{Filter optimization for real time digital processing of radiofrequency signals: application

\newtheorem{definition}{Definition}

36

24

to oscillator metrology}

37

25

\begin{abstract}

38

26

\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

39

27

G. Goavec-M\'erou\IEEEauthorrefmark{1},

radiofrequency signal processing. Applied to oscillator characterization in the context

40

28

P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}\\

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

41

29

\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }\\

noise rejection needs. Since real time radiofrequency processing must be performed in a

42

30

\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

Field Programmable Array to meet timing constraints, we investigate optimization strategies

43

31

Email: \{pyb2,jmfriedt\}@femto-st.fr}

to design filters meeting rejection characteristics while limiting the hardware resources

44

32

}

required and keeping timing constraints within the targeted measurement bandwidths. The

45

33

\maketitle

presented technique is applicable to scheduling any sequence of processing blocks characterized

46

34

\thispagestyle{plain}

by a throughput, resource occupation and performance tabulated as a function of configuration

47

35

\pagestyle{plain}

characateristics, as is the case for filters with their coefficients and resolution yielding

48

36

\newtheorem{definition}{Definition}

rejection and number of multipliers.

49

37

\end{abstract}

50

38

\begin{abstract}

51

39

Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

\begin{IEEEkeywords}

52

40

radiofrequency signal processing. Applied to oscillator characterization in the context

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

53

41

of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

\end{IEEEkeywords}

54

42

noise rejection needs. Since real time radiofrequency processing must be performed in a

55

43

Field Programmable Array to meet timing constraints, we investigate optimization strategies

\section{Digital signal processing of ultrastable clock signals}

56

44

to design filters meeting rejection characteristics while limiting the hardware resources

57

45

required and keeping timing constraints within the targeted measurement bandwidths. The

Analog oscillator phase noise characteristics are classically performed by downconverting

58

46

presented technique is applicable to scheduling any sequence of processing blocks characterized

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

59

47

by a throughput, resource occupation and performance tabulated as a function of configuration

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

60

48

characateristics, as is the case for filters with their coefficients and resolution yielding

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

61

49

rejection and number of multipliers.

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

62

50

\end{abstract}

63

51

\begin{figure}[h!tb]

64

52

\begin{IEEEkeywords}

\begin{center}

65

53

Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

\includegraphics[width=.8\linewidth]{images/schema}

66

54

\end{IEEEkeywords}

\end{center}

67

55

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

68

56

\section{Digital signal processing of ultrastable clock signals}

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

69

57

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

70

58

Analog oscillator phase noise characteristics are classically performed by downconverting

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

71

59

the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

72

60

followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

the spectral characteristics of the phase fluctuations.}

73

61

a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

\label{schema}

74

62

multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

\end{figure}

75

63

76

64

\begin{figure}[h!tb]

As with the analog mixer,

77

65

\begin{center}

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

78

66

\includegraphics[width=.8\linewidth]{images/schema}

well as the generation of the frequency sum signal in addition to the frequency difference.

79

67

\end{center}

These unwanted spectral characteristics must be rejected before decimating the data stream

80

68

\caption{Fully digital oscillator phase noise characterization: the Device Under Test

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

81

69

(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

downconverter

82

70

downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

and the decimation processing blocks are core characteristics of an oscillator characterization

83

71

and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

84

72

Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

85

73

the spectral characteristics of the phase fluctuations.}

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

86

74

\label{schema}

datastream: optimizing the performance of the filter while reducing the needed resources is

87

75

\end{figure}

hence tackled in a systematic approach using optimization techniques. Most significantly, we

88

76

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

89

77

As with the analog mixer,

tunable number of coefficients and tunable number of bits representing the coefficients and the

90

78

the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

data being processed.

91

79

well as the generation of the frequency sum signal in addition to the frequency difference.

92

80

These unwanted spectral characteristics must be rejected before decimating the data stream

\section{Finite impulse response filter}

93

81

for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

94

82

downconverter

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

95

83

and the decimation processing blocks are core characteristics of an oscillator characterization

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

96

84

system, and must reject out-of-band signals below the targeted phase noise -- typically in the

outputs $y_k$

97

85

sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

\begin{align}

98

86

use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

y_n=\sum_{k=0}^N b_k x_{n-k}

99

87

datastream: optimizing the performance of the filter while reducing the needed resources is

\label{eq:fir_equation}

100

88

hence tackled in a systematic approach using optimization techniques. Most significantly, we

\end{align}

101

89

tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

102

90

tunable number of coefficients and tunable number of bits representing the coefficients and the

As opposed to an implementation on a general purpose processor in which word size is defined by the

103

91

data being processed.

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

104

92

not only the coefficient values and number of taps must be defined, but also the number of bits

105

93

\section{Finite impulse response filter}

defining the coefficients and the sample size. For this reason, and because we consider pipeline

106

94

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

107

95

We select FIR filters for their unconditional stability and ease of design. A FIR filter is defined

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

108

96

by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

109

97

outputs $y_k$

(VHDL) level.

110

98

\begin{align}

Since latency is not an issue in a openloop phase noise characterization instrument,

111

99

y_n=\sum_{k=0}^N b_k x_{n-k}

the large

112

100

\label{eq:fir_equation}

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

113

101

\end{align}

is not considered as an issue as would be in a closed loop system.

114

102

115

103

As opposed to an implementation on a general purpose processor in which word size is defined by the

The coefficients are classically expressed as floating point values. However, this binary

116

104

processor architecture, implementing such a filter on an FPGA offers more degrees of freedom since

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

117

105

not only the coefficient values and number of taps must be defined, but also the number of bits

we select to quantify these floating point values into integer values. This quantization

118

106

defining the coefficients and the sample size. For this reason, and because we consider pipeline

will result in some precision loss.

119

107

processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

120

108

signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

\begin{figure}[h!tb]

121

109

the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language

\includegraphics[width=\linewidth]{images/zero_values}

122

110

(VHDL) level.

\caption{Impact of the quantization resolution of the coefficients: the quantization is

123

111

Since latency is not an issue in a openloop phase noise characterization instrument,

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

124

112

the large

the 30~first and 30~last coefficients out of the initial 128~band-pass

125

113

numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

filter coefficients to 0 (red dots).}

126

114

is not considered as an issue as would be in a closed loop system.

\label{float_vs_int}

127

115

\end{figure}

128

116

The coefficients are classically expressed as floating point values. However, this binary

129

117

number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

The tradeoff between quantization resolution and number of coefficients when considering

130

118

we select to quantify these floating point values into integer values. This quantization

integer operations is not trivial. As an illustration of the issue related to the

131

119

will result in some precision loss.

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

132

120

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

133

121

\begin{figure}[h!tb]

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

134

122

\includegraphics[width=\linewidth]{images/zero_values}

taps become null, making the large number of coefficients irrelevant: processing

135

123

\caption{Impact of the quantization resolution of the coefficients: the quantization is

resources

136

124

set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources

137

125

the 30~first and 30~last coefficients out of the initial 128~band-pass

to reach a given rejection level, or maximizing out of band rejection for a given computational

138

126

filter coefficients to 0 (red dots).}

resource, will drive the investigation on cascading filters designed with varying tap resolution

139

127

\label{float_vs_int}

and tap length, as will be shown in the next section. Indeed, our development strategy closely

140

128

\end{figure}

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

141

129

in which basic blocks are defined and characterized before being assembled \cite{hide}

142

130

The tradeoff between quantization resolution and number of coefficients when considering

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

143

131

integer operations is not trivial. As an illustration of the issue related to the

combination process since we assume a single value to be processed and a single value to be

144

132

relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

generated at each clock cycle. The FIR filters will not be considered to decimate in the

145

133

a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

current implementation: the decimation is assumed to be located after the FIR cascade at the

146

134

quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

moment.

147

135

taps become null, making the large number of coefficients irrelevant: processing

148

136

resources

\section{Methodology description}

149

137

are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources

150

138

to reach a given rejection level, or maximizing out of band rejection for a given computational

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

151

139

resource, will drive the investigation on cascading filters designed with varying tap resolution

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

152

140

and tap length, as will be shown in the next section. Indeed, our development strategy closely

Achieving such a target requires defining an abstract model to represent some basic properties

153

141

follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and

154

142

in which basic blocks are defined and characterized before being assembled \cite{hide}

resource occupation. These abstract properties, not necessarily related to the detailed hardware

155

143

in a complete processing chain. In our case, assembling the filter blocks is a simpler block

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

156

144

combination process since we assume a single value to be processed and a single value to be

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

157

145

generated at each clock cycle. The FIR filters will not be considered to decimate in the

minimizing resource occupation for a given performance. In our approach, the solution of the

158

146

current implementation: the decimation is assumed to be located after the FIR cascade at the

solver is then synthesized using the dedicated tool provided by each platform manufacturer

159

147

moment.

to assess the validity of our abstract resource occupation indicator, and the result of running

160

148

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

161

149

\section{Methodology description}

that all solutions found by the solver are synthesized and executed on hardware at the end

162

150

of the analysis.

163

151

Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)

164

152

chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.

In this demonstration, we focus on only two operations: filtering and shifting the number of

165

153

Achieving such a target requires defining an abstract model to represent some basic properties

bits needed to represent the data along the processing chain.

166

154

of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and

We have chosen these basic operations because shifting and the filtering have already been studied

167

155

resource occupation. These abstract properties, not necessarily related to the detailed hardware

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

168

156

implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

169

157

target, whether in terms of maximizing performance for a given arbitrary resource occupation, or

requiring pipelined processing at full bandwidth for the earliest steps, including for

170

158

minimizing resource occupation for a given performance. In our approach, the solution of the

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

171

159

solver is then synthesized using the dedicated tool provided by each platform manufacturer

172

160

to assess the validity of our abstract resource occupation indicator, and the result of running

Addressing only two operations allows for demonstrating the methodology but should not be

173

161

the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize

considered as a limitation of the framework which can be extended to assembling any number

174

162

that all solutions found by the solver are synthesized and executed on hardware at the end

of skeleton blocks as long as performance and resource occupation can be determined.

175

163

of the analysis.

Hence,

176

164

in this paper we will apply our methodology on simple DSP chains: a white noise input signal

177

165

In this demonstration, we focus on only two operations: filtering and shifting the number of

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

178

166

bits needed to represent the data along the processing chain.

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been

179

167

We have chosen these basic operations because shifting and the filtering have already been studied

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

180

168

in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

181

169

assessing our results. Furthermore, filtering is a core step in any radiofrequency frontend

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

182

170

requiring pipelined processing at full bandwidth for the earliest steps, including for

allowing to assess either filter rejection for a given resource usage, or validating the rejection

183

171

time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}.

when implementing a solution minimizing resource occupation.

184

172

185

173

Addressing only two operations allows for demonstrating the methodology but should not be

The first step of our approach is to model the DSP chain. Since we aim at only optimizing

186

174

considered as a limitation of the framework which can be extended to assembling any number

the filtering part of the signal processing chain, we have not included the PRN generator or the

187

175

of skeleton blocks as long as performance and resource occupation can be determined.

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

188

176

Hence,

The filtering can be done in two ways, either by considering a single monolithic FIR filter

189

177

in this paper we will apply our methodology on simple DSP chains: a white noise input signal

requiring many coefficients to reach the targeted noise rejection ratio, or by

190

178

is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.

191

179

14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been

192

180

digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --

After each filter we leave the possibility of shifting the filtered data to consume

193

181

practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction

less resources. Hence in the case of cascaded filter, we define a stage as a filter

194

182

by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

195

183

allowing to assess either filter rejection for a given resource usage, or validating the rejection

196

184

when implementing a solution minimizing resource occupation.

\subsection{Model of a FIR filter}

197

185

198

186

The first step of our approach is to model the DSP chain. Since we aim at only optimizing

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

199

187

the filtering part of the signal processing chain, we have not included the PRN generator or the

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

200

188

ADC in the model: the input data size and rate are considered fixed and defined by the hardware.

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

201

189

The filtering can be done in two ways, either by considering a single monolithic FIR filter

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

202

190

requiring many coefficients to reach the targeted noise rejection ratio, or by

shows a filtering stage.

203

191

cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.

204

192

\begin{figure}

205

193

After each filter we leave the possibility of shifting the filtered data to consume

\centering

206

194

less resources. Hence in the case of cascaded filter, we define a stage as a filter

\begin{tikzpicture}[node distance=2cm]

207

195

and a shifter (the shift could be omitted if we do not need to divide the filtered data).

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

208

196

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

209

197

\subsection{Model of a FIR filter}

\node (Start) [left of=FIR] { } ;

210

198

\node (End) [right of=Shift] { } ;

211

199

A cascade of filters is composed of $n$ FIR stages. In stage $i$ ($1 \leq i \leq n$)

212

200

the FIR has $C_i$ coefficients and each coefficient is an integer value with $\pi^C_i$

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

213

201

bits while the filtered data are shifted by $\pi^S_i$ bits. We define also $\pi^-_i$ as

214

202

the size of input data and $\pi^+_i$ as the size of output data. The figure~\ref{fig:fir_stage}

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

215

203

shows a filtering stage.

\draw[->] (FIR) -- (Shift) ;

216

204

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

217

205

\begin{figure}

\end{tikzpicture}

218

206

\centering

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

219

207

\begin{tikzpicture}[node distance=2cm]

\label{fig:fir_stage}

220

208

\node[draw,minimum size=1.3cm] (FIR) { $C_i, \pi_i^C$ } ;

\end{figure}

221

209

\node[draw,minimum size=1.3cm] (Shift) [right of=FIR, ] { $\pi_i^S$ } ;

222

210

\node (Start) [left of=FIR] { } ;

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

223

211

\node (End) [right of=Shift] { } ;

This rejection has been computed using GNU Octave software FIR coefficient design functions

224

212

(\texttt{firls} and \texttt{fir1}).

225

213

\node[draw,fit=(FIR) (Shift)] (Filter) { } ;

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

226

214

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

227

215

\draw[->] (Start) edge node [above] { $\pi_i^-$ } (FIR) ;

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

228

216

\draw[->] (FIR) -- (Shift) ;

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

229

217

\draw[->] (Shift) edge node [above] { $\pi_i^+$ } (End) ;

230

218

\end{tikzpicture}

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

231

219

\caption{A single filter is composed of a FIR (on the left) and a Shifter (on the right)}

transfer function.

232

220

\label{fig:fir_stage}

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

233

221

\end{figure}

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

234

222

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,

235

223

FIR $i$ has been characterized through numerical simulation as able to reject $F(C_i, \pi_i^C)$ dB.

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

236

224

This rejection has been computed using GNU Octave software FIR coefficient design functions

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

237

225

(\texttt{firls} and \texttt{fir1}).

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

238

226

For each configuration $(C_i, \pi_i^C)$, we first create a FIR with floating point coefficients and a given $C_i$ number of coefficients.

shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}

239

227

Then, the floating point coefficients are discretized into integers. In order to ensure that the coefficients are coded on $\pi_i^C$~bits effectively,

as described below is indeed unique for each filter shape.

240

228

the coefficients are normalized by their absolute maximum before being scaled to integer coefficients.

241

229

At least one coefficient is coded on $\pi_i^C$~bits, and in practice only $b_{C_i/2}$ is coded on $\pi_i^C$~bits while the others are coded on much fewer bits.

\begin{figure}

242

230

\begin{center}

243

231

With these coefficients, the \texttt{freqz} function is used to estimate the magnitude of the filter

\scalebox{0.8}{

244

232

transfer function.

\centering

245

233

Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},

\begin{tikzpicture}[scale=0.3]

246

234

the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

247

235

bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

248

236

we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%

249

237

of the Nyquist frequency to the end of the band, as would be typically selected to prevent

\draw (0,14) node [left] { $P$ } ;

250

238

aliasing before decimating the dataflow by 2. The method is however generalized to any filter

\draw (20,0) node [below] { $f$ } ;

251

239

shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}

252

240

as described below is indeed unique for each filter shape.

\draw[>=latex,<->] (0,14) -- (8,14) ;

253

241

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

254

242

\begin{figure}

255

243

\begin{center}

\draw[>=latex,<->] (8,14) -- (12,14) ;

256

244

\scalebox{0.8}{

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

257

245

\centering

258

246

\begin{tikzpicture}[scale=0.3]

\draw[>=latex,<->] (12,14) -- (20,14) ;

259

247

\draw[<->] (0,15) -- (0,0) -- (21,0) ;

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

260

248

\draw[thick] (0,12) -- (8,12) -- (20,0) ;

261

249

\draw[>=latex,<->] (16,12) -- (16,8) ;

262

250

\draw (0,14) node [left] { $P$ } ;

\draw (16,10) node [right] { rejection } ;

263

251

\draw (20,0) node [below] { $f$ } ;

264

252

\draw[dashed] (8,-1) -- (8,14) ;

265

253

\draw[>=latex,<->] (0,14) -- (8,14) ;

\draw[dashed] (12,-1) -- (12,14) ;

266

254

\draw (4,14) node [above] { passband } node [below] { $40\%$ } ;

267

255

\draw[dashed] (8,12) -- (16,12) ;

268

256

\draw[>=latex,<->] (8,14) -- (12,14) ;

\draw[dashed] (12,8) -- (16,8) ;

269

257

\draw (10,14) node [above] { transition } node [below] { $20\%$ } ;

270

258

\end{tikzpicture}

271

259

\draw[>=latex,<->] (12,14) -- (20,14) ;

}

272

260

\draw (16,14) node [above] { stopband } node [below] { $40\%$ } ;

\end{center}

273

261

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

274

262

\draw[>=latex,<->] (16,12) -- (16,8) ;

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

275

263

\draw (16,10) node [right] { rejection } ;

the stopband the last 40\%, allowing 20\% transition width.}

276

264

\label{fig:fir_mag}

277

265

\draw[dashed] (8,-1) -- (8,14) ;

\end{figure}

278

266

\draw[dashed] (12,-1) -- (12,14) ;

279

267

In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.

280

268

\draw[dashed] (8,12) -- (16,12) ;

Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches

281

269

\draw[dashed] (12,8) -- (16,8) ;

overestimate the rejection capability of the filter.

282

270

An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values

283

271

\end{tikzpicture}

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

284

272

}

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).

285

273

\end{center}

In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples

286

274

\caption{Shape of the filter transmitted power $P$ as a function of frequency $f$:

led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band.

287

275

the passband is considered to occupy the initial 40\% of the Nyquist frequency range,

Hence, the final criterion considers the minimal rejection in the stopband to which the

288

276

the stopband the last 40\%, allowing 20\% transition width.}

the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with

289

277

\label{fig:fir_mag}

a 1~dB threshold on the latter quantity over which the filter is discarded.

290

278

\end{figure}

With this

291

279

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

292

280

In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.

The best filter has a correct rejection estimation and the worst filter

293

is discarded based on the excessive passband ripple criterion.

294

281

Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches

295

282

overestimate the rejection capability of the filter.

\begin{figure}

296

283

An intermediate criterion considered the maximal rejection within the stopband, to which the sum of the absolute values

\centering

297

\includegraphics[width=\linewidth]{images/custom_criterion}

298

\caption{Selected filter qualification criterion computed as the maximum rejection in the stopband

299

minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded:

300

comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).}

301

284

within the passband is subtracted to avoid filters with excessive ripples, normalized to the

\label{fig:custom_criterion}

302

285

bin width to remain consistent with the passband criterion (dBc/Hz units in all cases).

\end{figure}

303

286

In this case, cascading too many filters with individual excessive ($>$ 1~dB) passband ripples

304

287

led to unacceptable ($>$ 10~dB) final ripple levels, especially close to the transition band.

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

305

288

Hence, the final criterion considers the minimal rejection in the stopband to which the

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

306

289

the maximal amplitude in the passband (maximum value minus the minimum value) is substracted, with

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

307

290

a 1~dB threshold on the latter quantity over which the filter is discarded.

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

308

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

309

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

310

the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length

311

and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected

312

by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding

313

291

With this

these filters and so is the solution search space.

314

292

criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.

315

293

The best filter has a correct rejection estimation and the worst filter

\begin{figure}

316

294

is discarded based on the excessive passband ripple criterion.

\centering

317

\includegraphics[width=\linewidth]{images/rejection_pyramid}

318

\caption{Filter rejection as a function of number of coefficients and number of bits

319

: this lookup table will be used to identify which filter parameters -- number of bits

320

representing coefficients and number of coefficients -- best match the targeted transfer function. Filters

321

with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive

322

ripples in the passband.}

323

\label{fig:rejection_pyramid}

324

295

\end{figure}

325

296

\begin{figure}

326

297

\centering

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

327

298

\includegraphics[width=\linewidth]{images/custom_criterion}

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

328

299

\caption{Selected filter qualification criterion computed as the maximum rejection in the stopband

If the FIR filter coefficients are the same between the stages, we have:

329

300

minus the maximal ripple amplitude in the passband with a $>$ 1~dB threshold above which the filter is discarded:

$$F_{total} = F_1 + F_2$$

330

301

comparison between monolithic filter (blue, rejected in this case) and cascaded filters (red).}

But selecting two different sets of coefficient will yield a more complex situation in which

331

302

\label{fig:custom_criterion}

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

332

303

\end{figure}

are two different filters with maximums and notches not located at the same frequency offsets.

333

304

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

334

305

Thanks to the latter criterion which will be used in the remainder of this paper, we are able to automatically generate multiple FIR taps

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

335

306

and estimate their rejection. Figure~\ref{fig:rejection_pyramid} exhibits the

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

336

307

rejection as a function of the number of coefficients and the number of bits representing these coefficients.

criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,

337

308

The curve shaped as a pyramid exhibits optimum configurations sets at the vertex where both edges meet.

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

338

309

Indeed for a given number of coefficients, increasing the number of bits over the edge will not improve the rejection.

of the filter cascade to meet design criteria.

339

310

Conversely when setting the a given number of bits, increasing the number of coefficients will not improve

340

311

the rejection. Hence the best coefficient set are on the vertex of the pyramid. Notice that the word length

\begin{figure}

341

312

and number of coefficients do not start at 1: filters with too few coefficients or too little tap word size are rejected

\centering

342

313

by the excessive ripple constraint of the criterion. Hence, the size of the pyramid is significantly reduced by discarding

\includegraphics[width=\linewidth]{images/cascaded_criterion}

343

314

these filters and so is the solution search space.

\caption{Transfer function of individual filters and after cascading the two filters,

344

315

demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal

345

316

\begin{figure}

lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop

346

317

\centering

maximum of each individual filter.

347

318

\includegraphics[width=\linewidth]{images/rejection_pyramid}

}

348

319

\caption{Filter rejection as a function of number of coefficients and number of bits

\label{fig:sum_rejection}

349

320

: this lookup table will be used to identify which filter parameters -- number of bits

\end{figure}

350

321

representing coefficients and number of coefficients -- best match the targeted transfer function. Filters

351

322

with fewer than 10~taps or with coefficients coded on fewer than 5~bits are discarded due to excessive

Finally in our case, we consider that the input signal are fully known. The

352

323

ripples in the passband.}

resolution of the input data stream are fixed and still the same for all experiments

353

324

\label{fig:rejection_pyramid}

in this paper.

354

325

\end{figure}

355

326

Based on this analysis, we address the estimate of resource consumption (called

356

327

Although we have an efficient criterion to estimate the rejection of one set of coefficients (taps),

silicon area -- in the case of FPGAs this means processing cells) as a function of

357

328

we have a problem when we cascade filters and estimate the criterion as a sum two or more individual criteria.

filter characteristics. As a reminder, we do not aim at matching actual hardware

358

329

If the FIR filter coefficients are the same between the stages, we have:

configuration but consider an arbitrary silicon area occupied by each processing function,

359

330

$$F_{total} = F_1 + F_2$$

and will assess after synthesis the adequation of this arbitrary unit with actual

360

331

But selecting two different sets of coefficient will yield a more complex situation in which

hardware resources provided by FPGA manufacturers. The sum of individual processing

361

332

the previous relation is no longer valid as illustrated on figure~\ref{fig:sum_rejection}. The red and blue curves

unit areas is constrained by a total silicon area representative of FPGA global resources.

362

333

are two different filters with maximums and notches not located at the same frequency offsets.

Formally, variable $a_i$ is the area taken by filter~$i$

363

334

Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

364

335

with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

365

366

336

Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection

\begin{align}

367

337

criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

368

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

369

338

this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

370

339

of the filter cascade to meet design criteria.

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

371

340

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

372

341

\begin{figure}

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

373

342

\centering

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

374

343

\includegraphics[width=\linewidth]{images/cascaded_criterion}

\pi_1^- &= \Pi^I \label{eq:init}

375

344

\caption{Transfer function of individual filters and after cascading the two filters,

\end{align}

376

345

demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal

377

346

lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop

Equation~\ref{eq:area} states that the total area taken by the filters must be

378

347

maximum of each individual filter.

less than the available area. Equation~\ref{eq:areadef} gives the definition of

379

348

}

the area used by a filter, considered as the area of the FIR since the Shifter is

380

349

\label{fig:sum_rejection}

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

381

350

\end{figure}

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

382

351

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

383

352

Finally in our case, we consider that the input signal are fully known. The

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

384

353

resolution of the input data stream are fixed and still the same for all experiments

previously. The Shifter does not introduce negative rejection as we will explain later,

385

354

in this paper.

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

386

355

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

387

356

Based on this analysis, we address the estimate of resource consumption (called

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

388

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

389

357

silicon area -- in the case of FPGAs this means processing cells) as a function of

a filter is the same as the input number of bits of the next filter.

390

358

filter characteristics. As a reminder, we do not aim at matching actual hardware

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

391

359

configuration but consider an arbitrary silicon area occupied by each processing function,

rejection. Indeed, the results of the FIR can be right shifted without compromising

392

360

and will assess after synthesis the adequation of this arbitrary unit with actual

the quality of the rejection until a threshold. Each bit of the output data

393

361

hardware resources provided by FPGA manufacturers. The sum of individual processing

increases the maximum rejection level by 6~dB. We add one to take the sign bit

394

362

unit areas is constrained by a total silicon area representative of FPGA global resources.

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

395

363

Formally, variable $a_i$ is the area taken by filter~$i$

shift too much and introduce some noise in the output data. Each supplementary

396

364

(in arbitrary unit). Variable $r_i$ is the rejection of filter~$i$ (in dB).

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

397

365

Constant $\mathcal{A}$ is the total available area. We model our problem as follows:

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

398

366

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

399

367

\begin{align}

400

368

\text{Maximize } & \sum_{i=1}^n r_i \notag \\

This model is non-linear since we multiply some variable with another variable

401

369

\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\

and it is even non-quadratic, as the cost function $F$ does not have a known

402

370

a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

403

371

r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\

This variable $p$ is defined by the user, and represents the number of different

404

372

\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

405

373

\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\

functions from GNU Octave) based on the targeted filter characteristics and implementation

406

374

\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

407

375

\pi_1^- &= \Pi^I \label{eq:init}

$\pi_{ij}^C$ become constants and

408

376

\end{align}

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

409

377

for each configurations thanks to the rejection criterion. We also define the binary

410

378

Equation~\ref{eq:area} states that the total area taken by the filters must be

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

411

379

less than the available area. Equation~\ref{eq:areadef} gives the definition of

and 0 otherwise. The new equations are as follows:

412

380

the area used by a filter, considered as the area of the FIR since the Shifter is

413

381

assumed not to require significant resources. We consider that the FIR needs $C_i$ registers of size

\begin{align}

414

382

$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

415

383

input data with the coefficients. Equation~\ref{eq:rejectiondef} gives the

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

416

384

definition of the rejection of the filter thanks to the tabulated function~$F$ that we defined

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

417

385

previously. The Shifter does not introduce negative rejection as we will explain later,

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

418

386

so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the

\end{align}

419

387

relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add

420

388

$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

421

389

$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

422

390

a filter is the same as the input number of bits of the next filter.

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

423

391

Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative

424

392

rejection. Indeed, the results of the FIR can be right shifted without compromising

The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

425

393

the quality of the rejection until a threshold. Each bit of the output data

we multiply

426

394

increases the maximum rejection level by 6~dB. We add one to take the sign bit

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

427

395

into account. If equation~\ref{eq:maxshift} was not present, the Shifter could

linearize this multiplication. The following formula shows how to linearize

428

396

shift too much and introduce some noise in the output data. Each supplementary

this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):

429

397

shift bit would cause an additional 6~dB rejection rise. A totally equivalent equation is:

\begin{equation*}

430

398

$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.

m = x \times y \implies

431

399

Finally, equation~\ref{eq:init} gives the number of bits of the global input.

\left \{

432

400

\begin{split}

433

401

This model is non-linear since we multiply some variable with another variable

m & \geq 0 \\

434

402

and it is even non-quadratic, as the cost function $F$ does not have a known

m & \leq y \times X^{max} \\

435

403

linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.

m & \leq x \\

436

m & \geq x - (1 - y) \times X^{max} \\

437

\end{split}

438

\right .

439

\end{equation*}

440

So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is

441

assumed on hardware characteristics,

442

the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize

443

for us the quadratic problem so the model is left as is. This model

444

has $O(np)$ variables and $O(n)$ constraints.

445

404

This variable $p$ is defined by the user, and represents the number of different

446

405

set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}

Two problems will be addressed using the workflow described in the next section: on the one

447

406

functions from GNU Octave) based on the targeted filter characteristics and implementation

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

448

407

assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and

silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

449

408

$\pi_{ij}^C$ become constants and

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

450

409

we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)

objective function is replaced with:

451

410

for each configurations thanks to the rejection criterion. We also define the binary

\begin{align}

452

411

variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$

\text{Minimize } & \sum_{i=1}^n a_i \notag

453

412

and 0 otherwise. The new equations are as follows:

\end{align}

454

413

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

455

414

\begin{align}

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

456

415

a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\

rejection required.

457

416

r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\

458

417

\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\

\begin{align}

459

418

\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

460

419

\end{align}

461

420

462

421

Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace

\section{Design workflow}

463

422

respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.

\label{sec:workflow}

464

423

Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.

465

424

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

466

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

467

in the computation of the results.

468

469

\begin{figure}

470

\centering

471

\begin{tikzpicture}[node distance=0.75cm and 2cm]

472

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

473

\node (Start) [left= 3cm of Solver] { } ;

474

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

475

\node (Input) [above= of TCL] { } ;

476

425

The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

477

426

we multiply

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

478

427

$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

479

428

linearize this multiplication. The following formula shows how to linearize

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

480

429

this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):

\node (Results) [left= of Postproc] { } ;

481

430

\begin{equation*}

482

431

m = x \times y \implies

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

483

432

\left \{

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

484

433

\begin{split}

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

485

434

m & \geq 0 \\

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

486

435

m & \leq y \times X^{max} \\

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

487

436

m & \leq x \\

\draw[->,dashed] (Bitstream) -- (Deploy) ;

488

437

m & \geq x - (1 - y) \times X^{max} \\

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

489

438

\end{split}

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

490

439

\right .

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

491

440

\end{equation*}

\draw[->] (Postproc) -- (Results) ;

492

441

So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose estimation is

\end{tikzpicture}

493

442

assumed on hardware characteristics,

\caption{Design workflow from the input parameters to the results allowing for

494

443

the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize

a fully automated optimal solution search.}

495

444

for us the quadratic problem so the model is left as is. This model

\label{fig:workflow}

496

445

has $O(np)$ variables and $O(n)$ constraints.

\end{figure}

497

446

498

The filter solver is a C++ program that takes as input the maximum area

499

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

500

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

501

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

502

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

503

and a deploy script ((1b) on figure~\ref{fig:workflow}).

504

505

The TCL script describes the whole digital processing chain from the beginning

506

(the raw signal data) to the end (the filtered data) in a language compatible

507

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

508

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

509

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

510

Then the script builds each stage of the chain with a generic FIR task that

511

comes from a skeleton library. The generic FIR is highly configurable

512

with the number of coefficients and the size of the coefficients. The coefficients

513

themselves are not stored in the script.

514

As the signal is processed in real-time, the output signal is stored as

515

consecutive bursts of data for post-processing, mainly assessing the consistency of the

516

implemented FIR cascade transfer function with the design criteria and the expected

517

transfer function.

518

519

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

520

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

521

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

522

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

523

provide a broadband noise source.

524

The board runs the Linux kernel and surrounding environment produced from the

525

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

526

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

527

fetching the results is automated.

528

447

Two problems will be addressed using the workflow described in the next section: on the one

529

448

hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary

The deploy script uploads the bitstream to the board ((3) on

530

449

silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

531

450

for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the

configures the coefficients of the FIR filters. It then waits for the results

532

451

objective function is replaced with:

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

533

452

\begin{align}

534

453

\text{Minimize } & \sum_{i=1}^n a_i \notag

Finally, an Octave post-processing script computes the final results thanks to

535

454

\end{align}

the output data ((5) on figure~\ref{fig:workflow}).

536

455

We adapt our constraints of quadratic program to replace equation \ref{eq:area}

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

537

456

with equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal

and the different configurations can be compared.

538

457

rejection required.

539

458

\section{Maximizing the rejection at fixed silicon area}

540

459

\begin{align}

\label{sec:fixed_area}

541

460

\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}

This section presents the output of the filter solver {\em i.e.} the computed

542

461

\end{align}

configurations for each stage, the computed rejection and the computed silicon area.

543

462

Such results allow for understanding the choices made by the solver to compute its solutions.

544

463

\section{Design workflow}

545

464

\label{sec:workflow}

The experimental setup is composed of three cases. The raw input is generated

546

465

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

547

466

In this section, we describe the workflow to compute all the results presented in sections~\ref{sec:fixed_area}

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

548

467

and \ref{sec:fixed_rej}. Figure~\ref{fig:workflow} shows the global workflow and the different steps involved

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

549

468

in the computation of the results.

The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$

550

469

ranging from 2 to 22. In each case, the quadratic program has been able to give a

551

470

\begin{figure}

result up to five stages ($n = 5$) in the cascaded filter.

552

471

\centering

553

472

\begin{tikzpicture}[node distance=0.75cm and 2cm]

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

554

473

\node[draw,minimum size=1cm] (Solver) { Filter Solver } ;

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

555

474

\node (Start) [left= 3cm of Solver] { } ;

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

556

475

\node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;

557

476

\node (Input) [above= of TCL] { } ;

\renewcommand{\arraystretch}{1.4}

558

477

\node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;

559

478

\node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;

\begin{table}

560

479

\node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

561

480

\node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;

\label{tbl:gurobi_max_500}

562

481

\node (Results) [left= of Postproc] { } ;

\centering

563

482

{\scalefont{0.77}

564

483

\draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;

\begin{tabular}{|c|ccccc|c|c|}

565

484

\draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;

\hline

566

485

\draw[->] (Solver) edge node [below] { (1a) } (TCL) ;

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

567

486

\draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;

\hline

568

487

\draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

569

488

\draw[->,dashed] (Bitstream) -- (Deploy) ;

2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\

570

489

\draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;

3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

571

490

\draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;

4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

572

491

\draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;

5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

573

492

\draw[->] (Postproc) -- (Results) ;

\hline

574

493

\end{tikzpicture}

\end{tabular}

575

494

\caption{Design workflow from the input parameters to the results allowing for

}

576

495

a fully automated optimal solution search.}

\end{table}

577

496

\label{fig:workflow}

578

497

\end{figure}

\begin{table}

579

498

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

580

499

The filter solver is a C++ program that takes as input the maximum area

\label{tbl:gurobi_max_1000}

581

500

$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,

\centering

582

501

the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates

{\scalefont{0.77}

583

502

the quadratic programs and uses the Gurobi solver to estimate the optimal results.

\begin{tabular}{|c|ccccc|c|c|}

584

503

Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})

\hline

585

504

and a deploy script ((1b) on figure~\ref{fig:workflow}).

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

586

505

\hline

587

506

The TCL script describes the whole digital processing chain from the beginning

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

588

507

(the raw signal data) to the end (the filtered data) in a language compatible

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

589

508

with proprietary synthesis software, namely Vivado for Xilinx and Quartus for

3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\

590

509

Intel/Altera. The raw input data generated from a 20-bit Pseudo Random Number (PRN)

4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

591

510

generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.

5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

592

511

Then the script builds each stage of the chain with a generic FIR task that

\hline

593

512

comes from a skeleton library. The generic FIR is highly configurable

\end{tabular}

594

513

with the number of coefficients and the size of the coefficients. The coefficients

}

595

514

themselves are not stored in the script.

\end{table}

596

515

As the signal is processed in real-time, the output signal is stored as

597

516

consecutive bursts of data for post-processing, mainly assessing the consistency of the

\begin{table}

598

517

implemented FIR cascade transfer function with the design criteria and the expected

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

599

518

transfer function.

\label{tbl:gurobi_max_1500}

600

519

\centering

601

520

The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).

{\scalefont{0.77}

602

521

We use the 2018.2 version of Xilinx Vivado and we execute the synthesized

\begin{tabular}{|c|ccccc|c|c|}

603

522

bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series

\hline

604

523

FPGA (xc7z010clg400-1) and two LTC2145 14-bit 125~MS/s ADC, loaded with 50~$\Omega$ resistors to

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

605

524

provide a broadband noise source.

\hline

606

525

The board runs the Linux kernel and surrounding environment produced from the

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

607

526

Buildroot framework available at \url{https://github.com/trabucayre/redpitaya/}: configuring

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\

608

527

the Zynq FPGA, feeding the FIR with the set of coefficients, executing the simulation and

3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\

609

528

fetching the results is automated.

4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

610

529

5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

611

530

The deploy script uploads the bitstream to the board ((3) on

\hline

612

531

figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,

\end{tabular}

613

532

configures the coefficients of the FIR filters. It then waits for the results

}

614

533

and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).

\end{table}

615

534

616

535

Finally, an Octave post-processing script computes the final results thanks to

\renewcommand{\arraystretch}{1}

617

536

the output data ((5) on figure~\ref{fig:workflow}).

618

537

The results are normalized so that the Power Spectrum Density (PSD) starts at zero

By analyzing these tables, we can first state that we reach an optimal solution

619

538

and the different configurations can be compared.

for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover

620

539

the cascaded filters always exhibit better performance than the monolithic solution.

621

540

\section{Maximizing the rejection at fixed silicon area}

It was an expected result as it has

622

541

\label{sec:fixed_area}

been previously observed that many small filters are better than

623

542

This section presents the output of the filter solver {\em i.e.} the computed

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

624

543

configurations for each stage, the computed rejection and the computed silicon area.

being hardly used in practice due to the lack of tools for identifying individual filter

625

544

Such results allow for understanding the choices made by the solver to compute its solutions.

coefficients in the cascaded approach.

626

545

627

546

The experimental setup is composed of three cases. The raw input is generated

Second, the larger the silicon area, the better the rejection. This was also an

628

547

by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.

expected result as more area means a filter of better quality with more coefficients

629

548

Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500

or more bits per coefficient.

630

549

arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.

631

550

The number of configurations $p$ is 1133, with $C_i$ ranging from 3 to 60 and $\pi^C$

Then, we also observe that the first stage can have a larger shift than the other

632

551

ranging from 2 to 22. In each case, the quadratic program has been able to give a

stages. This is explained by the fact that the solver tries to use just enough

633

552

result up to five stages ($n = 5$) in the cascaded filter.

bits for the computed rejection after each stage. In the first stage, a

634

553

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

635

554

Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.

gives the relation between both values.

636

555

Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.

637

556

Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.

Finally, we note that the solver consumes all the given silicon area.

638

557

639

558

\renewcommand{\arraystretch}{1.4}

The following graphs present the rejection for real data on the FPGA. In all the following

640

559

figures, the solid line represents the actual rejection of the filtered

641

560

\begin{table}

data on the FPGA as measured experimentally and the dashed line are the noise levels

642

561

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}

given by the quadratic solver. The configurations are those computed in the previous section.

643

562

\label{tbl:gurobi_max_500}

644

563

\centering

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

645

564

{\scalefont{0.77}

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

646

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

647

565

\begin{tabular}{|c|ccccc|c|c|}

648

566

\hline

\begin{figure}

649

567

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\centering

650

568

\hline

\begin{subfigure}{\linewidth}

651

569

1 & (21, 7, 0) & - & - & - & - & 32~dB & 483 \\

\includegraphics[width=\linewidth]{images/max_500}

652

570

2 & (3, 5, 18) & (33, 10, 0) & - & - & - & 48~dB & 492 \\

\caption{Filter transfer functions for varying number of cascaded filters solving

653

571

3 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}

654

572

4 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

\label{fig:max_500_result}

655

573

5 & (3, 5, 18) & (19, 7, 1) & (15, 7, 0) & - & - & 56~dB & 493 \\

\end{subfigure}

656

574

\hline

657

575

\end{tabular}

\begin{subfigure}{\linewidth}

658

576

}

\includegraphics[width=\linewidth]{images/max_1000}

659

577

\end{table}

\caption{Filter transfer functions for varying number of cascaded filters solving

660

578

the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}

661

579

\begin{table}

\label{fig:max_1000_result}

662

580

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}

\end{subfigure}

663

581

\label{tbl:gurobi_max_1000}

664

582

\centering

\begin{subfigure}{\linewidth}

665

583

{\scalefont{0.77}

\includegraphics[width=\linewidth]{images/max_1500}

666

584

\begin{tabular}{|c|ccccc|c|c|}

\caption{Filter transfer functions for varying number of cascaded filters solving

667

585

\hline

the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}

668

586

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\label{fig:max_1500_result}

669

587

\hline

\end{subfigure}

670

588

1 & (37, 11, 0) & - & - & - & - & 56~dB & 999 \\

\caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing

671

589

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

rejection for a given resource allocation.

672

590

3 & (3, 13, 26) & (31, 9, 1) & (27, 9, 0) & - & - & 92~dB & 999 \\

The filter shape constraint (bandpass and bandstop) is shown as thick

673

591

4 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

horizontal lines on each chart.}

674

592

5 & (3, 5, 18) & (19, 7, 1) & (19, 7, 0) & (19, 7, 0) & - & 98~dB & 994 \\

\end{figure}

675

593

\hline

676

594

\end{tabular}

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

677

595

}

678

596

\end{table}

We compare the actual silicon resources given by Vivado to the

679

597

resources in arbitrary units.

680

598

\begin{table}

The goal is to check that our arbitrary units of silicon area models well enough

681

599

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}

the real resources on the FPGA. Especially we want to verify that, for a given

682

600

\label{tbl:gurobi_max_1500}

number of arbitrary units, the actual silicon resources do not depend on the

683

601

\centering

number of stages $n$. Most significantly, our approach aims

684

602

{\scalefont{0.77}

at remaining far enough from the practical logic gate implementation used by

685

603

\begin{tabular}{|c|ccccc|c|c|}

various vendors to remain platform independent and be portable from one

686

604

\hline

architecture to another.

687

605

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

688

606

\hline

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

689

607

1 & (47, 15, 0) & - & - & - & - & 71~dB & 1457 \\

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

690

608

2 & (19, 6, 15) & (51, 14, 0) & - & - & - & 102~dB & 1489 \\

and 1500 arbitrary units. We have taken care to extract solely the resources used by

691

609

3 & (15, 9, 18) & (31, 8, 0) & (27, 9, 0) & - & - & 116~dB & 1488 \\

the FIR filters and remove additional processing blocks including FIFO and Programmable

692

610

4 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

693

611

5 & (3, 9, 22) & (31, 9, 1) & (27, 9, 0) & (19, 7, 0) & - & 125~dB & 1500 \\

694

612

\hline

\begin{table}[h!tb]

695

613

\end{tabular}

\caption{Resource occupation following synthesis of the solutions found for

696

614

}

the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

697

615

\end{table}

\label{tbl:resources_usage}

698

616

\centering

699

617

\renewcommand{\arraystretch}{1}

\begin{tabular}{|c|c|ccc|c|}

700

618

\hline

701

619

By analyzing these tables, we can first state that we reach an optimal solution

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

702

& LUT & 249 & 453 & 627 & \emph{17600} \\

703

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

704

620

for each case : $n = 3$ for MAX/500, and $n = 4$ for MAX/1000 and MAX/1500. Moreover

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

705

621

the cascaded filters always exhibit better performance than the monolithic solution.

& LUT & 2253 & 474 & 691 & \emph{17600} \\

706

622

It was an expected result as it has

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

707

623

been previously observed that many small filters are better than

& DSP & 0 & 50 & 70 & \emph{80} \\ \hline

708

624

a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusions

& LUT & 1329 & 2006 & 3158 & \emph{17600} \\

709

625

being hardly used in practice due to the lack of tools for identifying individual filter

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

710

626

coefficients in the cascaded approach.

& DSP & 15 & 30 & 42 & \emph{80} \\ \hline

711

627

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

712

628

Second, the larger the silicon area, the better the rejection. This was also an

4 & BRAM & 3 & 4 & 4 & \emph{120} \\

713

629

expected result as more area means a filter of better quality with more coefficients

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

714

630

or more bits per coefficient.

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

715

631

5 & BRAM & 3 & 4 & 4 & \emph{120} \\

716

632

Then, we also observe that the first stage can have a larger shift than the other

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

717

633

stages. This is explained by the fact that the solver tries to use just enough

\end{tabular}

718

634

bits for the computed rejection after each stage. In the first stage, a

\end{table}

719

635

balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}

720

636

gives the relation between both values.

In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

721

637

when the filter coefficients are small enough, or when the input size is small

722

638

Finally, we note that the solver consumes all the given silicon area.

enough, Vivado optimizes resource consumption by selecting multiplexers to

723

639

implement the multiplications instead of a DSP. In this case, it is quite difficult

724

640

The following graphs present the rejection for real data on the FPGA. In all the following

to compare the whole silicon budget.

725

641

figures, the solid line represents the actual rejection of the filtered

726

642

data on the FPGA as measured experimentally and the dashed line are the noise levels

However, a rough estimation can be made with a simple equivalence: looking at

727

643

given by the quadratic solver. The configurations are those computed in the previous section.

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

728

644

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

729

645

Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.

area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,

730

646

Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

731

647

Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

732

648

unit map well to actual hardware resources. The relatively small differences can probably be explained

733

by the optimizations done by Vivado based on the detailed map of available processing resources.

734

735

We now present the computation time needed to solve the quadratic problem.

736

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

737

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

738

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

739

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

740

741

\begin{table}[h!tb]

742

\caption{Time needed to solve the quadratic program with Gurobi}

743

\label{tbl:area_time}

744

\centering

745

\begin{tabular}{|c|c|c|c|}\hline

746

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

747

1 & 0.01~s & 0.02~s & 0.03~s \\

748

2 & 0.1~s & 1~s & 2~s \\

749

3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\

750

4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\

751

5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline

752

\end{tabular}

753

\end{table}

754

755

649

\begin{figure}

As expected, the computation time seems to rise exponentially with the number of stages.

756

650

\centering

When the area is limited, the design exploration space is more limited and the solver is able to

757

651

\begin{subfigure}{\linewidth}

find an optimal solution faster.

758

652

\includegraphics[width=\linewidth]{images/max_500}

We also notice that the solution with $n$ greater than the optimal value

759

653

\caption{Filter transfer functions for varying number of cascaded filters solving

takes more time to be found than the optimal one. This can be explained since the search space is

760

654

the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}

larger and we need more time to ensure that the previous solution (from the

761

655

\label{fig:max_500_result}

smaller value of $n$) still remains the optimal solution.

762

656

\end{subfigure}

763

657

\subsection{Minimizing resource occupation at fixed rejection}

764

658

\begin{subfigure}{\linewidth}

\label{sec:fixed_rej}

765

659

\includegraphics[width=\linewidth]{images/max_1000}

766

660

\caption{Filter transfer functions for varying number of cascaded filters solving

This section presents the results of the complementary quadratic program aimed at

767

661

the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}

minimizing the area occupation for a targeted rejection level.

768

662

\label{fig:max_1000_result}

769

663

\end{subfigure}

The experimental setup is composed of four cases. The raw input is the same

770

664

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

771

665

\begin{subfigure}{\linewidth}

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

772

666

\includegraphics[width=\linewidth]{images/max_1500}

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

773

667

\caption{Filter transfer functions for varying number of cascaded filters solving

The number of configurations $p$ is the same as previous section.

774

668

the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}

775

669

\label{fig:max_1500_result}

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

776

670

\end{subfigure}

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

777

671

\caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

778

672

rejection for a given resource allocation.

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

779

673

The filter shape constraint (bandpass and bandstop) is shown as thick

780

674

horizontal lines on each chart.}

\renewcommand{\arraystretch}{1.4}

781

675

\end{figure}

782

676

\begin{table}[h!tb]

783

677

In all cases, we observe that the actual rejection is close to the rejection computed by the solver.

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

784

678

\label{tbl:gurobi_min_40}

785

679

We compare the actual silicon resources given by Vivado to the

\centering

786

680

resources in arbitrary units.

{\scalefont{0.77}

787

681

The goal is to check that our arbitrary units of silicon area models well enough

\begin{tabular}{|c|ccccc|c|c|}

788

682

the real resources on the FPGA. Especially we want to verify that, for a given

\hline

789

683

number of arbitrary units, the actual silicon resources do not depend on the

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

790

684

number of stages $n$. Most significantly, our approach aims

\hline

791

685

at remaining far enough from the practical logic gate implementation used by

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

792

686

various vendors to remain platform independent and be portable from one

2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

793

687

architecture to another.

3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

794

688

4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

795

689

Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and

5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

796

690

MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000

\hline

797

691

and 1500 arbitrary units. We have taken care to extract solely the resources used by

\end{tabular}

798

692

the FIR filters and remove additional processing blocks including FIFO and Programmable

}

799

693

Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.

\end{table}

800

694

801

695

\begin{table}[h!tb]

802

696

\caption{Resource occupation following synthesis of the solutions found for

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

803

697

the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

\label{tbl:gurobi_min_60}

804

698

\label{tbl:resources_usage}

\centering

805

{\scalefont{0.77}

806

699

\centering

\begin{tabular}{|c|ccccc|c|c|}

807

700

\begin{tabular}{|c|c|ccc|c|}

\hline

808

701

\hline

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

809

702

$n$ & & MAX/500 & MAX/1000 & MAX/1500 & \emph{Zynq 7010} \\ \hline\hline

\hline

810

703

& LUT & 249 & 453 & 627 & \emph{17600} \\

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

811

704

1 & BRAM & 1 & 1 & 1 & \emph{120} \\

2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\

812

705

& DSP & 21 & 37 & 47 & \emph{80} \\ \hline

3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

813

706

& LUT & 2253 & 474 & 691 & \emph{17600} \\

4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

814

707

2 & BRAM & 2 & 2 & 2 & \emph{120} \\

5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

815

708

& DSP & 0 & 50 & 70 & \emph{80} \\ \hline

\hline

816

709

& LUT & 1329 & 2006 & 3158 & \emph{17600} \\

\end{tabular}

817

710

3 & BRAM & 3 & 3 & 3 & \emph{120} \\

}

818

711

& DSP & 15 & 30 & 42 & \emph{80} \\ \hline

\end{table}

819

712

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

820

713

4 & BRAM & 3 & 4 & 4 & \emph{120} \\

\begin{table}[h!tb]

821

714

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

822

715

& LUT & 1329 & 1600 & 2260 & \emph{17600} \\

\label{tbl:gurobi_min_80}

823

716

5 & BRAM & 3 & 4 & 4 & \emph{120} \\

\centering

824

717

& DPS & 15 & 38 & 49 & \emph{80} \\ \hline

{\scalefont{0.77}

825

718

\end{tabular}

\begin{tabular}{|c|ccccc|c|c|}

826

719

\end{table}

\hline

827

720

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

828

721

In case $n = 2$ for MAX/500, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,

\hline

829

722

when the filter coefficients are small enough, or when the input size is small

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

830

723

enough, Vivado optimizes resource consumption by selecting multiplexers to

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

831

724

implement the multiplications instead of a DSP. In this case, it is quite difficult

3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

832

725

to compare the whole silicon budget.

4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

833

726

5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

834

727

However, a rough estimation can be made with a simple equivalence: looking at

\hline

835

728

the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,

\end{tabular}

836

729

we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon

}

837

730

area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,

\end{table}

838

731

1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond

839

732

to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary

\begin{table}[h!tb]

840

733

unit map well to actual hardware resources. The relatively small differences can probably be explained

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

841

734

by the optimizations done by Vivado based on the detailed map of available processing resources.

\label{tbl:gurobi_min_100}

842

735

\centering

843

736

We now present the computation time needed to solve the quadratic problem.

{\scalefont{0.77}

844

737

For each case, the filter solver software is executed on a Intel(R) Xeon(R) CPU E5606

\begin{tabular}{|c|ccccc|c|c|}

845

738

clocked at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve

\hline

846

739

the quadratic problem. Table~\ref{tbl:area_time} shows the time needed to solve the quadratic

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

847

740

problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.

\hline

848

741

1 & - & - & - & - & - & - & - \\

849

742

\begin{table}[h!tb]

2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\

850

743

\caption{Time needed to solve the quadratic program with Gurobi}

3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\

851

744

\label{tbl:area_time}

4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

852

745

\centering

5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

853

\hline

854

746

\begin{tabular}{|c|c|c|c|}\hline

\end{tabular}

855

747

$n$ & Time (MAX/500) & Time (MAX/1000) & Time (MAX/1500) \\\hline\hline

}

856

748

1 & 0.01~s & 0.02~s & 0.03~s \\

\end{table}

857

749

2 & 0.1~s & 1~s & 2~s \\

\renewcommand{\arraystretch}{1}

858

750

3 & 5~s & 27~s & 351~s ($\approx$ 6~min) \\

859

751

4 & 4~s & 141~s ($\approx$ 3~min) & 1134~s ($\approx$ 18~min) \\

From these tables, we can first state that almost all configurations reach the targeted rejection

860

752

5 & 6~s & 630~s ($\approx$ 10~min) & 49400~s ($\approx$ 13~h) \\\hline

level or even better thanks to our underestimate of the cascade rejection as the sum of the

861

753

\end{tabular}

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

862

754

\end{table}

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

863

755

Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters

864

756

As expected, the computation time seems to rise exponentially with the number of stages.

(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection

865

757

When the area is limited, the design exploration space is more limited and the solver is able to

respectively). More generally, the more filters are cascaded, the lower the occupied area.

866

758

find an optimal solution faster.

867

759

We also notice that the solution with $n$ greater than the optimal value

Like in previous section, the solver chooses always a little filter as first

868

760

takes more time to be found than the optimal one. This can be explained since the search space is

filter stage and the second one is often the biggest filter. This choice can be explained

869

761

larger and we need more time to ensure that the previous solution (from the

as in the previous section, with the solver using just enough bits not to degrade the input

870

762

smaller value of $n$) still remains the optimal solution.

signal and in the second filter selecting a better filter to improve rejection without

871

763

having too many bits in the output data.

872

764

\subsection{Minimizing resource occupation at fixed rejection}

765

\label{sec:fixed_rej}

873

766

For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,

874

767

This section presents the results of the complementary quadratic program aimed at

for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions

875

768

minimizing the area occupation for a targeted rejection level.

when $n$ is greater than this optimal $n$ remain identical to the optimal one.

876

769

877

770

The experimental setup is composed of four cases. The raw input is the same

The following graphs present the rejection for real data on the FPGA. In all the following

878

771

as in the previous section, from a PRN generator, which fixes the input data size $\Pi^I$.

figures, the solid line represents the actual rejection of the filtered

879

772

Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60, 80 or 100~dB.

data on the FPGA as measured experimentally and the dashed line is the noise level

880

773

Hence, the three cases have been named: MIN/40, MIN/60, MIN/80 and MIN/100.

given by the quadratic solver.

881

774

The number of configurations $p$ is the same as previous section.

882

775

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

883

776

Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

884

777

Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

885

778

Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

886

779

Table~\ref{tbl:gurobi_min_100} shows the results obtained by the filter solver for MIN/100.

887

780

\begin{figure}

888

781

\renewcommand{\arraystretch}{1.4}

\centering

889

782

\begin{subfigure}{\linewidth}

890

783

\begin{table}[h!tb]

\includegraphics[width=.91\linewidth]{images/min_40}

891

784

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}

\caption{Filter transfer functions for varying number of cascaded filters solving

892

785

\label{tbl:gurobi_min_40}

the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}

893

786

\centering

\label{fig:min_40}

894

787

{\scalefont{0.77}

\end{subfigure}

895

788

\begin{tabular}{|c|ccccc|c|c|}

896

789

\hline

\begin{subfigure}{\linewidth}

897

790

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\includegraphics[width=.91\linewidth]{images/min_60}

898

791

\hline

\caption{Filter transfer functions for varying number of cascaded filters solving

899

792

1 & (27, 8, 0) & - & - & - & - & 41~dB & 648 \\

the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}

900

793

2 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

\label{fig:min_60}

901

794

3 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

\end{subfigure}

902

795

4 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

903

796

5 & (3, 5, 18) & (27, 8, 0) & - & - & - & 42~dB & 360 \\

\begin{subfigure}{\linewidth}

904

797

\hline

\includegraphics[width=.91\linewidth]{images/min_80}

905

798

\end{tabular}

\caption{Filter transfer functions for varying number of cascaded filters solving

906

799

}

the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}

907

800

\end{table}

\label{fig:min_80}

908

801

\end{subfigure}

909

802

\begin{table}[h!tb]

910

803

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}

\begin{subfigure}{\linewidth}

911

804

\label{tbl:gurobi_min_60}

\includegraphics[width=.91\linewidth]{images/min_100}

912

805

\centering

\caption{Filter transfer functions for varying number of cascaded filters solving

913

806

{\scalefont{0.77}

the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}

914

807

\begin{tabular}{|c|ccccc|c|c|}

\label{fig:min_100}

915

808

\hline

\end{subfigure}

916

809

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a

917

810

\hline

given rejection while minimizing resource allocation. The filter shape constraint (bandpass and

918

811

1 & (39, 13, 0) & - & - & - & - & 60~dB & 1131 \\

bandstop) is shown as thick

919

812

2 & (15, 6, 16) & (23, 9, 0) & - & - & - & 60~dB & 675 \\

horizontal lines on each chart.}

920

813

3 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

\end{figure}

921

814

4 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

922

815

5 & (3, 5, 18) & (15, 6, 2) & (23, 8, 0) & - & - & 60~dB & 543 \\

We observe that all rejections given by the quadratic solver are close to the experimentally

923

816

\hline

measured rejection. All curves prove that the constraint to reach the target rejection is

924

817

\end{tabular}

respected with both monolithic (except in MIN/100 which has no monolithic solution) or cascaded filters.

925

818

}

926

819

\end{table}

Table~\ref{tbl:resources_usage} shows the resource usage in the case of MIN/40, MIN/60;

927

820

MIN/80 and MIN/100 \emph{i.e.} when the target rejection is fixed to 40, 60, 80 and 100~dB. We

928

821

\begin{table}[h!tb]

have taken care to extract solely the resources used by

929

822

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}

the FIR filters and remove additional processing blocks including FIFO and PL to

930

823

\label{tbl:gurobi_min_80}

PS communication.

931

824

\centering

932

825

{\scalefont{0.77}

\renewcommand{\arraystretch}{1.2}

933

826

\begin{tabular}{|c|ccccc|c|c|}

\begin{table}

934

827

\hline

\caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}

935

828

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

\label{tbl:resources_usage_comp}

936

829

\hline

\centering

937

830

1 & (55, 16, 0) & - & - & - & - & 81~dB & 1760 \\

{\scalefont{0.90}

938

831

2 & (15, 8, 17) & (35, 11, 0) & - & - & - & 80~dB & 990 \\

\begin{tabular}{|c|c|cccc|c|}

939

832

3 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

\hline

940

833

4 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

$n$ & & MIN/40 & MIN/60 & MIN/80 & MIN/100 & \emph{Zynq 7010} \\ \hline\hline

941

834

5 & (3, 7, 20) & (31, 9, 1) & (19, 7, 0) & - & - & 80~dB & 783 \\

& LUT & 343 & 334 & 772 & - & \emph{17600} \\

942

835

\hline

1 & BRAM & 1 & 1 & 1 & - & \emph{120} \\

943

836

\end{tabular}

& DSP & 27 & 39 & 55 & - & \emph{80} \\ \hline

944

837

}

& LUT & 1664 & 2329 & 474 & 620 & \emph{17600} \\

945

838

\end{table}

2 & BRAM & 2 & 2 & 2 & 2 & \emph{120} \\

946

839

& DSP & 0 & 15 & 50 & 62 & \emph{80} \\ \hline

947

840

\begin{table}[h!tb]

& LUT & 1664 & 3114 & 1884 & 2873 & \emph{17600} \\

948

841

\caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/100}

3 & BRAM & 2 & 3 & 3 & 3 & \emph{120} \\

949

842

\label{tbl:gurobi_min_100}

& DSP & 0 & 0 & 22 & 27 & \emph{80} \\ \hline

950

843

\centering

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

951

844

{\scalefont{0.77}

4 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

952

845

\begin{tabular}{|c|ccccc|c|c|}

& DPS & 0 & 15 & 19 & 19 & \emph{80} \\ \hline

953

846

\hline

& LUT & 1664 & 3114 & 2570 & 4318 & \emph{17600} \\

954

847

$n$ & $i = 1$ & $i = 2$ & $i = 3$ & $i = 4$ & $i = 5$ & Rejection & Area \\

5 & BRAM & 2 & 3 & 4 & 4 & \emph{120} \\

955

848

\hline

& DPS & 0 & 0 & 19 & 19 & \emph{80} \\ \hline

956

849

1 & - & - & - & - & - & - & - \\

\end{tabular}

957

850

2 & (27, 9, 15) & (35, 11, 0) & - & - & - & 100~dB & 1410 \\

}

958

851

3 & (3, 5, 18) & (35, 11, 1) & (27, 9, 0) & - & - & 100~dB & 1147 \\

\end{table}

959

852

4 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

\renewcommand{\arraystretch}{1}

960

853

5 & (3, 5, 18) & (15, 6, 2) & (27, 9, 0) & (19, 7, 0) & - & 100~dB & 1067 \\

961

854

\hline

If we keep the previous estimation of cost of one DSP in terms of LUT (1 DSP $\approx$ 100 LUT)

962

855

\end{tabular}

the real resource consumption decreases as a function of the number of stages in the cascaded

963

856

}

filter according

964

857

\end{table}

to the solution given by the quadratic solver. Indeed, we have always a decreasing

965

858

\renewcommand{\arraystretch}{1}

consumption even if the difference between the monolithic and the two cascaded

966

859

filters is less than expected.

967

860

From these tables, we can first state that almost all configurations reach the targeted rejection

968

861

level or even better thanks to our underestimate of the cascade rejection as the sum of the

Finally, table~\ref{tbl:area_time_comp} shows the computation time to solve

969

862

individual filter rejection. The only exception is for the monolithic case ($n = 1$) in

the quadratic program.

970

863

MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.

971

864

Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters

\renewcommand{\arraystretch}{1.2}

972

865

(675 and 1131 arbitrary units v.s 990 and 1760 arbitrary units for 60 and 80~dB rejection

\begin{table}[h!tb]

973

866

respectively). More generally, the more filters are cascaded, the lower the occupied area.

\caption{Time to solve the quadratic program with Gurobi}

974

867

\label{tbl:area_time_comp}

975

868

Like in previous section, the solver chooses always a little filter as first

\centering

976

869

filter stage and the second one is often the biggest filter. This choice can be explained

{\scalefont{0.90}

977

870

as in the previous section, with the solver using just enough bits not to degrade the input

\begin{tabular}{|c|c|c|c|c|}\hline

978

871

signal and in the second filter selecting a better filter to improve rejection without

$n$ & Time (MIN/40) & Time (MIN/60) & Time (MIN/80) & Time (MIN/100) \\\hline\hline

979

872

having too many bits in the output data.

1 & 0.04~s & 0.01~s & 0.01~s & - \\

980

873

2 & 2.7~s & 2.4~s & 2.4~s & 0.8~s \\

981

874

For each case, we found an optimal solution with $n < 5$: for MIN/40 $n=2$,

3 & 4.6~s & 7~s & 7~s & 18~s \\

982

875

for MIN/60 and MIN/80 $n = 3$ and for MIN/100 $n = 4$. In all cases, the solutions

4 & 3~s & 22~s & 70~s & 220~s ($\approx$ 3~min) \\

983

876

when $n$ is greater than this optimal $n$ remain identical to the optimal one.

5 & 5~s & 122~s & 200~s & 384~s ($\approx$ 5~min) \\\hline

984

\end{tabular}

985

}

986

\end{table}

987

877

\renewcommand{\arraystretch}{1}

988

878

The following graphs present the rejection for real data on the FPGA. In all the following

989

879

figures, the solid line represents the actual rejection of the filtered

The time needed to solve this configuration is significantly shorter than the time

990

880

data on the FPGA as measured experimentally and the dashed line is the noise level

needed in the previous section. Indeed the worst time in this case is only 5~minutes,

991

881

given by the quadratic solver.

compared to 13~hours in the previous section: this problem is more easily solved than the

992

882

previous one.

993

883

Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.

994

884

Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.

To conclude, we compare our monolithic filters with the FIR Compiler provided by

995

885

Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.

Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the

996

886

Figure~\ref{fig:min_100} shows the rejection of the different configurations in the case of MIN/100.

same coefficient set and we compare the resource consumption, having checked that

997

887

the transfer functions are indeed the same with both implementations.

998

Table~\ref{tbl:xilinx_resources} exhibits the results.

999

The FIR Compiler never uses BRAM while our filter implementation uses one block. This difference

1000

is explained be our wish to have a dynamically reconfigurable FIR filter whose

1001

coefficients can be updated from the processing system without having to update the FPGA design.

1002

With the FIR compiler, the coefficients are defined during the FPGA design so that

1003

changing coefficients required generating a new design. The difference with the LUT consumption

1004

is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest

1005

resource, is the same between the Xilinx FIR Compiler end

1006

our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation.

1007

1008

\renewcommand{\arraystretch}{1.2}

1009

\begin{table}

1010

\centering

1011

\caption{Resource consumption compared between the FIR Compiler from Xilinx and our FIR block}

1012

\label{tbl:xilinx_resources}

1013

\begin{tabular}{|c|c|c|c|c|c|c|}

1014

\hline

1015

\multirow{2}{*}{} & \multicolumn{3}{c|}{Xilinx} & \multicolumn{3}{c|}{Our FIR block} \\ \cline{2-7}

1016

& LUT & BRAM & DSP & LUT & BRAM & DSP \\ \hline

1017

MAX/500 & 177 & 0 & 21 & 249 & 1 & 21 \\ \hline

1018

MAX/1000 & 306 & 0 & 37 & 453 & 1 & 37 \\ \hline

1019

MAX/1500 & 418 & 0 & 47 & 627 & 1 & 47 \\ \hline

1020

GITLAB

jfriedt / IFCS2018 article

Compare View

Diff