jfriedt / IFCS2018 article

1

-% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee

2

-% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de

3

-% rejection par bit et perte si moins de bits que rejection/6

4

-% developper programme lineaire en incluant le decalage de bits

5

-% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on

6

-% implemente et on demontre que ca tourne

7

-% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?

8

-% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer

9

-% (zedboard ou redpit)

10

-

11

-\documentclass[a4paper,transaction]{IEEEtran/IEEEtran}

12

-\usepackage{graphicx,color,hyperref}

13

-\usepackage{amsfonts}

14

-\usepackage{amsthm}

15

-\usepackage{amssymb}

16

-\usepackage{amsmath}

17

-\usepackage{algorithm2e}

18

-\usepackage{url,balance}

19

-\usepackage[normalem]{ulem}

20

-% correct bad hyphenation here

21

-\hyphenation{op-tical net-works semi-conduc-tor}

22

-\textheight=26cm

23

-\setlength{\footskip}{30pt}

24

-\pagenumbering{gobble}

25

-\begin{document}

26

-\title{Filter optimization for real time digital processing of radiofrequency signals: application

27

-to oscillator metrology}

28

-

29

-\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2},

30

-G. Goavec-M\'erou\IEEEauthorrefmark{1},

31

-P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}}

32

-\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France }

33

-\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\

34

-Email: \{pyb2,jmfriedt\}@femto-st.fr}

35

-}

36

-\maketitle

37

-\thispagestyle{plain}

38

-\pagestyle{plain}

39

-\newtheorem{definition}{Definition}

40

-

41

-\begin{abstract}

42

-Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to

43

-radiofrequency signal processing. Applied to oscillator characterization in the context

44

-of ultrastable clocks, stringent filtering requirements are defined by spurious signal or

45

-noise rejection needs. Since real time radiofrequency processing must be performed in a

46

-Field Programmable Array to meet timing constraints, we investigate optimization strategies

47

-to design filters meeting rejection characteristics while limiting the hardware resources

48

-required and keeping timing constraints within the targeted measurement bandwidths.

49

-\end{abstract}

50

-

51

-\begin{IEEEkeywords}

52

-Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter

53

-\end{IEEEkeywords}

54

-

55

-\section{Digital signal processing of ultrastable clock signals}

56

-

57

-Analog oscillator phase noise characteristics are classically performed by downconverting

58

-the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband,

59

-followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In

60

-a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by

61

-multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}.

62

-

63

-\begin{figure}[h!tb]

64

-\begin{center}

65

-\includegraphics[width=.8\linewidth]{schema}

66

-\end{center}

67

-\caption{Fully digital oscillator phase noise characterization: the Device Under Test

68

-(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and

69

-downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals

70

-and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite

71

-Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays

72

-the spectral characteristics of the phase fluctuations.}

73

-% JMF : argumenter de la cascade de FIR

74

-\label{schema}

75

-\end{figure}

76

-

77

-As with the analog mixer,

78

-the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as

79

-well as the generation of the frequency sum signal in addition to the frequency difference.

80

-These unwanted spectral characteristics must be rejected before decimating the data stream

81

-for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the

82

-downconverter

83

-and the decimation processing blocks are core characteristics of an oscillator characterization

84

-system, and must reject out-of-band signals below the targeted phase noise -- typically in the

85

-sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will

86

-use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency

87

-datastream: optimizing the performance of the filter while reducing the needed resources is

88

-hence tackled in a systematic approach using optimization techniques. Most significantly, we

89

-tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with

90

-tunable number of coefficients and tunable number of bits representing the coefficients and the

91

-data being processed.

92

-

93

-\section{Finite impulse response filter}

94

-

95

-We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined

96

-by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the

97

-outputs $y_k$

98

-$$y_n=\sum_{k=0}^N b_k x_{n-k}$$

99

-

100

-As opposed to an implementation on a general purpose processor in which word size is defined by the

101

-processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since

102

-not only the coefficient values and number of taps must be defined, but also the number of bits

103

-defining the coefficients and the sample size. For this reason, and because we consider pipeline

104

-processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency

105

-signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but

106

-the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level.

107

-Since latency is not an issue in a openloop phase noise characterization instrument, the large

108

-numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,

109

-is not considered as an issue as would be in a closed loop system.

110

-

111

-The coefficients are classically expressed as floating point values. However, this binary

112

-number representation is not efficient for fast arithmetic computation by an FPGA. Instead,

113

-we select to quantify these floating point values into integer values. This quantization

114

-will result in some precision loss.

115

-

116

-%As illustrated in Fig. \ref{float_vs_int}, we see that we aren't

117

-%need too coefficients or too sample size. If we have lot of coefficients but a small sample size,

118

-%the first and last are equal to zero. But if we have too sample size for few coefficients that not improve the quality.

119

-

120

-% JMF je ne comprends pas la derniere phrase ci-dessus ni la figure ci dessous

121

-% AH en gros je voulais dire que prendre trop peu de bit avec trop de coeff, ça induit ta figure (bien mieux faite que moi)

122

-% et que l'inverse trop de bit sur pas assez de coeff on ne gagne rien, je vais essayer de la reformuler

123

-

124

-%\begin{figure}[h!tb]

125

-%\includegraphics[width=\linewidth]{images/float-vs-integer.pdf}

126

-%\caption{Impact of the quantization resolution of the coefficients}

127

-%\label{float_vs_int}

128

-%\end{figure}

129

-

130

-\begin{figure}[h!tb]

131

-\includegraphics[width=\linewidth]{images/demo_filtre}

132

-\caption{Impact of the quantization resolution of the coefficients: the quantization is

133

-set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting

134

-the 30~first and 30~last coefficients out of the initial 128~band-pass

135

-filter coefficients to 0 (red dots).}

136

-\label{float_vs_int}

137

-\end{figure}

138

-

139

-The tradeoff between quantization resolution and number of coefficients when considering

140

-integer operations is not trivial. As an illustration of the issue related to the

141

-relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits

142

-a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon

143

-quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the

144

-taps become null, making the large number of coefficients irrelevant and allowing to save

145

-processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources

146

-to reach a given rejection level, or maximizing out of band rejection for a given computational

147

-resource, will drive the investigation on cascading filters designed with varying tap resolution

148

-and tap length, as will be shown in the next section. Indeed, our development strategy closely

149

-follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards}

150

-in which basic blocks are defined and characterized before being assembled \cite{hide}

151

-in a complete processing chain. In our case, assembling the filter blocks is a simpler block

152

-combination process since we assume a single value to be processed and a single value to be

153

-generated at each clock cycle. The FIR filters will not be considered to decimate in the

154

-current implementation: the decimation is assumed to be located after the FIR cascade at the

155

-moment.

156

-

157

-\section{Filter optimization}

158

-

159

-A basic approach for implementing the FIR filter is to compute the transfer function of

160

-a monolithic filter: this single filter defines all coefficients with the same resolution

161

-(number of bits) and processes data represented with their own resolution. Meeting the

162

-filter shape requires a large number of coefficients, limited by resources of the FPGA since

163

-this filter must process data stream at the radiofrequency sampling rate after the mixer.

164

-

165

-An optimization problem \cite{leung2004handbook} aims at improving one or many

166

-performance criteria within a constrained resource environment. Amongst the tools

167

-developed to meet this aim, Mixed-Integer Linear Programming (MILP) provides the framework to

168

-formally define the stated problem and search for an optimal use of available

169

-resources \cite{yu2007design, kodek1980design}.

170

-

171

-First we need to ensure that our problem is a real optimization problem. When

172

-designing a processing function in the FPGA, we aim at meeting some requirement such as

173

-the throughput, the computation time or the noise rejection noise. However, due to limited

174

-resources to design the process like BRAM (high performance RAM), DSP (Digital Signal Processor)

175

-or LUT (Look Up Table), a tradeoff must be generally searched between performance and available

176

-computational resources: optimizing some criteria within finite, limited

177

-resources indeed matches the definition of a classical optimization problem.

178

-

179

-Specifically the degrees of freedom when addressing the problem of replacing the single monolithic

180

-FIR with a cascade of optimized filters are the number of coefficients $N_i$ of each filter $i$,

181

-the number of bits $C_i$ representing the coefficients and the number of bits $D_i$ needed to represent

182

-the data $x_k$ fed to each filter as provided by the acquisition or previous processing stage.

183

-Because each FIR in the chain is fed the output of the previous stage,

184

-the optimization of the complete processing chain within a constrained resource environment is not

185

-trivial. The resource occupation of a FIR filter is considered as $C_i \times N_i$ which aims

186

-at approximating the number of bits needed in a worst case condition to represent the output of the

187

-FIR. Indeed, the number of bits generated by the $i$th FIR is $(C_i+D_i)\times\log_2(N_i)$, but the

188

-$\log$ function is avoided for its incompatibility with a linear programming description, and

189

-the simple product is approximated as the number of gates needed to perform the calculation. Such an

190

-occupied area estimate assumes that the number of gates scales as the number of bits and the number

191

-of coefficients, but does not account for the detailed implementation of the hardware. Indeed,

192

-various FPGA implementations will provide different hardware functionalities, and we shall consider

193

-at the end of the design a synthesis step using vendor software to assess the validity of the solution

194

-found. As an example of the limitation linked to the lack of detailed hardware consideration, Block Random

195

-Access Memory (BRAM) used to store filter coefficients are not shared amongst filters, and multiplications

196

-are most efficiently implemented by using DSP blocks whose input word

197

-size is finite. DSPs are a scarce resource to be saved in a practical implementation. Keeping a high

198

-abstraction on the resource occupation is nevertheless selected in the following discussion in order

199

-to leave enough degrees of freedom in the problem to try and find original solutions: too many

200

-constraints in the initial statement of the problem leave little room for finding an optimal solution.

201

-

202

-\begin{figure}[h!tb]

203

-\begin{center}

204

-\includegraphics[width=.5\linewidth]{schema2}

205

-\caption{Shape of the filter transmitted power $P$ as a function of frequency:

206

-the bandpass BP is considered to occupy the initial

207

-40\% of the Nyquist frequency range, the stopband the last 40\%, allowing 20\% transition

208

-width.}

209

-\label{rejection-shape}

210

-\end{center}

211

-\end{figure}

212

-

213

-Following these considerations, the model is expressed as:

214

-\begin{align}

215

- \begin{cases}

216

- \mathcal{R}_i &= \mathcal{F}(N_i, C_i)\\

217

- \mathcal{A}_i &= N_i \times C_i\\

218

- \Delta_i &= \Delta _{i-1} + \mathcal{P}_i

219

- \end{cases}

220

- \label{model-FIR}

221

-\end{align}

222

-To explain the system \ref{model-FIR}, $\mathcal{R}_i$ represents the stopband rejection dependence with $N_i$ and $C_i$, $\mathcal{A}_i$

223

-is a theoretical area occupation of the processing block on the FPGA as discussed earlier, and $\Delta_i$ is the total rejection for the current stage $i$.

224

-Since the function $\mathcal{F}$ cannot be explictly expressed, we run simulations to determine the rejection depending

225

-on $N_i$ and $C_i$. However, selecting the right filter requires a clear definition of the rejection criterion. Selecting an

226

-incorrect criterion will lead the linear program solver to produce a solution which might not meet the user requirements.

227

-Hence, amongst various criteria including the mean or median value of the FIR response in the stopband as will

228

-be illustrated lated (section \ref{median}), we have designed

229

-a criterion aimed at avoiding ripples in the passband and considering the maximum of the FIR spectral response in the stopband

230

-(Fig. \ref{rejection-shape}). The bandpass criterion is defined as the sum of the absolute values of the spectral response

231

-in the bandpass, reminiscent of a standard deviation of the spectral response: this criterion must be minimized to avoid

232

-ripples in the passband. The stopband transfer function maximum must also be minimized in order to improve the filter

233

-rejection capability. Weighing these two criteria allows designing the linear program to be solved.

234

-

235

-\begin{figure}[h!tb]

236

-\includegraphics[width=\linewidth]{images/noise-rejection.pdf}

237

-\caption{Rejection as a function of number of coefficients and number of bits.}

238

-\label{noise-rejection}

239

-\end{figure}

240

-

241

-{\bf ARTHUR : reg\'en\'erer une pyramide juste}

242

-

243

-The objective function maximizes the noise rejection ($\max(\Delta_{i_{\max}})$) while keeping resource

244

-occupation below a user-defined threshold, or as will be discussed here, aims at minimizing the area

245

-needed to reach a given rejection ($\min(S_q)$ in the forthcoming discussion, Eqs. \ref{cstr_size}

246

-and \ref{cstr_rejection}). The MILP solver is allowed to choose the number of successive

247

-filters, within an upper bound. The last problem is to model the noise rejection. Since filter

248

-noise rejection capability is not modeled with linear equations, a look-up-table is generated

249

-for multiple filter configurations in which the $C_i$, $D_i$ and $N_i$ parameters are varied: for each

250

-one of these conditions, the low-pass filter rejection is stored as computed by the frequency response

251

-of the digital filter (Fig. \ref{noise-rejection}). Various rejection criteria have been investigated,

252

-including mean value of the stopband response, median value of the stopband response, or as finally

253

-selected, maximum value in the stopband. An intuitive analysis of the chart of Fig. \ref{noise-rejection}

254

-hints at an optimum

255

-set of tap length and number of bit for representing the coefficients along the line of the pyramidal

256

-shaped rejection capability function.

257

-

258

-Linear program formalism for solving the problem is well documented: an objective function is

259

-defined which is linearly dependent on the parameters to be optimized. Constraints are expressed

260

-as linear equations and solved using one of the available solvers, in our case GLPK\cite{glpk}.

261

-With the notations used in the description of system \ref{model-FIR}, we have defined the linear problem as:

262

-\paragraph{Variables}

263

-\begin{align*}

264

-x_{i,j} \in \lbrace 0,1 \rbrace & \text{ $i$ is a given filter} \\

265

-& \text{ $j$ is the stage} \\

266

-& \text{ If $x_{i,j}$ is equal to 1, the filter is selected} \\

267

-\end{align*}

268

-\paragraph{Constants}

269

-\begin{align*}

270

-\mathcal{F} = \lbrace F_1 ... F_p \rbrace & \text{ All possible filters}\\

271

-& \text{ $p$ is the number of different filters} \\

272

-% N(i) & \text{ % Constant to let the

273

-% number of coefficients %} \\ & \text{

274

-% for filter $i$}\\

275

-% C(i) & \text{ % Constant to let the

276

-% number of bits of %}\\ & \text{

277

-% each coefficient for filter $i$}\\

278

-\mathcal{S}_{\max} & \text{ Total space available inside the FPGA}

279

-\end{align*}

280

-\paragraph{Constraints}

281

-\begin{align}

282

-1 \leq i \leq p & \nonumber\\

283

-1 \leq j \leq q & \text{ $q$ is the max of filter stage} \nonumber \\

284

-\forall j, \mathlarger{\sum_{i}} x_{i,j} = 1 & \text{ At most one filter by stage} \nonumber\\

285

-\mathcal{S}_0 = 0 & \text{ initial occupation} \nonumber\\

286

-\forall j, \mathcal{S}_j = \mathcal{S}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{A}_i)} \label{cstr_size} \\

287

-\mathcal{S}_j \leq \mathcal{S}_{\max}\nonumber \\

288

-\mathcal{N}_0 = 0 & \text{ initial rejection}\nonumber\\

289

-\forall j, \mathcal{N}_j = \mathcal{N}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{R}_i)} \label{cstr_rejection} \\

290

-\mathcal{N}_q \geqslant 160 & \text{ an user defined bound}\nonumber\\

291

-& \text{ (e.g. 160~dB here)}\nonumber\\\nonumber

292

-\end{align}

293

-\paragraph{Goal}

294

-\begin{align*}

295

-\min \mathcal{S}_q

296

-\end{align*}

297

-

298

-The constraint \ref{cstr_size} means the occupation for the current stage $j$ depends on

299

-the previous occupation and the occupation of current selected filter (it is possible

300

-that no filter is selected for this stage). And the second one \ref{cstr_rejection}

301

-means the same thing but for the rejection, the rejection depends the previous rejection

302

-plus the rejection of selected filter.

303

-

304

-\subsection{Low bandpass ripple and maximum rejection criteria}

305

-

306

-The MILP solver provides a solution to the problem by selecting a series of small FIR with

307

-increasing number of bits representing data and coefficients as well as an increasing number

308

-of coefficients, instead of a single monolithic filter.

309

-

310

-\begin{figure}[h!tb]

311

-% \includegraphics[width=\linewidth]{images/compare-fir.pdf}

312

-\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-jmf-light.pdf}

313

-\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR

314

-with a cutoff frequency set at half the Nyquist frequency.}

315

-\label{compare-fir}

316

-\end{figure}

317

-

318

-Fig. \ref{compare-fir} exhibits the

319

-performance comparison between one solution and a monolithic FIR when selecting a cutoff

320

-frequency of half the Nyquist frequency: a series of 5 FIR and a series of 10 FIR with the

321

-same space usage are provided as selected by the MILP solver. The FIR cascade provides improved

322

-rejection than the monolithic FIR at the expense of a lower cutoff frequency which remains to

323

-be tuned or compensated for.

324

-

325

-

326

-The resource occupation when synthesizing such FIR on a Xilinx FPGA is summarized as Tab. \ref{t1}.

327

-We have considered a set of resources representative of the hardware platform we work on,

328

-Avnet's Zedboard featuring a Xilinx XC7Z020-CLG484-1 Zynq System on Chip (SoC). The results reported in

329

-Tab. \ref{t1} emphasize that implementing the monolithic single FIR is impossible due to

330

-the insufficient hardware resources (exhausted LUT resources), while the FIR cascading 5 or 10

331

-filters fit in the available resources. However, in all cases the DSP resources are fully

332

-used: while the design can be synthesized using Xilinx proprietary Vivado 2016.2 software,

333

-implementing the design fails due to the excessive resource usage preventing routing the signals

334

-on the FPGA. Such results emphasize on the one hand the improvement prospect of the optimization

335

-procedure by finding non-trivial solutions matching resource constraints, but on the other

336

-hand also illustrates the limitation of a model with an abstraction layer that does not account

337

-for the detailed architecture of the hardware.

338

-

339

-\begin{table}[h!tb]

340

-\caption{Resource occupation on a Xilinx Zynq-7000 series FPGA when synthesizing the FIR cascade

341

-identified as optimal by the MILP solver within a finite resource criterion. The last line refers

342

-to available resources on a Zynq-7020 as found on the Zedboard.}

343

-\begin{center}

344

-\begin{tabular}{|c|cccc|}\hline

345

-FIR & BlockRAM & LookUpTables & DSP & rejection (dB)\\\hline\hline

346

-1 (monolithic) & 1 & 76183 & 220 & -162 \\

347

-5 & 5 & 18597 & 220 & -160 \\

348

-10 & 8 & 24729 & 220 & -161 \\\hline\hline

349

-\textbf{Zynq 7020} & \textbf{420} & \textbf{53200} & \textbf{220} & \\\hline

350

-%\begin{tabular}{|c|ccccc|}\hline

351

-%FIR & BRAM36 & BRAM18 & LUT & DSP & rejection (dB)\\\hline\hline

352

-%1 (monolithic) & 1 & 0 & {\color{Red}76183} & 220 & -162 \\

353

-%5 & 0 & 5 & {\color{Green}18597} & 220 & -160 \\

354

-%10 & 0 & 8 & {\color{Green}24729} & 220 & -161 \\\hline\hline

355

-%\textbf{Zynq 7020} & \textbf{140} & \textbf{280} & \textbf{53200} & \textbf{220} & \\\hline

356

-\end{tabular}

357

-\end{center}

358

-%\vspace{-0.7cm}

359

-\label{t1}

360

-\end{table}

361

-

362

-\subsection{Alternate criteria}\label{median}

363

-

364

-Fig. \ref{compare-fir} provides FIR solutions matching well the targeted transfer

365

-function, namely low ripple in the bandpass defined as the first 40\% of the frequency

366

-range and maximum rejection of 160~dB in the last 40\% stopband. We illustrate now, for

367

-demonstrating the need to properly select the optimization criterion, two cases of poor

368

-filter shapes obtained by selecting the mean value and median value of the rejection,

369

-with no consideration for the ripples in the bandpass. The results of the optimizations,

370

-in these cases, are shown in Figs. \ref{compare-mean} and \ref{compare-median}.

371

-

372

-\begin{figure}[h!tb]

373

-\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-mean-light.pdf}

374

-\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR

375

-with a cutoff frequency set at half the Nyquist frequency.}

376

-\label{compare-mean}

377

-\end{figure}

378

-

379

-In the case of the mean value criterion (Fig. \ref{compare-mean}), the solution is not

380

-acceptable since the notch at the end of the transition band compensates for some unacceptable

381

-rise in the rejection close to the Nyquist frequency. Applying such a filter might yield excessive

382

-high frequency spurious components to be aliased at low frequency when decimating the signal.

383

-Similarly, the lack of criterion on the bandpass shape induces a shape with poor flatness and

384

-and slowly decaying transfer function starting to attenuate spectral components well before the

385

-transition band starts. Such issues are partly aleviated by replacing a mean rejection value with

386

-a median rejection value (Fig. \ref{compare-median}) but solutions remain unacceptable for

387

-the reasons stated previously and much poorer than those found with the maximum rejection criterion

388

-selected earlier (Fig. \ref{compare-fir}).

389

-

390

-\begin{figure}[h!tb]

391

-\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-median-light.pdf}

392

-\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR

393

-with a cutoff frequency set at half the Nyquist frequency.}

394

-\label{compare-median}

395

-\end{figure}

396

-

397

-\section{Filter coefficient selection}

398

-

399

-The coefficients of a single monolithic filter are computed as the impulse response

400

-of the filter transfer function, and practically approximated by a multitude of methods

401

-including least square optimization (Matlab's {\tt firls} function), Hamming or Kaiser windowing

402

-(Matlab's {\tt fir1} function).

403

-

404

-\begin{figure}[h!tb]

405

-\includegraphics[width=\linewidth]{images/fir1-vs-firls}

406

-\caption{Evolution of the rejection capability of least-square optimized filters and Hamming

407

-FIR filters as a function of the number of coefficients, for floating point numbers and 8-bit

408

-encoded integers.}

409

-\label{2}

410

-\end{figure}

411

-

412

-Cascading filters opens a new optimization opportunity by

413

-selecting various coefficient sets depending on the number of coefficients. Fig. \ref{2}

414

-illustrates that for a number of coefficients ranging from 8 to 47, {\tt fir1} provides a better

415

-rejection than {\tt firls}: since the linear solver increases the number of coefficients along

416

-the processing chain, the type of selected filter also changes depending on the number of coefficients

417

-and evolves along the processing chain.

418

-

419

-\section{Conclusion}

420

-

421

-We address the optimization problem of designing a low-pass filter chain in a Field Programmable Gate

422

-Array for improved noise rejection within constrained resource occupation, as needed for

423

-real time processing of radiofrequency signal when characterizing spectral phase noise

424

-characteristics of stable oscillators. The flexibility of the digital approach makes the result

425

-best suited for closing the loop and using the measurement output in a feedback loop for

426

-controlling clocks, e.g. in a quartz-stabilized high performance clock whose long term behavior

427

-is controlled by non-piezoelectric resonator (sapphire resonator, microwave or optical

428

-atomic transition).

429

-

430

-\section*{Acknowledgement}

431

-

432

-This work is supported by the ANR Programme d'Investissement d'Avenir in

433

-progress at the Time and Frequency Departments of the FEMTO-ST Institute

434

-(Oscillator IMP, First-TF and Refimeve+), and by R\'egion de Franche-Comt\'e.

435

-The authors would like to thank E. Rubiola, F. Vernotte, and G. Cabodevila

436

-for support and fruitful discussions.

437

-

438

-\bibliographystyle{IEEEtran}

439

-\balance

440

-\bibliography{references,biblio}

441

-\end{document}

442

-

443

- \section{Contexte d'ordonnancement}

444

- Dans cette partie, nous donnerons des d\'efinitions de termes rattach\'es au domaine de l'ordonnancement

445

- et nous verrons que le sujet trait\'e se rapproche beaucoup d'un problème d'ordonnancement. De ce fait

446

- nous pourrons aller plus loin que les travaux vus pr\'ec\'edemment et nous tenterons des approches d'ordonnancement

447

- et d'optimisation.

448

-

449

- \subsection{D\'efinition du vocabulaire}

450

- Avant tout, il faut d\'efinir ce qu'est un problème d'optimisation. Il y a deux d\'efinitions

451

- importantes à donner. La première est propos\'ee par Legrand et Robert dans leur livre \cite{def1-ordo} :

452

- \begin{definition}

453

- \label{def-ordo1}

454

- Un ordonnancement d'un système de t\^aches $G\ =\ (V,\ E,\ w)$ est une fonction $\sigma$ :

455

- $V \rightarrow \mathbb{N}$ telle que $\sigma(u) + w(u) \leq \sigma(v)$ pour toute arête $(u,\ v) \in E$.

456

- \end{definition}

457

-

458

- Dit plus simplement, l'ensemble $V$ repr\'esente les t\^aches à ex\'ecuter, l'ensemble $E$ repr\'esente les d\'ependances

459

- des t\^aches et $w$ les temps d'ex\'ecution de la t\^ache. La fonction $\sigma$ donne donc l'heure de d\'ebut de

460

- chacune des t\^aches. La d\'efinition dit que si une t\^ache $v$ d\'epend d'une t\^ache $u$ alors

461

- la date de d\'ebut de $v$ sera plus grande ou \'egale au d\'ebut de l'ex\'ecution de la t\^ache $u$ plus son

462

- temps d'ex\'ecution.

463

-

464

- Une autre d\'efinition importante qui est propos\'ee par Leung et al. \cite{def2-ordo} est :

465

- \begin{definition}

466

- \label{def-ordo2}

467

- L'ordonnancement traite de l'allocation de ressources rares à des activit\'es avec

468

- l'objectif d'optimiser un ou plusieurs critères de performance.

469

- \end{definition}

470

-

471

- Cette d\'efinition est plus g\'en\'erique mais elle nous int\'eresse d'avantage que la d\'efinition \ref{def-ordo1}.

472

- En effet, la partie qui nous int\'eresse dans cette première d\'efinition est le respect de la pr\'ec\'edance des t\^aches.

473

- Dans les faits les dates de d\'ebut ne nous int\'eressent pas r\'eellement.

474

-

475

- En revanche la d\'efinition \ref{def-ordo2} sera au c\oe{}ur du projet. Pour se convaincre de cela,

476

- il nous faut d'abord d\'efinir quel est le type de problème d'ordonnancement qu'on traite et quelles

477

- sont les m\'ethodes qu'on peut appliquer.

478

-

479

- Les problèmes d'ordonnancement peuvent être class\'es en diff\'erentes cat\'egories :

480

- \begin{itemize}

481

- \item T\^aches ind\'ependantes : dans cette cat\'egorie de problèmes, les t\^aches sont complètement ind\'ependantes

482

- les unes des autres. Dans notre cas, ce n'est pas le plus adapt\'e.

483

- \item Graphe de t\^aches : la d\'efinition \ref{def-ordo1} d\'ecrit cette cat\'egorie. La plupart du temps,

484

- les t\^aches sont repr\'esent\'ees par une DAG. Cette cat\'egorie est très proche de notre cas puisque nous devons \'egalement ex\'ecuter

485

- des t\^aches qui ont un certain nombre de d\'ependances. On pourra même dire que dans certain cas,

486

- on a des anti-arbres, c'est à dire que nous avons une multitude de t\^aches d'entr\'ees qui convergent vers une

487

- t\^ache de fin.

488

- \item Workflow : cette cat\'egorie est une sous cat\'egorie des graphes de t\^aches dans le sens où

489

- il s'agit d'un graphe de t\^aches r\'ep\'et\'e de nombreuses de fois. C'est exactement ce type de problème

490

- que nous traitons ici.

491

- \end{itemize}

492

-

493

- Bien entendu, cette liste n'est pas exhaustive et il existe de nombreuses autres classifications et sous-classifications

494

- de ces problèmes. Nous n'avons parl\'e ici que des cat\'egories les plus communes.

495

-

496

- Un autre point à d\'efinir, est le critère d'optimisation. Il y a là encore un grand nombre de

497

- critères possibles. Nous allons donc parler des principaux :

498

- \begin{itemize}

499

- \item Temps de compl\'etion total (ou Makespan en anglais) : ce critère est l'un des critères d'optimisation

500

- les plus courant. Il s'agit donc de minimiser la date de fin de la dernière t\^ache de l'ensemble des

501

- t\^aches à ex\'ecuter. L'enjeu de cette optimisation est donc de trouver l'ordonnancement optimal permettant

502

- la fin d'ex\'ecution au plus tôt.

503

- \item Somme des temps d'ex\'ecution (Flowtime en anglais) : il s'agit de faire la somme des temps d'ex\'ecution de toutes les t\^aches

504

- et d'optimiser ce r\'esultat.

505

- \item Le d\'ebit : ce critère quant à lui, vise à augmenter au maximum le d\'ebit de traitement des donn\'ees.

506

- \end{itemize}

507

-

508

- En plus de cela, on peut avoir besoin de plusieurs critères d'optimisation. Il s'agit dans ce cas d'une optimisation

509

- multi-critères. Bien entendu, cela complexifie d'autant plus le problème car la solution la plus optimale pour un

510

- des critères peut être très mauvaise pour un autre critère. De ce cas, il s'agira de trouver une solution qui permet

511

- de faire le meilleur compromis entre tous les critères.

512

-

513

- \subsection{Formalisation du problème}

514

- \label{formalisation}

515

- Maintenant que nous avons donn\'e le vocabulaire li\'e à l'ordonnancement, nous allons pouvoir essayer caract\'eriser

516

- formellement notre problème. En effet, nous allons reprendre les contraintes \'enonc\'ees dans la sections \ref{def-contraintes}

517

- et nous essayerons de les formaliser le plus finement possible.

518

-

519

- Comme nous l'avons dit, une t\^ache est un bloc de traitement. Chaque t\^ache $i$ dispose d'un ensemble de paramètres

520

- que nous nommerons $\mathcal{P}_{i}$. Cet ensemble $\mathcal{P}_i$ est propre à chaque t\^ache et il variera d'une

521

- t\^ache à l'autre. Nous reviendrons plus tard sur les paramètres qui peuvent composer cet ensemble.

522

-

523

- Outre cet ensemble $\mathcal{P}_i$, chaque t\^ache dispose de paramètres communs :

524

- \begin{itemize}

525

- \item Dur\'ee de la t\^ache : Comme nous l'avons dit auparavant, dans le cadre d'un FPGA le temps est compt\'e en nombre de coup d'horloge.

526

- En outre, les blocs sont toujours sollicit\'es, certains même sont capables de lire et de renvoyer une r\'esultat à chaque coups d'horloge.

527

- Donc la dur\'ee d'une t\^ache ne peut être le laps de temps entre l'entr\'ee d'une donn\'ee et la sortie d'une autre. Nous d\'efinirons la

528

- dur\'ee comme le temps de traitement d'une donn\'ee, c'est à dire la diff\'erence de temps entre la date de sortie d'une donn\'ee

529

- et de sa date d'entr\'ee. Nous nommerons cette dur\'ee $\delta_i$. % Je devrais la nomm\'ee w comme dans la def2

530

- \item La pr\'ecision : La pr\'ecision d'une donn\'ee est le nombre de bits significatifs qu'elle compte. En effet, au fil des traitements

531

- les pr\'ecisions peuvent varier. On nomme donc la pr\'ecision d'entr\'ee d'une t\^ache $i$ comme $\pi_i^-$ et la pr\'ecision en sortie $\pi_i^+$.

532

- \item La fr\'equence du flux en entr\'ee (ou sortie) : Cette fr\'equence repr\'esente la fr\'equence des donn\'ees qui arrivent (resp. sortent).

533

- Selon les t\^aches, les fr\'equences varieront. En effet, certains blocs ralentissent le flux c'est pourquoi on distingue la fr\'equence du

534

- flux en entr\'ee et la fr\'equence en sortie. Nous nommerons donc la fr\'equence du flux en entr\'ee $f_i^-$ et la fr\'equence en sortie $f_i^+$.

535

- \item La quantit\'e de donn\'ees en entr\'ee (ou en sortie) : Il s'agit de la quantit\'e de donn\'ees que le bloc s'attend à traiter (resp.

536

- est capable de produire). Les t\^aches peuvent avoir à traiter des gros volumes de donn\'ees et n'en ressortir qu'une partie. Cette

537

- fois encore, il nous faut donc diff\'erencier l'entr\'ee et la sortie. Nous nommerons donc la quantit\'e de donn\'ees entrantes $q_i^-$

538

- et la quantit\'e de donn\'ees sortantes $q_i^+$ pour une t\^ache $i$.

539

- \item Le d\'ebit d'entr\'ee (ou de sortie) : Ce paramètre correspond au d\'ebit de donn\'ees que la t\^ache est capable de traiter ou qu'elle

540

- fournit en sortie. Il s'agit simplement de l'expression des deux pr\'ec\'edents paramètres. Nous d\'efinirons donc la d\'ebit entrant de la

541

- t\^ache $i$ comme $d_i^-\ =\ q_i^-\ *\ f_i^-$ et le d\'ebit sortant comme $d_i^+\ =\ q_i^+\ *\ f_i^+$.

542

- \item La taille de la t\^ache : La taille dans les FPGA \'etant limit\'ee, ce paramètre exprime donc la place qu'occupe la t\^ache au sein du bloc.

543

- Nous nommerons $\mathcal{A}_i$ cette taille.

544

- \item Les pr\'ed\'ecesseurs et successeurs d'une t\^ache : cela nous permet de connaître les t\^aches requises pour pouvoir traiter

545

- la t\^ache $i$ ainsi que les t\^aches qui en d\'ependent. Ces ensemble sont not\'es $\Gamma _i ^-$ et $ \Gamma _i ^+$ \\

546

- %TODO Est-ce vraiment un paramètre ?

547

- \end{itemize}

548

-

549

- Ces diff\'erents paramètres communs sont fortement li\'es aux \'el\'ements de $\mathcal{P}_i$. Voici quelques exemples de relations

550

- que nous avons identifi\'ees :

551

- \begin{itemize}

552

- \item $ \delta _i ^+ \ = \ \mathcal{F}_{\delta}(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+,\ \mathcal{P}_i) $ donne le temps d'ex\'ecution

553

- de la t\^ache en fonction de la pr\'ecision voulue, du d\'ebit et des paramètres internes.

554

- \item $ \pi _i ^+ \ = \ \mathcal{F}_{p}(\pi_i^-,\ \mathcal{P}_i) $, la fonction $F_p$ donne la pr\'ecision en sortie selon la pr\'ecision de d\'epart

555

- et les paramètres internes de la t\^ache.

556

- \item $d_i^+\ =\ \mathcal{F}_d(d_i^-, \mathcal{P}_i)$, la fonction $F_d$ donne le d\'ebit sortant de la t\^ache en fonction du d\'ebit

557

- sortant et des variables internes de la t\^ache.

558

- \item $A_i^+\ =\ \mathcal{F}_A(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+, \mathcal{P}_i)$

559

- \end{itemize}

560

- Pour le moment, nous ne sommes pas capables de donner une d\'efinition g\'en\'erale de ces fonctions. Mais en revanche,

561

- sur quelques exemples simples (cf. \ref{def-contraintes}), nous parvenons à donner une \'evaluation de ces fonctions.

562

-

563

- Maintenant que nous avons donn\'e toutes les notations utiles, nous allons \'enoncer des contraintes relatives à notre problème. Soit

564

- un DGA $G(V,\ E)$, on a pour toutes arêtes $(i, j)\ \in\ E$ les in\'equations suivantes :

565

-

566

- \paragraph{Contrainte de pr\'ecision :}

567

- Cette in\'equation traduit la contrainte de pr\'ecision d'une t\^ache à l'autre :

568

- \begin{align*}

569

- \pi _i ^+ \geq \pi _j ^-

570

- \end{align*}

571

-

572

- \paragraph{Contrainte de d\'ebit :}

573

- Cette in\'equation traduit la contrainte de d\'ebit d'une t\^ache à l'autre :

574

- \begin{align*}

575

- d _i ^+ = q _j ^- * (f_i + (1 / s_j) ) & \text{ où } s_j \text{ est une valeur positive de temporisation de la t\^ache}

576

- \end{align*}

577

-

578

- \paragraph{Contrainte de synchronisation :}

579

- Il s'agit de la contrainte qui impose que si à un moment du traitement, le DAG se s\'epare en plusieurs branches parallèles

580

- et qu'elles se rejoignent plus tard, la somme des latences sur chacune des branches soit la même.

581

- Plus formellement, s'il existe plusieurs chemins disjoints, partant de la t\^ache $s$ et allant à la t\^ache de $f$ alors :

582

- \begin{align*}

583

- \forall \text{ chemin } \mathcal{C}1(s, .., f),

584

- \forall \text{ chemin } \mathcal{C}2(s, .., f)

585

- \text{ tel que } \mathcal{C}1 \neq \mathcal{C}2

586

- \Rightarrow

587

- \sum _{i} ^{i \in \mathcal{C}1} \delta_i = \sum _{i} ^{i \in \mathcal{C}2} \delta_i

588

- \end{align*}

589

-

590

- \paragraph{Contrainte de place :}

591

- Cette in\'equation traduit la contrainte de place dans le FPGA. La taille max de la puce FPGA est nomm\'e $\mathcal{A}_{FPGA}$ :

592

- \begin{align*}

593

- \sum ^{\text{t\^ache } i} \mathcal{A}_i \leq \mathcal{A}_{FPGA}

594

- \end{align*}

595

-

596

- \subsection{Exemples de mod\'elisation}

597

- \label{exemples-modeles}

598

- Nous allons maintenant prendre quelques blocs de traitement simples afin d'illustrer au mieux notre modèle.

599

- Pour tous nos exemple, nous prendrons un d\'ebit en entr\'ee de 200 Mo/s avec une pr\'ecision de 16 bit.

600

-

601

- Prenons tout d'abord l'exemple d'un bloc de d\'ecimation. Le but de ce bloc est de ralentir le flux en ne gardant

602

- que certaines donn\'ees à intervalle r\'egulier. Cet intervalle est appel\'e le facteur de d\'ecimation, on le notera $N$.

603

-

604

- Donc d'après notre mod\'elisation :

605

- \begin{itemize}

606

- \item $N \in \mathcal{P}_i$

607

- %TODO N ou 1 ?

608

- \item $\delta _i = N\ c.h.$ (coup d'horloge)

609

- \item $\pi _i ^+ = \pi _i ^- = 16 bits$

610

- \item $f _i ^+ = f _i ^-$

611

- \item $q _i ^+ = q _i ^- / N$

612

- \item $d _i ^+ = q _i ^- / N / f _i ^-$

613

- \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\

614

- %TODO Je ne sais pas trouver la taille...

615

- \end{itemize}

616

-

617

- Un autre exemple int\'eressant que l'on peut donner, c'est le cas des spliters. Il s'agit la aussi d'un bloc très

618

- simple qui permet de dupliquer un flux. On peut donc donner un nombre de sorties à cr\'eer, on note ce paramètre

619

- %TODO pas très inspir\'e...

620

- $X$. Voici ce que donne notre mod\'elisation :

621

- \begin{itemize}

622

- \item $X \in \mathcal{P}_i$

623

- \item $\delta _i = 1\ c.h.$

624

- \item $\pi _i ^+ = \pi _i ^- = 16 bits$

625

- \item $f _i ^+ = f _i ^-$

626

- \item $q _i ^+ = q _i ^-$

627

- \item $d _i ^+ = d _i ^-$

628

- \item $\Gamma _i ^- = 1$

629

- \item $\Gamma _i ^+ = X$\\

630

- \end{itemize}

631

-

632

- L'exemple suivant traite du cas du shifter. Il s'agit d'un bloc qui a pour but de diminuer le nombre de bits des

633

- donn\'ees afin d'acc\'el\'erer les traitement sur les blocs suivants. On peut donc donner le nombre de bits à shifter,

634

- on note ce paramètre $S$. Voici ce que donne notre mod\'elisation :

635

- \begin{itemize}

636

- \item $S \in \mathcal{P}_i$

637

- \item $\delta _i = 1\ c.h.$

638

- \item $\pi _i ^+ = \pi _i ^- - S$

639

- \item $f _i ^+ = f _i ^-$

640

- \item $q _i ^+ = q _i ^-$

641

- \item $d _i ^+ = d _i ^-$

642

- \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\

643

- \end{itemize}

644

-

645

- Nous allons traiter un dernier exemple un peu plus complexe, le cas d'un filtre d\'ecimateur (ou FIR). Ce bloc

646

- est compos\'e de beaucoup de paramètres internes. On peut d\'efinir un nombre d'\'etages $E$, qui repr\'esente le nombre

647

- d'it\'erations à faire avant d'arrêter le traitement. Afin d'effectuer son filtrage, on doit donner au bloc un ensemble

648

- de coefficients $C$ et par cons\'equent ces coefficients ont leur propre pr\'ecision $\pi _C$. Pour finir, le dernier

649

- paramètre à donner est le facteur de d\'ecimation $N$. Si on applique notre mod\'elisation, on peut obtenir cela :

650

- \begin{itemize}

651

- \item $E \in \mathcal{P}_i$

652

- \item $C \in \mathcal{P}_i$

653

- \item $\pi _C \in \mathcal{P}_i$

654

- \item $N \in \mathcal{P}_i$

655

- \item $\delta _i = E * |C| * q_i^-\ c.h.$ %Trop simpliste

656

- \item $\pi _i ^+ = \pi _i ^- * \pi _C$

657

- \item $f _i ^+ = f _i ^-$

658

- \item $q _i ^+ = q _i ^- / N$

659

- \item $d _i ^+ = q _i ^- / N / f _i ^-$

660

- \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\

661

- \end{itemize}

662

-

663

- Ces exemples ne sont que des modèles provisoires; pour s'assurer de leur performance, il faudra les

664

- confronter à des simulations.

665

-

666

-

667

-Bien que les articles sur les skeletons, \cite{gwen-cogen}, \cite{skeleton} et \cite{hide}, nous aient donn\'e des indices sur une possible

668

- mod\'elisation, ils \'etaient encore trop focalis\'es sur l'optimisation spatiale des blocs. Nous nous sommes donc inspir\'es de ces travaux

669

- pour proposer notre modèle, en faisant abstraction des optimisations bas niveau.

	1	+% fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee
	2	+% demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de
	3	+% rejection par bit et perte si moins de bits que rejection/6
	4	+% developper programme lineaire en incluant le decalage de bits
	5	+% insister que avant on etait synthetisable mais pas implementable, alors que maintenant on
	6	+% implemente et on demontre que ca tourne
	7	+% gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ?
	8	+% Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer
	9	+% (zedboard ou redpit)
	10	+
	11	+% ajouter pyramide "juste"
	12	+% label schema : verifier que "argumenter de la cascade de FIR" est fait
	13	+
1	14	\documentclass[a4paper,conference]{IEEEtran/IEEEtran}
2	15	\usepackage{graphicx,color,hyperref}
3	16	\usepackage{amsfonts}

GITLAB

jfriedt / IFCS2018 article

menage article IFCS