Commit cbca8b45674b2f43778f1e606d4cf6044b214699
1 parent
b0ed3be3ee
Exists in
master
template d'article
Showing 1 changed file with 669 additions and 0 deletions Inline Diff
ifcs2018_article.tex
File was created | 1 | % fusionner max rejection a surface donnee v.s minimiser surface a rejection donnee | ||
2 | % demontrer comment la quantification rejette du bruit vers les hautes frequences => 6 dB de | |||
3 | % rejection par bit et perte si moins de bits que rejection/6 | |||
4 | % developper programme lineaire en incluant le decalage de bits | |||
5 | % insister que avant on etait synthetisable mais pas implementable, alors que maintenant on | |||
6 | % implemente et on demontre que ca tourne | |||
7 | % gwen : pourquoi le FIR est desormais implementable et ne l'etait pas meme sur zedboard->new FIR ? | |||
8 | % Gwen : peut-on faire un vrai banc de bruit de phase avec ce FIR, ie ajouter ADC, NCO et mixer | |||
9 | % (zedboard ou redpit) | |||
10 | ||||
11 | \documentclass[a4paper,transaction]{IEEEtran/IEEEtran} | |||
12 | \usepackage{graphicx,color,hyperref} | |||
13 | \usepackage{amsfonts} | |||
14 | \usepackage{amsthm} | |||
15 | \usepackage{amssymb} | |||
16 | \usepackage{amsmath} | |||
17 | \usepackage{algorithm2e} | |||
18 | \usepackage{url,balance} | |||
19 | \usepackage[normalem]{ulem} | |||
20 | % correct bad hyphenation here | |||
21 | \hyphenation{op-tical net-works semi-conduc-tor} | |||
22 | \textheight=26cm | |||
23 | \setlength{\footskip}{30pt} | |||
24 | \pagenumbering{gobble} | |||
25 | \begin{document} | |||
26 | \title{Filter optimization for real time digital processing of radiofrequency signals: application | |||
27 | to oscillator metrology} | |||
28 | ||||
29 | \author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, | |||
30 | G. Goavec-M\'erou\IEEEauthorrefmark{1}, | |||
31 | P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}} | |||
32 | \IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France } | |||
33 | \IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ | |||
34 | Email: \{pyb2,jmfriedt\}@femto-st.fr} | |||
35 | } | |||
36 | \maketitle | |||
37 | \thispagestyle{plain} | |||
38 | \pagestyle{plain} | |||
39 | \newtheorem{definition}{Definition} | |||
40 | ||||
41 | \begin{abstract} | |||
42 | Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to | |||
43 | radiofrequency signal processing. Applied to oscillator characterization in the context | |||
44 | of ultrastable clocks, stringent filtering requirements are defined by spurious signal or | |||
45 | noise rejection needs. Since real time radiofrequency processing must be performed in a | |||
46 | Field Programmable Array to meet timing constraints, we investigate optimization strategies | |||
47 | to design filters meeting rejection characteristics while limiting the hardware resources | |||
48 | required and keeping timing constraints within the targeted measurement bandwidths. | |||
49 | \end{abstract} | |||
50 | ||||
51 | \begin{IEEEkeywords} | |||
52 | Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter | |||
53 | \end{IEEEkeywords} | |||
54 | ||||
55 | \section{Digital signal processing of ultrastable clock signals} | |||
56 | ||||
57 | Analog oscillator phase noise characteristics are classically performed by downconverting | |||
58 | the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, | |||
59 | followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In | |||
60 | a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by | |||
61 | multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. | |||
62 | ||||
63 | \begin{figure}[h!tb] | |||
64 | \begin{center} | |||
65 | \includegraphics[width=.8\linewidth]{schema} | |||
66 | \end{center} | |||
67 | \caption{Fully digital oscillator phase noise characterization: the Device Under Test | |||
68 | (DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and | |||
69 | downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals | |||
70 | and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite | |||
71 | Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays | |||
72 | the spectral characteristics of the phase fluctuations.} | |||
73 | % JMF : argumenter de la cascade de FIR | |||
74 | \label{schema} | |||
75 | \end{figure} | |||
76 | ||||
77 | As with the analog mixer, | |||
78 | the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as | |||
79 | well as the generation of the frequency sum signal in addition to the frequency difference. | |||
80 | These unwanted spectral characteristics must be rejected before decimating the data stream | |||
81 | for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the | |||
82 | downconverter | |||
83 | and the decimation processing blocks are core characteristics of an oscillator characterization | |||
84 | system, and must reject out-of-band signals below the targeted phase noise -- typically in the | |||
85 | sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will | |||
86 | use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency | |||
87 | datastream: optimizing the performance of the filter while reducing the needed resources is | |||
88 | hence tackled in a systematic approach using optimization techniques. Most significantly, we | |||
89 | tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with | |||
90 | tunable number of coefficients and tunable number of bits representing the coefficients and the | |||
91 | data being processed. | |||
92 | ||||
93 | \section{Finite impulse response filter} | |||
94 | ||||
95 | We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined | |||
96 | by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the | |||
97 | outputs $y_k$ | |||
98 | $$y_n=\sum_{k=0}^N b_k x_{n-k}$$ | |||
99 | ||||
100 | As opposed to an implementation on a general purpose processor in which word size is defined by the | |||
101 | processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since | |||
102 | not only the coefficient values and number of taps must be defined, but also the number of bits | |||
103 | defining the coefficients and the sample size. For this reason, and because we consider pipeline | |||
104 | processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency | |||
105 | signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but | |||
106 | the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level. | |||
107 | Since latency is not an issue in a openloop phase noise characterization instrument, the large | |||
108 | numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, | |||
109 | is not considered as an issue as would be in a closed loop system. | |||
110 | ||||
111 | The coefficients are classically expressed as floating point values. However, this binary | |||
112 | number representation is not efficient for fast arithmetic computation by an FPGA. Instead, | |||
113 | we select to quantify these floating point values into integer values. This quantization | |||
114 | will result in some precision loss. | |||
115 | ||||
116 | %As illustrated in Fig. \ref{float_vs_int}, we see that we aren't | |||
117 | %need too coefficients or too sample size. If we have lot of coefficients but a small sample size, | |||
118 | %the first and last are equal to zero. But if we have too sample size for few coefficients that not improve the quality. | |||
119 | ||||
120 | % JMF je ne comprends pas la derniere phrase ci-dessus ni la figure ci dessous | |||
121 | % AH en gros je voulais dire que prendre trop peu de bit avec trop de coeff, ça induit ta figure (bien mieux faite que moi) | |||
122 | % et que l'inverse trop de bit sur pas assez de coeff on ne gagne rien, je vais essayer de la reformuler | |||
123 | ||||
124 | %\begin{figure}[h!tb] | |||
125 | %\includegraphics[width=\linewidth]{images/float-vs-integer.pdf} | |||
126 | %\caption{Impact of the quantization resolution of the coefficients} | |||
127 | %\label{float_vs_int} | |||
128 | %\end{figure} | |||
129 | ||||
130 | \begin{figure}[h!tb] | |||
131 | \includegraphics[width=\linewidth]{images/demo_filtre} | |||
132 | \caption{Impact of the quantization resolution of the coefficients: the quantization is | |||
133 | set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting | |||
134 | the 30~first and 30~last coefficients out of the initial 128~band-pass | |||
135 | filter coefficients to 0 (red dots).} | |||
136 | \label{float_vs_int} | |||
137 | \end{figure} | |||
138 | ||||
139 | The tradeoff between quantization resolution and number of coefficients when considering | |||
140 | integer operations is not trivial. As an illustration of the issue related to the | |||
141 | relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits | |||
142 | a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon | |||
143 | quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the | |||
144 | taps become null, making the large number of coefficients irrelevant and allowing to save | |||
145 | processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources | |||
146 | to reach a given rejection level, or maximizing out of band rejection for a given computational | |||
147 | resource, will drive the investigation on cascading filters designed with varying tap resolution | |||
148 | and tap length, as will be shown in the next section. Indeed, our development strategy closely | |||
149 | follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} | |||
150 | in which basic blocks are defined and characterized before being assembled \cite{hide} | |||
151 | in a complete processing chain. In our case, assembling the filter blocks is a simpler block | |||
152 | combination process since we assume a single value to be processed and a single value to be | |||
153 | generated at each clock cycle. The FIR filters will not be considered to decimate in the | |||
154 | current implementation: the decimation is assumed to be located after the FIR cascade at the | |||
155 | moment. | |||
156 | ||||
157 | \section{Filter optimization} | |||
158 | ||||
159 | A basic approach for implementing the FIR filter is to compute the transfer function of | |||
160 | a monolithic filter: this single filter defines all coefficients with the same resolution | |||
161 | (number of bits) and processes data represented with their own resolution. Meeting the | |||
162 | filter shape requires a large number of coefficients, limited by resources of the FPGA since | |||
163 | this filter must process data stream at the radiofrequency sampling rate after the mixer. | |||
164 | ||||
165 | An optimization problem \cite{leung2004handbook} aims at improving one or many | |||
166 | performance criteria within a constrained resource environment. Amongst the tools | |||
167 | developed to meet this aim, Mixed-Integer Linear Programming (MILP) provides the framework to | |||
168 | formally define the stated problem and search for an optimal use of available | |||
169 | resources \cite{yu2007design, kodek1980design}. | |||
170 | ||||
171 | First we need to ensure that our problem is a real optimization problem. When | |||
172 | designing a processing function in the FPGA, we aim at meeting some requirement such as | |||
173 | the throughput, the computation time or the noise rejection noise. However, due to limited | |||
174 | resources to design the process like BRAM (high performance RAM), DSP (Digital Signal Processor) | |||
175 | or LUT (Look Up Table), a tradeoff must be generally searched between performance and available | |||
176 | computational resources: optimizing some criteria within finite, limited | |||
177 | resources indeed matches the definition of a classical optimization problem. | |||
178 | ||||
179 | Specifically the degrees of freedom when addressing the problem of replacing the single monolithic | |||
180 | FIR with a cascade of optimized filters are the number of coefficients $N_i$ of each filter $i$, | |||
181 | the number of bits $C_i$ representing the coefficients and the number of bits $D_i$ needed to represent | |||
182 | the data $x_k$ fed to each filter as provided by the acquisition or previous processing stage. | |||
183 | Because each FIR in the chain is fed the output of the previous stage, | |||
184 | the optimization of the complete processing chain within a constrained resource environment is not | |||
185 | trivial. The resource occupation of a FIR filter is considered as $C_i \times N_i$ which aims | |||
186 | at approximating the number of bits needed in a worst case condition to represent the output of the | |||
187 | FIR. Indeed, the number of bits generated by the $i$th FIR is $(C_i+D_i)\times\log_2(N_i)$, but the | |||
188 | $\log$ function is avoided for its incompatibility with a linear programming description, and | |||
189 | the simple product is approximated as the number of gates needed to perform the calculation. Such an | |||
190 | occupied area estimate assumes that the number of gates scales as the number of bits and the number | |||
191 | of coefficients, but does not account for the detailed implementation of the hardware. Indeed, | |||
192 | various FPGA implementations will provide different hardware functionalities, and we shall consider | |||
193 | at the end of the design a synthesis step using vendor software to assess the validity of the solution | |||
194 | found. As an example of the limitation linked to the lack of detailed hardware consideration, Block Random | |||
195 | Access Memory (BRAM) used to store filter coefficients are not shared amongst filters, and multiplications | |||
196 | are most efficiently implemented by using DSP blocks whose input word | |||
197 | size is finite. DSPs are a scarce resource to be saved in a practical implementation. Keeping a high | |||
198 | abstraction on the resource occupation is nevertheless selected in the following discussion in order | |||
199 | to leave enough degrees of freedom in the problem to try and find original solutions: too many | |||
200 | constraints in the initial statement of the problem leave little room for finding an optimal solution. | |||
201 | ||||
202 | \begin{figure}[h!tb] | |||
203 | \begin{center} | |||
204 | \includegraphics[width=.5\linewidth]{schema2} | |||
205 | \caption{Shape of the filter transmitted power $P$ as a function of frequency: | |||
206 | the bandpass BP is considered to occupy the initial | |||
207 | 40\% of the Nyquist frequency range, the stopband the last 40\%, allowing 20\% transition | |||
208 | width.} | |||
209 | \label{rejection-shape} | |||
210 | \end{center} | |||
211 | \end{figure} | |||
212 | ||||
213 | Following these considerations, the model is expressed as: | |||
214 | \begin{align} | |||
215 | \begin{cases} | |||
216 | \mathcal{R}_i &= \mathcal{F}(N_i, C_i)\\ | |||
217 | \mathcal{A}_i &= N_i \times C_i\\ | |||
218 | \Delta_i &= \Delta _{i-1} + \mathcal{P}_i | |||
219 | \end{cases} | |||
220 | \label{model-FIR} | |||
221 | \end{align} | |||
222 | To explain the system \ref{model-FIR}, $\mathcal{R}_i$ represents the stopband rejection dependence with $N_i$ and $C_i$, $\mathcal{A}_i$ | |||
223 | is a theoretical area occupation of the processing block on the FPGA as discussed earlier, and $\Delta_i$ is the total rejection for the current stage $i$. | |||
224 | Since the function $\mathcal{F}$ cannot be explictly expressed, we run simulations to determine the rejection depending | |||
225 | on $N_i$ and $C_i$. However, selecting the right filter requires a clear definition of the rejection criterion. Selecting an | |||
226 | incorrect criterion will lead the linear program solver to produce a solution which might not meet the user requirements. | |||
227 | Hence, amongst various criteria including the mean or median value of the FIR response in the stopband as will | |||
228 | be illustrated lated (section \ref{median}), we have designed | |||
229 | a criterion aimed at avoiding ripples in the passband and considering the maximum of the FIR spectral response in the stopband | |||
230 | (Fig. \ref{rejection-shape}). The bandpass criterion is defined as the sum of the absolute values of the spectral response | |||
231 | in the bandpass, reminiscent of a standard deviation of the spectral response: this criterion must be minimized to avoid | |||
232 | ripples in the passband. The stopband transfer function maximum must also be minimized in order to improve the filter | |||
233 | rejection capability. Weighing these two criteria allows designing the linear program to be solved. | |||
234 | ||||
235 | \begin{figure}[h!tb] | |||
236 | \includegraphics[width=\linewidth]{images/noise-rejection.pdf} | |||
237 | \caption{Rejection as a function of number of coefficients and number of bits.} | |||
238 | \label{noise-rejection} | |||
239 | \end{figure} | |||
240 | ||||
241 | {\bf ARTHUR : reg\'en\'erer une pyramide juste} | |||
242 | ||||
243 | The objective function maximizes the noise rejection ($\max(\Delta_{i_{\max}})$) while keeping resource | |||
244 | occupation below a user-defined threshold, or as will be discussed here, aims at minimizing the area | |||
245 | needed to reach a given rejection ($\min(S_q)$ in the forthcoming discussion, Eqs. \ref{cstr_size} | |||
246 | and \ref{cstr_rejection}). The MILP solver is allowed to choose the number of successive | |||
247 | filters, within an upper bound. The last problem is to model the noise rejection. Since filter | |||
248 | noise rejection capability is not modeled with linear equations, a look-up-table is generated | |||
249 | for multiple filter configurations in which the $C_i$, $D_i$ and $N_i$ parameters are varied: for each | |||
250 | one of these conditions, the low-pass filter rejection is stored as computed by the frequency response | |||
251 | of the digital filter (Fig. \ref{noise-rejection}). Various rejection criteria have been investigated, | |||
252 | including mean value of the stopband response, median value of the stopband response, or as finally | |||
253 | selected, maximum value in the stopband. An intuitive analysis of the chart of Fig. \ref{noise-rejection} | |||
254 | hints at an optimum | |||
255 | set of tap length and number of bit for representing the coefficients along the line of the pyramidal | |||
256 | shaped rejection capability function. | |||
257 | ||||
258 | Linear program formalism for solving the problem is well documented: an objective function is | |||
259 | defined which is linearly dependent on the parameters to be optimized. Constraints are expressed | |||
260 | as linear equations and solved using one of the available solvers, in our case GLPK\cite{glpk}. | |||
261 | With the notations used in the description of system \ref{model-FIR}, we have defined the linear problem as: | |||
262 | \paragraph{Variables} | |||
263 | \begin{align*} | |||
264 | x_{i,j} \in \lbrace 0,1 \rbrace & \text{ $i$ is a given filter} \\ | |||
265 | & \text{ $j$ is the stage} \\ | |||
266 | & \text{ If $x_{i,j}$ is equal to 1, the filter is selected} \\ | |||
267 | \end{align*} | |||
268 | \paragraph{Constants} | |||
269 | \begin{align*} | |||
270 | \mathcal{F} = \lbrace F_1 ... F_p \rbrace & \text{ All possible filters}\\ | |||
271 | & \text{ $p$ is the number of different filters} \\ | |||
272 | % N(i) & \text{ % Constant to let the | |||
273 | % number of coefficients %} \\ & \text{ | |||
274 | % for filter $i$}\\ | |||
275 | % C(i) & \text{ % Constant to let the | |||
276 | % number of bits of %}\\ & \text{ | |||
277 | % each coefficient for filter $i$}\\ | |||
278 | \mathcal{S}_{\max} & \text{ Total space available inside the FPGA} | |||
279 | \end{align*} | |||
280 | \paragraph{Constraints} | |||
281 | \begin{align} | |||
282 | 1 \leq i \leq p & \nonumber\\ | |||
283 | 1 \leq j \leq q & \text{ $q$ is the max of filter stage} \nonumber \\ | |||
284 | \forall j, \mathlarger{\sum_{i}} x_{i,j} = 1 & \text{ At most one filter by stage} \nonumber\\ | |||
285 | \mathcal{S}_0 = 0 & \text{ initial occupation} \nonumber\\ | |||
286 | \forall j, \mathcal{S}_j = \mathcal{S}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{A}_i)} \label{cstr_size} \\ | |||
287 | \mathcal{S}_j \leq \mathcal{S}_{\max}\nonumber \\ | |||
288 | \mathcal{N}_0 = 0 & \text{ initial rejection}\nonumber\\ | |||
289 | \forall j, \mathcal{N}_j = \mathcal{N}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{R}_i)} \label{cstr_rejection} \\ | |||
290 | \mathcal{N}_q \geqslant 160 & \text{ an user defined bound}\nonumber\\ | |||
291 | & \text{ (e.g. 160~dB here)}\nonumber\\\nonumber | |||
292 | \end{align} | |||
293 | \paragraph{Goal} | |||
294 | \begin{align*} | |||
295 | \min \mathcal{S}_q | |||
296 | \end{align*} | |||
297 | ||||
298 | The constraint \ref{cstr_size} means the occupation for the current stage $j$ depends on | |||
299 | the previous occupation and the occupation of current selected filter (it is possible | |||
300 | that no filter is selected for this stage). And the second one \ref{cstr_rejection} | |||
301 | means the same thing but for the rejection, the rejection depends the previous rejection | |||
302 | plus the rejection of selected filter. | |||
303 | ||||
304 | \subsection{Low bandpass ripple and maximum rejection criteria} | |||
305 | ||||
306 | The MILP solver provides a solution to the problem by selecting a series of small FIR with | |||
307 | increasing number of bits representing data and coefficients as well as an increasing number | |||
308 | of coefficients, instead of a single monolithic filter. | |||
309 | ||||
310 | \begin{figure}[h!tb] | |||
311 | % \includegraphics[width=\linewidth]{images/compare-fir.pdf} | |||
312 | \includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-jmf-light.pdf} | |||
313 | \caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |||
314 | with a cutoff frequency set at half the Nyquist frequency.} | |||
315 | \label{compare-fir} | |||
316 | \end{figure} | |||
317 | ||||
318 | Fig. \ref{compare-fir} exhibits the | |||
319 | performance comparison between one solution and a monolithic FIR when selecting a cutoff | |||
320 | frequency of half the Nyquist frequency: a series of 5 FIR and a series of 10 FIR with the | |||
321 | same space usage are provided as selected by the MILP solver. The FIR cascade provides improved | |||
322 | rejection than the monolithic FIR at the expense of a lower cutoff frequency which remains to | |||
323 | be tuned or compensated for. | |||
324 | ||||
325 | ||||
326 | The resource occupation when synthesizing such FIR on a Xilinx FPGA is summarized as Tab. \ref{t1}. | |||
327 | We have considered a set of resources representative of the hardware platform we work on, | |||
328 | Avnet's Zedboard featuring a Xilinx XC7Z020-CLG484-1 Zynq System on Chip (SoC). The results reported in | |||
329 | Tab. \ref{t1} emphasize that implementing the monolithic single FIR is impossible due to | |||
330 | the insufficient hardware resources (exhausted LUT resources), while the FIR cascading 5 or 10 | |||
331 | filters fit in the available resources. However, in all cases the DSP resources are fully | |||
332 | used: while the design can be synthesized using Xilinx proprietary Vivado 2016.2 software, | |||
333 | implementing the design fails due to the excessive resource usage preventing routing the signals | |||
334 | on the FPGA. Such results emphasize on the one hand the improvement prospect of the optimization | |||
335 | procedure by finding non-trivial solutions matching resource constraints, but on the other | |||
336 | hand also illustrates the limitation of a model with an abstraction layer that does not account | |||
337 | for the detailed architecture of the hardware. | |||
338 | ||||
339 | \begin{table}[h!tb] | |||
340 | \caption{Resource occupation on a Xilinx Zynq-7000 series FPGA when synthesizing the FIR cascade | |||
341 | identified as optimal by the MILP solver within a finite resource criterion. The last line refers | |||
342 | to available resources on a Zynq-7020 as found on the Zedboard.} | |||
343 | \begin{center} | |||
344 | \begin{tabular}{|c|cccc|}\hline | |||
345 | FIR & BlockRAM & LookUpTables & DSP & rejection (dB)\\\hline\hline | |||
346 | 1 (monolithic) & 1 & 76183 & 220 & -162 \\ | |||
347 | 5 & 5 & 18597 & 220 & -160 \\ | |||
348 | 10 & 8 & 24729 & 220 & -161 \\\hline\hline | |||
349 | \textbf{Zynq 7020} & \textbf{420} & \textbf{53200} & \textbf{220} & \\\hline | |||
350 | %\begin{tabular}{|c|ccccc|}\hline | |||
351 | %FIR & BRAM36 & BRAM18 & LUT & DSP & rejection (dB)\\\hline\hline | |||
352 | %1 (monolithic) & 1 & 0 & {\color{Red}76183} & 220 & -162 \\ | |||
353 | %5 & 0 & 5 & {\color{Green}18597} & 220 & -160 \\ | |||
354 | %10 & 0 & 8 & {\color{Green}24729} & 220 & -161 \\\hline\hline | |||
355 | %\textbf{Zynq 7020} & \textbf{140} & \textbf{280} & \textbf{53200} & \textbf{220} & \\\hline | |||
356 | \end{tabular} | |||
357 | \end{center} | |||
358 | %\vspace{-0.7cm} | |||
359 | \label{t1} | |||
360 | \end{table} | |||
361 | ||||
362 | \subsection{Alternate criteria}\label{median} | |||
363 | ||||
364 | Fig. \ref{compare-fir} provides FIR solutions matching well the targeted transfer | |||
365 | function, namely low ripple in the bandpass defined as the first 40\% of the frequency | |||
366 | range and maximum rejection of 160~dB in the last 40\% stopband. We illustrate now, for | |||
367 | demonstrating the need to properly select the optimization criterion, two cases of poor | |||
368 | filter shapes obtained by selecting the mean value and median value of the rejection, | |||
369 | with no consideration for the ripples in the bandpass. The results of the optimizations, | |||
370 | in these cases, are shown in Figs. \ref{compare-mean} and \ref{compare-median}. | |||
371 | ||||
372 | \begin{figure}[h!tb] | |||
373 | \includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-mean-light.pdf} | |||
374 | \caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |||
375 | with a cutoff frequency set at half the Nyquist frequency.} | |||
376 | \label{compare-mean} | |||
377 | \end{figure} | |||
378 | ||||
379 | In the case of the mean value criterion (Fig. \ref{compare-mean}), the solution is not | |||
380 | acceptable since the notch at the end of the transition band compensates for some unacceptable | |||
381 | rise in the rejection close to the Nyquist frequency. Applying such a filter might yield excessive | |||
382 | high frequency spurious components to be aliased at low frequency when decimating the signal. | |||
383 | Similarly, the lack of criterion on the bandpass shape induces a shape with poor flatness and | |||
384 | and slowly decaying transfer function starting to attenuate spectral components well before the | |||
385 | transition band starts. Such issues are partly aleviated by replacing a mean rejection value with | |||
386 | a median rejection value (Fig. \ref{compare-median}) but solutions remain unacceptable for | |||
387 | the reasons stated previously and much poorer than those found with the maximum rejection criterion | |||
388 | selected earlier (Fig. \ref{compare-fir}). | |||
389 | ||||
390 | \begin{figure}[h!tb] | |||
391 | \includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-median-light.pdf} | |||
392 | \caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |||
393 | with a cutoff frequency set at half the Nyquist frequency.} | |||
394 | \label{compare-median} | |||
395 | \end{figure} | |||
396 | ||||
397 | \section{Filter coefficient selection} | |||
398 | ||||
399 | The coefficients of a single monolithic filter are computed as the impulse response | |||
400 | of the filter transfer function, and practically approximated by a multitude of methods | |||
401 | including least square optimization (Matlab's {\tt firls} function), Hamming or Kaiser windowing | |||
402 | (Matlab's {\tt fir1} function). | |||
403 | ||||
404 | \begin{figure}[h!tb] | |||
405 | \includegraphics[width=\linewidth]{images/fir1-vs-firls} | |||
406 | \caption{Evolution of the rejection capability of least-square optimized filters and Hamming | |||
407 | FIR filters as a function of the number of coefficients, for floating point numbers and 8-bit | |||
408 | encoded integers.} | |||
409 | \label{2} | |||
410 | \end{figure} | |||
411 | ||||
412 | Cascading filters opens a new optimization opportunity by | |||
413 | selecting various coefficient sets depending on the number of coefficients. Fig. \ref{2} | |||
414 | illustrates that for a number of coefficients ranging from 8 to 47, {\tt fir1} provides a better | |||
415 | rejection than {\tt firls}: since the linear solver increases the number of coefficients along | |||
416 | the processing chain, the type of selected filter also changes depending on the number of coefficients | |||
417 | and evolves along the processing chain. | |||
418 | ||||
419 | \section{Conclusion} | |||
420 | ||||
421 | We address the optimization problem of designing a low-pass filter chain in a Field Programmable Gate | |||
422 | Array for improved noise rejection within constrained resource occupation, as needed for | |||
423 | real time processing of radiofrequency signal when characterizing spectral phase noise | |||
424 | characteristics of stable oscillators. The flexibility of the digital approach makes the result | |||
425 | best suited for closing the loop and using the measurement output in a feedback loop for | |||
426 | controlling clocks, e.g. in a quartz-stabilized high performance clock whose long term behavior | |||
427 | is controlled by non-piezoelectric resonator (sapphire resonator, microwave or optical | |||
428 | atomic transition). | |||
429 | ||||
430 | \section*{Acknowledgement} | |||
431 | ||||
432 | This work is supported by the ANR Programme d'Investissement d'Avenir in | |||
433 | progress at the Time and Frequency Departments of the FEMTO-ST Institute | |||
434 | (Oscillator IMP, First-TF and Refimeve+), and by R\'egion de Franche-Comt\'e. | |||
435 | The authors would like to thank E. Rubiola, F. Vernotte, and G. Cabodevila | |||
436 | for support and fruitful discussions. | |||
437 | ||||
438 | \bibliographystyle{IEEEtran} | |||
439 | \balance | |||
440 | \bibliography{references,biblio} | |||
441 | \end{document} | |||
442 | ||||
443 | \section{Contexte d'ordonnancement} | |||
444 | Dans cette partie, nous donnerons des d\'efinitions de termes rattach\'es au domaine de l'ordonnancement | |||
445 | et nous verrons que le sujet trait\'e se rapproche beaucoup d'un problème d'ordonnancement. De ce fait | |||
446 | nous pourrons aller plus loin que les travaux vus pr\'ec\'edemment et nous tenterons des approches d'ordonnancement | |||
447 | et d'optimisation. | |||
448 | ||||
449 | \subsection{D\'efinition du vocabulaire} | |||
450 | Avant tout, il faut d\'efinir ce qu'est un problème d'optimisation. Il y a deux d\'efinitions | |||
451 | importantes à donner. La première est propos\'ee par Legrand et Robert dans leur livre \cite{def1-ordo} : | |||
452 | \begin{definition} | |||
453 | \label{def-ordo1} | |||
454 | Un ordonnancement d'un système de t\^aches $G\ =\ (V,\ E,\ w)$ est une fonction $\sigma$ : | |||
455 | $V \rightarrow \mathbb{N}$ telle que $\sigma(u) + w(u) \leq \sigma(v)$ pour toute arête $(u,\ v) \in E$. | |||
456 | \end{definition} | |||
457 | ||||
458 | Dit plus simplement, l'ensemble $V$ repr\'esente les t\^aches à ex\'ecuter, l'ensemble $E$ repr\'esente les d\'ependances | |||
459 | des t\^aches et $w$ les temps d'ex\'ecution de la t\^ache. La fonction $\sigma$ donne donc l'heure de d\'ebut de | |||
460 | chacune des t\^aches. La d\'efinition dit que si une t\^ache $v$ d\'epend d'une t\^ache $u$ alors | |||
461 | la date de d\'ebut de $v$ sera plus grande ou \'egale au d\'ebut de l'ex\'ecution de la t\^ache $u$ plus son | |||
462 | temps d'ex\'ecution. | |||
463 | ||||
464 | Une autre d\'efinition importante qui est propos\'ee par Leung et al. \cite{def2-ordo} est : | |||
465 | \begin{definition} | |||
466 | \label{def-ordo2} | |||
467 | L'ordonnancement traite de l'allocation de ressources rares à des activit\'es avec | |||
468 | l'objectif d'optimiser un ou plusieurs critères de performance. | |||
469 | \end{definition} | |||
470 | ||||
471 | Cette d\'efinition est plus g\'en\'erique mais elle nous int\'eresse d'avantage que la d\'efinition \ref{def-ordo1}. | |||
472 | En effet, la partie qui nous int\'eresse dans cette première d\'efinition est le respect de la pr\'ec\'edance des t\^aches. | |||
473 | Dans les faits les dates de d\'ebut ne nous int\'eressent pas r\'eellement. | |||
474 | ||||
475 | En revanche la d\'efinition \ref{def-ordo2} sera au c\oe{}ur du projet. Pour se convaincre de cela, | |||
476 | il nous faut d'abord d\'efinir quel est le type de problème d'ordonnancement qu'on traite et quelles | |||
477 | sont les m\'ethodes qu'on peut appliquer. | |||
478 | ||||
479 | Les problèmes d'ordonnancement peuvent être class\'es en diff\'erentes cat\'egories : | |||
480 | \begin{itemize} | |||
481 | \item T\^aches ind\'ependantes : dans cette cat\'egorie de problèmes, les t\^aches sont complètement ind\'ependantes | |||
482 | les unes des autres. Dans notre cas, ce n'est pas le plus adapt\'e. | |||
483 | \item Graphe de t\^aches : la d\'efinition \ref{def-ordo1} d\'ecrit cette cat\'egorie. La plupart du temps, | |||
484 | les t\^aches sont repr\'esent\'ees par une DAG. Cette cat\'egorie est très proche de notre cas puisque nous devons \'egalement ex\'ecuter | |||
485 | des t\^aches qui ont un certain nombre de d\'ependances. On pourra même dire que dans certain cas, | |||
486 | on a des anti-arbres, c'est à dire que nous avons une multitude de t\^aches d'entr\'ees qui convergent vers une | |||
487 | t\^ache de fin. | |||
488 | \item Workflow : cette cat\'egorie est une sous cat\'egorie des graphes de t\^aches dans le sens où | |||
489 | il s'agit d'un graphe de t\^aches r\'ep\'et\'e de nombreuses de fois. C'est exactement ce type de problème | |||
490 | que nous traitons ici. | |||
491 | \end{itemize} | |||
492 | ||||
493 | Bien entendu, cette liste n'est pas exhaustive et il existe de nombreuses autres classifications et sous-classifications | |||
494 | de ces problèmes. Nous n'avons parl\'e ici que des cat\'egories les plus communes. | |||
495 | ||||
496 | Un autre point à d\'efinir, est le critère d'optimisation. Il y a là encore un grand nombre de | |||
497 | critères possibles. Nous allons donc parler des principaux : | |||
498 | \begin{itemize} | |||
499 | \item Temps de compl\'etion total (ou Makespan en anglais) : ce critère est l'un des critères d'optimisation | |||
500 | les plus courant. Il s'agit donc de minimiser la date de fin de la dernière t\^ache de l'ensemble des | |||
501 | t\^aches à ex\'ecuter. L'enjeu de cette optimisation est donc de trouver l'ordonnancement optimal permettant | |||
502 | la fin d'ex\'ecution au plus tôt. | |||
503 | \item Somme des temps d'ex\'ecution (Flowtime en anglais) : il s'agit de faire la somme des temps d'ex\'ecution de toutes les t\^aches | |||
504 | et d'optimiser ce r\'esultat. | |||
505 | \item Le d\'ebit : ce critère quant à lui, vise à augmenter au maximum le d\'ebit de traitement des donn\'ees. | |||
506 | \end{itemize} | |||
507 | ||||
508 | En plus de cela, on peut avoir besoin de plusieurs critères d'optimisation. Il s'agit dans ce cas d'une optimisation | |||
509 | multi-critères. Bien entendu, cela complexifie d'autant plus le problème car la solution la plus optimale pour un | |||
510 | des critères peut être très mauvaise pour un autre critère. De ce cas, il s'agira de trouver une solution qui permet | |||
511 | de faire le meilleur compromis entre tous les critères. | |||
512 | ||||
513 | \subsection{Formalisation du problème} | |||
514 | \label{formalisation} | |||
515 | Maintenant que nous avons donn\'e le vocabulaire li\'e à l'ordonnancement, nous allons pouvoir essayer caract\'eriser | |||
516 | formellement notre problème. En effet, nous allons reprendre les contraintes \'enonc\'ees dans la sections \ref{def-contraintes} | |||
517 | et nous essayerons de les formaliser le plus finement possible. | |||
518 | ||||
519 | Comme nous l'avons dit, une t\^ache est un bloc de traitement. Chaque t\^ache $i$ dispose d'un ensemble de paramètres | |||
520 | que nous nommerons $\mathcal{P}_{i}$. Cet ensemble $\mathcal{P}_i$ est propre à chaque t\^ache et il variera d'une | |||
521 | t\^ache à l'autre. Nous reviendrons plus tard sur les paramètres qui peuvent composer cet ensemble. | |||
522 | ||||
523 | Outre cet ensemble $\mathcal{P}_i$, chaque t\^ache dispose de paramètres communs : | |||
524 | \begin{itemize} | |||
525 | \item Dur\'ee de la t\^ache : Comme nous l'avons dit auparavant, dans le cadre d'un FPGA le temps est compt\'e en nombre de coup d'horloge. | |||
526 | En outre, les blocs sont toujours sollicit\'es, certains même sont capables de lire et de renvoyer une r\'esultat à chaque coups d'horloge. | |||
527 | Donc la dur\'ee d'une t\^ache ne peut être le laps de temps entre l'entr\'ee d'une donn\'ee et la sortie d'une autre. Nous d\'efinirons la | |||
528 | dur\'ee comme le temps de traitement d'une donn\'ee, c'est à dire la diff\'erence de temps entre la date de sortie d'une donn\'ee | |||
529 | et de sa date d'entr\'ee. Nous nommerons cette dur\'ee $\delta_i$. % Je devrais la nomm\'ee w comme dans la def2 | |||
530 | \item La pr\'ecision : La pr\'ecision d'une donn\'ee est le nombre de bits significatifs qu'elle compte. En effet, au fil des traitements | |||
531 | les pr\'ecisions peuvent varier. On nomme donc la pr\'ecision d'entr\'ee d'une t\^ache $i$ comme $\pi_i^-$ et la pr\'ecision en sortie $\pi_i^+$. | |||
532 | \item La fr\'equence du flux en entr\'ee (ou sortie) : Cette fr\'equence repr\'esente la fr\'equence des donn\'ees qui arrivent (resp. sortent). | |||
533 | Selon les t\^aches, les fr\'equences varieront. En effet, certains blocs ralentissent le flux c'est pourquoi on distingue la fr\'equence du | |||
534 | flux en entr\'ee et la fr\'equence en sortie. Nous nommerons donc la fr\'equence du flux en entr\'ee $f_i^-$ et la fr\'equence en sortie $f_i^+$. | |||
535 | \item La quantit\'e de donn\'ees en entr\'ee (ou en sortie) : Il s'agit de la quantit\'e de donn\'ees que le bloc s'attend à traiter (resp. | |||
536 | est capable de produire). Les t\^aches peuvent avoir à traiter des gros volumes de donn\'ees et n'en ressortir qu'une partie. Cette | |||
537 | fois encore, il nous faut donc diff\'erencier l'entr\'ee et la sortie. Nous nommerons donc la quantit\'e de donn\'ees entrantes $q_i^-$ | |||
538 | et la quantit\'e de donn\'ees sortantes $q_i^+$ pour une t\^ache $i$. | |||
539 | \item Le d\'ebit d'entr\'ee (ou de sortie) : Ce paramètre correspond au d\'ebit de donn\'ees que la t\^ache est capable de traiter ou qu'elle | |||
540 | fournit en sortie. Il s'agit simplement de l'expression des deux pr\'ec\'edents paramètres. Nous d\'efinirons donc la d\'ebit entrant de la | |||
541 | t\^ache $i$ comme $d_i^-\ =\ q_i^-\ *\ f_i^-$ et le d\'ebit sortant comme $d_i^+\ =\ q_i^+\ *\ f_i^+$. | |||
542 | \item La taille de la t\^ache : La taille dans les FPGA \'etant limit\'ee, ce paramètre exprime donc la place qu'occupe la t\^ache au sein du bloc. | |||
543 | Nous nommerons $\mathcal{A}_i$ cette taille. | |||
544 | \item Les pr\'ed\'ecesseurs et successeurs d'une t\^ache : cela nous permet de connaître les t\^aches requises pour pouvoir traiter | |||
545 | la t\^ache $i$ ainsi que les t\^aches qui en d\'ependent. Ces ensemble sont not\'es $\Gamma _i ^-$ et $ \Gamma _i ^+$ \\ | |||
546 | %TODO Est-ce vraiment un paramètre ? | |||
547 | \end{itemize} | |||
548 | ||||
549 | Ces diff\'erents paramètres communs sont fortement li\'es aux \'el\'ements de $\mathcal{P}_i$. Voici quelques exemples de relations | |||
550 | que nous avons identifi\'ees : | |||
551 | \begin{itemize} | |||
552 | \item $ \delta _i ^+ \ = \ \mathcal{F}_{\delta}(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+,\ \mathcal{P}_i) $ donne le temps d'ex\'ecution | |||
553 | de la t\^ache en fonction de la pr\'ecision voulue, du d\'ebit et des paramètres internes. | |||
554 | \item $ \pi _i ^+ \ = \ \mathcal{F}_{p}(\pi_i^-,\ \mathcal{P}_i) $, la fonction $F_p$ donne la pr\'ecision en sortie selon la pr\'ecision de d\'epart | |||
555 | et les paramètres internes de la t\^ache. | |||
556 | \item $d_i^+\ =\ \mathcal{F}_d(d_i^-, \mathcal{P}_i)$, la fonction $F_d$ donne le d\'ebit sortant de la t\^ache en fonction du d\'ebit | |||
557 | sortant et des variables internes de la t\^ache. | |||
558 | \item $A_i^+\ =\ \mathcal{F}_A(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+, \mathcal{P}_i)$ | |||
559 | \end{itemize} | |||
560 | Pour le moment, nous ne sommes pas capables de donner une d\'efinition g\'en\'erale de ces fonctions. Mais en revanche, | |||
561 | sur quelques exemples simples (cf. \ref{def-contraintes}), nous parvenons à donner une \'evaluation de ces fonctions. | |||
562 | ||||
563 | Maintenant que nous avons donn\'e toutes les notations utiles, nous allons \'enoncer des contraintes relatives à notre problème. Soit | |||
564 | un DGA $G(V,\ E)$, on a pour toutes arêtes $(i, j)\ \in\ E$ les in\'equations suivantes : | |||
565 | ||||
566 | \paragraph{Contrainte de pr\'ecision :} | |||
567 | Cette in\'equation traduit la contrainte de pr\'ecision d'une t\^ache à l'autre : | |||
568 | \begin{align*} | |||
569 | \pi _i ^+ \geq \pi _j ^- | |||
570 | \end{align*} | |||
571 | ||||
572 | \paragraph{Contrainte de d\'ebit :} | |||
573 | Cette in\'equation traduit la contrainte de d\'ebit d'une t\^ache à l'autre : | |||
574 | \begin{align*} | |||
575 | d _i ^+ = q _j ^- * (f_i + (1 / s_j) ) & \text{ où } s_j \text{ est une valeur positive de temporisation de la t\^ache} | |||
576 | \end{align*} | |||
577 | ||||
578 | \paragraph{Contrainte de synchronisation :} | |||
579 | Il s'agit de la contrainte qui impose que si à un moment du traitement, le DAG se s\'epare en plusieurs branches parallèles | |||
580 | et qu'elles se rejoignent plus tard, la somme des latences sur chacune des branches soit la même. | |||
581 | Plus formellement, s'il existe plusieurs chemins disjoints, partant de la t\^ache $s$ et allant à la t\^ache de $f$ alors : | |||
582 | \begin{align*} | |||
583 | \forall \text{ chemin } \mathcal{C}1(s, .., f), | |||
584 | \forall \text{ chemin } \mathcal{C}2(s, .., f) | |||
585 | \text{ tel que } \mathcal{C}1 \neq \mathcal{C}2 | |||
586 | \Rightarrow | |||
587 | \sum _{i} ^{i \in \mathcal{C}1} \delta_i = \sum _{i} ^{i \in \mathcal{C}2} \delta_i | |||
588 | \end{align*} | |||
589 | ||||
590 | \paragraph{Contrainte de place :} | |||
591 | Cette in\'equation traduit la contrainte de place dans le FPGA. La taille max de la puce FPGA est nomm\'e $\mathcal{A}_{FPGA}$ : |