Commit e3580faaed9916c30a80a4611d49775015d4a080
1 parent
3ca9d7dfc3
Exists in
master
relecture/corrections du matin
Showing 1 changed file with 83 additions and 123 deletions Side-by-side Diff
ifcs2018_proceeding.tex
... | ... | @@ -7,7 +7,6 @@ |
7 | 7 | \usepackage{algorithm2e} |
8 | 8 | \usepackage{url} |
9 | 9 | \usepackage[normalem]{ulem} |
10 | -\graphicspath{{/home/jmfriedt/gpr/170324_avalanche/}{/home/jmfriedt/gpr/1705_homemade/}} | |
11 | 10 | % correct bad hyphenation here |
12 | 11 | \hyphenation{op-tical net-works semi-conduc-tor} |
13 | 12 | \textheight=26cm |
... | ... | @@ -91,7 +90,7 @@ |
91 | 90 | processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since |
92 | 91 | not only the coefficient values and number of taps must be defined, but also the number of bits |
93 | 92 | defining the coefficients and the sample size. For this reason, and because we consider pipeline |
94 | -processing (as opposed to First-In, First-Out memory batch processing) of radiofrequency | |
93 | +processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency | |
95 | 94 | signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but |
96 | 95 | the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL). |
97 | 96 | Since latency is not an issue in a openloop phase noise characterization instrument, the large |
... | ... | @@ -117,6 +116,14 @@ |
117 | 116 | %\label{float_vs_int} |
118 | 117 | %\end{figure} |
119 | 118 | |
119 | +\begin{figure}[h!tb] | |
120 | +\includegraphics[width=\linewidth]{images/demo_filtre} | |
121 | +\caption{Impact of the quantization resolution of the coefficients: the quantization is | |
122 | +set to 6~bits, setting the 30~first and 30~last coefficients out of the initial 128~band-pass | |
123 | +filter coefficients to 0.} | |
124 | +\label{float_vs_int} | |
125 | +\end{figure} | |
126 | + | |
120 | 127 | The tradeoff between quantization resolution and number of coefficients when considering |
121 | 128 | integer operations is not trivial. As an illustration of the issue related to the |
122 | 129 | relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits |
123 | 130 | |
... | ... | @@ -126,16 +133,15 @@ |
126 | 133 | processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources |
127 | 134 | to reach a given rejection level, or maximizing out of band rejection for a given computational |
128 | 135 | resource, will drive the investigation on cascading filters designed with varying tap resolution |
129 | -and tap length, as will be shown in the next section. | |
136 | +and tap length, as will be shown in the next section. Indeed, our development strategy closely | |
137 | +follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} | |
138 | +in which basic blocks are defined and characterized before being assembled \cite{hide} | |
139 | +in a complete processing chain. In our case, assembling the filter blocks is a simpler block | |
140 | +combination process since we assume a single value to be processed and a single value to be | |
141 | +generated at each clock cycle. The FIR filters will not be considered to decimate in the | |
142 | +current implementation: the decimation is assumed to be located after the FIR cascade at the | |
143 | +moment. | |
130 | 144 | |
131 | -\begin{figure}[h!tb] | |
132 | -\includegraphics[width=\linewidth]{images/demo_filtre} | |
133 | -\caption{Impact of the quantization resolution of the coefficients: the quantization is | |
134 | -set to 6~bits, setting the 30~first and 30~last coefficients out of the initial 128~band-pass | |
135 | -filter coefficients to 0.} | |
136 | -\label{float_vs_int} | |
137 | -\end{figure} | |
138 | - | |
139 | 145 | \section{Filter optimization} |
140 | 146 | |
141 | 147 | A basic approach for implementing the FIR filter is to compute the transfer function of |
142 | 148 | |
... | ... | @@ -163,13 +169,30 @@ |
163 | 169 | the data fed to the filter. Because each FIR in the chain is fed the output of the previous stage, |
164 | 170 | the optimization of the complete processing chain within a constrained resource environment is not |
165 | 171 | trivial. The resource occupation of a FIR filter is considered as $D_i+C_i \times N_i)$ which is |
166 | -the number of bits needed in a worst case condition to represent the output of the FIR. | |
167 | -Unfortunately this representation is not sufficient to represent the real occupation inside FPGA. | |
168 | -In fact the FPGA have some BRAM block on which the coefficients are stored and each BRAM are not | |
169 | -share between different filters. Moreover the multiplication need DSP to be | |
170 | -perform. Those DSP are in limited quantity so in the future we shall to consider this. | |
172 | +the number of bits needed in a worst case condition to represent the output of the FIR. Such an | |
173 | +occupied area estimate assumes that the number of gates scales as the number of bits and the number | |
174 | +of coefficients, but does not account for the detailed implementation of the hardware. Indeed, | |
175 | +various FPGA implementations will provide different hardware functionalities, and we shall consider | |
176 | +at the end of the design a synthesis step using vendor software to assess the validity of the solution | |
177 | +found. As an example of the limitation linked to the lack of detailed hardware consideration, Block Random | |
178 | +Access Memory (BRAM) used to store filter coefficients are not shared amongst filters, and multiplications | |
179 | +are most efficiently implemented by using Digital Signal Processing (DSP) blocks whose input word | |
180 | +size is finite. DSPs are a scarce resource to be saved in a practical implementation. Keeping a high | |
181 | +abstraction on the resource occupation is nevertheless selected in the following discussion in order | |
182 | +to leave enough degrees of freedom in the problem to try and find original solutions: too many | |
183 | +constraints in the initial statement of the problem leave little room for finding an optimal solution. | |
171 | 184 | |
172 | -At the moment our model can be express like this : | |
185 | +\begin{figure}[h!tb] | |
186 | +\begin{center} | |
187 | +\includegraphics[width=.5\linewidth]{schema2} | |
188 | +\caption{Shape of the filter: the bandpass BP is considered to occupy the initial | |
189 | +40\% of the Nyquist frequency range, the bandstop the last 40\%, allowing 20\% transition | |
190 | +width.} | |
191 | +\label{rejection-shape} | |
192 | +\end{center} | |
193 | +\end{figure} | |
194 | + | |
195 | +Following these considerations, the model is expressed as: | |
173 | 196 | \begin{align} |
174 | 197 | \begin{cases} |
175 | 198 | \mathcal{R}_i &= \mathcal{F}(N_i, C_i)\\ |
176 | 199 | |
... | ... | @@ -178,22 +201,19 @@ |
178 | 201 | \end{cases} |
179 | 202 | \label{model-FIR} |
180 | 203 | \end{align} |
181 | -To explain the system \ref{model-FIR}, $\mathcal{R}_i$ represent the rejection of depending on $N_i$ and $C_i$, $\mathcal{A}$ | |
182 | -is just theoretical occupation and $\Delta_i$ is the total rejection for the current stage $i$. At this moment | |
183 | -we are not able to express the function $\mathcal{F}$ so we are run some simulations to determine the rejection noise depending | |
184 | -on $N_i$ and $C_i$. But to choose the right filter we must define clearly the rejection criterion. If we take incorrect criterion | |
185 | -the linear program will produce a wrong solution. So we define a criterion to avoid ripple on passband and just keep | |
186 | -the maximum of rejection on the stopband (see the figure \ref{rejection-shape}). Thank to this system, we can able to design our linear program. | |
204 | +To explain the system \ref{model-FIR}, $\mathcal{R}_i$ represents the rejection of depending on $N_i$ and $C_i$, $\mathcal{A}$ | |
205 | +is a theoretical area occupation of the processing block on the FPGA, and $\Delta_i$ is the total rejection for the current stage $i$. | |
206 | +Since the function $\mathcal{F}$ cannot be explictly expressed, we run simulations to determine the rejection depending | |
207 | +on $N_i$ and $C_i$. However, selecting the right filter requires a clear definition of the rejection criterion. Selecting an | |
208 | +incorrect criterion will lead the linear program solver to produce a solution which might not meet the user requirements. | |
209 | +Hence, amongst various criteria including the mean or median value of the FIR response in the stopband, we have designed | |
210 | +a criterion aimed at avoiding ripples on passband and considering the maximum of the FIR spectral response in the stopband | |
211 | +(Fig. \ref{rejection-shape}). The bandpass criterion is defined as the sum of the absolute values of the spectral response | |
212 | +in the bandpass, reminiscent of a standard deviation of the spectral response: this criterion must be minimized to avoid | |
213 | +ripples in the passband. The stopband transfer function maximum must also be minimized in order to improve the filter | |
214 | +rejection capability. Weighing these two criteria allows designing the linear program to be solved. | |
187 | 215 | |
188 | 216 | \begin{figure}[h!tb] |
189 | -\begin{center} | |
190 | -\includegraphics[width=.5\linewidth]{schema2} | |
191 | -\caption{Shape of rejection} | |
192 | -\label{rejection-shape} | |
193 | -\end{center} | |
194 | -\end{figure} | |
195 | - | |
196 | -\begin{figure}[h!tb] | |
197 | 217 | \includegraphics[width=\linewidth]{images/noise-rejection.pdf} |
198 | 218 | \caption{Rejection as a function of number of coefficients and number of bits} |
199 | 219 | \label{noise-rejection} |
... | ... | @@ -202,7 +222,7 @@ |
202 | 222 | The objective function maximizes the noise rejection ($\max(\Delta_{i_{\max}})$) while keeping resource occupation below |
203 | 223 | a user-defined threshold. The MILP solver is allowed to choose the number of successive |
204 | 224 | filters, within an upper bound. The last problem is to model the noise rejection. Since filter |
205 | -noise rejection capability is not modeled with linear equation, a look-up-table is generated | |
225 | +noise rejection capability is not modeled with linear equations, a look-up-table is generated | |
206 | 226 | for multiple filter configurations in which the $C_i$, $D_i$ and $N_i$ parameters are varied: for each |
207 | 227 | one of these conditions, the low-pass filter rejection defined as the mean power between |
208 | 228 | half the Nyquist frequency and the Nyquist frequency is stored as computed by the frequency response |
... | ... | @@ -214,7 +234,7 @@ |
214 | 234 | With the notation explain in system \ref{model-FIR}, we have defined our linear problem like this: |
215 | 235 | \paragraph{Variables} |
216 | 236 | \begin{align*} |
217 | -x_{i,j} \in \lbrace 0,1 \rbrace & \text{ $i$ is a specific filter} \ | |
237 | +x_{i,j} \in \lbrace 0,1 \rbrace & \text{ $i$ is a given filter} \ | |
218 | 238 | & \text{ $j$ is the stage} \\ |
219 | 239 | & \text{ If $x_{i,j}$ is equal to 1, the filter is selected} \\ |
220 | 240 | \end{align*} |
221 | 241 | |
... | ... | @@ -222,24 +242,27 @@ |
222 | 242 | \begin{align*} |
223 | 243 | \mathcal{F} = \lbrace F_1 ... F_p \rbrace & \text{ All possible filters}\\ |
224 | 244 | & \text{ $p$ is the number of different filters} \\ |
225 | -C(i) & \text{ Constant to let the number of coefficients}\\ | |
226 | -& \text{ for the filter $i$}\\ | |
227 | -\pi_C(i) & \text{ Constant to let the number of bits of}\\ | |
228 | -& \text{ each coefficient for the filter $i$}\\ | |
229 | -\mathcal{A}_{\max} & \text{ Max space available inside the FPGA} | |
245 | +C(i) & \text{ % Constant to let the | |
246 | +number of coefficients %} \\ & \text{ | |
247 | +for filter $i$}\\ | |
248 | +\pi_C(i) & \text{ % Constant to let the | |
249 | +number of bits of %}\\ & \text{ | |
250 | +each coefficient for filter $i$}\\ | |
251 | +\mathcal{A}_{\max} & \text{ Total space available inside the FPGA} | |
230 | 252 | \end{align*} |
231 | 253 | \paragraph{Constraints} |
232 | -\begin{align*} | |
233 | -1 \leq i \leq p & \\ | |
234 | -1 \leq j \leq q & \text{ $q$ is the max of filter stage} \\ | |
235 | -\forall j, \mathlarger{\sum_{i}} x_{i,j} = 1 & \text{ At most one filter by stage} \\ | |
236 | -\mathcal{S}_0 = 0 & \text{ initial occupation}\\ | |
237 | -\forall j, \mathcal{S}_j = \mathcal{S}_{j-1} + \forall i, x_{i,j} \times \mathcal{A}_i \\%\label{cstr_size} | |
238 | -\mathcal{S} \leq \mathcal{S}_{\max} \\ | |
239 | -\mathcal{N}_0 = 0 & \text{ initial rejection}\\ | |
240 | -\forall j, \mathcal{N}_j = \mathcal{N}_{j-1} + \forall i, x_{i,j} \times \mathcal{R}_i \\%\label{cstr_rejection} | |
241 | -\mathcal{N}_q \geqslant 160 & \text{ an user's bound}\\ | |
242 | -\end{align*} | |
254 | +\begin{align} | |
255 | +1 \leq i \leq p & \nonumber\\ | |
256 | +1 \leq j \leq q & \text{ $q$ is the max of filter stage} \nonumber \\ | |
257 | +\forall j, \mathlarger{\sum_{i}} x_{i,j} = 1 & \text{ At most one filter by stage} \nonumber\\ | |
258 | +\mathcal{S}_0 = 0 & \text{ initial occupation} \nonumber\\ | |
259 | +\forall j, \mathcal{S}_j = \mathcal{S}_{j-1} + \forall i, x_{i,j} \times \mathcal{A}_i \label{cstr_size} \\ | |
260 | +\mathcal{S} \leq \mathcal{S}_{\max}\nonumber \\ | |
261 | +\mathcal{N}_0 = 0 & \text{ initial rejection}\nonumber\\ | |
262 | +\forall j, \mathcal{N}_j = \mathcal{N}_{j-1} + \forall i, x_{i,j} \times \mathcal{R}_i \label{cstr_rejection} \\ | |
263 | +\mathcal{N}_q \geqslant 160 & \text{ an user defined bound}\nonumber\\ | |
264 | +& \text{ (e.g. 160~dB here)}\nonumber\\\nonumber | |
265 | +\end{align} | |
243 | 266 | \paragraph{Goal} |
244 | 267 | \begin{align*} |
245 | 268 | \min \mathcal{S}_q |
... | ... | @@ -293,12 +316,7 @@ |
293 | 316 | The coefficients of a single monolithic filter are computed as the impulse response |
294 | 317 | of the filter transfer function, and practically approximated by a multitude of methods |
295 | 318 | including least square optimization (Matlab's {\tt firls} function), Hamming or Kaiser windowing |
296 | -(Matlab's {\tt fir1} function). Cascading filters opens a new optimization opportunity by | |
297 | -selecting various coefficient sets depending on the number of coefficients. Fig. \ref{2} | |
298 | -illustrates that for a number of coefficients ranging from 8 to 47, {\tt fir1} provides a better | |
299 | -rejection than {\tt firls}: since the linear solver increases the number of coefficients along | |
300 | -the processing chain, the type of selected filter also changes depending on the number of coefficients | |
301 | -and evolves along the processing chain. | |
319 | +(Matlab's {\tt fir1} function). | |
302 | 320 | |
303 | 321 | \begin{figure}[h!tb] |
304 | 322 | \includegraphics[width=\linewidth]{images/fir1-vs-firls} |
... | ... | @@ -308,6 +326,13 @@ |
308 | 326 | \label{2} |
309 | 327 | \end{figure} |
310 | 328 | |
329 | +Cascading filters opens a new optimization opportunity by | |
330 | +selecting various coefficient sets depending on the number of coefficients. Fig. \ref{2} | |
331 | +illustrates that for a number of coefficients ranging from 8 to 47, {\tt fir1} provides a better | |
332 | +rejection than {\tt firls}: since the linear solver increases the number of coefficients along | |
333 | +the processing chain, the type of selected filter also changes depending on the number of coefficients | |
334 | +and evolves along the processing chain. | |
335 | + | |
311 | 336 | \section{Conclusion} |
312 | 337 | |
313 | 338 | We address the optimization problem of designing a low-pass filter chain in a Field Programmable Gate |
314 | 339 | |
... | ... | @@ -327,70 +352,10 @@ |
327 | 352 | The authors would like to thank E. Rubiola, F. Vernotte, G. Cabodevila for support and |
328 | 353 | fruitful discussions. |
329 | 354 | |
355 | +\bibliographystyle{IEEEtran} | |
356 | +\bibliography{references,biblio} | |
357 | +\end{document} | |
330 | 358 | |
331 | -XXX | |
332 | - | |
333 | - \subsubsection{Contraintes} | |
334 | - | |
335 | - Dans les r\'ef\'erences \cite{zhuo2007scalable, olariu1993computing, pan1999improved}, les auteurs | |
336 | - proposent tous des optimisations hardware uniquement. Cependant ces articles sont focalis\'es sur des optimisations mat\'erielles | |
337 | - or notre objectif est de trouver une formalisation math\'ematique d'un FPGA. | |
338 | - | |
339 | - Une dernière approche que nous avons \'etudi\'ee est l'utilisation de \emph{skeletons}. D. Crookes et A. Benkrid | |
340 | - ont beaucoup parl\'e de cette m\'ethode dans leur articles \cite{crookes1998environment, crookes2000design, benkrid2002towards}. | |
341 | - L'id\'ee essentielle est qu'ils r\'ealisent des composants très optimis\'es et param\'etrables. Ainsi lorsqu'ils | |
342 | - veulent faire un d\'eveloppement, ils utilisent les blocs d\'ejà faits. | |
343 | - | |
344 | - Ces blocs repr\'esentent une \'etape de calcul (une d\'ecimation, un filtrage, une modulation, une | |
345 | - d\'emodulation etc...). En prenant le cas du FIR, on rend param\'etrables les valeurs des coefficients | |
346 | - utilis\'es pour le produit de convolutions ainsi que leur nombre. Le facteur de d\'ecimation est | |
347 | - lui aussi param\'etrable. | |
348 | - | |
349 | - On gagne ainsi beaucoup de temps de d\'eveloppement car on r\'eutilise des composants d\'ejà \'eprouv\'es et optimis\'es. | |
350 | - De plus, au fil des projets, on constitue une bibliothèque de composants nous | |
351 | - permettant de faire une chaine complète très simplement. | |
352 | - | |
353 | - K. Benkrid, S. Belkacemi et A. Benkrid dans leur article\cite{hide} caract\'erisent | |
354 | - ces blocs en Prolog pour faire un langage descriptif permettant d'assembler les blocs de manière | |
355 | - optimale. En partant de cette description, ils arrivent à g\'en\'erer directement le code VHDL. | |
356 | - | |
357 | - \begin{itemize} | |
358 | - \item la latence du bloc repr\'esente, en coups d'horloge, le temps entre l'entr\'ee de la donn\'ee | |
359 | - et le temps où la même donn\'ee ressort du bloc. | |
360 | - \item l'acceptance repr\'esente le nombre de donn\'ees par coup d'horloge que le bloc est capable | |
361 | - de traiter. | |
362 | - \item la sortance repr\'esente le nombre de donn\'ees qui sortent par coup d'horloge. | |
363 | - \end{itemize} | |
364 | - | |
365 | - Gr\^ace à cela, le logiciel est capable de donner une impl\'ementation optimale d'un problème qu'on lui | |
366 | - soumet. Le problème ne se d\'efinit pas uniquement par un r\'esultat attendu mais aussi par des | |
367 | - contraintes de d\'ebit et/ou de pr\'ecision. | |
368 | - | |
369 | - Dans une second temps, nous nous sommes aussi int\'eress\'es à des articles d'ordonnancement. | |
370 | - Nous avons notamment lu des documents parlant des cas des micro-usines. | |
371 | - | |
372 | - Les micro-usines ressemblent un peu à des FPGA dans le sens où on connait à l'avance les | |
373 | - t\^aches à effectuer et leurs caract\'eristiques. Nous allons donc nous inspirer | |
374 | - de leur modèle pour essayer de construire le notre. | |
375 | - | |
376 | - Dans sa thèse A. Dobrila \cite{these-alex} traite d'un problème de tol\'erance aux pannes | |
377 | - dans le contextes des mirco-usines. Mais les FPGA ne sont pas concern\'es dans la mesure | |
378 | - où si le composant tombe en panne, tout le traitement est paralys\'e. Cette thèse nous a n\'eanmoins | |
379 | - permis d'avoir un exemple de formalisation de problème. | |
380 | - | |
381 | - Pour finir nous avons lu la thèse de M. Coqblin \cite{these-mathias} qui elle aussi traite du sujet | |
382 | - des micro-usines. Le travail de M. Coqblin porte surtout sur une chaine de traitement | |
383 | - reconfigurable, il tient compte dans ses travaux du surcoût engendr\'e par la reconfiguration d'une machine. | |
384 | - Cela n'est pas tout à fait exploitable dans notre contexte puisqu'une | |
385 | - puce FPGA d\'es qu'elle est programm\'ee n'a pas la possibilit\'e de reconfigurer une partie de sa chaine de | |
386 | - traitement. Là encore, nous avions un exemple de formalisation d'un problème. | |
387 | - | |
388 | - Pour conclure, nous avons vu deux approches li\'ees à deux domaines diff\'erents. La première est le | |
389 | - point de vue \'electronique qui se focalise principalement sur des optimisations mat\'erielles ou algorithmiques. | |
390 | - La seconde est le point de vue informatique : les modèles sont très g\'en\'eriques et ne sont pas | |
391 | - adapt\'es au cas des FPGA. La suite de ce rapport se concentrera donc sur la recherche d'un compromis | |
392 | - entre ces deux points de vue. | |
393 | - | |
394 | 359 | \section{Contexte d'ordonnancement} |
395 | 360 | Dans cette partie, nous donnerons des d\'efinitions de termes rattach\'es au domaine de l'ordonnancement |
396 | 361 | et nous verrons que le sujet trait\'e se rapproche beaucoup d'un problème d'ordonnancement. De ce fait |
... | ... | @@ -461,7 +426,6 @@ |
461 | 426 | des critères peut être très mauvaise pour un autre critère. De ce cas, il s'agira de trouver une solution qui permet |
462 | 427 | de faire le meilleur compromis entre tous les critères. |
463 | 428 | |
464 | - | |
465 | 429 | \subsection{Formalisation du problème} |
466 | 430 | \label{formalisation} |
467 | 431 | Maintenant que nous avons donn\'e le vocabulaire li\'e à l'ordonnancement, nous allons pouvoir essayer caract\'eriser |
... | ... | @@ -619,8 +583,4 @@ |
619 | 583 | Bien que les articles sur les skeletons, \cite{gwen-cogen}, \cite{skeleton} et \cite{hide}, nous aient donn\'e des indices sur une possible |
620 | 584 | mod\'elisation, ils \'etaient encore trop focalis\'es sur l'optimisation spatiale des blocs. Nous nous sommes donc inspir\'es de ces travaux |
621 | 585 | pour proposer notre modèle, en faisant abstraction des optimisations bas niveau. |
622 | - | |
623 | -\bibliographystyle{IEEEtran} | |
624 | -\bibliography{references,biblio} | |
625 | -\end{document} |