Commit 7c951bd35e5df9b0ee5b1c9c8683eaa0ea9478e6

Authored by Arthur HUGEAT
1 parent 4d905253d9
Exists in master

Typo + texte en noir.

Showing 1 changed file with 51 additions and 59 deletions Side-by-side Diff

ifcs2018_journal.tex
... ... @@ -120,10 +120,10 @@
120 120 signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but
121 121 the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language
122 122 (VHDL) level.
123   -{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument,
  123 +Since latency is not an issue in a openloop phase noise characterization instrument,
124 124 the large
125 125 numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter,
126   -is not considered as an issue as would be in a closed loop system.} % r2.4
  126 +is not considered as an issue as would be in a closed loop system.
127 127  
128 128 The coefficients are classically expressed as floating point values. However, this binary
129 129 number representation is not efficient for fast arithmetic computation by an FPGA. Instead,
... ... @@ -144,9 +144,9 @@
144 144 relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits
145 145 a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon
146 146 quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the
147   -taps become null, {\color{red}making the large number of coefficients irrelevant: processing
148   -resources % r1.1
149   -are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources
  147 +taps become null, making the large number of coefficients irrelevant: processing
  148 +resources
  149 +are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources
150 150 to reach a given rejection level, or maximizing out of band rejection for a given computational
151 151 resource, will drive the investigation on cascading filters designed with varying tap resolution
152 152 and tap length, as will be shown in the next section. Indeed, our development strategy closely
153 153  
... ... @@ -163,11 +163,11 @@
163 163 Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP)
164 164 chain obtained by assembling basic processing blocks, with hardware and manufacturer independence.
165 165 Achieving such a target requires defining an abstract model to represent some basic properties
166   -of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and
  166 +of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and
167 167 resource occupation. These abstract properties, not necessarily related to the detailed hardware
168 168 implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum
169 169 target, whether in terms of maximizing performance for a given arbitrary resource occupation, or
170   -minimizing resource occupation for a given perfomance. In our approach, the solution of the
  170 +minimizing resource occupation for a given performance. In our approach, the solution of the
171 171 solver is then synthesized using the dedicated tool provided by each platform manufacturer
172 172 to assess the validity of our abstract resource occupation indicator, and the result of running
173 173 the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize
174 174  
175 175  
176 176  
177 177  
... ... @@ -184,24 +184,23 @@
184 184  
185 185 Addressing only two operations allows for demonstrating the methodology but should not be
186 186 considered as a limitation of the framework which can be extended to assembling any number
187   -of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red}
  187 +of skeleton blocks as long as performance and resource occupation can be determined.
188 188 Hence,
189   -in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2
  189 +in this paper we will apply our methodology on simple DSP chains: a white noise input signal
190 190 is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s)
191   -14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been
  191 +14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been
192 192 digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance --
193 193 practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction
194 194 by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing,
195 195 allowing to assess either filter rejection for a given resource usage, or validating the rejection
196 196 when implementing a solution minimizing resource occupation.
197 197  
198   -{\color{red}
199   -The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3
  198 +The first step of our approach is to model the DSP chain. Since we aim at only optimizing
200 199 the filtering part of the signal processing chain, we have not included the PRN generator or the
201 200 ADC in the model: the input data size and rate are considered fixed and defined by the hardware.
202 201 The filtering can be done in two ways, either by considering a single monolithic FIR filter
203 202 requiring many coefficients to reach the targeted noise rejection ratio, or by
204   -cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.}
  203 +cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.
205 204  
206 205 After each filter we leave the possibility of shifting the filtered data to consume
207 206 less resources. Hence in the case of cascaded filter, we define a stage as a filter
208 207  
... ... @@ -245,12 +244,12 @@
245 244 transfer function.
246 245 Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag},
247 246 the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the
248   -bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration,
  247 +bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration,
249 248 we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\%
250 249 of the Nyquist frequency to the end of the band, as would be typically selected to prevent
251 250 aliasing before decimating the dataflow by 2. The method is however generalized to any filter
252   -shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid}
253   -as described below is indeed unique for each filter shape.}
  251 +shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid}
  252 +as described below is indeed unique for each filter shape.
254 253  
255 254 \begin{figure}
256 255 \begin{center}
257 256  
258 257  
259 258  
... ... @@ -290,17 +289,17 @@
290 289 \label{fig:fir_mag}
291 290 \end{figure}
292 291  
293   -In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics.
  292 +In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics.
294 293 % r2.7
295   -{\color{red}Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches
296   -overestimate the rejection capability of the filter.}
  294 +Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches
  295 +overestimate the rejection capability of the filter.
297 296 % Furthermore, the losses within
298 297 % the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients.
299 298 Our final criterion to compute the filter rejection considers
300 299 % r2.8 et r2.2 r2.3
301   -the {\color{red}minimal} rejection within the stopband, to which the {\color{red}sum of the absolute values
  300 +the minimal rejection within the stopband, to which the sum of the absolute values
302 301 within the passband is subtracted to avoid filters with excessive ripples, normalized to the
303   -bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this
  302 +bin width to remain consistent with the passband criterion (dBc/Hz units in all cases). With this
304 303 criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.
305 304  
306 305 % \begin{figure}
... ... @@ -313,8 +312,8 @@
313 312 \begin{figure}
314 313 \centering
315 314 \includegraphics[width=\linewidth]{images/colored_custom_criterion}
316   -\caption{Custom criterion (maximum rejection in the stopband minus the {\color{red} sum of the
317   -absolute values of the passband rejection normalized to the bandwidth})
  315 +\caption{Custom criterion (maximum rejection in the stopband minus the sum of the
  316 +absolute values of the passband rejection normalized to the bandwidth)
318 317 comparison between monolithic filter and cascaded filters}
319 318 \label{fig:custom_criterion}
320 319 \end{figure}
... ... @@ -330,9 +329,9 @@
330 329 \begin{figure}
331 330 \centering
332 331 \includegraphics[width=\linewidth]{images/rejection_pyramid}
333   -\caption{{\color{red}{Filter}} rejection as a function of number of coefficients and number of bits
334   -{\color{red}: this lookup table will be used to identify which filter parameters -- number of bits
335   -representing coefficients and number of coefficients -- best match the targeted transfer function.}}
  332 +\caption{Filter rejection as a function of number of coefficients and number of bits
  333 +: this lookup table will be used to identify which filter parameters -- number of bits
  334 +representing coefficients and number of coefficients -- best match the targeted transfer function.}
336 335 \label{fig:rejection_pyramid}
337 336 \end{figure}
338 337  
339 338  
340 339  
341 340  
342 341  
343 342  
344 343  
... ... @@ -346,32 +345,30 @@
346 345 Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved
347 346 with respect to a basic sum of the rejection criteria shown as a the dotted yellow line.
348 347 % r2.9
349   -Thus, estimating the rejection of filter cascades is more complex than {\color{red}taking} the sum of all the rejection
350   -criteria of each filter. However since the {\color{red}individual filter rejection} sum underestimates the rejection capability of the cascade,
  348 +Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection
  349 +criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade,
351 350 % r2.10
352   -this upper bound is considered as a {\color{red}conservative} and acceptable criterion for deciding on the suitability
  351 +this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability
353 352 of the filter cascade to meet design criteria.
354 353  
355 354 \begin{figure}
356 355 \centering
357 356 \includegraphics[width=\linewidth]{images/cascaded_criterion}
358   -\caption{{\color{red}Transfer function of individual filters and after cascading} the two filters,
359   -{\color{red}demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal
  357 +\caption{Transfer function of individual filters and after cascading the two filters,
  358 +demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal
360 359 lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop
361   -maximum of each individual filter.}
  360 +maximum of each individual filter.
362 361 }
363 362 \label{fig:sum_rejection}
364 363 \end{figure}
365 364  
366   -% r2.6
367   -{\color{red}
368 365 Finally in our case, we consider that the input signal are fully known. The
369 366 resolution of the input data stream are fixed and still the same for all experiments
370   -in this paper.}
  367 +in this paper.
371 368  
372 369 Based on this analysis, we address the estimate of resource consumption (called
373 370 % r2.11
374   -silicon area -- in the case of FPGAs {\color{red}this means} processing cells) as a function of
  371 +silicon area -- in the case of FPGAs this means processing cells) as a function of
375 372 filter characteristics. As a reminder, we do not aim at matching actual hardware
376 373 configuration but consider an arbitrary silicon area occupied by each processing function,
377 374 and will assess after synthesis the adequation of this arbitrary unit with actual
... ... @@ -415,7 +412,6 @@
415 412 $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$.
416 413 Finally, equation~\ref{eq:init} gives the number of bits of the global input.
417 414  
418   -{\color{red}
419 415 This model is non-linear since we multiply some variable with another variable
420 416 and it is even non-quadratic, as the cost function $F$ does not have a known
421 417 linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.
... ... @@ -437,7 +433,6 @@
437 433 for each configurations thanks to the rejection criterion. We also define the binary
438 434 variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
439 435 and 0 otherwise. The new equations are as follows:
440   -}
441 436  
442 437 \begin{align}
443 438 a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
... ... @@ -450,7 +445,6 @@
450 445 respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
451 446 Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
452 447  
453   -{\color{red}
454 448 % JM: conflict merge
455 449 % However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
456 450 % we multiply
... ... @@ -464,7 +458,7 @@
464 458 The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
465 459 we multiply
466 460 $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
467   -linearise linearize this multiplication. The following formula shows how to linearize
  461 +linearize this multiplication. The following formula shows how to linearize
468 462 this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$):
469 463 \begin{equation*}
470 464 m = x \times y \implies
... ... @@ -481,7 +475,7 @@
481 475 assumed on hardware characteristics,
482 476 the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize
483 477 for us the quadratic problem so the model is left as is. This model
484   -has $O(np)$ variables and $O(n)$ constraints.}
  478 +has $O(np)$ variables and $O(n)$ constraints.
485 479  
486 480 % This model is non-linear and even non-quadratic, as $F$ does not have a known
487 481 % linear or quadratic expression. We introduce $p$ FIR configurations
... ... @@ -515,7 +509,7 @@
515 509  
516 510 Two problems will be addressed using the workflow described in the next section: on the one
517 511 hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary
518   -silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area
  512 +silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area
519 513 for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the
520 514 objective function is replaced with:
521 515 \begin{align}
... ... @@ -560,8 +554,8 @@
560 554 \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
561 555 \draw[->] (Postproc) -- (Results) ;
562 556 \end{tikzpicture}
563   - \caption{Design workflow from the input parameters to the results {\color{red} allowing for
564   -a fully automated optimal solution search.}}
  557 + \caption{Design workflow from the input parameters to the results allowing for
  558 +a fully automated optimal solution search.}
565 559 \label{fig:workflow}
566 560 \end{figure}
567 561  
568 562  
569 563  
570 564  
... ... @@ -739,25 +733,25 @@
739 733 \centering
740 734 \begin{subfigure}{\linewidth}
741 735 \includegraphics[width=\linewidth]{images/max_500}
742   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  736 + \caption{Filter transfer functions for varying number of cascaded filters solving
743 737 the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).}
744 738 \label{fig:max_500_result}
745 739 \end{subfigure}
746 740  
747 741 \begin{subfigure}{\linewidth}
748 742 \includegraphics[width=\linewidth]{images/max_1000}
749   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  743 + \caption{Filter transfer functions for varying number of cascaded filters solving
750 744 the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).}
751 745 \label{fig:max_1000_result}
752 746 \end{subfigure}
753 747  
754 748 \begin{subfigure}{\linewidth}
755 749 \includegraphics[width=\linewidth]{images/max_1500}
756   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  750 + \caption{Filter transfer functions for varying number of cascaded filters solving
757 751 the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).}
758 752 \label{fig:max_1500_result}
759 753 \end{subfigure}
760   - \caption{\color{red}Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing
  754 + \caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing
761 755 rejection for a given resource allocation.
762 756 The filter shape constraint (bandpass and bandstop) is shown as thick
763 757 horizontal lines on each chart.}
... ... @@ -782,8 +776,8 @@
782 776 Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication.
783 777  
784 778 \begin{table}[h!tb]
785   - \caption{Resource occupation {\color{red}following synthesis of the solutions found for
786   -the problem of maximizing rejection for a given resource allocation}. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
  779 + \caption{Resource occupation following synthesis of the solutions found for
  780 +the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
787 781 \label{tbl:resources_usage}
788 782 \centering
789 783 \begin{tabular}{|c|c|ccc|c|}
... ... @@ -816,7 +810,7 @@
816 810 However, a rough estimation can be made with a simple equivalence: looking at
817 811 the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
818 812 we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
819   -area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs,
  813 +area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs,
820 814 1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond
821 815 to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
822 816 unit map well to actual hardware resources. The relatively small differences can probably be explained
... ... @@ -944,7 +938,7 @@
944 938 level or even better thanks to our underestimate of the cascade rejection as the sum of the
945 939 individual filter rejection. The only exception is for the monolithic case ($n = 1$) in
946 940 MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection.
947   -Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters
  941 +Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters
948 942 (1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection
949 943 respectively). More generally, the more filters are cascaded, the lower the occupied area.
950 944  
951 945  
952 946  
953 947  
954 948  
... ... @@ -1001,32 +995,32 @@
1001 995 \centering
1002 996 \begin{subfigure}{\linewidth}
1003 997 \includegraphics[width=.91\linewidth]{images/min_40}
1004   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  998 + \caption{Filter transfer functions for varying number of cascaded filters solving
1005 999 the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.}
1006 1000 \label{fig:min_40}
1007 1001 \end{subfigure}
1008 1002  
1009 1003 \begin{subfigure}{\linewidth}
1010 1004 \includegraphics[width=.91\linewidth]{images/min_60}
1011   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  1005 + \caption{Filter transfer functions for varying number of cascaded filters solving
1012 1006 the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.}
1013 1007 \label{fig:min_60}
1014 1008 \end{subfigure}
1015 1009  
1016 1010 \begin{subfigure}{\linewidth}
1017 1011 \includegraphics[width=.91\linewidth]{images/min_80}
1018   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  1012 + \caption{Filter transfer functions for varying number of cascaded filters solving
1019 1013 the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.}
1020 1014 \label{fig:min_80}
1021 1015 \end{subfigure}
1022 1016  
1023 1017 \begin{subfigure}{\linewidth}
1024 1018 \includegraphics[width=.91\linewidth]{images/min_100}
1025   - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving
  1019 + \caption{Filter transfer functions for varying number of cascaded filters solving
1026 1020 the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.}
1027 1021 \label{fig:min_100}
1028 1022 \end{subfigure}
1029   - \caption{\color{red}Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a
  1023 + \caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a
1030 1024 given rejection while minimizing resource allocation. The filter shape constraint (bandpass and
1031 1025 bandstop) is shown as thick
1032 1026 horizontal lines on each chart.}
... ... @@ -1104,7 +1098,6 @@
1104 1098 compared to 3~days in the previous section: this problem is more easily solved than the
1105 1099 previous one.
1106 1100  
1107   -{\color{red} % r1.4
1108 1101 To conclude, we compare our monolithic filters with the FIR Compiler provided by
1109 1102 Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the
1110 1103 same coefficient set and we compare the resource consumption, having checked that
... ... @@ -1137,7 +1130,6 @@
1137 1130 \end{tabular}
1138 1131 \end{table}
1139 1132 \renewcommand{\arraystretch}{1}
1140   -}
1141 1133  
1142 1134 \section{Conclusion}
1143 1135