Commit 27f5f41088d7ee4bb8c9bad2091378acee58cd0e
1 parent
b0ed3be3ee
Exists in
master
Article étendu.
Showing 3 changed files with 679 additions and 4 deletions Side-by-side Diff
.gitignore
Makefile
| ... | ... | @@ -4,7 +4,7 @@ |
| 4 | 4 | BIB = bibtex |
| 5 | 5 | TARGET = ifcs2018 |
| 6 | 6 | |
| 7 | -all: $(TARGET)_abstract $(TARGET)_poster $(TARGET)_proceeding | |
| 7 | +all: $(TARGET)_abstract $(TARGET)_poster $(TARGET)_proceeding $(TARGET)_journal | |
| 8 | 8 | |
| 9 | 9 | view: $(TARGET) |
| 10 | 10 | evince $(TARGET).pdf |
| 11 | 11 | |
| 12 | 12 | |
| 13 | 13 | |
| ... | ... | @@ -28,11 +28,18 @@ |
| 28 | 28 | $(TEX) $@.tex |
| 29 | 29 | $(TEX) $@.tex |
| 30 | 30 | |
| 31 | +$(TARGET)_journal: $(TARGET)_journal.tex references.bib biblio.bib | |
| 32 | + $(TEX) $@.tex | |
| 33 | + $(BIB) $@ | |
| 34 | + $(TEX) $@.tex | |
| 35 | + $(TEX) $@.tex | |
| 36 | + | |
| 31 | 37 | clean: |
| 32 | - rm -f $(TARGET).aux $(TARGET).log $(TARGET).out $(TARGET).bbl $(TARGET).blg | |
| 33 | - rm -f $(TARGET)_proceeding.aux $(TARGET)_proceeding.log $(TARGET)_proceeding.out $(TARGET)_proceeding.bbl $(TARGET)_proceeding.blg | |
| 38 | + rm -f $(TARGET)_abstract.aux $(TARGET)_abstract.log $(TARGET)_abstract.out $(TARGET)_abstract.bbl $(TARGET)_abstract.blg | |
| 34 | 39 | rm -f $(TARGET)_poster.aux $(TARGET)_poster.log $(TARGET)_poster.out |
| 40 | + rm -f $(TARGET)_proceeding.aux $(TARGET)_proceeding.log $(TARGET)_proceeding.out $(TARGET)_proceeding.bbl $(TARGET)_proceeding.blg | |
| 41 | + rm -f $(TARGET)_journal.aux $(TARGET)_journal.log $(TARGET)_journal.out $(TARGET)_journal.bbl $(TARGET)_journal.blg | |
| 35 | 42 | |
| 36 | 43 | mrproper: clean |
| 37 | - rm -f $(TARGET)_abstract.pdf $(TARGET)_proceeding.pdf $(TARGET)_poster.pdf | |
| 44 | + rm -f $(TARGET)_abstract.pdf $(TARGET)_proceeding.pdf $(TARGET)_poster.pdf $(TARGET)_journal.pdf |
ifcs2018_journal.tex
| 1 | +% JMF : revoir l'abstract : on y avait mis le Zynq7010 de la redpitaya en montrant | |
| 2 | +% comment optimiser les perfs a surface finie. Ici aussi on tombait dans le cas ou` | |
| 3 | +% la solution a 1 seul FIR n'etait simplement pas synthetisable => fusionner les deux | |
| 4 | +% contributions pour le papier TUFFC | |
| 5 | + | |
| 6 | +\documentclass[a4paper,conference]{IEEEtran/IEEEtran} | |
| 7 | +\usepackage{graphicx,color,hyperref} | |
| 8 | +\usepackage{amsfonts} | |
| 9 | +\usepackage{amsthm} | |
| 10 | +\usepackage{amssymb} | |
| 11 | +\usepackage{amsmath} | |
| 12 | +\usepackage{algorithm2e} | |
| 13 | +\usepackage{url,balance} | |
| 14 | +\usepackage[normalem]{ulem} | |
| 15 | +% correct bad hyphenation here | |
| 16 | +\hyphenation{op-tical net-works semi-conduc-tor} | |
| 17 | +\textheight=26cm | |
| 18 | +\setlength{\footskip}{30pt} | |
| 19 | +\pagenumbering{gobble} | |
| 20 | +\begin{document} | |
| 21 | +\title{Filter optimization for real time digital processing of radiofrequency signals: application | |
| 22 | +to oscillator metrology} | |
| 23 | + | |
| 24 | +\author{\IEEEauthorblockN{A. Hugeat\IEEEauthorrefmark{1}\IEEEauthorrefmark{2}, J. Bernard\IEEEauthorrefmark{2}, | |
| 25 | +G. Goavec-M\'erou\IEEEauthorrefmark{1}, | |
| 26 | +P.-Y. Bourgeois\IEEEauthorrefmark{1}, J.-M. Friedt\IEEEauthorrefmark{1}} | |
| 27 | +\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, Time \& Frequency department, Besan\c con, France } | |
| 28 | +\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Computer Science department DISC, Besan\c con, France \\ | |
| 29 | +Email: \{pyb2,jmfriedt\}@femto-st.fr} | |
| 30 | +} | |
| 31 | +\maketitle | |
| 32 | +\thispagestyle{plain} | |
| 33 | +\pagestyle{plain} | |
| 34 | +\newtheorem{definition}{Definition} | |
| 35 | + | |
| 36 | +\begin{abstract} | |
| 37 | +Software Defined Radio (SDR) provides stability, flexibility and reconfigurability to | |
| 38 | +radiofrequency signal processing. Applied to oscillator characterization in the context | |
| 39 | +of ultrastable clocks, stringent filtering requirements are defined by spurious signal or | |
| 40 | +noise rejection needs. Since real time radiofrequency processing must be performed in a | |
| 41 | +Field Programmable Array to meet timing constraints, we investigate optimization strategies | |
| 42 | +to design filters meeting rejection characteristics while limiting the hardware resources | |
| 43 | +required and keeping timing constraints within the targeted measurement bandwidths. | |
| 44 | +\end{abstract} | |
| 45 | + | |
| 46 | +\begin{IEEEkeywords} | |
| 47 | +Software Defined Radio, Mixed-Integer Linear Programming, Finite Impulse Response filter | |
| 48 | +\end{IEEEkeywords} | |
| 49 | + | |
| 50 | +\section{Digital signal processing of ultrastable clock signals} | |
| 51 | + | |
| 52 | +Analog oscillator phase noise characteristics are classically performed by downconverting | |
| 53 | +the radiofrequency signal using a saturated mixer to bring the radiofrequency signal to baseband, | |
| 54 | +followed by a Fourier analysis of the beat signal to analyze phase fluctuations close to carrier. In | |
| 55 | +a fully digital approach, the radiofrequency signal is digitized and numerically downconverted by | |
| 56 | +multiplying the samples with a local numerically controlled oscillator (Fig. \ref{schema}) \cite{rsi}. | |
| 57 | + | |
| 58 | +\begin{figure}[h!tb] | |
| 59 | +\begin{center} | |
| 60 | +\includegraphics[width=.8\linewidth]{images/schema} | |
| 61 | +\end{center} | |
| 62 | +\caption{Fully digital oscillator phase noise characterization: the Device Under Test | |
| 63 | +(DUT) signal is sampled by the radiofrequency grade Analog to Digital Converter (ADC) and | |
| 64 | +downconverted by mixing with a Numerically Controlled Oscillator (NCO). Unwanted signals | |
| 65 | +and noise aliases are rejected by a Low Pass Filter (LPF) implemented as a cascade of Finite | |
| 66 | +Impulse Response (FIR) filters. The signal is then decimated before a Fourier analysis displays | |
| 67 | +the spectral characteristics of the phase fluctuations.} | |
| 68 | +\label{schema} | |
| 69 | +\end{figure} | |
| 70 | + | |
| 71 | +As with the analog mixer, | |
| 72 | +the non-linear behavior of the downconverter introduces noise or spurious signal aliasing as | |
| 73 | +well as the generation of the frequency sum signal in addition to the frequency difference. | |
| 74 | +These unwanted spectral characteristics must be rejected before decimating the data stream | |
| 75 | +for the phase noise spectral characterization \cite{andrich2018high}. The characteristics introduced between the | |
| 76 | +downconverter | |
| 77 | +and the decimation processing blocks are core characteristics of an oscillator characterization | |
| 78 | +system, and must reject out-of-band signals below the targeted phase noise -- typically in the | |
| 79 | +sub -170~dBc/Hz for ultrastable oscillator we aim at characterizing. The filter blocks will | |
| 80 | +use most resources of the Field Programmable Gate Array (FPGA) used to process the radiofrequency | |
| 81 | +datastream: optimizing the performance of the filter while reducing the needed resources is | |
| 82 | +hence tackled in a systematic approach using optimization techniques. Most significantly, we | |
| 83 | +tackle the issue by attempting to cascade multiple Finite Impulse Response (FIR) filters with | |
| 84 | +tunable number of coefficients and tunable number of bits representing the coefficients and the | |
| 85 | +data being processed. | |
| 86 | + | |
| 87 | +\section{Finite impulse response filter} | |
| 88 | + | |
| 89 | +We select FIR filter for their unconditional stability and ease of design. A FIR filter is defined | |
| 90 | +by a set of weights $b_k$ applied to the inputs $x_k$ through a convolution to generate the | |
| 91 | +outputs $y_k$ | |
| 92 | +$$y_n=\sum_{k=0}^N b_k x_{n-k}$$ | |
| 93 | + | |
| 94 | +As opposed to an implementation on a general purpose processor in which word size is defined by the | |
| 95 | +processor architecture, implementing such a filter on an FPGA offer more degrees of freedom since | |
| 96 | +not only the coefficient values and number of taps must be defined, but also the number of bits | |
| 97 | +defining the coefficients and the sample size. For this reason, and because we consider pipeline | |
| 98 | +processing (as opposed to First-In, First-Out FIFO memory batch processing) of radiofrequency | |
| 99 | +signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but | |
| 100 | +the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level. | |
| 101 | +Since latency is not an issue in a openloop phase noise characterization instrument, the large | |
| 102 | +numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, | |
| 103 | +is not considered as an issue as would be in a closed loop system. | |
| 104 | + | |
| 105 | +The coefficients are classically expressed as floating point values. However, this binary | |
| 106 | +number representation is not efficient for fast arithmetic computation by an FPGA. Instead, | |
| 107 | +we select to quantify these floating point values into integer values. This quantization | |
| 108 | +will result in some precision loss. | |
| 109 | + | |
| 110 | +%As illustrated in Fig. \ref{float_vs_int}, we see that we aren't | |
| 111 | +%need too coefficients or too sample size. If we have lot of coefficients but a small sample size, | |
| 112 | +%the first and last are equal to zero. But if we have too sample size for few coefficients that not improve the quality. | |
| 113 | + | |
| 114 | +% JMF je ne comprends pas la derniere phrase ci-dessus ni la figure ci dessous | |
| 115 | +% AH en gros je voulais dire que prendre trop peu de bit avec trop de coeff, ça induit ta figure (bien mieux faite que moi) | |
| 116 | +% et que l'inverse trop de bit sur pas assez de coeff on ne gagne rien, je vais essayer de la reformuler | |
| 117 | + | |
| 118 | +%\begin{figure}[h!tb] | |
| 119 | +%\includegraphics[width=\linewidth]{images/float-vs-integer.pdf} | |
| 120 | +%\caption{Impact of the quantization resolution of the coefficients} | |
| 121 | +%\label{float_vs_int} | |
| 122 | +%\end{figure} | |
| 123 | + | |
| 124 | +\begin{figure}[h!tb] | |
| 125 | +\includegraphics[width=\linewidth]{images/demo_filtre} | |
| 126 | +\caption{Impact of the quantization resolution of the coefficients: the quantization is | |
| 127 | +set to 6~bits -- with the horizontal black lines indicating $\pm$1 least significant bit -- setting | |
| 128 | +the 30~first and 30~last coefficients out of the initial 128~band-pass | |
| 129 | +filter coefficients to 0 (red dots).} | |
| 130 | +\label{float_vs_int} | |
| 131 | +\end{figure} | |
| 132 | + | |
| 133 | +The tradeoff between quantization resolution and number of coefficients when considering | |
| 134 | +integer operations is not trivial. As an illustration of the issue related to the | |
| 135 | +relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits | |
| 136 | +a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon | |
| 137 | +quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the | |
| 138 | +taps become null, making the large number of coefficients irrelevant and allowing to save | |
| 139 | +processing resource by shrinking the filter length. This tradeoff aimed at minimizing resources | |
| 140 | +to reach a given rejection level, or maximizing out of band rejection for a given computational | |
| 141 | +resource, will drive the investigation on cascading filters designed with varying tap resolution | |
| 142 | +and tap length, as will be shown in the next section. Indeed, our development strategy closely | |
| 143 | +follows the skeleton approach \cite{crookes1998environment, crookes2000design, benkrid2002towards} | |
| 144 | +in which basic blocks are defined and characterized before being assembled \cite{hide} | |
| 145 | +in a complete processing chain. In our case, assembling the filter blocks is a simpler block | |
| 146 | +combination process since we assume a single value to be processed and a single value to be | |
| 147 | +generated at each clock cycle. The FIR filters will not be considered to decimate in the | |
| 148 | +current implementation: the decimation is assumed to be located after the FIR cascade at the | |
| 149 | +moment. | |
| 150 | + | |
| 151 | +\section{Filter optimization} | |
| 152 | + | |
| 153 | +A basic approach for implementing the FIR filter is to compute the transfer function of | |
| 154 | +a monolithic filter: this single filter defines all coefficients with the same resolution | |
| 155 | +(number of bits) and processes data represented with their own resolution. Meeting the | |
| 156 | +filter shape requires a large number of coefficients, limited by resources of the FPGA since | |
| 157 | +this filter must process data stream at the radiofrequency sampling rate after the mixer. | |
| 158 | + | |
| 159 | +An optimization problem \cite{leung2004handbook} aims at improving one or many | |
| 160 | +performance criteria within a constrained resource environment. Amongst the tools | |
| 161 | +developed to meet this aim, Mixed-Integer Linear Programming (MILP) provides the framework to | |
| 162 | +formally define the stated problem and search for an optimal use of available | |
| 163 | +resources \cite{yu2007design, kodek1980design}. | |
| 164 | + | |
| 165 | +First we need to ensure that our problem is a real optimization problem. When | |
| 166 | +designing a processing function in the FPGA, we aim at meeting some requirement such as | |
| 167 | +the throughput, the computation time or the noise rejection noise. However, due to limited | |
| 168 | +resources to design the process like BRAM (high performance RAM), DSP (Digital Signal Processor) | |
| 169 | +or LUT (Look Up Table), a tradeoff must be generally searched between performance and available | |
| 170 | +computational resources: optimizing some criteria within finite, limited | |
| 171 | +resources indeed matches the definition of a classical optimization problem. | |
| 172 | + | |
| 173 | +Specifically the degrees of freedom when addressing the problem of replacing the single monolithic | |
| 174 | +FIR with a cascade of optimized filters are the number of coefficients $N_i$ of each filter $i$, | |
| 175 | +the number of bits $C_i$ representing the coefficients and the number of bits $D_i$ needed to represent | |
| 176 | +the data $x_k$ fed to each filter as provided by the acquisition or previous processing stage. | |
| 177 | +Because each FIR in the chain is fed the output of the previous stage, | |
| 178 | +the optimization of the complete processing chain within a constrained resource environment is not | |
| 179 | +trivial. The resource occupation of a FIR filter is considered as $C_i \times N_i$ which aims | |
| 180 | +at approximating the number of bits needed in a worst case condition to represent the output of the | |
| 181 | +FIR. Indeed, the number of bits generated by the $i$th FIR is $(C_i+D_i)\times\log_2(N_i)$, but the | |
| 182 | +$\log$ function is avoided for its incompatibility with a linear programming description, and | |
| 183 | +the simple product is approximated as the number of gates needed to perform the calculation. Such an | |
| 184 | +occupied area estimate assumes that the number of gates scales as the number of bits and the number | |
| 185 | +of coefficients, but does not account for the detailed implementation of the hardware. Indeed, | |
| 186 | +various FPGA implementations will provide different hardware functionalities, and we shall consider | |
| 187 | +at the end of the design a synthesis step using vendor software to assess the validity of the solution | |
| 188 | +found. As an example of the limitation linked to the lack of detailed hardware consideration, Block Random | |
| 189 | +Access Memory (BRAM) used to store filter coefficients are not shared amongst filters, and multiplications | |
| 190 | +are most efficiently implemented by using DSP blocks whose input word | |
| 191 | +size is finite. DSPs are a scarce resource to be saved in a practical implementation. Keeping a high | |
| 192 | +abstraction on the resource occupation is nevertheless selected in the following discussion in order | |
| 193 | +to leave enough degrees of freedom in the problem to try and find original solutions: too many | |
| 194 | +constraints in the initial statement of the problem leave little room for finding an optimal solution. | |
| 195 | + | |
| 196 | +\begin{figure}[h!tb] | |
| 197 | +\begin{center} | |
| 198 | +\includegraphics[width=.5\linewidth]{schema2} | |
| 199 | +\caption{Shape of the filter transmitted power $P$ as a function of frequency: | |
| 200 | +the bandpass BP is considered to occupy the initial | |
| 201 | +40\% of the Nyquist frequency range, the stopband the last 40\%, allowing 20\% transition | |
| 202 | +width.} | |
| 203 | +\label{rejection-shape} | |
| 204 | +\end{center} | |
| 205 | +\end{figure} | |
| 206 | + | |
| 207 | +Following these considerations, the model is expressed as: | |
| 208 | +\begin{align} | |
| 209 | + \begin{cases} | |
| 210 | + \mathcal{R}_i &= \mathcal{F}(N_i, C_i)\\ | |
| 211 | + \mathcal{A}_i &= N_i \times C_i\\ | |
| 212 | + \Delta_i &= \Delta _{i-1} + \mathcal{P}_i | |
| 213 | + \end{cases} | |
| 214 | + \label{model-FIR} | |
| 215 | +\end{align} | |
| 216 | +To explain the system \ref{model-FIR}, $\mathcal{R}_i$ represents the stopband rejection dependence with $N_i$ and $C_i$, $\mathcal{A}_i$ | |
| 217 | +is a theoretical area occupation of the processing block on the FPGA as discussed earlier, and $\Delta_i$ is the total rejection for the current stage $i$. | |
| 218 | +Since the function $\mathcal{F}$ cannot be explictly expressed, we run simulations to determine the rejection depending | |
| 219 | +on $N_i$ and $C_i$. However, selecting the right filter requires a clear definition of the rejection criterion. Selecting an | |
| 220 | +incorrect criterion will lead the linear program solver to produce a solution which might not meet the user requirements. | |
| 221 | +Hence, amongst various criteria including the mean or median value of the FIR response in the stopband as will | |
| 222 | +be illustrated lated (section \ref{median}), we have designed | |
| 223 | +a criterion aimed at avoiding ripples in the passband and considering the maximum of the FIR spectral response in the stopband | |
| 224 | +(Fig. \ref{rejection-shape}). The bandpass criterion is defined as the sum of the absolute values of the spectral response | |
| 225 | +in the bandpass, reminiscent of a standard deviation of the spectral response: this criterion must be minimized to avoid | |
| 226 | +ripples in the passband. The stopband transfer function maximum must also be minimized in order to improve the filter | |
| 227 | +rejection capability. Weighing these two criteria allows designing the linear program to be solved. | |
| 228 | + | |
| 229 | +\begin{figure}[h!tb] | |
| 230 | +\includegraphics[width=\linewidth]{images/noise-rejection.pdf} | |
| 231 | +\caption{Rejection as a function of number of coefficients and number of bits} | |
| 232 | +\label{noise-rejection} | |
| 233 | +\end{figure} | |
| 234 | + | |
| 235 | +The objective function maximizes the noise rejection ($\max(\Delta_{i_{\max}})$) while keeping resource | |
| 236 | +occupation below a user-defined threshold, or as will be discussed here, aims at minimizing the area | |
| 237 | +needed to reach a given rejection ($\min(S_q)$ in the forthcoming discussion, Eqs. \ref{cstr_size} | |
| 238 | +and \ref{cstr_rejection}). The MILP solver is allowed to choose the number of successive | |
| 239 | +filters, within an upper bound. The last problem is to model the noise rejection. Since filter | |
| 240 | +noise rejection capability is not modeled with linear equations, a look-up-table is generated | |
| 241 | +for multiple filter configurations in which the $C_i$, $D_i$ and $N_i$ parameters are varied: for each | |
| 242 | +one of these conditions, the low-pass filter rejection is stored as computed by the frequency response | |
| 243 | +of the digital filter (Fig. \ref{noise-rejection}). Various rejection criteria have been investigated, | |
| 244 | +including mean value of the stopband response, median value of the stopband response, or as finally | |
| 245 | +selected, maximum value in the stopband. An intuitive analysis of the chart of Fig. \ref{noise-rejection} | |
| 246 | +hints at an optimum | |
| 247 | +set of tap length and number of bit for representing the coefficients along the line of the pyramidal | |
| 248 | +shaped rejection capability function. | |
| 249 | + | |
| 250 | +Linear program formalism for solving the problem is well documented: an objective function is | |
| 251 | +defined which is linearly dependent on the parameters to be optimized. Constraints are expressed | |
| 252 | +as linear equations and solved using one of the available solvers, in our case GLPK\cite{glpk}. | |
| 253 | +With the notations used in the description of system \ref{model-FIR}, we have defined the linear problem as: | |
| 254 | +\paragraph{Variables} | |
| 255 | +\begin{align*} | |
| 256 | +x_{i,j} \in \lbrace 0,1 \rbrace & \text{ $i$ is a given filter} \\ | |
| 257 | +& \text{ $j$ is the stage} \\ | |
| 258 | +& \text{ If $x_{i,j}$ is equal to 1, the filter is selected} \\ | |
| 259 | +\end{align*} | |
| 260 | +\paragraph{Constants} | |
| 261 | +\begin{align*} | |
| 262 | +\mathcal{F} = \lbrace F_1 ... F_p \rbrace & \text{ All possible filters}\\ | |
| 263 | +& \text{ $p$ is the number of different filters} \\ | |
| 264 | +% N(i) & \text{ % Constant to let the | |
| 265 | +% number of coefficients %} \\ & \text{ | |
| 266 | +% for filter $i$}\\ | |
| 267 | +% C(i) & \text{ % Constant to let the | |
| 268 | +% number of bits of %}\\ & \text{ | |
| 269 | +% each coefficient for filter $i$}\\ | |
| 270 | +\mathcal{S}_{\max} & \text{ Total space available inside the FPGA} | |
| 271 | +\end{align*} | |
| 272 | +\paragraph{Constraints} | |
| 273 | +\begin{align} | |
| 274 | +1 \leq i \leq p & \nonumber\\ | |
| 275 | +1 \leq j \leq q & \text{ $q$ is the max of filter stage} \nonumber \\ | |
| 276 | +\forall j, \mathlarger{\sum_{i}} x_{i,j} = 1 & \text{ At most one filter by stage} \nonumber\\ | |
| 277 | +\mathcal{S}_0 = 0 & \text{ initial occupation} \nonumber\\ | |
| 278 | +\forall j, \mathcal{S}_j = \mathcal{S}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{A}_i)} \label{cstr_size} \\ | |
| 279 | +\mathcal{S}_j \leq \mathcal{S}_{\max}\nonumber \\ | |
| 280 | +\mathcal{N}_0 = 0 & \text{ initial rejection}\nonumber\\ | |
| 281 | +\forall j, \mathcal{N}_j = \mathcal{N}_{j-1} + \mathlarger{\sum_i (x_{i,j} \times \mathcal{R}_i)} \label{cstr_rejection} \\ | |
| 282 | +\mathcal{N}_q \geqslant 160 & \text{ an user defined bound}\nonumber\\ | |
| 283 | +& \text{ (e.g. 160~dB here)}\nonumber\\\nonumber | |
| 284 | +\end{align} | |
| 285 | +\paragraph{Goal} | |
| 286 | +\begin{align*} | |
| 287 | +\min \mathcal{S}_q | |
| 288 | +\end{align*} | |
| 289 | + | |
| 290 | +The constraint \ref{cstr_size} means the occupation for the current stage $j$ depends on | |
| 291 | +the previous occupation and the occupation of current selected filter (it is possible | |
| 292 | +that no filter is selected for this stage). And the second one \ref{cstr_rejection} | |
| 293 | +means the same thing but for the rejection, the rejection depends the previous rejection | |
| 294 | +plus the rejection of selected filter. | |
| 295 | + | |
| 296 | +\subsection{Low bandpass ripple and maximum rejection criteria} | |
| 297 | + | |
| 298 | +The MILP solver provides a solution to the problem by selecting a series of small FIR with | |
| 299 | +increasing number of bits representing data and coefficients as well as an increasing number | |
| 300 | +of coefficients, instead of a single monolithic filter. | |
| 301 | + | |
| 302 | +\begin{figure}[h!tb] | |
| 303 | +% \includegraphics[width=\linewidth]{images/compare-fir.pdf} | |
| 304 | +\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-jmf-light.pdf} | |
| 305 | +\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |
| 306 | +with a cutoff frequency set at half the Nyquist frequency.} | |
| 307 | +\label{compare-fir} | |
| 308 | +\end{figure} | |
| 309 | + | |
| 310 | +Fig. \ref{compare-fir} exhibits the | |
| 311 | +performance comparison between one solution and a monolithic FIR when selecting a cutoff | |
| 312 | +frequency of half the Nyquist frequency: a series of 5 FIR and a series of 10 FIR with the | |
| 313 | +same space usage are provided as selected by the MILP solver. The FIR cascade provides improved | |
| 314 | +rejection than the monolithic FIR at the expense of a lower cutoff frequency which remains to | |
| 315 | +be tuned or compensated for. | |
| 316 | + | |
| 317 | + | |
| 318 | +The resource occupation when synthesizing such FIR on a Xilinx FPGA is summarized as Tab. \ref{t1}. | |
| 319 | +We have considered a set of resources representative of the hardware platform we work on, | |
| 320 | +Avnet's Zedboard featuring a Xilinx XC7Z020-CLG484-1 Zynq System on Chip (SoC). The results reported in | |
| 321 | +Tab. \ref{t1} emphasize that implementing the monolithic single FIR is impossible due to | |
| 322 | +the insufficient hardware resources (exhausted LUT resources), while the FIR cascading 5 or 10 | |
| 323 | +filters fit in the available resources. However, in all cases the DSP resources are fully | |
| 324 | +used: while the design can be synthesized using Xilinx proprietary Vivado 2016.2 software, | |
| 325 | +implementing the design fails due to the excessive resource usage preventing routing the signals | |
| 326 | +on the FPGA. Such results emphasize on the one hand the improvement prospect of the optimization | |
| 327 | +procedure by finding non-trivial solutions matching resource constraints, but on the other | |
| 328 | +hand also illustrates the limitation of a model with an abstraction layer that does not account | |
| 329 | +for the detailed architecture of the hardware. | |
| 330 | + | |
| 331 | +\begin{table}[h!tb] | |
| 332 | +\caption{Resource occupation on a Xilinx Zynq-7000 series FPGA when synthesizing the FIR cascade | |
| 333 | +identified as optimal by the MILP solver within a finite resource criterion. The last line refers | |
| 334 | +to available resources on a Zynq-7020 as found on the Zedboard.} | |
| 335 | +\begin{center} | |
| 336 | +\begin{tabular}{|c|cccc|}\hline | |
| 337 | +FIR & BlockRAM & LookUpTables & DSP & rejection (dB)\\\hline\hline | |
| 338 | +1 (monolithic) & 1 & 76183 & 220 & -162 \\ | |
| 339 | +5 & 5 & 18597 & 220 & -160 \\ | |
| 340 | +10 & 8 & 24729 & 220 & -161 \\\hline\hline | |
| 341 | +\textbf{Zynq 7020} & \textbf{420} & \textbf{53200} & \textbf{220} & \\\hline | |
| 342 | +%\begin{tabular}{|c|ccccc|}\hline | |
| 343 | +%FIR & BRAM36 & BRAM18 & LUT & DSP & rejection (dB)\\\hline\hline | |
| 344 | +%1 (monolithic) & 1 & 0 & {\color{Red}76183} & 220 & -162 \\ | |
| 345 | +%5 & 0 & 5 & {\color{Green}18597} & 220 & -160 \\ | |
| 346 | +%10 & 0 & 8 & {\color{Green}24729} & 220 & -161 \\\hline\hline | |
| 347 | +%\textbf{Zynq 7020} & \textbf{140} & \textbf{280} & \textbf{53200} & \textbf{220} & \\\hline | |
| 348 | +\end{tabular} | |
| 349 | +\end{center} | |
| 350 | +%\vspace{-0.7cm} | |
| 351 | +\label{t1} | |
| 352 | +\end{table} | |
| 353 | + | |
| 354 | +\subsection{Alternate criteria}\label{median} | |
| 355 | + | |
| 356 | +Fig. \ref{compare-fir} provides FIR solutions matching well the targeted transfer | |
| 357 | +function, namely low ripple in the bandpass defined as the first 40\% of the frequency | |
| 358 | +range and maximum rejection of 160~dB in the last 40\% stopband. We illustrate now, for | |
| 359 | +demonstrating the need to properly select the optimization criterion, two cases of poor | |
| 360 | +filter shapes obtained by selecting the mean value and median value of the rejection, | |
| 361 | +with no consideration for the ripples in the bandpass. The results of the optimizations, | |
| 362 | +in these cases, are shown in Figs. \ref{compare-mean} and \ref{compare-median}. | |
| 363 | + | |
| 364 | +\begin{figure}[h!tb] | |
| 365 | +\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-mean-light.pdf} | |
| 366 | +\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |
| 367 | +with a cutoff frequency set at half the Nyquist frequency.} | |
| 368 | +\label{compare-mean} | |
| 369 | +\end{figure} | |
| 370 | + | |
| 371 | +In the case of the mean value criterion (Fig. \ref{compare-mean}), the solution is not | |
| 372 | +acceptable since the notch at the end of the transition band compensates for some unacceptable | |
| 373 | +rise in the rejection close to the Nyquist frequency. Applying such a filter might yield excessive | |
| 374 | +high frequency spurious components to be aliased at low frequency when decimating the signal. | |
| 375 | +Similarly, the lack of criterion on the bandpass shape induces a shape with poor flatness and | |
| 376 | +and slowly decaying transfer function starting to attenuate spectral components well before the | |
| 377 | +transition band starts. Such issues are partly aleviated by replacing a mean rejection value with | |
| 378 | +a median rejection value (Fig. \ref{compare-median}) but solutions remain unacceptable for | |
| 379 | +the reasons stated previously and much poorer than those found with the maximum rejection criterion | |
| 380 | +selected earlier (Fig. \ref{compare-fir}). | |
| 381 | + | |
| 382 | +\begin{figure}[h!tb] | |
| 383 | +\includegraphics[width=\linewidth]{images/fir-mono-vs-fir-series-noise-fixe-median-light.pdf} | |
| 384 | +\caption{Comparison of the rejection capability between a series of FIR and a monolithic FIR | |
| 385 | +with a cutoff frequency set at half the Nyquist frequency.} | |
| 386 | +\label{compare-median} | |
| 387 | +\end{figure} | |
| 388 | + | |
| 389 | +\section{Filter coefficient selection} | |
| 390 | + | |
| 391 | +The coefficients of a single monolithic filter are computed as the impulse response | |
| 392 | +of the filter transfer function, and practically approximated by a multitude of methods | |
| 393 | +including least square optimization (Matlab's {\tt firls} function), Hamming or Kaiser windowing | |
| 394 | +(Matlab's {\tt fir1} function). | |
| 395 | + | |
| 396 | +\begin{figure}[h!tb] | |
| 397 | +\includegraphics[width=\linewidth]{images/fir1-vs-firls} | |
| 398 | +\caption{Evolution of the rejection capability of least-square optimized filters and Hamming | |
| 399 | +FIR filters as a function of the number of coefficients, for floating point numbers and 8-bit | |
| 400 | +encoded integers.} | |
| 401 | +\label{2} | |
| 402 | +\end{figure} | |
| 403 | + | |
| 404 | +Cascading filters opens a new optimization opportunity by | |
| 405 | +selecting various coefficient sets depending on the number of coefficients. Fig. \ref{2} | |
| 406 | +illustrates that for a number of coefficients ranging from 8 to 47, {\tt fir1} provides a better | |
| 407 | +rejection than {\tt firls}: since the linear solver increases the number of coefficients along | |
| 408 | +the processing chain, the type of selected filter also changes depending on the number of coefficients | |
| 409 | +and evolves along the processing chain. | |
| 410 | + | |
| 411 | +\section{Conclusion} | |
| 412 | + | |
| 413 | +We address the optimization problem of designing a low-pass filter chain in a Field Programmable Gate | |
| 414 | +Array for improved noise rejection within constrained resource occupation, as needed for | |
| 415 | +real time processing of radiofrequency signal when characterizing spectral phase noise | |
| 416 | +characteristics of stable oscillators. The flexibility of the digital approach makes the result | |
| 417 | +best suited for closing the loop and using the measurement output in a feedback loop for | |
| 418 | +controlling clocks, e.g. in a quartz-stabilized high performance clock whose long term behavior | |
| 419 | +is controlled by non-piezoelectric resonator (sapphire resonator, microwave or optical | |
| 420 | +atomic transition). | |
| 421 | + | |
| 422 | +\section*{Acknowledgement} | |
| 423 | + | |
| 424 | +This work is supported by the ANR Programme d'Investissement d'Avenir in | |
| 425 | +progress at the Time and Frequency Departments of the FEMTO-ST Institute | |
| 426 | +(Oscillator IMP, First-TF and Refimeve+), and by R\'egion de Franche-Comt\'e. | |
| 427 | +The authors would like to thank E. Rubiola, F. Vernotte, and G. Cabodevila | |
| 428 | +for support and fruitful discussions. | |
| 429 | + | |
| 430 | +\bibliographystyle{IEEEtran} | |
| 431 | +\balance | |
| 432 | +\bibliography{references,biblio} | |
| 433 | +\end{document} | |
| 434 | + | |
| 435 | + \section{Contexte d'ordonnancement} | |
| 436 | + Dans cette partie, nous donnerons des d\'efinitions de termes rattach\'es au domaine de l'ordonnancement | |
| 437 | + et nous verrons que le sujet trait\'e se rapproche beaucoup d'un problème d'ordonnancement. De ce fait | |
| 438 | + nous pourrons aller plus loin que les travaux vus pr\'ec\'edemment et nous tenterons des approches d'ordonnancement | |
| 439 | + et d'optimisation. | |
| 440 | + | |
| 441 | + \subsection{D\'efinition du vocabulaire} | |
| 442 | + Avant tout, il faut d\'efinir ce qu'est un problème d'optimisation. Il y a deux d\'efinitions | |
| 443 | + importantes à donner. La première est propos\'ee par Legrand et Robert dans leur livre \cite{def1-ordo} : | |
| 444 | + \begin{definition} | |
| 445 | + \label{def-ordo1} | |
| 446 | + Un ordonnancement d'un système de t\^aches $G\ =\ (V,\ E,\ w)$ est une fonction $\sigma$ : | |
| 447 | + $V \rightarrow \mathbb{N}$ telle que $\sigma(u) + w(u) \leq \sigma(v)$ pour toute arête $(u,\ v) \in E$. | |
| 448 | + \end{definition} | |
| 449 | + | |
| 450 | + Dit plus simplement, l'ensemble $V$ repr\'esente les t\^aches à ex\'ecuter, l'ensemble $E$ repr\'esente les d\'ependances | |
| 451 | + des t\^aches et $w$ les temps d'ex\'ecution de la t\^ache. La fonction $\sigma$ donne donc l'heure de d\'ebut de | |
| 452 | + chacune des t\^aches. La d\'efinition dit que si une t\^ache $v$ d\'epend d'une t\^ache $u$ alors | |
| 453 | + la date de d\'ebut de $v$ sera plus grande ou \'egale au d\'ebut de l'ex\'ecution de la t\^ache $u$ plus son | |
| 454 | + temps d'ex\'ecution. | |
| 455 | + | |
| 456 | + Une autre d\'efinition importante qui est propos\'ee par Leung et al. \cite{def2-ordo} est : | |
| 457 | + \begin{definition} | |
| 458 | + \label{def-ordo2} | |
| 459 | + L'ordonnancement traite de l'allocation de ressources rares à des activit\'es avec | |
| 460 | + l'objectif d'optimiser un ou plusieurs critères de performance. | |
| 461 | + \end{definition} | |
| 462 | + | |
| 463 | + Cette d\'efinition est plus g\'en\'erique mais elle nous int\'eresse d'avantage que la d\'efinition \ref{def-ordo1}. | |
| 464 | + En effet, la partie qui nous int\'eresse dans cette première d\'efinition est le respect de la pr\'ec\'edance des t\^aches. | |
| 465 | + Dans les faits les dates de d\'ebut ne nous int\'eressent pas r\'eellement. | |
| 466 | + | |
| 467 | + En revanche la d\'efinition \ref{def-ordo2} sera au c\oe{}ur du projet. Pour se convaincre de cela, | |
| 468 | + il nous faut d'abord d\'efinir quel est le type de problème d'ordonnancement qu'on traite et quelles | |
| 469 | + sont les m\'ethodes qu'on peut appliquer. | |
| 470 | + | |
| 471 | + Les problèmes d'ordonnancement peuvent être class\'es en diff\'erentes cat\'egories : | |
| 472 | + \begin{itemize} | |
| 473 | + \item T\^aches ind\'ependantes : dans cette cat\'egorie de problèmes, les t\^aches sont complètement ind\'ependantes | |
| 474 | + les unes des autres. Dans notre cas, ce n'est pas le plus adapt\'e. | |
| 475 | + \item Graphe de t\^aches : la d\'efinition \ref{def-ordo1} d\'ecrit cette cat\'egorie. La plupart du temps, | |
| 476 | + les t\^aches sont repr\'esent\'ees par une DAG. Cette cat\'egorie est très proche de notre cas puisque nous devons \'egalement ex\'ecuter | |
| 477 | + des t\^aches qui ont un certain nombre de d\'ependances. On pourra même dire que dans certain cas, | |
| 478 | + on a des anti-arbres, c'est à dire que nous avons une multitude de t\^aches d'entr\'ees qui convergent vers une | |
| 479 | + t\^ache de fin. | |
| 480 | + \item Workflow : cette cat\'egorie est une sous cat\'egorie des graphes de t\^aches dans le sens où | |
| 481 | + il s'agit d'un graphe de t\^aches r\'ep\'et\'e de nombreuses de fois. C'est exactement ce type de problème | |
| 482 | + que nous traitons ici. | |
| 483 | + \end{itemize} | |
| 484 | + | |
| 485 | + Bien entendu, cette liste n'est pas exhaustive et il existe de nombreuses autres classifications et sous-classifications | |
| 486 | + de ces problèmes. Nous n'avons parl\'e ici que des cat\'egories les plus communes. | |
| 487 | + | |
| 488 | + Un autre point à d\'efinir, est le critère d'optimisation. Il y a là encore un grand nombre de | |
| 489 | + critères possibles. Nous allons donc parler des principaux : | |
| 490 | + \begin{itemize} | |
| 491 | + \item Temps de compl\'etion total (ou Makespan en anglais) : ce critère est l'un des critères d'optimisation | |
| 492 | + les plus courant. Il s'agit donc de minimiser la date de fin de la dernière t\^ache de l'ensemble des | |
| 493 | + t\^aches à ex\'ecuter. L'enjeu de cette optimisation est donc de trouver l'ordonnancement optimal permettant | |
| 494 | + la fin d'ex\'ecution au plus tôt. | |
| 495 | + \item Somme des temps d'ex\'ecution (Flowtime en anglais) : il s'agit de faire la somme des temps d'ex\'ecution de toutes les t\^aches | |
| 496 | + et d'optimiser ce r\'esultat. | |
| 497 | + \item Le d\'ebit : ce critère quant à lui, vise à augmenter au maximum le d\'ebit de traitement des donn\'ees. | |
| 498 | + \end{itemize} | |
| 499 | + | |
| 500 | + En plus de cela, on peut avoir besoin de plusieurs critères d'optimisation. Il s'agit dans ce cas d'une optimisation | |
| 501 | + multi-critères. Bien entendu, cela complexifie d'autant plus le problème car la solution la plus optimale pour un | |
| 502 | + des critères peut être très mauvaise pour un autre critère. De ce cas, il s'agira de trouver une solution qui permet | |
| 503 | + de faire le meilleur compromis entre tous les critères. | |
| 504 | + | |
| 505 | + \subsection{Formalisation du problème} | |
| 506 | + \label{formalisation} | |
| 507 | + Maintenant que nous avons donn\'e le vocabulaire li\'e à l'ordonnancement, nous allons pouvoir essayer caract\'eriser | |
| 508 | + formellement notre problème. En effet, nous allons reprendre les contraintes \'enonc\'ees dans la sections \ref{def-contraintes} | |
| 509 | + et nous essayerons de les formaliser le plus finement possible. | |
| 510 | + | |
| 511 | + Comme nous l'avons dit, une t\^ache est un bloc de traitement. Chaque t\^ache $i$ dispose d'un ensemble de paramètres | |
| 512 | + que nous nommerons $\mathcal{P}_{i}$. Cet ensemble $\mathcal{P}_i$ est propre à chaque t\^ache et il variera d'une | |
| 513 | + t\^ache à l'autre. Nous reviendrons plus tard sur les paramètres qui peuvent composer cet ensemble. | |
| 514 | + | |
| 515 | + Outre cet ensemble $\mathcal{P}_i$, chaque t\^ache dispose de paramètres communs : | |
| 516 | + \begin{itemize} | |
| 517 | + \item Dur\'ee de la t\^ache : Comme nous l'avons dit auparavant, dans le cadre d'un FPGA le temps est compt\'e en nombre de coup d'horloge. | |
| 518 | + En outre, les blocs sont toujours sollicit\'es, certains même sont capables de lire et de renvoyer une r\'esultat à chaque coups d'horloge. | |
| 519 | + Donc la dur\'ee d'une t\^ache ne peut être le laps de temps entre l'entr\'ee d'une donn\'ee et la sortie d'une autre. Nous d\'efinirons la | |
| 520 | + dur\'ee comme le temps de traitement d'une donn\'ee, c'est à dire la diff\'erence de temps entre la date de sortie d'une donn\'ee | |
| 521 | + et de sa date d'entr\'ee. Nous nommerons cette dur\'ee $\delta_i$. % Je devrais la nomm\'ee w comme dans la def2 | |
| 522 | + \item La pr\'ecision : La pr\'ecision d'une donn\'ee est le nombre de bits significatifs qu'elle compte. En effet, au fil des traitements | |
| 523 | + les pr\'ecisions peuvent varier. On nomme donc la pr\'ecision d'entr\'ee d'une t\^ache $i$ comme $\pi_i^-$ et la pr\'ecision en sortie $\pi_i^+$. | |
| 524 | + \item La fr\'equence du flux en entr\'ee (ou sortie) : Cette fr\'equence repr\'esente la fr\'equence des donn\'ees qui arrivent (resp. sortent). | |
| 525 | + Selon les t\^aches, les fr\'equences varieront. En effet, certains blocs ralentissent le flux c'est pourquoi on distingue la fr\'equence du | |
| 526 | + flux en entr\'ee et la fr\'equence en sortie. Nous nommerons donc la fr\'equence du flux en entr\'ee $f_i^-$ et la fr\'equence en sortie $f_i^+$. | |
| 527 | + \item La quantit\'e de donn\'ees en entr\'ee (ou en sortie) : Il s'agit de la quantit\'e de donn\'ees que le bloc s'attend à traiter (resp. | |
| 528 | + est capable de produire). Les t\^aches peuvent avoir à traiter des gros volumes de donn\'ees et n'en ressortir qu'une partie. Cette | |
| 529 | + fois encore, il nous faut donc diff\'erencier l'entr\'ee et la sortie. Nous nommerons donc la quantit\'e de donn\'ees entrantes $q_i^-$ | |
| 530 | + et la quantit\'e de donn\'ees sortantes $q_i^+$ pour une t\^ache $i$. | |
| 531 | + \item Le d\'ebit d'entr\'ee (ou de sortie) : Ce paramètre correspond au d\'ebit de donn\'ees que la t\^ache est capable de traiter ou qu'elle | |
| 532 | + fournit en sortie. Il s'agit simplement de l'expression des deux pr\'ec\'edents paramètres. Nous d\'efinirons donc la d\'ebit entrant de la | |
| 533 | + t\^ache $i$ comme $d_i^-\ =\ q_i^-\ *\ f_i^-$ et le d\'ebit sortant comme $d_i^+\ =\ q_i^+\ *\ f_i^+$. | |
| 534 | + \item La taille de la t\^ache : La taille dans les FPGA \'etant limit\'ee, ce paramètre exprime donc la place qu'occupe la t\^ache au sein du bloc. | |
| 535 | + Nous nommerons $\mathcal{A}_i$ cette taille. | |
| 536 | + \item Les pr\'ed\'ecesseurs et successeurs d'une t\^ache : cela nous permet de connaître les t\^aches requises pour pouvoir traiter | |
| 537 | + la t\^ache $i$ ainsi que les t\^aches qui en d\'ependent. Ces ensemble sont not\'es $\Gamma _i ^-$ et $ \Gamma _i ^+$ \\ | |
| 538 | + %TODO Est-ce vraiment un paramètre ? | |
| 539 | + \end{itemize} | |
| 540 | + | |
| 541 | + Ces diff\'erents paramètres communs sont fortement li\'es aux \'el\'ements de $\mathcal{P}_i$. Voici quelques exemples de relations | |
| 542 | + que nous avons identifi\'ees : | |
| 543 | + \begin{itemize} | |
| 544 | + \item $ \delta _i ^+ \ = \ \mathcal{F}_{\delta}(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+,\ \mathcal{P}_i) $ donne le temps d'ex\'ecution | |
| 545 | + de la t\^ache en fonction de la pr\'ecision voulue, du d\'ebit et des paramètres internes. | |
| 546 | + \item $ \pi _i ^+ \ = \ \mathcal{F}_{p}(\pi_i^-,\ \mathcal{P}_i) $, la fonction $F_p$ donne la pr\'ecision en sortie selon la pr\'ecision de d\'epart | |
| 547 | + et les paramètres internes de la t\^ache. | |
| 548 | + \item $d_i^+\ =\ \mathcal{F}_d(d_i^-, \mathcal{P}_i)$, la fonction $F_d$ donne le d\'ebit sortant de la t\^ache en fonction du d\'ebit | |
| 549 | + sortant et des variables internes de la t\^ache. | |
| 550 | + \item $A_i^+\ =\ \mathcal{F}_A(\pi_i^-,\ \pi_i^+,\ d_i^-,\ d_i^+, \mathcal{P}_i)$ | |
| 551 | + \end{itemize} | |
| 552 | + Pour le moment, nous ne sommes pas capables de donner une d\'efinition g\'en\'erale de ces fonctions. Mais en revanche, | |
| 553 | + sur quelques exemples simples (cf. \ref{def-contraintes}), nous parvenons à donner une \'evaluation de ces fonctions. | |
| 554 | + | |
| 555 | + Maintenant que nous avons donn\'e toutes les notations utiles, nous allons \'enoncer des contraintes relatives à notre problème. Soit | |
| 556 | + un DGA $G(V,\ E)$, on a pour toutes arêtes $(i, j)\ \in\ E$ les in\'equations suivantes : | |
| 557 | + | |
| 558 | + \paragraph{Contrainte de pr\'ecision :} | |
| 559 | + Cette in\'equation traduit la contrainte de pr\'ecision d'une t\^ache à l'autre : | |
| 560 | + \begin{align*} | |
| 561 | + \pi _i ^+ \geq \pi _j ^- | |
| 562 | + \end{align*} | |
| 563 | + | |
| 564 | + \paragraph{Contrainte de d\'ebit :} | |
| 565 | + Cette in\'equation traduit la contrainte de d\'ebit d'une t\^ache à l'autre : | |
| 566 | + \begin{align*} | |
| 567 | + d _i ^+ = q _j ^- * (f_i + (1 / s_j) ) & \text{ où } s_j \text{ est une valeur positive de temporisation de la t\^ache} | |
| 568 | + \end{align*} | |
| 569 | + | |
| 570 | + \paragraph{Contrainte de synchronisation :} | |
| 571 | + Il s'agit de la contrainte qui impose que si à un moment du traitement, le DAG se s\'epare en plusieurs branches parallèles | |
| 572 | + et qu'elles se rejoignent plus tard, la somme des latences sur chacune des branches soit la même. | |
| 573 | + Plus formellement, s'il existe plusieurs chemins disjoints, partant de la t\^ache $s$ et allant à la t\^ache de $f$ alors : | |
| 574 | + \begin{align*} | |
| 575 | + \forall \text{ chemin } \mathcal{C}1(s, .., f), | |
| 576 | + \forall \text{ chemin } \mathcal{C}2(s, .., f) | |
| 577 | + \text{ tel que } \mathcal{C}1 \neq \mathcal{C}2 | |
| 578 | + \Rightarrow | |
| 579 | + \sum _{i} ^{i \in \mathcal{C}1} \delta_i = \sum _{i} ^{i \in \mathcal{C}2} \delta_i | |
| 580 | + \end{align*} | |
| 581 | + | |
| 582 | + \paragraph{Contrainte de place :} | |
| 583 | + Cette in\'equation traduit la contrainte de place dans le FPGA. La taille max de la puce FPGA est nomm\'e $\mathcal{A}_{FPGA}$ : | |
| 584 | + \begin{align*} | |
| 585 | + \sum ^{\text{t\^ache } i} \mathcal{A}_i \leq \mathcal{A}_{FPGA} | |
| 586 | + \end{align*} | |
| 587 | + | |
| 588 | + \subsection{Exemples de mod\'elisation} | |
| 589 | + \label{exemples-modeles} | |
| 590 | + Nous allons maintenant prendre quelques blocs de traitement simples afin d'illustrer au mieux notre modèle. | |
| 591 | + Pour tous nos exemple, nous prendrons un d\'ebit en entr\'ee de 200 Mo/s avec une pr\'ecision de 16 bit. | |
| 592 | + | |
| 593 | + Prenons tout d'abord l'exemple d'un bloc de d\'ecimation. Le but de ce bloc est de ralentir le flux en ne gardant | |
| 594 | + que certaines donn\'ees à intervalle r\'egulier. Cet intervalle est appel\'e le facteur de d\'ecimation, on le notera $N$. | |
| 595 | + | |
| 596 | + Donc d'après notre mod\'elisation : | |
| 597 | + \begin{itemize} | |
| 598 | + \item $N \in \mathcal{P}_i$ | |
| 599 | + %TODO N ou 1 ? | |
| 600 | + \item $\delta _i = N\ c.h.$ (coup d'horloge) | |
| 601 | + \item $\pi _i ^+ = \pi _i ^- = 16 bits$ | |
| 602 | + \item $f _i ^+ = f _i ^-$ | |
| 603 | + \item $q _i ^+ = q _i ^- / N$ | |
| 604 | + \item $d _i ^+ = q _i ^- / N / f _i ^-$ | |
| 605 | + \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\ | |
| 606 | + %TODO Je ne sais pas trouver la taille... | |
| 607 | + \end{itemize} | |
| 608 | + | |
| 609 | + Un autre exemple int\'eressant que l'on peut donner, c'est le cas des spliters. Il s'agit la aussi d'un bloc très | |
| 610 | + simple qui permet de dupliquer un flux. On peut donc donner un nombre de sorties à cr\'eer, on note ce paramètre | |
| 611 | + %TODO pas très inspir\'e... | |
| 612 | + $X$. Voici ce que donne notre mod\'elisation : | |
| 613 | + \begin{itemize} | |
| 614 | + \item $X \in \mathcal{P}_i$ | |
| 615 | + \item $\delta _i = 1\ c.h.$ | |
| 616 | + \item $\pi _i ^+ = \pi _i ^- = 16 bits$ | |
| 617 | + \item $f _i ^+ = f _i ^-$ | |
| 618 | + \item $q _i ^+ = q _i ^-$ | |
| 619 | + \item $d _i ^+ = d _i ^-$ | |
| 620 | + \item $\Gamma _i ^- = 1$ | |
| 621 | + \item $\Gamma _i ^+ = X$\\ | |
| 622 | + \end{itemize} | |
| 623 | + | |
| 624 | + L'exemple suivant traite du cas du shifter. Il s'agit d'un bloc qui a pour but de diminuer le nombre de bits des | |
| 625 | + donn\'ees afin d'acc\'el\'erer les traitement sur les blocs suivants. On peut donc donner le nombre de bits à shifter, | |
| 626 | + on note ce paramètre $S$. Voici ce que donne notre mod\'elisation : | |
| 627 | + \begin{itemize} | |
| 628 | + \item $S \in \mathcal{P}_i$ | |
| 629 | + \item $\delta _i = 1\ c.h.$ | |
| 630 | + \item $\pi _i ^+ = \pi _i ^- - S$ | |
| 631 | + \item $f _i ^+ = f _i ^-$ | |
| 632 | + \item $q _i ^+ = q _i ^-$ | |
| 633 | + \item $d _i ^+ = d _i ^-$ | |
| 634 | + \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\ | |
| 635 | + \end{itemize} | |
| 636 | + | |
| 637 | + Nous allons traiter un dernier exemple un peu plus complexe, le cas d'un filtre d\'ecimateur (ou FIR). Ce bloc | |
| 638 | + est compos\'e de beaucoup de paramètres internes. On peut d\'efinir un nombre d'\'etages $E$, qui repr\'esente le nombre | |
| 639 | + d'it\'erations à faire avant d'arrêter le traitement. Afin d'effectuer son filtrage, on doit donner au bloc un ensemble | |
| 640 | + de coefficients $C$ et par cons\'equent ces coefficients ont leur propre pr\'ecision $\pi _C$. Pour finir, le dernier | |
| 641 | + paramètre à donner est le facteur de d\'ecimation $N$. Si on applique notre mod\'elisation, on peut obtenir cela : | |
| 642 | + \begin{itemize} | |
| 643 | + \item $E \in \mathcal{P}_i$ | |
| 644 | + \item $C \in \mathcal{P}_i$ | |
| 645 | + \item $\pi _C \in \mathcal{P}_i$ | |
| 646 | + \item $N \in \mathcal{P}_i$ | |
| 647 | + \item $\delta _i = E * |C| * q_i^-\ c.h.$ %Trop simpliste | |
| 648 | + \item $\pi _i ^+ = \pi _i ^- * \pi _C$ | |
| 649 | + \item $f _i ^+ = f _i ^-$ | |
| 650 | + \item $q _i ^+ = q _i ^- / N$ | |
| 651 | + \item $d _i ^+ = q _i ^- / N / f _i ^-$ | |
| 652 | + \item $\Gamma _i ^+ = \Gamma _i ^- = 1$\\ | |
| 653 | + \end{itemize} | |
| 654 | + | |
| 655 | + Ces exemples ne sont que des modèles provisoires; pour s'assurer de leur performance, il faudra les | |
| 656 | + confronter à des simulations. | |
| 657 | + | |
| 658 | + | |
| 659 | +Bien que les articles sur les skeletons, \cite{gwen-cogen}, \cite{skeleton} et \cite{hide}, nous aient donn\'e des indices sur une possible | |
| 660 | + mod\'elisation, ils \'etaient encore trop focalis\'es sur l'optimisation spatiale des blocs. Nous nous sommes donc inspir\'es de ces travaux | |
| 661 | + pour proposer notre modèle, en faisant abstraction des optimisations bas niveau. |