From 7c951bd35e5df9b0ee5b1c9c8683eaa0ea9478e6 Mon Sep 17 00:00:00 2001 From: Arthur HUGEAT Date: Tue, 10 Sep 2019 10:33:04 +0200 Subject: [PATCH] Typo + texte en noir. --- ifcs2018_journal.tex | 110 ++++++++++++++++++++++++--------------------------- 1 file changed, 51 insertions(+), 59 deletions(-) diff --git a/ifcs2018_journal.tex b/ifcs2018_journal.tex index 926c294..1c45949 100644 --- a/ifcs2018_journal.tex +++ b/ifcs2018_journal.tex @@ -120,10 +120,10 @@ processing (as opposed to First-In, First-Out FIFO memory batch processing) of r signals, High Level Synthesis (HLS) languages \cite{kasbah2008multigrid} are not considered but the problem is tackled at the Very-high-speed-integrated-circuit Hardware Description Language (VHDL) level. -{\color{red}Since latency is not an issue in a openloop phase noise characterization instrument, +Since latency is not an issue in a openloop phase noise characterization instrument, the large numbre of taps in the FIR, as opposed to the shorter Infinite Impulse Response (IIR) filter, -is not considered as an issue as would be in a closed loop system.} % r2.4 +is not considered as an issue as would be in a closed loop system. The coefficients are classically expressed as floating point values. However, this binary number representation is not efficient for fast arithmetic computation by an FPGA. Instead, @@ -144,9 +144,9 @@ integer operations is not trivial. As an illustration of the issue related to th relation between number of fiter taps and quantization, Fig. \ref{float_vs_int} exhibits a 128-coefficient FIR bandpass filter designed using floating point numbers (blue). Upon quantization on 6~bit integers, 60 of the 128~coefficients in the beginning and end of the -taps become null, {\color{red}making the large number of coefficients irrelevant: processing -resources % r1.1 -are hence saved by shrinking the filter length.} This tradeoff aimed at minimizing resources +taps become null, making the large number of coefficients irrelevant: processing +resources +are hence saved by shrinking the filter length. This tradeoff aimed at minimizing resources to reach a given rejection level, or maximizing out of band rejection for a given computational resource, will drive the investigation on cascading filters designed with varying tap resolution and tap length, as will be shown in the next section. Indeed, our development strategy closely @@ -163,11 +163,11 @@ moment. Our objective is to develop a new methodology applicable to any Digital Signal Processing (DSP) chain obtained by assembling basic processing blocks, with hardware and manufacturer independence. Achieving such a target requires defining an abstract model to represent some basic properties -of DSP blocks such as perfomance (i.e. rejection or ripples in the bandpass for filters) and +of DSP blocks such as performance (i.e. rejection or ripples in the bandpass for filters) and resource occupation. These abstract properties, not necessarily related to the detailed hardware implementation of a given platform, will feed a scheduler solver aimed at assembling the optimum target, whether in terms of maximizing performance for a given arbitrary resource occupation, or -minimizing resource occupation for a given perfomance. In our approach, the solution of the +minimizing resource occupation for a given performance. In our approach, the solution of the solver is then synthesized using the dedicated tool provided by each platform manufacturer to assess the validity of our abstract resource occupation indicator, and the result of running the DSP chain on the FPGA allows for assessing the performance of the scheduler. We emphasize @@ -184,24 +184,23 @@ time and frequency transfer or characterization \cite{carolina1,carolina2,rsi}. Addressing only two operations allows for demonstrating the methodology but should not be considered as a limitation of the framework which can be extended to assembling any number -of skeleton blocks as long as perfomance and resource occupation can be determined. {\color{red} +of skeleton blocks as long as performance and resource occupation can be determined. Hence, -in this paper we will apply our methodology on simple DSP chains: a white noise input signal % r1.2 +in this paper we will apply our methodology on simple DSP chains: a white noise input signal is generated using a Pseudo-Random Number (PRN) generator or by sampling a wideband (125~MS/s) -14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor.} Once samples have been +14-bit Analog to Digital Converter (ADC) loaded by a 50~$\Omega$ resistor. Once samples have been digitized at a rate of 125~MS/s, filtering is applied to qualify the processing block performance -- practically meeting the radiofrequency frontend requirement of noise and bandwidth reduction by filtering and decimating. Finally, bursts of filtered samples are stored for post-processing, allowing to assess either filter rejection for a given resource usage, or validating the rejection when implementing a solution minimizing resource occupation. -{\color{red} -The first step of our approach is to model the DSP chain. Since we aim at only optimizing % r1.3 +The first step of our approach is to model the DSP chain. Since we aim at only optimizing the filtering part of the signal processing chain, we have not included the PRN generator or the ADC in the model: the input data size and rate are considered fixed and defined by the hardware. The filtering can be done in two ways, either by considering a single monolithic FIR filter requiring many coefficients to reach the targeted noise rejection ratio, or by -cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter.} +cascading multiple FIR filters, each with fewer coefficients than found in the monolithic filter. After each filter we leave the possibility of shifting the filtered data to consume less resources. Hence in the case of cascaded filter, we define a stage as a filter @@ -245,12 +244,12 @@ With these coefficients, the \texttt{freqz} function is used to estimate the mag transfer function. Comparing the performance between FIRs requires however defining a unique criterion. As shown in figure~\ref{fig:fir_mag}, the FIR magnitude exhibits two parts: we focus here on the transitions width and the rejection rather than on the -bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. {\color{red}Throughout this demonstration, +bandpass ripples as emphasized in \cite{lim_1988,lim_1996}. Throughout this demonstration, we arbitrarily set a bandpass of 40\% of the Nyquist frequency and a bandstop from 60\% of the Nyquist frequency to the end of the band, as would be typically selected to prevent aliasing before decimating the dataflow by 2. The method is however generalized to any filter -shape as long as it is defined from the initial modelling steps: Fig. \ref{fig:rejection_pyramid} -as described below is indeed unique for each filter shape.} +shape as long as it is defined from the initial modeling steps: Fig. \ref{fig:rejection_pyramid} +as described below is indeed unique for each filter shape. \begin{figure} \begin{center} @@ -290,17 +289,17 @@ the stopband the last 40\%, allowing 20\% transition width.} \label{fig:fir_mag} \end{figure} -In the transition band, the behavior of the filter is left free, we only {\color{red}define} the passband and the stopband characteristics. +In the transition band, the behavior of the filter is left free, we only define the passband and the stopband characteristics. % r2.7 -{\color{red}Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches -overestimate the rejection capability of the filter.} +Initial considered criteria include the mean value of the stopband rejection which yields unacceptable results since notches +overestimate the rejection capability of the filter. % Furthermore, the losses within % the passband are not considered and might be excessive for excessively wide transitions widths introduced for filters with few coefficients. Our final criterion to compute the filter rejection considers % r2.8 et r2.2 r2.3 -the {\color{red}minimal} rejection within the stopband, to which the {\color{red}sum of the absolute values +the minimal rejection within the stopband, to which the sum of the absolute values within the passband is subtracted to avoid filters with excessive ripples, normalized to the -bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this +bin width to remain consistent with the passband criterion (dBc/Hz units in all cases). With this criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}. % \begin{figure} @@ -313,8 +312,8 @@ criterion, we meet the expected rejection capability of low pass filters as show \begin{figure} \centering \includegraphics[width=\linewidth]{images/colored_custom_criterion} -\caption{Custom criterion (maximum rejection in the stopband minus the {\color{red} sum of the -absolute values of the passband rejection normalized to the bandwidth}) +\caption{Custom criterion (maximum rejection in the stopband minus the sum of the +absolute values of the passband rejection normalized to the bandwidth) comparison between monolithic filter and cascaded filters} \label{fig:custom_criterion} \end{figure} @@ -330,9 +329,9 @@ the rejection. Hence the best coefficient set are on the vertex of the pyramid. \begin{figure} \centering \includegraphics[width=\linewidth]{images/rejection_pyramid} -\caption{{\color{red}{Filter}} rejection as a function of number of coefficients and number of bits -{\color{red}: this lookup table will be used to identify which filter parameters -- number of bits -representing coefficients and number of coefficients -- best match the targeted transfer function.}} +\caption{Filter rejection as a function of number of coefficients and number of bits +: this lookup table will be used to identify which filter parameters -- number of bits +representing coefficients and number of coefficients -- best match the targeted transfer function.} \label{fig:rejection_pyramid} \end{figure} @@ -346,32 +345,30 @@ are two different filters with maximums and notches not located at the same freq Hence when summing the transfer functions, the resulting rejection shown as the dashed yellow line is improved with respect to a basic sum of the rejection criteria shown as a the dotted yellow line. % r2.9 -Thus, estimating the rejection of filter cascades is more complex than {\color{red}taking} the sum of all the rejection -criteria of each filter. However since the {\color{red}individual filter rejection} sum underestimates the rejection capability of the cascade, +Thus, estimating the rejection of filter cascades is more complex than taking the sum of all the rejection +criteria of each filter. However since the individual filter rejection sum underestimates the rejection capability of the cascade, % r2.10 -this upper bound is considered as a {\color{red}conservative} and acceptable criterion for deciding on the suitability +this upper bound is considered as a conservative and acceptable criterion for deciding on the suitability of the filter cascade to meet design criteria. \begin{figure} \centering \includegraphics[width=\linewidth]{images/cascaded_criterion} -\caption{{\color{red}Transfer function of individual filters and after cascading} the two filters, -{\color{red}demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal +\caption{Transfer function of individual filters and after cascading the two filters, +demonstrating that the selected criterion of maximum rejection in the bandstop (horizontal lines) is met. Notice that the cascaded filter has better rejection than summing the bandstop -maximum of each individual filter.} +maximum of each individual filter. } \label{fig:sum_rejection} \end{figure} -% r2.6 -{\color{red} Finally in our case, we consider that the input signal are fully known. The resolution of the input data stream are fixed and still the same for all experiments -in this paper.} +in this paper. Based on this analysis, we address the estimate of resource consumption (called % r2.11 -silicon area -- in the case of FPGAs {\color{red}this means} processing cells) as a function of +silicon area -- in the case of FPGAs this means processing cells) as a function of filter characteristics. As a reminder, we do not aim at matching actual hardware configuration but consider an arbitrary silicon area occupied by each processing function, and will assess after synthesis the adequation of this arbitrary unit with actual @@ -415,7 +412,6 @@ shift bit would cause an additional 6~dB rejection rise. A totally equivalent eq $\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right)$. Finally, equation~\ref{eq:init} gives the number of bits of the global input. -{\color{red} This model is non-linear since we multiply some variable with another variable and it is even non-quadratic, as the cost function $F$ does not have a known linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations. @@ -437,7 +433,6 @@ we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up T for each configurations thanks to the rejection criterion. We also define the binary variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$ and 0 otherwise. The new equations are as follows: -} \begin{align} a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\ @@ -450,7 +445,6 @@ Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}. Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most. -{\color{red} % JM: conflict merge % However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} % we multiply @@ -464,7 +458,7 @@ Equation~\ref{eq:config} states that for each stage, a single configuration is c The problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} we multiply $\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can -linearise linearize this multiplication. The following formula shows how to linearize +linearize this multiplication. The following formula shows how to linearize this situation in general case with $y$ a binary variable and $x$ a real variable ($0 \leq x \leq X^{max}$): \begin{equation*} m = x \times y \implies @@ -481,7 +475,7 @@ So if we bound up $\pi_i^-$ by 128~bits which is the maximum data size whose est assumed on hardware characteristics, the Gurobi (\url{www.gurobi.com}) optimization software will be able to linearize for us the quadratic problem so the model is left as is. This model -has $O(np)$ variables and $O(n)$ constraints.} +has $O(np)$ variables and $O(n)$ constraints. % This model is non-linear and even non-quadratic, as $F$ does not have a known % linear or quadratic expression. We introduce $p$ FIR configurations @@ -515,7 +509,7 @@ has $O(np)$ variables and $O(n)$ constraints.} Two problems will be addressed using the workflow described in the next section: on the one hand maximizing the rejection capability of a set of cascaded filters occupying a fixed arbitrary -silcon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area +silicon area (section~\ref{sec:fixed_area}) and on the second hand the dual problem of minimizing the silicon area for a fixed rejection criterion (section~\ref{sec:fixed_rej}). In the latter case, the objective function is replaced with: \begin{align} @@ -560,8 +554,8 @@ in the computation of the results. \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ; \draw[->] (Postproc) -- (Results) ; \end{tikzpicture} - \caption{Design workflow from the input parameters to the results {\color{red} allowing for -a fully automated optimal solution search.}} + \caption{Design workflow from the input parameters to the results allowing for +a fully automated optimal solution search.} \label{fig:workflow} \end{figure} @@ -739,25 +733,25 @@ Figure~\ref{fig:max_1500_result} shows the rejection of the different configurat \centering \begin{subfigure}{\linewidth} \includegraphics[width=\linewidth]{images/max_500} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MAX/500 problem of maximizing rejection for a given resource allocation (500~arbitrary units).} \label{fig:max_500_result} \end{subfigure} \begin{subfigure}{\linewidth} \includegraphics[width=\linewidth]{images/max_1000} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MAX/1000 problem of maximizing rejection for a given resource allocation (1000~arbitrary units).} \label{fig:max_1000_result} \end{subfigure} \begin{subfigure}{\linewidth} \includegraphics[width=\linewidth]{images/max_1500} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MAX/1500 problem of maximizing rejection for a given resource allocation (1500~arbitrary units).} \label{fig:max_1500_result} \end{subfigure} - \caption{\color{red}Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing + \caption{Solutions for the MAX/500, MAX/1000 and MAX/1500 problems of maximizing rejection for a given resource allocation. The filter shape constraint (bandpass and bandstop) is shown as thick horizontal lines on each chart.} @@ -782,8 +776,8 @@ the FIR filters and remove additional processing blocks including FIFO and Progr Logic (PL -- FPGA) to Processing System (PS -- general purpose processor) communication. \begin{table}[h!tb] - \caption{Resource occupation {\color{red}following synthesis of the solutions found for -the problem of maximizing rejection for a given resource allocation}. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} + \caption{Resource occupation following synthesis of the solutions found for +the problem of maximizing rejection for a given resource allocation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.} \label{tbl:resources_usage} \centering \begin{tabular}{|c|c|ccc|c|} @@ -816,7 +810,7 @@ to compare the whole silicon budget. However, a rough estimation can be made with a simple equivalence: looking at the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$, we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon -area use. With this equivalence, our 500 arbitraty units correspond to 2500 LUTs, +area use. With this equivalence, our 500 arbitrary units correspond to 2500 LUTs, 1000 arbitrary units correspond to 5000 LUTs and 1500 arbitrary units correspond to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary unit map well to actual hardware resources. The relatively small differences can probably be explained @@ -944,7 +938,7 @@ From these tables, we can first state that almost all configurations reach the t level or even better thanks to our underestimate of the cascade rejection as the sum of the individual filter rejection. The only exception is for the monolithic case ($n = 1$) in MIN/100: no solution is found for a single monolithic filter reach a 100~dB rejection. -Futhermore, the area of the monolithic filter is twice as big as the two cascaded filters +Furthermore, the area of the monolithic filter is twice as big as the two cascaded filters (1131 and 1760 arbitrary units v.s 547 and 903 arbitrary units for 60 and 80~dB rejection respectively). More generally, the more filters are cascaded, the lower the occupied area. @@ -1001,32 +995,32 @@ Figure~\ref{fig:min_100} shows the rejection of the different configurations in \centering \begin{subfigure}{\linewidth} \includegraphics[width=.91\linewidth]{images/min_40} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MIN/40 problem of minimizing resource allocation for reaching a 40~dB rejection.} \label{fig:min_40} \end{subfigure} \begin{subfigure}{\linewidth} \includegraphics[width=.91\linewidth]{images/min_60} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MIN/60 problem of minimizing resource allocation for reaching a 60~dB rejection.} \label{fig:min_60} \end{subfigure} \begin{subfigure}{\linewidth} \includegraphics[width=.91\linewidth]{images/min_80} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MIN/80 problem of minimizing resource allocation for reaching a 80~dB rejection.} \label{fig:min_80} \end{subfigure} \begin{subfigure}{\linewidth} \includegraphics[width=.91\linewidth]{images/min_100} - \caption{\color{red}Filter transfer functions for varying number of cascaded filters solving + \caption{Filter transfer functions for varying number of cascaded filters solving the MIN/100 problem of minimizing resource allocation for reaching a 100~dB rejection.} \label{fig:min_100} \end{subfigure} - \caption{\color{red}Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a + \caption{Solutions for the MIN/40, MIN/60, MIN/80 and MIN/100 problems of reaching a given rejection while minimizing resource allocation. The filter shape constraint (bandpass and bandstop) is shown as thick horizontal lines on each chart.} @@ -1104,7 +1098,6 @@ needed in the previous section. Indeed the worst time in this case is only 17~mi compared to 3~days in the previous section: this problem is more easily solved than the previous one. -{\color{red} % r1.4 To conclude, we compare our monolithic filters with the FIR Compiler provided by Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the same coefficient set and we compare the resource consumption, having checked that @@ -1137,7 +1130,6 @@ MIN/80 & 482 & 0 & 55 & 772 & 1 & 55 \end{tabular} \end{table} \renewcommand{\arraystretch}{1} -} \section{Conclusion} -- 2.16.4