Add first draft for the journal article.

Arthur HUGEAT
1 parent 842e804be4
Showing 11 changed files with 570 additions and 120 deletions Side-by-side Diff
ifcs2018_journal.tex
images/max_1000.pdf
images/max_1500.pdf
images/max_500.pdf
images/max_rejection/prn_1000.pdf
images/max_rejection/prn_2000.pdf
images/max_rejection/prn_500.pdf
images/min_40.pdf
images/min_60.pdf
images/min_80.pdf
references.bib
@@ -142,9 +142,9 @@
 and for any hardware platform (Altera, Xilinx...). To do this we have defined an
 abstract model to represent some basic operations of DSP.
  
-For the moment, we are focused on only two operations: the filtering and the shift of data.
+For the moment, we are focused on only two operations: the filtering and the shifting of data.
 We have chosen this basic operation because the shifting and the filtering have already be studied in
-lot of works {\color{red} mettre les nouvelles référence ici} hence it will be easier
+lot of works \cite{lim_1996, lim_1988, young_1992, smith_1998} hence it will be easier
 to check and validate our results.
  
 However having only two operations is insufficient to work with complex DSP but
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
@@ -283,169 +283,568 @@
 \label{fig:sum_rejection}
 \end{figure}
  
+Finally we can describe our abstract model with following expressions :
+\begin{align}
+\text{Maximize } & \sum_{i=1}^n r_i  \notag \\
+\sum_{i=1}^n a_i & \leq \mathcal{A} & \label{eq:area} \\
+a_i & = C_i \times (\pi_i^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef} \\
+r_i & = F(C_i, \pi_i^C), & \forall i \in [1, n] \label{eq:rejectiondef} \\
+\pi_i^+ & = \pi_i^- + \pi_i^C - \pi_i^S, & \forall i \in [1, n] \label{eq:bits} \\
+\pi_{i - 1}^+ & = \pi_i^-, & \forall i \in [2, n] \label{eq:inout} \\
+\pi_i^+ & \geq 1 + \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right), & \forall i \in [1, n] \label{eq:maxshift} \\
+\pi_1^- &= \Pi^I \label{eq:init}
+\end{align}
+
+{\color{red} Je sais que l'idée est de ne pas parler du programme linéaire mais
+ça me semble quand même indispensable. Au pire, j'essaierai de revoir ça si on
+est vraiment en manque de place.}
+
+Equation~\ref{eq:area} states that the total area taken by the filters must be
+less than the available area. Equation~\ref{eq:areadef} gives the definition of
+the area for a filter. More precisely, it is the area of the FIR as the Shifter
+does not need any circuitry. We consider that the FIR needs $C_i$ registers of size
+$\pi_i^C + \pi_i^-$~bits to store the results of the multiplications of the
+input data and the coefficients. Equation~\ref{eq:rejectiondef} gives the
+definition of the rejection of the filter thanks to function~$F$ that we defined
+previously. The Shifter does not introduce negative rejection as we explain later,
+so the rejection only comes from the FIR. Equation~\ref{eq:bits} states the
+relation between $\pi_i^+$ and $\pi_i^-$. The multiplications in the FIR add
+$\pi_i^C$ bits as most coefficients are close to zero, and the Shifter removes
+$\pi_i^S$ bits. Equation~\ref{eq:inout} states that the output number of bits of
+a filter is the same as the input number of bits of the next filter.
+Equation~\ref{eq:maxshift} ensures that the Shifter does not introduce negative
+rejection. Indeed, the results of the FIR can be right shifted without compromising
+the quality of the rejection until a threshold. Each bit of the output data
+increases the maximum rejection level of 6~dB. We add one to take the sign bit
+into account. If equation~\ref{eq:maxshift} was not present, the Shifter could
+shift too much and introduce some noise in the output data. Each supplementary
+shift bit would cause 6~dB of noise. A totally equivalent equation is:
+$\pi_i^S \leq \pi_i^- + \pi_i^C - 1 - \sum_{k=1}^{i} \left(1 + \frac{r_j}{6}\right) $.
+Finally, equation~\ref{eq:init} gives the global input's number of bits.
+
+This model is non-linear and even non-quadratic, as $F$ does not have a known
+linear or quadratic expression. We introduce $p$ FIR configurations
+ $(C_{ij}, \pi_{ij}^C), 1 \leq j \leq p$ that are constants. We define binary
+ variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
+ and 0 otherwise. The new equations are as follows:
+
+\begin{align}
+a_i & = \sum_{j=1}^p \delta_{ij} \times C_{ij} \times (\pi_{ij}^C + \pi_i^-), & \forall i \in [1, n] \label{eq:areadef2} \\
+r_i & = \sum_{j=1}^p \delta_{ij} \times F(C_{ij}, \pi_{ij}^C), & \forall i \in [1, n] \label{eq:rejectiondef2} \\
+\pi_i^+ & = \pi_i^- + \left(\sum_{j=1}^p \delta_{ij} \pi_{ij}^C\right) - \pi_i^S, & \forall i \in [1, n] \label{eq:bits2} \\
+\sum_{j=1}^p \delta_{ij} & \leq 1, & \forall i \in [1, n] \label{eq:config}
+\end{align}
+
+Equations \ref{eq:areadef2}, \ref{eq:rejectiondef2} and \ref{eq:bits2} replace
+respectively equations \ref{eq:areadef}, \ref{eq:rejectiondef} and \ref{eq:bits}.
+Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
+
+The next section shows the results for this quadratic program but the section~\ref{sec:fixed_rej}
+presents the results for the complementary problem. In this case we want
+minimize the occupied area for a targeted rejection level. Hence we have replace
+the objective function with:
+\begin{align}
+\text{Minimize } & \sum_{i=1}^n a_i  \notag
+\end{align}
+We adapt our constraints of quadratic program to replace the equation \ref{eq:area}
+by the equation \ref{eq:rejection_min} where $\mathcal{R}$ is the minimal
+rejection required.
+
+\begin{align}
+\sum_{i=1}^n r_i & \geq \mathcal{R} & \label{eq:rejection_min}
+\end{align}
+
+\section{Design workflow}
+\label{sec:workflow}
+
+In this section, we describe the workflow to compute all the results presented in section~\ref{sec:fixed_area}.
+Figure~\ref{fig:workflow} shows the global workflow and the different steps involved in the computations of the results.
+
+\begin{figure}
+  \centering
+  \begin{tikzpicture}[node distance=0.75cm and 2cm]
+    \node[draw,minimum size=1cm] (Solver) { Filter Solver } ;
+    \node (Start) [left= 3cm of Solver] { } ;
+    \node[draw,minimum size=1cm] (TCL) [right= of Solver] { TCL Script } ;
+    \node (Input) [above= of TCL] { } ;
+    \node[draw,minimum size=1cm] (Deploy) [below= of Solver] { Deploy Script } ;
+    \node[draw,minimum size=1cm] (Bitstream) [below= of TCL] { Bitstream } ;
+    \node[draw,minimum size=1cm,rounded corners] (Board) [below right= of Deploy] { Board } ;
+    \node[draw,minimum size=1cm] (Postproc) [below= of Deploy] { Post-Processing } ;
+    \node (Results) [left= of Postproc] { } ;
+
+    \draw[->] (Start) edge node [above] { $\mathcal{A}, n, \Pi^I$ } node [below] { $(C_{ij}, \pi_{ij}^C), F$ } (Solver) ;
+    \draw[->] (Input) edge node [left] { ADC or PRN } (TCL) ;
+    \draw[->] (Solver) edge node [below] { (1a) } (TCL) ;
+    \draw[->] (Solver) edge node [right] { (1b) } (Deploy) ;
+    \draw[->] (TCL) edge node [left] { (2) } (Bitstream) ;
+    \draw[->,dashed] (Bitstream) -- (Deploy) ;
+    \draw[->] (Deploy) to[out=-30,in=120] node [above] { (3) } (Board) ;
+    \draw[->] (Board) to[out=150,in=-60] node [below] { (4) } (Deploy) ;
+    \draw[->] (Deploy) edge node [left] { (5) } (Postproc) ;
+    \draw[->] (Postproc) -- (Results) ;
+  \end{tikzpicture}
+  \caption{Design workflow from the input parameters to the results}
+  \label{fig:workflow}
+\end{figure}
+
+The filter solver is a C++ program that takes as input the maximum area
+$\mathcal{A}$, the number of stages $n$, the size of the input signal $\Pi^I$,
+the FIR configurations $(C_{ij}, \pi_{ij}^C)$ and the function $F$. It creates
+the quadratic programs and uses the Gurobi solver to get the optimal results.
+Then it produces two scripts: a TCL script ((1a) on figure~\ref{fig:workflow})
+and a deploy script ((1b) on figure~\ref{fig:workflow}).
+
+The TCL script describes the whole digital processing chain from the beginning
+(the raw signal data) to the end (the filtered data).
+The raw input data generated from a Pseudo Random Number (PRN)
+generator inside the FPGA and $\Pi^I$ is fixed at 16~bits.
+Then the script builds each stage of the chain with a generic FIR task that
+comes from a skeleton library. The generic FIR is highly configurable
+with the number of coefficients and the size of the coefficients. The coefficients
+themselves are not stored in the script.
+Whereas the signal is processed in real-time, the output signal is stored as
+consecutive bursts of data.
+
+The TCL script is used by Vivado to produce the FPGA bitstream ((2) on figure~\ref{fig:workflow}).
+We use the 2018.2 version of Xilinx Vivado and we execute the synthesized
+bitstream on a Redpitaya board fitted with a Xilinx Zynq-7010 series
+FPGA (xc7z010clg400-1) and two 125~MS/s ADC.
+The board works with a Buildroot Linux image. We have developed some tools and
+drivers to flash and communicate with the FPGA. They are used to automatize all
+the workflow inside the board: load the filter coefficients and retrieve the
+computed data.
+
+The deploy script uploads the bitstream to the board ((3) on
+figure~\ref{fig:workflow}), flashes the FPGA, loads the different drivers,
+configures the coefficients of the FIR filters. It then waits for the results
+and retrieves the data to the main computer ((4) on figure~\ref{fig:workflow}).
+
+Finally, an Octave post-processing script computes the final results thanks to
+the output data ((5) on figure~\ref{fig:workflow}).
+The results are normalized so that the Power Spectrum Density (PSD) starts at zero
+and the different configurations can be compared.
+
+The workflow used to compute the results in section~\ref{sec:fixed_rej}, we
+have just adapted the quadratic program but the rest of the workflow is unchanged.
+
 \section{Experiments with fixed area space}
+\label{sec:fixed_area}
+This section presents the output of the filter solver {\em i.e.} the computed
+configurations for each stage, the computed rejection and the computed silicon area.
+This is interesting to understand the choices made by the solver to compute its solutions.
  
+The experimental setup is composed of three cases. The raw input is generated
+by a Pseudo Random Number (PRN) generator, which fixes the input data size $\Pi^I$.
+Then the total silicon area $\mathcal{A}$ has been fixed to either 500, 1000 or 1500
+arbitrary units. Hence, the three cases have been named: MAX/500, MAX/1000, MAX/1500.
+The number of configurations $p$ is 1827, with $C_i$ ranging from 3 to 60 and $\pi^C$
+ranging from 2 to 22. In each case, the quadratic program has been able to give a
+result up to five stages ($n = 5$) in the cascaded filter.
+
+Table~\ref{tbl:gurobi_max_500} shows the results obtained by the filter solver for MAX/500.
+Table~\ref{tbl:gurobi_max_1000} shows the results obtained by the filter solver for MAX/1000.
+Table~\ref{tbl:gurobi_max_1500} shows the results obtained by the filter solver for MAX/1500.
+
+\renewcommand{\arraystretch}{1.4}
+
+\begin{table}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/500}
+  \label{tbl:gurobi_max_500}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area  \\
+        \hline
+            1 & (21, 7, 0)  & -           & -           & -           & -           & 32~dB           & 483   \\
+            2 & (3, 3, 15)  & (31, 9, 0)  & -           & -           & -           & 58~dB           & 460   \\
+            3 & (3, 3, 15)  & (27, 9, 0)  & (5, 3, 0)   & -           & -           & 66~dB           & 488   \\
+            4 & (3, 3, 15)  & (19, 7, 0)  & (11, 5, 0)  & (3, 3, 0)   & -           & 74~dB           & 499   \\
+            5 & (3, 3, 15)  & (23, 8, 0)  & (3, 3, 1)   & (3, 3, 0)   & (3, 3, 0)   & 78~dB           & 489   \\
+        \hline
+      \end{tabular}
+    }
+\end{table}
+
+\begin{table}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1000}
+  \label{tbl:gurobi_max_1000}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area \\
+        \hline
+            1 & (37, 11, 0) & -           & -           & -           & -           & 56~dB           & 999  \\
+            2 & (3, 3, 15)  & (51, 14, 0) & -           & -           & -           & 87~dB           & 975  \\
+            3 & (3, 3, 15)  & (35, 11, 0) & (19, 7, 0)  & -           & -           & 99~dB           & 1000 \\
+            4 & (3, 4, 16)  & (27, 8, 0)  & (19, 7, 1)  & (11, 5, 0)  & -           & 103~dB          & 998  \\
+            5 & (3, 3, 15)  & (31, 9, 0)  & (19, 7, 0)  & (3, 3, 1)   & (3, 3, 0)   & 111~dB          & 984  \\
+        \hline
+      \end{tabular}
+    }
+\end{table}
+
+\begin{table}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MAX/1500}
+  \label{tbl:gurobi_max_1500}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area  \\
+        \hline
+            1 & (47, 15, 0) & -           & -           & -           & -           & 71~dB           & 1457  \\
+            2 & (19, 6, 15) & (51, 14, 0) & -           & -           & -           & 103~dB          & 1489  \\
+            3 & (3, 3, 15)  & (35, 11, 0) & (35, 11, 0) & -           & -           & 122~dB          & 1492  \\
+            4 & (3, 3, 15)  & (27, 8, 0)  & (19, 7, 0)  & (27, 9, 0)  & -           & 129~dB          & 1498  \\
+            5 & (3, 3, 15)  & (23, 9, 2)  & (27, 9, 0)  & (19, 7, 0)  & (3, 3, 0)   & 136~dB          & 1499  \\
+        \hline
+      \end{tabular}
+    }
+\end{table}
+
+\renewcommand{\arraystretch}{1}
+
+From these tables, we can first state that the more stages are used to define
+the cascaded FIR filters, the better the rejection. It was an expected result as it has
+been previously observed that many small filters are better than
+a single large filter \cite{lim_1988, lim_1996, young_1992}, despite such conclusion
+being hardly used in practice due to the lack of tools for identifying individual filter
+coefficients in the cascaded approach.
+
+Second, the larger the silicon area, the better the rejection. This was also an
+expected result as more area means a filter of better quality (more coefficients
+or more bits per coefficient).
+
+Then, we also observe that the first stage can have a larger shift than the other
+stages. This is explained by the fact that the solver tries to use just enough
+bits for the computed rejection after each stage. In the first stage, a
+balance between a strong rejection with a low number of bits is targeted. Equation~\ref{eq:maxshift}
+gives the relation between both values.
+
+Finally, we note that the solver consumes all the given silicon area.
+
+The following graphs present the rejection for real data on the FPGA. In all following
+figures, the solid line represents the actual rejection of the filtered
+data on the FPGA as measured experimentally and the dashed line are the noise level
+given by the quadratic solver. The configurations are those computed in the previous section.
+
+Figure~\ref{fig:max_500_result} shows the rejection of the different configurations in the case of MAX/500.
+Figure~\ref{fig:max_1000_result} shows the rejection of the different configurations in the case of MAX/1000.
+Figure~\ref{fig:max_1500_result} shows the rejection of the different configurations in the case of MAX/1500.
+
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/max_rejection/prn_500}
-\caption{Experimental results for design with PRN as data input and 500 a.u. as max arbitrary space}
-\label{fig:prn_500}
+\includegraphics[width=\linewidth]{images/max_500}
+\caption{Signal spectrum for MAX/500}
+\label{fig:max_500_result}
 \end{figure}
  
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/max_rejection/prn_1000}
-\caption{Experimental results for design with PRN as data input and 1000 a.u. as max arbitrary space}
-\label{fig:prn_1000}
+\includegraphics[width=\linewidth]{images/max_1000}
+\caption{Signal spectrum for MAX/1000}
+\label{fig:max_1000_result}
 \end{figure}
  
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/max_rejection/prn_2000}
-\caption{Experimental results for design with PRN as data input and 2000 a.u. as max arbitrary space}
-\label{fig:prn_2000}
+\includegraphics[width=\linewidth]{images/max_1500}
+\caption{Signal spectrum for MAX/1500}
+\label{fig:max_1500_result}
 \end{figure}
  
-\begin{table}
-\centering
-\begin{tabular}{|c|c|ccc|c|c|}
-\hline
-\multicolumn{2}{|c|}{\multirow{2}{*}{Stage}}  & \multicolumn{3}{c|}{Stage}  & \multirow{2}{*}{Rejection} & \multirow{2}{*}{Area} \\ \cline{3-5}
-\multicolumn{2}{|c|}{}                        & i = 1 & i = 2 & i = 3       &                            &                       \\ \hline
-      & C                                     & 19    & -     & -           &                            &                       \\
-n = 1 & $pi^C$                                & 7     & -     & -           & 33 dB                      & 437 a.u.              \\
-      & $pi^S$                                & 0     & -     & -           &                            &                       \\ \hline
-      & C                                     & 11    & 19    & -           &                            &                       \\
-n = 2 & $pi^C$                                & 5     & 7     & -           & 53 dB                      & 478 a.u.              \\
-      & $pi^S$                                & 16    & 0     & -           &                            &                       \\ \hline
-      & C                                     & 9     & 15    & 11          &                            &                       \\
-n = 3 & $pi^C$                                & 4     & 6     & 5           & 57 dB                      & 499 a.u.              \\
-      & $pi^S$                                & 16    & 3     & 0           &                            &                       \\ \hline
-\end{tabular}
-\caption{Solver results for design with PRN as data input and 500 a.u. as max arbitrary space}
-\label{tbl:prn_500}
-\end{table}
+In all cases, we observe that the actual rejection is close to the rejection computed by the solver.
  
+We compare the actual silicon resources given by Vivado to the
+resources in arbitrary units.
+The goal is to check that our arbitrary units of silicon area models well enough
+the real resources on the FPGA. Especially we want to verify that, for a given
+number of arbitrary units, the actual silicon resources do not depend on the
+number of stages $n$. Most significantly, our approach aims
+at remaining far enough from the practical logic gate implementation used by
+various vendors to remain platform independent and be portable from one
+architecture to another.
+
+Table~\ref{tbl:resources_usage} shows the resources usage in the case of MAX/500, MAX/1000 and
+MAX/1500 \emph{i.e.} when the maximum allowed silicon area is fixed to 500, 1000
+and 1500 arbitrary units. We have taken care to extract solely the resources used by
+the FIR filters and remove additional processing blocks including FIFO and PL to
+PS communication.
+
 \begin{table}
-\centering
-{\scalefont{0.85}
-\begin{tabular}{|c|c|ccccc|c|c|}
-\hline
-\multicolumn{2}{|c|}{\multirow{2}{*}{Stage}}  & \multicolumn{5}{c|}{Stage}            & \multirow{2}{*}{Rejection} & \multirow{2}{*}{Area} \\ \cline{3-7}
-\multicolumn{2}{|c|}{}                        & i = 1 & i = 2 & i = 3 & i = 4 & i = 5 &                            &                       \\ \hline
-      & C                                     & 37    & -     & -     & -     & -     &                            &                       \\
-n = 1 & $pi^C$                                & 11    & -     & -     & -     & -     & 56 dB                      & 999 a.u.              \\
-      & $pi^S$                                & 0     & -     & -     & -     & -     &                            &                       \\ \hline
-      & C                                     & 11    & 39    & -     & -     & -     &                            &                       \\
-n = 2 & $pi^C$                                & 5     & 13    & -     & -     & -     & 82 dB                      & 972 a.u.              \\
-      & $pi^S$                                & 16    & 0     & -     & -     & -     &                            &                       \\ \hline
-      & C                                     & 9     & 31    & 19    & -     & -     &                            &                       \\
-n = 3 & $pi^C$                                & 7     & 8     & 7     & -     & -     & 93 dB                      & 990 a.u.              \\
-      & $pi^S$                                & 19    & 2     & 0     & -     & -     &                            &                       \\ \hline
-      & C                                     & 9     & 19    & 17    & 11    & -     &                            &                       \\
-n = 4 & $pi^C$                                & 4     & 7     & 7     & 5     & -     & 99 dB                      & 992 a.u.              \\
-      & $pi^S$                                & 16    & 3     & 3     & 0     & -     &                            &                       \\ \hline
-      & C                                     & 9     & 15    & 11    & 11    & 11    &                            &                       \\
-n = 5 & $pi^C$                                & 4     & 7     & 5     & 5     & 5     & 99 dB                      & 998 a.u.              \\
-      & $pi^S$                                & 16    & 3     & 2     & 1     & 1     &                            &                       \\ \hline
-\end{tabular}
-}
-\caption{Solver results for design with PRN as data input and 1000 a.u. as max arbitrary space}
-\label{tbl:prn_1000}
+  \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
+  \label{tbl:resources_usage}
+  \centering
+      \begin{tabular}{|c|c|ccc|c|}
+        \hline
+        $n$ &          & MAX/500  & MAX/1000 & MAX/1500 & \emph{Zynq 7010}         \\ \hline\hline
+            & LUT      & 249      & 453      & 627      & \emph{17600}             \\
+        1   & BRAM     & 1        & 1        & 1        & \emph{120}               \\
+            & DSP      & 21       & 37       & 47       & \emph{80}                \\ \hline
+            & LUT      & 2374     & 5494     & 691      & \emph{17600}             \\
+        2   & BRAM     & 2        & 2        & 2        & \emph{120}               \\
+            & DSP      & 0        & 0        & 70       & \emph{80}                \\ \hline
+            & LUT      & 2443     & 3304     & 3521     & \emph{17600}             \\
+        3   & BRAM     & 3        & 3        & 3        & \emph{120}               \\
+            & DSP      & 0        & 19       & 35       & \emph{80}                \\ \hline
+            & LUT      & 2634     & 3753     & 2557     & \emph{17600}             \\
+        4   & BRAM     & 4        & 4        & 4        & \emph{120}               \\
+            & DPS      & 0        & 19       & 46       & \emph{80}                \\ \hline
+            & LUT      & 2423     & 3047     & 2847     & \emph{17600}             \\
+        5   & BRAM     & 5        & 5        & 5        & \emph{120}               \\
+            & DPS      & 0        & 22       & 46       & \emph{80}                \\ \hline
+      \end{tabular}
 \end{table}
  
+In some cases, Vivado replaces the DSPs by Look Up Tables (LUTs). We assume that,
+when the filters coefficients are small enough, or when the input size is small
+enough, Vivado optimized resource consumption by selecting multiplexers to
+implement the multiplications instead of a DSP. In this case, it is quite difficult
+to compare the whole silicon budget.
+
+However, a rough estimation can be made with a simple equivalence. Looking at
+the first column (MAX/500), where the number of LUTs is quite stable for $n \geq 2$,
+we can deduce that a DSP is roughly equivalent to 100~LUTs in terms of silicon
+area use. With this equivalence, our 500 arbitraty units corresponds to 2500 LUTs,
+1000 arbitrary units corresponds to 5000 LUTs and 1500 arbitrary units corresponds
+to 7300 LUTs. The conclusion is that the orders of magnitude of our arbitrary
+unit are quite good. The relatively small differences can probably be explained
+by the optimizations done by Vivado based on the detailed map of available processing resources.
+
+We present the computation time to solve the quadratic problem.
+For each case, the filter solver software are executed with a Intel(R) Xeon(R) CPU E5606
+cadenced at 2.13~GHz. The CPU has 8 cores that are used by Gurobi to solve
+the quadratic problem.
+
+Table~\ref{tbl:area_time} shows the time needed to solve the quadratic
+problem when the maximal area is fixed to 500, 1000 and 1500 arbitrary units.
+
 \begin{table}
+\caption{Time to solve the quadratic program with Gurobi}
+\label{tbl:area_time}
 \centering
-{\scalefont{0.85}
-\begin{tabular}{|c|c|ccccc|c|c|}
-\hline
-\multicolumn{2}{|c|}{\multirow{2}{*}{Stage}}  & \multicolumn{5}{c|}{Stage}            & \multirow{2}{*}{Rejection} & \multirow{2}{*}{Area} \\ \cline{3-7}
-\multicolumn{2}{|c|}{}                        & i = 1 & i = 2 & i = 3 & i = 4 & i = 5 &                            &                       \\ \hline
-      & C                                     & 39    & -     & -     & -     & -     &                            &                       \\
-n = 1 & $pi^C$                                & 13    & -     & -     & -     & -     & 61 dB                      & 1131 a.u.             \\
-      & $pi^S$                                & 0     & -     & -     & -     & -     &                            &                       \\ \hline
-      & C                                     & 37    & 39    & -     & -     & -     &                            &                       \\
-n = 2 & $pi^C$                                & 11    & 13    & -     & -     & -     & 117 dB                     & 1974 a.u.             \\
-      & $pi^S$                                & 17    & 0     & -     & -     & -     &                            &                       \\ \hline
-      & C                                     & 15    & 35    & 35    & -     & -     &                            &                       \\
-n = 3 & $pi^C$                                & 9     & 11    & 11    & -     & -     & 138 dB                     & 1985 a.u.             \\
-      & $pi^S$                                & 19    & 3     & 0     & -     & -     &                            &                       \\ \hline
-      & C                                     & 11    & 27    & 27    & 23    & -     &                            &                       \\
-n = 4 & $pi^C$                                & 5     & 9     & 9     & 9     & -     & 148 dB                     & 1993 a.u.             \\
-      & $pi^S$                                & 16    & 3     & 2     & 0     & -     &                            &                       \\ \hline
-      & C                                     & 11    & 27    & 31    & 11    & 11    &                            &                       \\
-n = 5 & $pi^C$                                & 5     & 9     & 8     & 5     & 5     & 153 dB                     & 2000 a.u.             \\
-      & $pi^S$                                & 16    & 3     & 1     & 0     & 1     &                            &                       \\ \hline
+\begin{tabular}{|c|c|c|c|}\hline
+$n$ & Time (MAX/500)          & Time (MAX/1000)             & Time (MAX/1500)              \\\hline\hline
+1   & 0.1~s                   & 0.1~s                       & 0.3~s                        \\
+2   & 1.1~s                   & 2.2~s                       & 12~s                         \\
+3   & 17~s                    & 137~s  ($\approx$ 2~min)    & 275~s ($\approx$ 4~min)      \\
+4   & 52~s                    & 5448~s ($\approx$ 90~min)   & 5505~s ($\approx$ 17~h)      \\
+5   & 286~s ($\approx$ 4~min) & 4119~s ($\approx$ 68~min)   & 235479~s ($\approx$ 3~days)  \\\hline
 \end{tabular}
-}
-\caption{Solver results for design with PRN as data input and 2000 a.u. as max arbitrary space}
-\label{tbl:prn_2000}
 \end{table}
  
+As expected, the computation time seems to rise exponentially with the number of stages. % TODO: exponentiel ?
+When the area is limited, the design exploration space is more limited and the solver is able to
+find an optimal solution faster. On the contrary, in the case of MAX/1500 with
+5~stages, we were not able to obtain a result after 40~hours of computation so we decided to stop.
+
+\section{Experiments with fixed rejection target}
+\label{sec:fixed_rej}
+This section presents the results of complementary quadratic program which we
+minimize the area occupation for a targeted noise level.
+
+The experimental setup is also composed of three cases. The raw input is the same
+as previous section, a PRN generator, which fixes the input data size $\Pi^I$.
+Then the targeted rejection $\mathcal{R}$ has been fixed to either 40, 60 or 80~dB.
+Hence, the three cases have been named: MIN/40, MIN/60, MIN/80.
+The number of configurations $p$ is the same as previous section.
+
+Table~\ref{tbl:gurobi_min_40} shows the results obtained by the filter solver for MIN/40.
+Table~\ref{tbl:gurobi_min_60} shows the results obtained by the filter solver for MIN/60.
+Table~\ref{tbl:gurobi_min_80} shows the results obtained by the filter solver for MIN/80.
+
+\renewcommand{\arraystretch}{1.4}
+
 \begin{table}
-\centering
-\begin{tabular}{|c|c|c|c|c|}\hline
-Input  & Stages & Computation time        & Vivado time      &  Redpitaya time  \\\hline\hline
-       & 1      & 0.02~s                  & $\approx$ 20 min & $\approx$ 1 min  \\
-PRN    & 2      & 1.70~s                  & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 3      & 19~s                    & $\approx$ 20 min & $\approx$ 1 min  \\\hline
-\end{tabular}
-\caption{Time to compute and deploy the designs for PRN 500}
-\label{tbl:time_prn_500}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/40}
+  \label{tbl:gurobi_min_40}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area  \\
+        \hline
+            1 & (27, 8, 0)  & -           & -           & -           & -           & 41~dB           & 648   \\
+            2 & (3, 2, 14)  & (19, 7, 0)  & -           & -           & -           & 40~dB           & 263   \\
+            3 & (3, 3, 15)  & (11, 5, 0)  & (3, 3, 0)   & -           & -           & 41~dB           & 192   \\
+            4 & (3, 3, 15)  & (3, 3, 0)   & (3, 3, 0)   & (3, 3, 0)   & -           & 42~dB           & 147   \\
+        \hline
+      \end{tabular}
+    }
 \end{table}
  
 \begin{table}
-\centering
-\begin{tabular}{|c|c|c|c|c|}\hline
-Input  & Stages & Computation time        & Vivado time      &  Redpitaya time  \\\hline\hline
-       & 1      & 0.07~s                  & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 2      & 1.31~s                  & $\approx$ 20 min & $\approx$ 1 min  \\
-PRN    & 3      & 119~s ($\approx$ 2~min) & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 4      & 270~s ($\approx$ 5~min) & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 5      & 5998~s ($\approx$ 2~h)  & $\approx$ 20 min & $\approx$ 1 min  \\\hline
-\end{tabular}
-\caption{Time to compute and deploy the designs for PRN 1000}
-\label{tbl:time_prn_1000}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/60}
+  \label{tbl:gurobi_min_60}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area \\
+        \hline
+            1 & (39, 13, 0) & -           & -           & -           & -           & 60~dB           & 1131 \\
+            2 & (3, 3, 15)  & (35, 10, 0) & -           & -           & -           & 60~dB           & 547  \\
+            3 & (3, 3, 15)  & (27, 8, 0)  & (3, 3, 0)   & -           & -           & 62~dB           & 426  \\
+            4 & (3, 2, 14)  & (11, 5, 1)  & (11, 5, 0)  & (3, 3, 0)   & -           & 60~dB           & 344  \\
+            5 & (3, 2, 14)  & (3, 3, 1)   & (3, 3, 0)   & (3, 3, 0)   & (3, 3, 0)   & 60~dB           & 279  \\
+        \hline
+      \end{tabular}
+    }
 \end{table}
  
 \begin{table}
-\centering
-\begin{tabular}{|c|c|c|c|c|}\hline
-Input  & Stages & Computation time          & Vivado time      &  Redpitaya time  \\\hline\hline
-       & 1      & 0.07~s                    & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 2      & 0.75~s                    & $\approx$ 20 min & $\approx$ 1 min  \\
-PRN    & 3      & 36~s                      & -                & -                \\
-       & 4      & 14500~s ($\approx$ 4~h)   & $\approx$ 20 min & $\approx$ 1 min  \\
-       & 5      & 74237~s ($\approx$ 20~h)  & $\approx$ 20 min & $\approx$ 1 min  \\\hline
-\end{tabular}
-\caption{Time to compute and deploy the designs for PRN 2000}
-\label{tbl:time_prn_2000}
+  \caption{Configurations $(C_i, \pi_i^C, \pi_i^S)$, rejections and areas (in arbitrary units) for MIN/80}
+  \label{tbl:gurobi_min_80}
+  \centering
+    {\scalefont{0.77}
+      \begin{tabular}{|c|ccccc|c|c|}
+        \hline
+         $n$  & $i = 1$     & $i = 2$     & $i = 3$     & $i = 4$     & $i = 5$     & Rejection       & Area  \\
+        \hline
+            1 & (55, 16, 0) & -           & -           & -           & -           & 81~dB           & 1760  \\
+            2 & (3, 3, 15)  & (47, 14, 0) & -           & -           & -           & 80~dB           & 903   \\
+            3 & (3, 3, 15)  & (23, 9, 0)  & (19, 7, 0)  & -           & -           & 80~dB           & 698   \\
+            4 & (3, 3, 15)  & (27, 9, 0)  & (7, 7, 4)   & (3, 3, 0)   & -           & 80~dB           & 605   \\
+            5 & (3, 2, 14)  & (27, 8, 0)  & (3, 3, 1)   & (3, 3, 0)   & (3, 3, 0)   & 81~dB           & 534   \\
+        \hline
+      \end{tabular}
+    }
 \end{table}
+\renewcommand{\arraystretch}{1}
  
-\section{Experiments with fixed rejection target}
+From these tables, we can first state that all configuration reach the target rejection
+level and more we have stages lesser is the area occupied in arbitrary unit.
+Futhermore, the area of the monolithic filter is twice bigger than the two cascaded.
+More generally, more there is filters lower is the occupied area.
  
+Like in previous section, the solver choose always a little filter as first
+filter stage and the second one is often the biggest filter. this choice can be explain
+as the previous section. The solver uses just enough bits to not degrade the input
+signal and in second filter it can choose a better filter to improve rejection without
+have too bits in the output data.
+
+For the specific case in MIN/40 for $n = 5$ the solver has determined that the optimal
+number of filter is 4 so it not chose any configuration in last filter. Hence this
+solution is equivalent to the result for $n = 4$.
+
+The following graphs present the rejection for real data on the FPGA. In all following
+figures, the solid line represents the actual rejection of the filtered
+data on the FPGA as measured experimentally and the dashed line are the noise level
+given by the quadratic solver.
+
+Figure~\ref{fig:min_40} shows the rejection of the different configurations in the case of MIN/40.
+Figure~\ref{fig:min_60} shows the rejection of the different configurations in the case of MIN/60.
+Figure~\ref{fig:min_80} shows the rejection of the different configurations in the case of MIN/80.
+
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/min_area/prn_50}
-\caption{Results for design with PRN as data input and 50 dB as aimed rejection level}
-\label{fig:prn_500}
+\includegraphics[width=\linewidth]{images/min_40}
+\caption{Signal spectrum for MIN/40}
+\label{fig:min_40}
 \end{figure}
  
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/min_area/prn_100}
-\caption{Results for design with PRN as data input and 50 dB as aimed rejection level}
-\label{fig:prn_100}
+\includegraphics[width=\linewidth]{images/min_60}
+\caption{Signal spectrum for MIN/60}
+\label{fig:min_60}
 \end{figure}
  
 \begin{figure}
 \centering
-\includegraphics[width=\linewidth]{images/min_area/prn_150}
-\caption{Results for design with PRN as data input and 2000 a.u. as max arbitrary space}
-\label{fig:prn_150}
+\includegraphics[width=\linewidth]{images/min_80}
+\caption{Signal spectrum for MIN/80}
+\label{fig:min_80}
 \end{figure}
  
+We observe that all rejections given by the quadratic solver are close to the real
+rejection. All curves prove that the constraint to reach the target rejection is
+respected both monolithic filter or cascaded filters.
+
+Table~\ref{tbl:resources_usage} shows the resources usage in the case of MIN/40, MIN/60 and
+MIN/80 \emph{i.e.} when the target rejection is fixed to 40, 60 and 80~dB. We
+have taken care to extract solely the resources used by
+the FIR filters and remove additional processing blocks including FIFO and PL to
+PS communication.
+
+\begin{table}
+  \caption{Resource occupation. The last column refers to available resources on a Zynq-7010 as found on the Redpitaya.}
+  \label{tbl:resources_usage_comp}
+  \centering
+      \begin{tabular}{|c|c|ccc|c|}
+        \hline
+        $n$ &          & MIN/40   & MIN/60   & MIN/80   & \emph{Zynq 7010}         \\ \hline\hline
+            & LUT      & 343      & 334      & 772      & \emph{17600}             \\
+        1   & BRAM     & 1        & 1        & 1        & \emph{120}               \\
+            & DSP      & 27       & 39       & 55       & \emph{80}                \\ \hline
+            & LUT      & 1252     & 2862     & 5099     & \emph{17600}             \\
+        2   & BRAM     & 2        & 2        & 2        & \emph{120}               \\
+            & DSP      & 0        & 0        & 0        & \emph{80}                \\ \hline
+            & LUT      & 891      & 2148     & 2023     & \emph{17600}             \\
+        3   & BRAM     & 3        & 3        & 3        & \emph{120}               \\
+            & DSP      & 0        & 0        & 19       & \emph{80}                \\ \hline
+            & LUT      & 662      & 1729     & 2451     & \emph{17600}             \\
+        4   & BRAM     & 4        & 4        & 4        & \emph{120}               \\
+            & DPS      & 0        & 0        & 7        & \emph{80}                \\ \hline
+            & LUT      & -        & 1259     & 2602     & \emph{17600}             \\
+        5   & BRAM     & -        & 5        & 5        & \emph{120}               \\
+            & DPS      & -        & 0        & 0        & \emph{80}                \\ \hline
+      \end{tabular}
+\end{table}
+
+If we keep the previous estimation of cost of one DSP in term of LUT (1 DSP $\approx$ 100 LUT)
+the real resource consumption decrease in function of number of stage filter according
+to the solution given by the quadratic solver. Indeed, we have always a decreasing
+consumption even if the difference between the monolithic and the two cascaded
+filters is lesser than expected.
+
+Finally, the table~\ref{tbl:area_time_comp} shows the computation time to solve
+the quadratic program.
+
+\begin{table}
+\caption{Time to solve the quadratic program with Gurobi}
+\label{tbl:area_time_comp}
+\centering
+\begin{tabular}{|c|c|c|c|}\hline
+$n$ & Time (MIN/40)           & Time (MIN/60)               & Time (MIN/80)  \\\hline\hline
+1   & 0.07~s                  & 0.02~s                      & 0.01~s         \\
+2   & 7.8~s                   & 16~s                        & 14~s           \\
+3   & 4.7~s                   & 14~s                        & 28~s           \\
+4   & 39~s                    & 20~s                        & 193~s          \\
+5   & 126~s                   & 12~s                        & 170~s          \\\hline
+\end{tabular}
+\end{table}
+
+The time needed to solve this configuration are substantially faster than time
+needed in the previous section. Indeed the worst time in this case is only 3~minutes
+in balance of 3~days on previous section. We are able to solve more easily this
+problem than the previous one.
+
 \section{Conclusion}
+
+In this paper, we have proposed a new approach to work with a cascade of FIR filter inside a FPGA.
+This method aims to be hardware independent and focus an high-level of abstraction.
+We have modeled the FIR filter operation and the data shift impact. With this model
+we have created a quadratic program to select the optimal FIR coefficient set to reject a
+maximum of noise. In our experiments we have chosen deliberately some common tools
+to design the filter coefficients but we can use any other method.
+
+Our experimental results are very promising in providing a rational approach to selecting
+the coefficients of each FIR filter in the context of a performance target for a chain of
+such filters. The FPGA design that is produced automatically by our
+workflow is able to filter an input signal as expected which validates our model and our approach.
+We can easily change the quadratic program to adapt it to an other problem.
+
+A perspective is to model and add the decimators to the processing chain to have a classical
+FIR filter and decimator. The impact of the decimator is not so trivial, especially in terms of silicon
+area for the subsequent stages since some hardware optimization can be applied in
+this case.
+
+The software used to demonstrate the concepts developed in this paper is based on the
+CPU-FPGA co-design framework available at \url{https://github.com/oscimp/oscimpDigital}.
  
 \section*{Acknowledgement}
  
@@ -10,7 +10,7 @@
 }
  
 @article{kodek1980design,
-  title={Design of optimal finite wordlength {FIR} digital filters using integer 
+  title={Design of optimal finite wordlength {FIR} digital filters using integer
 programming techniques},
   author={Kodek, Dusan},
   journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
@@ -43,4 +43,56 @@
   year={2016},
   publisher={AIP Publishing}
 }
+
+@inproceedings{lim_1996,
+author={Y.-C. Lim and R. Yang and B. Liu},
+booktitle={1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96},
+title={The design of cascaded FIR filters},
+year={1996},
+volume={2},
+number={},
+pages={181-184 vol.2},
+keywords={cascade networks;digital filters;FIR filters;filtering theory;linear programming;frequency response;cascaded FIR filters;stopband response;minimum attenuation requirement;passband ripple magnitude;linear-programming technique;FIR filter design;filter optimisation;Finite impulse response filter;IIR filters;Passband;Frequency;Signal sampling;Band pass filters;Digital filters;Attenuation;Image sampling;Linear programming},
+doi={10.1109/ISCAS.1996.540382},
+ISSN={},
+month={May},}
+
+@article{lim_1988,
+author={Y. C. {Lim} and B. {Liu}},
+journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
+title={Design of cascade form FIR filters with discrete valued coefficients},
+year={1988},
+volume={36},
+number={11},
+pages={1735-1739},
+keywords={cascade networks;digital filters;filtering and prediction theory;iterative equalisation strategy;cascade form FIR filters;discrete valued coefficients;peak ripple;prototype filter;roundoff noise property;Finite impulse response filter;Low pass filters;Band pass filters;Passband;Prototypes;Frequency;Digital filters;Digital arithmetic;Design optimization;Sampling methods},
+doi={10.1109/29.9010},
+ISSN={0096-3518},
+month={Nov},}
+
+@inproceedings{young_1992,
+author={C. {Young} and D. L. {Jones}},
+booktitle={[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing},
+title={Improvement in finite wordlength FIR digital filter design by cascading},
+year={1992},
+volume={5},
+number={},
+pages={109-112 vol.5},
+keywords={approximation theory;digital filters;integer programming;series (mathematics);finite wordlength filter;quantization;FIR digital filter design;finite impulse response;digital systems;finite wordlength coefficients;cascaded subfilters;stopband suppression;Taylor series approximation;linear integer program;passband deviation;Finite impulse response filter;Digital filters;Linear programming;Passband;Quantization;Frequency response;Digital systems;Taylor series;Minimax techniques;Design optimization},
+doi={10.1109/ICASSP.1992.226646},
+ISSN={1520-6149},
+month={March},}
+
+@article{smith_1998,
+author={L. M. {Smith}},
+journal={IEEE Transactions on Signal Processing},
+title={Decomposition of FIR digital filters for realization via the cascade connection of subfilters},
+year={1998},
+volume={46},
+number={6},
+pages={1681-1684},
+keywords={FIR filters;digital filters;cascade networks;Z transforms;transfer functions;frequency response;Newton-Raphson method;convergence of numerical methods;search problems;poles and zeros;FIR digital filters;subfilters cascade connection;even-order linear-phase FIR filters;filter decomposition;fourth-order subfilters;second-order subfilters;roots;z-domain filter transfer function;complex z plane;impulse response symmetry;unit circle;perimeter;complex values;real values;impulse response coefficients;root-finding algorithm;Newton-Raphson method;2D search;Cauchy-Riemann relations;convergence speed;frequency response characteristics;Finite impulse response filter;Digital filters;Polynomials;Programmable logic arrays;Transfer functions;Testing;Frequency response;Application specific integrated circuits;Nonlinear filters;Passband},
+doi={10.1109/78.678490},
+ISSN={1053-587X},
+month={June},}
...	...	@@ -10,7 +10,7 @@
10	10	}
11	11
12	12	@article{kodek1980design,
13		- title={Design of optimal finite wordlength {FIR} digital filters using integer
	13	+ title={Design of optimal finite wordlength {FIR} digital filters using integer
14	14	programming techniques},
15	15	author={Kodek, Dusan},
16	16	journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
...	...	@@ -43,4 +43,56 @@
43	43	year={2016},
44	44	publisher={AIP Publishing}
45	45	}
	46	+
	47	+@inproceedings{lim_1996,
	48	+author={Y.-C. Lim and R. Yang and B. Liu},
	49	+booktitle={1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96},
	50	+title={The design of cascaded FIR filters},
	51	+year={1996},
	52	+volume={2},
	53	+number={},
	54	+pages={181-184 vol.2},
	55	+keywords={cascade networks;digital filters;FIR filters;filtering theory;linear programming;frequency response;cascaded FIR filters;stopband response;minimum attenuation requirement;passband ripple magnitude;linear-programming technique;FIR filter design;filter optimisation;Finite impulse response filter;IIR filters;Passband;Frequency;Signal sampling;Band pass filters;Digital filters;Attenuation;Image sampling;Linear programming},
	56	+doi={10.1109/ISCAS.1996.540382},
	57	+ISSN={},
	58	+month={May},}
	59	+
	60	+@article{lim_1988,
	61	+author={Y. C. {Lim} and B. {Liu}},
	62	+journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
	63	+title={Design of cascade form FIR filters with discrete valued coefficients},
	64	+year={1988},
	65	+volume={36},
	66	+number={11},
	67	+pages={1735-1739},
	68	+keywords={cascade networks;digital filters;filtering and prediction theory;iterative equalisation strategy;cascade form FIR filters;discrete valued coefficients;peak ripple;prototype filter;roundoff noise property;Finite impulse response filter;Low pass filters;Band pass filters;Passband;Prototypes;Frequency;Digital filters;Digital arithmetic;Design optimization;Sampling methods},
	69	+doi={10.1109/29.9010},
	70	+ISSN={0096-3518},
	71	+month={Nov},}
	72	+
	73	+@inproceedings{young_1992,
	74	+author={C. {Young} and D. L. {Jones}},
	75	+booktitle={[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing},
	76	+title={Improvement in finite wordlength FIR digital filter design by cascading},
	77	+year={1992},
	78	+volume={5},
	79	+number={},
	80	+pages={109-112 vol.5},
	81	+keywords={approximation theory;digital filters;integer programming;series (mathematics);finite wordlength filter;quantization;FIR digital filter design;finite impulse response;digital systems;finite wordlength coefficients;cascaded subfilters;stopband suppression;Taylor series approximation;linear integer program;passband deviation;Finite impulse response filter;Digital filters;Linear programming;Passband;Quantization;Frequency response;Digital systems;Taylor series;Minimax techniques;Design optimization},
	82	+doi={10.1109/ICASSP.1992.226646},
	83	+ISSN={1520-6149},
	84	+month={March},}
	85	+
	86	+@article{smith_1998,
	87	+author={L. M. {Smith}},
	88	+journal={IEEE Transactions on Signal Processing},
	89	+title={Decomposition of FIR digital filters for realization via the cascade connection of subfilters},
	90	+year={1998},
	91	+volume={46},
	92	+number={6},
	93	+pages={1681-1684},
	94	+keywords={FIR filters;digital filters;cascade networks;Z transforms;transfer functions;frequency response;Newton-Raphson method;convergence of numerical methods;search problems;poles and zeros;FIR digital filters;subfilters cascade connection;even-order linear-phase FIR filters;filter decomposition;fourth-order subfilters;second-order subfilters;roots;z-domain filter transfer function;complex z plane;impulse response symmetry;unit circle;perimeter;complex values;real values;impulse response coefficients;root-finding algorithm;Newton-Raphson method;2D search;Cauchy-Riemann relations;convergence speed;frequency response characteristics;Finite impulse response filter;Digital filters;Polynomials;Programmable logic arrays;Transfer functions;Testing;Frequency response;Application specific integrated circuits;Nonlinear filters;Passband},
	95	+doi={10.1109/78.678490},
	96	+ISSN={1053-587X},
	97	+month={June},}