relecture

jfriedt
1 parent 56f7c40c96
Showing 1 changed file with 37 additions and 31 deletions Side-by-side Diff
ifcs2018_journal.tex
@@ -174,7 +174,7 @@
 that all solutions found by the solver are synthesized and executed on hardware at the end
 of the analysis.
  
-In this demonstration , we focus on only two operations: filtering and shifting the number of
+In this demonstration, we focus on only two operations: filtering and shifting the number of
 bits needed to represent the data along the processing chain.
 We have chosen these basic operations because shifting and the filtering have already been studied
 in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for
@@ -298,7 +298,8 @@
 Our criterion to compute the filter rejection considers
 % r2.8 et r2.2 r2.3
 the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values
-within the passband is subtracted to avoid filters with excessive ripples}. With this
+within the passband is subtracted to avoid filters with excessive ripples, normalized to the
+bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this
 criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.
  
 % \begin{figure}
@@ -355,9 +356,10 @@
 \end{figure}
  
 % r2.6
-Finally in our case, we consider that the input signal are fully known. So the
-resolution of the data stream are fixed and still the same for all experiments
-in this paper.
+{\color{red}
+Finally in our case, we consider that the input signal are fully known. The
+resolution of the input data stream are fixed and still the same for all experiments
+in this paper.}
  
 Based on this analysis, we address the estimate of resource consumption (called
 % r2.11
  
@@ -407,13 +409,15 @@
  
 {\color{red}
 This model is non-linear since we multiply some variable with another variable
-and it is even non-quadratic, as $F$ does not have a known
+and it is even non-quadratic, as the cost function $F$ does not have a known
 linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.
-This variable must be defined by the user, it represent the number of different
-set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}
-functions from GNU Octave). So $C_{ij}$ and $\pi_{ij}^C$ become constant and
-we defined $1 \leq j \leq p$ and the function $F$ can be estimate for each configurations
-thanks our rejection criterion. We also defined binary
+This variable $p$ is defined by the user, and represents the number of different
+set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}
+functions from GNU Octave) based on the targeted filter characteristics and implementation
+assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and 
+$\pi_{ij}^C$ become constants and
+we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table) 
+for each configurations thanks to the rejection criterion. We also define the binary
 variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
 and 0 otherwise. The new equations are as follows:
 }
@@ -430,15 +434,15 @@
 Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
  
 {\color{red}
-However the problem still quadratic since in the constraint~\ref{eq:areadef2} we multiply
-$\delta_{ij}$ and $\pi_i^-$. But like $\delta_{ij}$ is a binary variable we can
-linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size
-we define $0 < \pi_i^- \leq 128$ which is the maximal data size that we can process.
-}
-Moreover the Gurobi
-(\url{www.gurobi.com}) optimization software is used to solve this quadratic
-model, and since Gurobi is able to linearize, the model is left as is. This model
-has $O(np)$ variables and $O(n)$ constraints.
+However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2} 
+we multiply
+$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
+linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,
+we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is
+assumed on hardware characteristics.
+The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic
+model is able to linearize the model provided as is. This model
+has $O(np)$ variables and $O(n)$ constraints.}
  
 % This model is non-linear and even non-quadratic, as $F$ does not have a known
 % linear or quadratic expression. We introduce $p$ FIR configurations
@@ -1047,17 +1051,19 @@
 previous one.
  
 {\color{red} % r1.4
-To conclude we have compared our monolithic filters with the FIR Compiler form
-Xilinx. For each experimentation we use the same coefficient set and we compare the
-resources consumption. The table~\ref{tbl:xilinx_resources} exhibits the results.
-The FIR Compiler never use BRAM while our filter use one block. This difference
-can be explain be our wish to have a reconfigurable FIR filter. In our case, we can
-configure the coefficients set without to have to change the FPGA design. With
-the FIR compiler, the coefficients set are given during the FPGA design conception
-so we have to change the coefficients, we need to regenerate the design. The
-difference with the LUT consumption is also related to the reconfigurability
-logic. However the DSP consumption, the most restricted resource, are the same between the FIR compiler end
-our FIR block. Our solutions are as good as the Xilinx implementation.
+To conclude, we compare our monolithic filters with the FIR Compiler provided by
+Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the 
+same coefficient set and we compare the resource consumption, having checked that
+the transfer functions are indeed the same with both implementations. 
+Table~\ref{tbl:xilinx_resources} exhibits the results.
+The FIR Compiler never use BRAM while our filter implementation uses one block. This difference
+is explained be our wish to have a dynamically reconfigurable FIR filter whose
+coefficients can be updated from the processing system without having to update the FPGA design. 
+With the FIR compiler, the coefficients are defined during the FPGA design so that
+changing coefficients required generating a new design. The difference with the LUT consumption 
+is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest  
+resource, is the same between the Xilinx FIR Compiler end
+our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation.
  
 \renewcommand{\arraystretch}{1.2}
 \begin{table}
...	...	@@ -174,7 +174,7 @@
174	174	that all solutions found by the solver are synthesized and executed on hardware at the end
175	175	of the analysis.
176	176
177		-In this demonstration , we focus on only two operations: filtering and shifting the number of
	177	+In this demonstration, we focus on only two operations: filtering and shifting the number of
178	178	bits needed to represent the data along the processing chain.
179	179	We have chosen these basic operations because shifting and the filtering have already been studied
180	180	in the literature \cite{lim_1996, lim_1988, young_1992, smith_1998} providing a framework for
...	...	@@ -298,7 +298,8 @@
298	298	Our criterion to compute the filter rejection considers
299	299	% r2.8 et r2.2 r2.3
300	300	the maximum magnitude within the stopband, to which the {\color{red}sum of the absolute values
301		-within the passband is subtracted to avoid filters with excessive ripples}. With this
	301	+within the passband is subtracted to avoid filters with excessive ripples, normalized to the
	302	+bin width to remain consistent with the passband criterion (dBc/Hz units in all cases)}. With this
302	303	criterion, we meet the expected rejection capability of low pass filters as shown in figure~\ref{fig:custom_criterion}.
303	304
304	305	% \begin{figure}
...	...	@@ -355,9 +356,10 @@
355	356	\end{figure}
356	357
357	358	% r2.6
358		-Finally in our case, we consider that the input signal are fully known. So the
359		-resolution of the data stream are fixed and still the same for all experiments
360		-in this paper.
	359	+{\color{red}
	360	+Finally in our case, we consider that the input signal are fully known. The
	361	+resolution of the input data stream are fixed and still the same for all experiments
	362	+in this paper.}
361	363
362	364	Based on this analysis, we address the estimate of resource consumption (called
363	365	% r2.11
364	366
...	...	@@ -407,13 +409,15 @@
407	409
408	410	{\color{red}
409	411	This model is non-linear since we multiply some variable with another variable
410		-and it is even non-quadratic, as $F$ does not have a known
	412	+and it is even non-quadratic, as the cost function $F$ does not have a known
411	413	linear or quadratic expression. To linearize this problem, we introduce $p$ FIR configurations.
412		-This variable must be defined by the user, it represent the number of different
413		-set of coefficients generated (for memory, we use \texttt{firls} and \texttt{fir1}
414		-functions from GNU Octave). So $C_{ij}$ and $\pi_{ij}^C$ become constant and
415		-we defined $1 \leq j \leq p$ and the function $F$ can be estimate for each configurations
416		-thanks our rejection criterion. We also defined binary
	414	+This variable $p$ is defined by the user, and represents the number of different
	415	+set of coefficients generated (remember, we use \texttt{firls} and \texttt{fir1}
	416	+functions from GNU Octave) based on the targeted filter characteristics and implementation
	417	+assumptions (estimated number of bits defining the coefficients). Hence, $C_{ij}$ and
	418	+$\pi_{ij}^C$ become constants and
	419	+we define $1 \leq j \leq p$ so that the function $F$ can be estimated (Look Up Table)
	420	+for each configurations thanks to the rejection criterion. We also define the binary
417	421	variable $\delta_{ij}$ that has value 1 if stage~$i$ is in configuration~$j$
418	422	and 0 otherwise. The new equations are as follows:
419	423	}
...	...	@@ -430,15 +434,15 @@
430	434	Equation~\ref{eq:config} states that for each stage, a single configuration is chosen at most.
431	435
432	436	{\color{red}
433		-However the problem still quadratic since in the constraint~\ref{eq:areadef2} we multiply
434		-$\delta_{ij}$ and $\pi_i^-$. But like $\delta_{ij}$ is a binary variable we can
435		-linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size
436		-we define $0 < \pi_i^- \leq 128$ which is the maximal data size that we can process.
437		-}
438		-Moreover the Gurobi
439		-(\url{www.gurobi.com}) optimization software is used to solve this quadratic
440		-model, and since Gurobi is able to linearize, the model is left as is. This model
441		-has $O(np)$ variables and $O(n)$ constraints.
	437	+However the problem remains quadratic at this stage since in the constraint~\ref{eq:areadef2}
	438	+we multiply
	439	+$\delta_{ij}$ and $\pi_i^-$. However, since $\delta_{ij}$ is a binary variable we can
	440	+linearise this multiplication if we can bound $\pi_i^-$. As $\pi_i^-$ is the data size,
	441	+we define $0 < \pi_i^- \leq 128$ which is the maximum data size whose estimation is
	442	+assumed on hardware characteristics.
	443	+The Gurobi (\url{www.gurobi.com}) optimization software used to solve this quadratic
	444	+model is able to linearize the model provided as is. This model
	445	+has $O(np)$ variables and $O(n)$ constraints.}
442	446
443	447	% This model is non-linear and even non-quadratic, as $F$ does not have a known
444	448	% linear or quadratic expression. We introduce $p$ FIR configurations
...	...	@@ -1047,17 +1051,19 @@
1047	1051	previous one.
1048	1052
1049	1053	{\color{red} % r1.4
1050		-To conclude we have compared our monolithic filters with the FIR Compiler form
1051		-Xilinx. For each experimentation we use the same coefficient set and we compare the
1052		-resources consumption. The table~\ref{tbl:xilinx_resources} exhibits the results.
1053		-The FIR Compiler never use BRAM while our filter use one block. This difference
1054		-can be explain be our wish to have a reconfigurable FIR filter. In our case, we can
1055		-configure the coefficients set without to have to change the FPGA design. With
1056		-the FIR compiler, the coefficients set are given during the FPGA design conception
1057		-so we have to change the coefficients, we need to regenerate the design. The
1058		-difference with the LUT consumption is also related to the reconfigurability
1059		-logic. However the DSP consumption, the most restricted resource, are the same between the FIR compiler end
1060		-our FIR block. Our solutions are as good as the Xilinx implementation.
	1054	+To conclude, we compare our monolithic filters with the FIR Compiler provided by
	1055	+Xilinx in the Vivado software suite (v.2018.2). For each experiment we use the
	1056	+same coefficient set and we compare the resource consumption, having checked that
	1057	+the transfer functions are indeed the same with both implementations.
	1058	+Table~\ref{tbl:xilinx_resources} exhibits the results.
	1059	+The FIR Compiler never use BRAM while our filter implementation uses one block. This difference
	1060	+is explained be our wish to have a dynamically reconfigurable FIR filter whose
	1061	+coefficients can be updated from the processing system without having to update the FPGA design.
	1062	+With the FIR compiler, the coefficients are defined during the FPGA design so that
	1063	+changing coefficients required generating a new design. The difference with the LUT consumption
	1064	+is also attributed to the reconfigurability logic. However the DSP consumption, the scarcest
	1065	+resource, is the same between the Xilinx FIR Compiler end
	1066	+our FIR block: we hence conclude that our solutions are as good as the Xilinx implementation.
1061	1067
1062	1068	\renewcommand{\arraystretch}{1.2}
1063	1069	\begin{table}