We wish to specify a digital circuit that will compute
where the are constants and the
are inputs to the circuit.
Two's complement fixed point representation is assumed throughout.
In typical applications the quantities of interest will be scaled by some constant factor: to represent real-world quantities in the case of the As or unitless fractions in the case of the Cs. Since these scaling factors flow through the calculation in an obvious way it is no loss of generality to regard all quantities as integers throughout the exposition of the algorithm. Any desired power-of-two scaling may be applied to the result by re-designating the weight of the output bits. Scaling other than by a power of two requires a combination of output scaling and prescaling of the coefficients.
We are concerned here with synchronous pipelined digital logic systems. A synchronous digital circuit is one in which all computations occur in discrete steps called clock periods. In a pipelined circuit, intermediate results are stored internally and the ultimate result emerges some number of clock periods after the inputs are accepted. Synchronous circuit elements store or latch their outputs between clocks; this storage is sufficient for the construction of a pipeline.
We assume that the vector of n data values
is one of a series of such vectors
arriving at the inputs of the circuit at intervals. One measure of the
performance of a circuit is the minimum period between arrivals.
Another measure is the time required to produce the sum once the
vector has been accepted. The latter quantity is known as the
latency of the circuit while the former is the reciprocal of its
throughput. The number of logic elements dedicated to the
circuit is the third metric; one desires high throughput, low latency,
and an economy of logic gates.
Latency and throughput are influenced by both the logical design of the sum-of-products circuit and by the speed of the digital circuit elements used to realize the design. The logic family determines the minimum clock period for a simple synchronous logic element. The circuit topology determines the latency and throughput in terms of the clock period.
The clock period, however, is not independent of the topology. A design that uses more complicated logical structures will have a longer minimum clock period due to electrical effects. Carry propagation networks are a classic example of the kind of logic that adversely affects clock period.
Our goal, then, is to design a sum-of-products circuit that operates without long-range carry propagation and with acceptable efficiency.