We wish to specify a digital circuit that will compute

where the are constants and the are inputs to the circuit. Two's complement fixed point representation is assumed throughout.

In typical applications
the quantities of interest will be scaled by some constant
factor: to represent real-world quantities in the case of the *A*s or
unitless fractions in the case of the *C*s. Since these scaling factors
flow through the calculation in an obvious way it is no loss
of generality to regard all quantities as integers throughout the
exposition of the algorithm. Any desired power-of-two scaling may
be applied to the result by re-designating the weight of the output bits.
Scaling other than by a power of two requires a combination of output
scaling and prescaling of the coefficients.

We are concerned here with synchronous pipelined digital logic systems.
A synchronous digital circuit is one in which all computations occur
in discrete steps called clock periods. In a pipelined
circuit, intermediate results are stored internally and the ultimate result
emerges some number of clock periods after the inputs are accepted.
Synchronous circuit elements store or *latch* their outputs
between clocks; this storage is sufficient for the construction of a pipeline.

We assume that the vector of *n* data values
is one of a series of such vectors
arriving at the inputs of the circuit at intervals. One measure of the
performance of a circuit is the minimum period between arrivals.
Another measure is the time required to produce the sum once the
vector has been accepted. The latter quantity is known as the
*latency* of the circuit while the former is the reciprocal of its
*throughput.* The number of logic elements dedicated to the
circuit is the third metric; one desires high throughput, low latency,
and an economy of logic *gates*.

Latency and throughput are influenced by both the logical design of the
sum-of-products circuit and by the speed of the digital circuit elements
used to realize the design. The logic family determines the minimum
clock period for a simple *synchronous logic element*.
The *circuit topology* determines the latency and throughput in terms
of the clock period.

The clock period, however, is not independent of the topology. A design that uses more complicated logical structures will have a longer minimum clock period due to electrical effects. Carry propagation networks are a classic example of the kind of logic that adversely affects clock period.

Our goal, then, is to design a sum-of-products circuit that operates without long-range carry propagation and with acceptable efficiency.

Mon Dec 11 17:02:42 CST 2000