In a pipelined circuit a sequence of synchrounous logic
*stages* operate on the data, passing intermediate results
from one to the next. Such circuits are carefully engineered for a particular
combination of hardware platform characteristics and required
performance and can be quite efficient.

Consider a series of computations, each having the form of (1). The coefficients are constant within the series but each sum is assumed to have an independent sequence of data values .

The sum is represented implicitly by the vector .
Suppose that a set of *m* functions exists so that
.
Clearly each intermediate form
is also a representation of .

Suppose the set of functions is chosen so that each can be realized efficiently in hardware. The relative value assigned to space (chip area) and time efficiency is application dependent and will generally determine the number of stages. If these stages are simply composed they will compute from with a delay equal to the sum of the propagation delay in each stage.

If, on the other hand, the intermediate results are latched then the
delay will be at least *m* times the longest of the stage delays.
The resulting pipeline still computes the correct value, but one
must be careful to account for the delay.
If at (discrete) time *t* the stored result
from stage *m* is then the input to stage *h* is
and at time *t*+1 stage *m* will output .
The canonical notational mechanism for reasoning about pipelines
is the time/space diagram; see for [?] for a careful treatment in
the intuitive style.

If the set of pipeline stage functions is chosen well the sum of the
stage areas (hardware cost) will be small and the stage delays
will be balanced.
The pipeline as a whole completes one computation per clock period,
corresponding to the input vector presented *m* delay periods
earlier. The application will generally determine the minimum rate of
computation and maximum allowable latency.

Mon Dec 11 17:02:42 CST 2000