In a pipelined circuit a sequence of synchrounous logic stages operate on the data, passing intermediate results from one to the next. Such circuits are carefully engineered for a particular combination of hardware platform characteristics and required performance and can be quite efficient.
Consider a series of computations, each having the form of (1). The coefficients are constant within the series but each sum is assumed to have an independent sequence of data values .
The sum is represented implicitly by the vector . Suppose that a set of m functions exists so that . Clearly each intermediate form is also a representation of .
Suppose the set of functions is chosen so that each can be realized efficiently in hardware. The relative value assigned to space (chip area) and time efficiency is application dependent and will generally determine the number of stages. If these stages are simply composed they will compute from with a delay equal to the sum of the propagation delay in each stage.
If, on the other hand, the intermediate results are latched then the delay will be at least m times the longest of the stage delays. The resulting pipeline still computes the correct value, but one must be careful to account for the delay. If at (discrete) time t the stored result from stage m is then the input to stage h is and at time t+1 stage m will output . The canonical notational mechanism for reasoning about pipelines is the time/space diagram; see for [?] for a careful treatment in the intuitive style.
If the set of pipeline stage functions is chosen well the sum of the stage areas (hardware cost) will be small and the stage delays will be balanced. The pipeline as a whole completes one computation per clock period, corresponding to the input vector presented m delay periods earlier. The application will generally determine the minimum rate of computation and maximum allowable latency.