The algorithms considered so far are all serial in the sense that they consider only one sum-of-products calculation at a time. Such an algorithm can be extended beyond its intrinsic performance limits by having multiple copies working in parallel.

If a series of computations is needed at an aggregate rate
*m* times faster than the available circuit can perform them, the
successive problems can be parceled out to *m* identical instances of
the algorithm round-robin fashion. As each instance completes its
computation the sequence of results is reassembled for downstream
consumption.

This scheme has the advantage that very little engineering is required
to scale a working algorithm up to a higher data rate. On the other hand,
it can do no better than *m* times the original hardware requirement.
It is also worth noting that, in practice, the data values in
successive computations are often closely related. A round-robin
circuit will generally be imperfectly able to exploit this structure.

Mon Dec 11 17:02:42 CST 2000