The algorithms considered so far are all serial in the sense that they consider only one sum-of-products calculation at a time. Such an algorithm can be extended beyond its intrinsic performance limits by having multiple copies working in parallel.
If a series of computations is needed at an aggregate rate m times faster than the available circuit can perform them, the successive problems can be parceled out to m identical instances of the algorithm round-robin fashion. As each instance completes its computation the sequence of results is reassembled for downstream consumption.
This scheme has the advantage that very little engineering is required to scale a working algorithm up to a higher data rate. On the other hand, it can do no better than m times the original hardware requirement. It is also worth noting that, in practice, the data values in successive computations are often closely related. A round-robin circuit will generally be imperfectly able to exploit this structure.