next up previous
Next: Unstructured Feedback in a Up: Modeling Background Previous: Horizontal Partitioning and the

Conventional Iterative Sum of Products

Practical serial-k sum-of-products circuits are based on the fastest available adder circuits in the target implementation technology. Because the tex2html_wrap_inline691 data are spread out over several clock periods at the inputs, the results must be accumulated at some stage. Since the adders take only two binary numbers as inputs, a wide sum-of-products will require a tree of adders.

The accumulation could be done at the top (wide end) of the tree:

displaymath855

but there would be p+q-2 bits in each term entering the summation tree (i loop). In the preferred circuit

  equation200

the adder inputs are only q-1 bits wide.

The conventional circuit can be modeled in the following way. The input is formally a vector containing tex2html_wrap_inline691 and k; in practice, k is implicit. The first stage of the pipeline is the AND gate multiplication:

displaymath856

Intermediate stages add pairs of elements:

displaymath857

In an optimized implementation tex2html_wrap_inline881 would likely be subsumed by the tex2html_wrap_inline883 hardware. If the number of inputs is not a power of two some values will be copied from input to output (delayed) in one or more stages.

Each bit in the output of an adder is a function of all the input bits having the same or lower weight -- the radius is the entire input vector. Without dedicating an extravagant amount of hardware to the problem, computing an m-bit wide sum requires time proportionalgif to tex2html_wrap_inline887 approximately.

Each rank in the summation tree requires adders one bit wider than those in the previous rank. Sign extention is accomplished by an arrangement logically equivalent to wiring the leftmost bit of each input vector to two inputs. The critical path in such circuits is that the sign bit cannot be duplicated until its value is known.

After tex2html_wrap_inline889 stages all the terms have been combined into a single tex2html_wrap_inline891 -bit integer and the input to the final stage is the vector tex2html_wrap_inline893 where

displaymath858

corresponds to the parenthesized expression in (8). The final stage has the following structure:

  equation211

with tex2html_wrap_inline895 for k<0.

The peculiar form of tex2html_wrap_inline899 is chosen to correspond to an efficient hardware design. It can be implemented with an adder only one bit wider than that of the final summation stage, plus a shift register for the low-order bits. A more straightforward computation, such as

displaymath859

would require an adder q-2 bits wider.

The conventional circuit is easy to specify and understand. Establishing its correctness is straightforward. When the data are narrow enough, available adder circuits can implement the conventional circuit with a sufficently high clock rate and there is no need for a more complex solution.


next up previous
Next: Unstructured Feedback in a Up: Modeling Background Previous: Horizontal Partitioning and the

Stephen W. Nuchia
Mon Dec 11 17:02:42 CST 2000