6.004-9, Case study: multipliers
MIT OpenCourseWarehttp://ocw.mit.edu For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.004 Computation Structures Spring 2009 L09 -Multipliers 1 6.004 – Spring 20093/5/09Cost/Performance Tradeoffs: a case study Digital Systems Architecture 1.01 modified 2/23/09 10:44 Lab #3 due tonight! L09 -Multipliers 2 6.004 – Spring 20093/5/09Binary Multiplication Engineering Principle:Exploit STRUCTURE in problem.ababxn bits n bits 2n bits since (2n-1)2 < 22nEASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-, 3-bit) operands... HARD PROBLEM: design circuit to multiply BIG (32-bit, 64-bit) numbers We can make bigmultipliers out of little ones! L09 -Multipliers 3 6.004 – Spring 20093/5/09Making a 2n-bit multiplier using n-bit multipliers Given n-bit multipliers: Synthesize 2n-bit multipliers: xabaHaLbHbLaLbLaLbHaHbLaHbHaxb=abn bits n bits 2n bits xab2n bits 2n bits ab4n bits L09 -Multipliers 4 6.004 – Spring 20093/5/09Our Basis: n=1: minimalist starting point Multiplying two 1-bit numbers is pretty simple: axb=ab0Of course, we could start with optimized combinational multipliers for larger operands; e.g. 2a1a02b1b04c3c2c1c02-bitMultiplierthe logic gets more complex, but some optimizationsare possible... Our induction step: 2n-bit by 2n-bit multiplication: 1. Divide multiplicands into n-bit pieces 2. Form 2n-bit partial products, using n-bit by n-bit multipliers.3. Align appropriately 4. Add. Induction: we can use the same structuring principle to build a 4n-bit multiplier from our newly-constructed 2n-bit ones... 6.004 – Spring 20093/5/09L09 -Multipliers 5 xa • baHaLbHbLaLbLaLbHaHbLREGROUP partial products -2 additions rather than 3! aHbHBrick Wall view of partial products Making 4n-bit multipliers from n-bit ones: 2 “induction steps”6.004 – Spring 20093/5/09L09 -Multipliers 6 b 3 2 1 0xbbb 3 2 1 0aaaaa0b2a0b3a1b2a1b3a0b0a0b1a1b0a1b1a2b2a2b3a3b2a3b3a2b0a2b1a3b0a3b1Multiplier Cookbook: Chapter 1 Step 1: Form (& arrange) Given problem: Partial Products: Subassemblies: • Partial Products • Adders Step 2: Sum 6.004 – Spring 20093/5/09L09 -Multipliers 7 3 2 1a 0aaab 3 2 1 0xbbbMULTADDa0b2a0b3a1b2a1b3a0b0a0b1a1b0a1b1a2b2a2b3a3b2a3b3a2b0a2b1a3b0a3b1Performance/Cost Analysis 2Partial Products:n=2(n)Things to Add:2n-2=(n)Adder Width:2n=(n)2Hardware Cost:?=(n)Latency: (n2) ?? 6.004 – Spring 20093/5/09L09 -Multipliers 8 Example:2n+2n+3 = 2(n)since 2n2 (n+2n+3)2 2n"almost always"(...) implies both inequalities;O(...)implies only the second. "Order Of" notation: such that for all but finitely manyg(n) = O(f(n)) "g(n) is of order f(n)" g(n) = (f(n)) g(n) =(f(n)) if there exist C2C1>> 0, integral n0c1•f(n)g(n)c2•f(n)L09 -Multipliers 9 6.004 – Spring 20093/5/09Observations: Hmmm.(n2)partialproducts.(n2)fulladders.MULTADDa0b2a0b3a1b2a1b3a0b0a0b1a1b0a1b1a2b2a2b3a3b2a3b3a2b0a2b1a3b0a3b1L09 -Multipliers 10 6.004 – Spring 20093/5/09Repackaging Function Engineering Principle #2: Put the Solution where the Problem is. How about n2 blocks, each doing a little multiplication and a littleaddition? 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3abMULTADD(n2)partialproducts.(n2)fulladders.L09 -Multipliers 11 6.004 – Spring 20093/5/09Goal: Array of Identical Multiplier Cells Single "brick" of brick-wall array... • Forms partial product • Adds to accumulating sum along with carryS’k+1S’kSk+1SkCk+2CkbiajAiBi(A+B)iCiCi+1FA Necessary Component: Full Adder Takes 2 addend bits plus carry bit. Produces sum and carry output bits. CASCADE to form an n-bit adder. 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3abb3b2b1b0a3a2a1a0L09 -Multipliers 12 6.004 – Spring 20093/5/09Design of 1-bit multiplier "Brick": 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3abb3b2b1b0a3a2a1a0Array Layout: •operand bits bused diagonally •Carry bits propagate right-to-left •Sum bits propagate down Brick design: •AND gate forms 1x1 product •2-bit sum propagates from top to bottom•Carry propagates to left S’k+1S’kSk+1SkCk+2CkbiajFA FA 0Wastes some gates… but consider (say) optimized 4x4-bit brick! L09 -Multipliers 13 6.004 – Spring 20093/5/09Latency revisited Here’s our combinational multiplier:b3a0 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3aba1a2a3b2b1b0What’s its propagation delay? Naive (but valid) bound: •O(n) additions •O(n) time for each addition •Hence O(n2) time required On closer inspection: •Propagation only toward left, bottom•Hence longest path bounded by length + width of array: O(n+n) = O(n)! L09 -Multipliers 14 6.004 – Spring 20093/5/09Multiplier Cookbook: Chapter 2 (n2)Hardware for n by n bits: Latency: Throughput: (n)(1/n)Combinational Multiplier:b3a0 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3aba1a2a3b2b1b0+Note: lots of tricks are available to make a faster combinational multiplier… L09 -Multipliers 15 6.004 – Spring 20093/5/09Combinational Multiplier: best bang for the buck? Suppose we have LOTS of multiplications. Can we do better from a cost/performance standpoint? PIPELININGL09 -Multipliers 16 6.004 – Spring 20093/5/09The Pipelining Bandwagon... where do I get on? WE HAVE: • Pipeline rules -"well formed pipelines" • Plenty of registers • Demand for higher throughput. What do we do? Where do we define stages? b3a0 2 1ab 0 1ab 1 1ab 0 0ab 1 0ab 2 0ab 3 0ab 3 1ab 0 2ab 1 2ab 0 3ab 1 3ab 2 2ab 3 2ab 2 3ab 3 3aba1a2a3b2b1b0L09 -Multipliers 17 6.004 – Spring 20093/5/09Stupid Pipeline Tricks (n)Stages:Clock Period: (n2)Hardware cost for n by n bits: (n)Latency: (n2)Throughput: ( 1/n ) 0 3ab 0 2ab 1 3ab 0 1ab 1 2ab 2 3ab 3 0ab 2 0ab 3 1ab 1 0ab 2 1ab 3 2ab 0 0ab 1 1ab 2 2ab 3 3abgotta break that long carry chain! L09 -Multipliers 18 6.004 – Spring 20093/5/09Even Stupider Pipeline Tricks 0 1ab 0 0ab 1 0ab 0 2ab 0 3ab 1 1ab 1 2ab 2 1ab 2 0ab 3 0ab 1 3ab 2 2ab 2 3ab 3 1ab 3 2ab 3 3abBack to basics: what’s the point of pipelining, anyhow? WORSE idea:• Doesn’t break long combinational paths • NOT a well-formed pipeline... ... different register counts on alternative paths ... data crosses stage boundaries in both directions! L09 -Multipliers 19 6.004 – Spring 20093/5/09Breaking O(n) combinational paths GOAL:(n) stages;(1) clock period! 0 1ab 0 0ab 0 2ab 0 3ab 1 1ab 1 0ab 1 2ab 1 3ab 2 1ab 2 0ab 2 2ab 2 3ab 3 0ab 3 1ab 3 2ab 3 3aba0a1a2a3b3b2b1b0LONG PATHS go down, to left: •Break array into diagonal slices •Segment every long combinational path L09 -Multipliers 20 6.004 – Spring 20093/5/09Multiplier Cookbook: Chapter 3 • Well-formed pipeline (careful!) • Constant (high!) throughput, independently of operand size. 0 1ab 0 0ab 0 2ab 0 3ab 1 1ab 1 0ab 1 2ab 1 3ab 2 1ab 2 0ab 2 2ab 2 3ab 3 0ab 3 1ab 3 2ab 3 3aba0a1a2a3b3b2b1b0Stages:Clock Period: (n2)Hardware cost for n by n bits: (n)Latency: Throughput: (1)(1)(n)... but suppose we don’t need the throughput? L09 -Multipliers 21 6.004 – Spring 20093/5/09Moving down the cost curve... Suppose we have INFREQUENT multiplications... pipelining doesn’t help us. Can we do better from a cost/performance standpoint? Hmmm, do I really need all these extras? a0a1a2a3b3b2b1b0 0 1ab 0 0ab 0 2ab 0 3ab 1 1ab 1 0ab 1 2ab 1 3ab 2 1ab 2 0ab 2 2ab 2 3ab 3 0ab 3 1ab 3 2ab 3 3aba0a1a2a3b3b2b1b0 3aba0a1a2a3b3b2b1b0 3aba0a1a2a3b3b2b1b0 3aba0a1a2a3b3b2b1b0 3abL09 -Multipliers 22 6.004 – Spring 20093/5/09Multiplier Cookbook: Chapter 4Stages:(1)(constant!) Clock Period: Hardware cost for n by n bits: (n)Latency: Throughput: 1(n)(1/n)aib3b2b1b0 i 0ab i 1ab i 2ab i 3abSequential Multiplier: •Re-uses a single n-bit “slice” to emulate each pipeline stage •a operand entered serially •Lots of details to be filled in... L09 -Multipliers 23 6.004 – Spring 20093/5/09(Ridiculous?)Extremes Dept... Cost minimization: how far can we go? aib3b2b1b0 i 0ab i 1ab i 2ab i 3ab 0abSuppose we want to minimize hardware (at any cost)… •Consider bit-serial!•Form and add 1-bit partial product per clock •Reuse single “brick” for each bit bj of slice; •Re-use slice for each bit of a operand L09 -Multipliers 24 6.004 – Spring 20093/5/09aib3b2b1b0Multiplier Cookbook: Chapter 5 Latency: Throughput: Stages:(1)Clock Period: Hardware cost for n by n bits: (1n)(1)+?(n2)(1/n2) i 3ab i 2ab i 1ab i 0abBit Serial multiplier: •Re-uses a single brick to emulate an n-bit slice •both operands entered serially •O(n2) clock cycles required •Needs additional storage (typically from existing registers) (constant) L09 -Multipliers 25 6.004 – Spring 20093/5/09Summary: Latency (n)(n)(n)(n2)Thruput (1/n)(1)(1/n)(1/n2)$(n2)(n2)(n)(1)*Scheme: Combinational N-pipe Slice-serial Bit-serial Lots more multiplier technology: fast adders, Booth Encoding, column compression, ...
Description
This is a case study of binary multiplication.Here it is explained about 2n-bit multiplier, Combinational multiplier and discussed various performence issues.
“Prof. Steve Ward, 6.004-9, Case study: multipliers 6.004 Computation Structures, Electrical Engineering and Computer Science , Engineering, Massachusetts Institute of Technology: MIT Open Course Ware,http://ocw.mit.edu (08-08-2011). License: Creative Commons BY-NC-SA: http://ocw.mit.edu/terms/#cc".
Presentation Transcript
Your Facebook Friends on WizIQ