6.004-15,Multilevel memories; locality, performance, caches,
MIT OpenCourseWarehttp://ocw.mit.edu For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.004 Computation Structures Spring 2009 L15 – Memory Hierarchy 1 6.004 – Spring 20094/2/09The Memory Hierarchy Lab #5 due tonight L15 – Memory Hierarchy 2 6.004 – Spring 20094/2/09What we want in a memory PCINSTMADDRMDATA BETA MEMORY Capacity Latency Cost Register 100’s of bits 20 ps $$$$SRAM100’s Kbytes 1 ns $$$DRAM100’s Mbytes 40 ns $Hard disk* 100’s Gbytes 10 ms ¢Want 1’s Gbytes 1 ns cheap * non-volatile ADDRDOUTADDRDIN/DOUTL15 – Memory Hierarchy 3 6.004 – Spring 20094/2/09SRAM Memory Cell 6-T SRAM Cell word line N bitbitaccess FETs staticbistablestorage element word line N+1 There are two bit-lines per column: one supplies the bit, the other it’s complement. On a Read Cycle: A single word line is activated (driven to “1”), and the access transistors enable the selected cells, and their complements, onto the bit lines. 0 1 1Good, but slow 0 Slow and almost 1 Strong 1 Strong 0 Doesn’t this violate our staticdiscipline? Writes are similar to reads, except the bit-lines are driven with the desired value of the cell. The writing has to “overpower” the original contents of the memory cell. L15 – Memory Hierarchy 4 6.004 – Spring 20094/2/09Multiport SRAMs (a.k.a. Register Files) One can increase the number of SRAM ports by adding access transistors. By carefully sizing the inverter pair, so that one is strong and the other weak, we can assure that our WRITE bus will only fight with the weaker one, and the READs are driven by the stronger one -thus minimizing both access and write times. write read0 read1 PU = 2 /1 PD = 4 /1 PU = 2 /2 PD = 2 /3 4/15 /1 2 /1 2 /1 wd rd1 rd0 This transistor isolates the storage node so that it won’t flip unintentionally. L15 – Memory Hierarchy 5 6.004 – Spring 20094/2/091-T Dynamic Ram word line bitaccess FET C in storage capacitor determined by: C =Admore area better dielectric thinner film 1-T DRAM Cell VREFExplicit storage capacitor Six transistors/cell may not sound like much, but they can add up quickly. What is the fewest number of transistors that can be used to store a bit? TiN top electrode (VREF)Ta2O5 dielectric W bottomelectrode poly word line access fet Can’t we get rid of the explicit cap? L15 – Memory Hierarchy 6 6.004 – Spring 20094/2/09Tricks for increasing throughputRow Address Decoder Col. 1Col. 2Col. 3Col. 2MRow 1 Row 2 Row 2NColumn Multiplexer/Shifter MNMultiplexed Address (row first, then column) bit lines word lines memory cell (one bit) Dt1t2t3t4The first thing that should pop into you mind when asked to speed up throughput… PIPELININGSynchronous DRAM (SDRAM)Clock Data outDouble-clocked Synchronous DRAM (DDRAM)but, alas, not latency L15 – Memory Hierarchy 7 6.004 – Spring 20094/2/09 Hard Disk Drives Typical high-end drive: •Average latency = 4 ms •Average seek time = 9 ms •Transfer rate = 20M bytes/sec •Capacity = 100-500G byte •Cost = ~$1/Gbyte L15 – Memory Hierarchy 8 6.004 – Spring 20094/2/09Quantity vs Quality… Your memory system can be • BIG and SLOW... or • SMALL and FAST. 10-810-3100 .01 1100 10 .1 10-6TAPE DISKDRAMSRAMAccess Time .001 $/MBWe’ve explored a range of circuit-design trade-offs.Is there an ARCHITECTURAL solution to this DILEMMA? Diagrams of a hard disk, showing sector, shaft, and track.Figure by MIT OpenCourseWare.L15 – Memory Hierarchy 9 6.004 – Spring 20094/2/09Best of Both Worlds What we WANT: A BIG, FAST memory! We’d like to have a memory system that • PERFORMS like 1 GBytes of SRAM; but • COSTS like 1 GBytes of slow memory. SURPRISE: We can (nearly) get our wish! KEY: Use a hierarchy of memory technologies: CPUSRAMMAINMEML15 – Memory Hierarchy 10 6.004 – Spring 20094/2/09Key IDEA •Keep the most often-used data in a small, fast SRAM (often local to CPU chip) •Refer to Main Memory only rarely, for remaining data. •The reason this strategy works: LOCALITY Locality of Reference: Reference to location X at time t implies that reference to location X+X at time t+t becomes more probable as X and t approach zero. L15 – Memory Hierarchy 11 6.004 – Spring 20094/2/09tMemory Reference Patternstimeaddressdatastackprogram|S|tS is the set of locations accessed during t.Working set: a set S which changes slowly wrt access time. Working set size, |S| L15 – Memory Hierarchy 12 6.004 – Spring 20094/2/09Exploiting the Memory Hierarchy Approach 1 (Cray, others): Expose Hierarchy •Registers, Main Memory, Disk each available as storage alternatives; • Tell programmers: “Use them cleverly” Approach 2: Hide Hierarchy •Programming model: SINGLE kind of memory, single address space. • Machine AUTOMATICALLY assigns locations to fast or slow memory, depending on usage patterns.CPUSRAMMAINMEMCPUSmall StaticDynamicRAMHARDDISKCACHE “MAIN MEMORY” “SWAP SPACE” X?L15 – Memory Hierarchy 13 6.004 – Spring 20094/2/09The Cache Idea: Program-Transparent Memory Hierarchy Cache contains TEMPORARY COPIES of selected main memory locations... eg. Mem[100] = 37 GOALS:1)Improve the average access time 2)Transparency (compatibility, programming ease) 1.0(1.0-)CPU"CACHE" DYNAMIC RAM"MAINMEMORY" 100 37 (1-)HIT RATIO:Fraction of refs found in CACHE. MISS RATIO: Remaining references.Challenge: make the hit ratio as high as possible. mcmccavet)(t)tt)((tt+=++=11L15 – Memory Hierarchy 14 6.004 – Spring 20094/2/09How High of a Hit Ratio? Suppose we can easily build an on-chip static memory with a 4 nS access time, but the fastest dynamic memories that we can buy for main memory have an average access time of 40 nS. How high of a hit rate do we need to sustain an average access time of 5 nS? =1tavetctm=15440=97.5%L15 – Memory Hierarchy 15 6.004 – Spring 20094/2/09The Cache Principle Find “Bitdiddle, Ben” 5-Minute Access Time: 5-Second Access Time: ALGORITHM: Look nearby for the requested information first, if it’s not there, check secondary storage L15 – Memory Hierarchy 16 6.004 – Spring 20094/2/09Basic Cache Algorithm ON REFERENCE TO Mem[X]: Look for X among cache tags... HIT:X = TAG(i) , for some cache line i •READ:return DATA(i) •WRITE:change DATA(i); Start Write to Mem(X) MISS:X not found in TAG of any cache line •REPLACEMENT SELECTION: Select some line k to hold Mem[X] (Allocation) •READ:Read Mem[X] Set TAG(k)=X, DATA(K)=Mem[X] •WRITE:Start Write to Mem(X) Set TAG(k)=X, DATA(K)= new Mem[X]MAINMEMORYCPU(1!)Tag Data ABMem[A]Mem[B]QUESTION: How do we “search” the cache? Clip art person.Figure by MIT OpenCourseWare.L15 – Memory Hierarchy 17 6.004 – Spring 20094/2/09Associativity: Parallel Lookup Find “Bitdiddle, Ben” Clip art person.Nope, “Smith” Nope, “Jones” Nope, “Bitwit” HERE IT IS! L15 – Memory Hierarchy 18 6.004 – Spring 20094/2/09Fully-Associative Cache TAG Data= ? TAG Data= ? TAG Data= ? Incoming Address HITData Out The extreme in associativity: All comparisons made in parallel Any data item could be located in any cache location L15 – Memory Hierarchy 19 6.004 – Spring 20094/2/09Direct-Mapped Cache (non-associative)Find “Bitdiddle, Ben” NO Parallelism: Look in JUST ONE place, determined by parameters of incoming request (address bits) ... can use ordinary RAM as table YZABNeed: Address Mapping Function! Maps incoming BIG address to small CACHE address… tells us which singlecache location to use Direct Mapped: just use a subset of incoming address bits! Collision when several addresses map to same cache line. L15 – Memory Hierarchy 20 6.004 – Spring 20094/2/09The Problem with Collisions Find “Bitwit” Find “Bituminous” Find “Bitdiddle” Nope, I’ve got “BITWIT” under “B” PROBLEM:Contention among B’s.... each competes for same cache line! -CAN’T cache both “Bitdiddle” & “Bitwit” ... Suppose B’s tend to come at once? YZABBETTER IDEA: File by LAST letter!Figure by MIT OpenCourseWare.Figure by MIT OpenCourseWare.Figure by MIT OpenCourseWare.L15 – Memory Hierarchy 21 6.004 – Spring 20094/2/09Optimizing for Locality:selecting on statistically independent bits AYZFind “Bitdiddle” Here’s BITDIDDLE,under E Find “Bitwit” L15 – Memory Hierarchy 22 6.004 – Spring 20094/2/09Direct Mapped Cache Low-cost extreme:Single comparator Use ordinary (fast) static RAM for cache tags & data: Incoming Address KT=?HITData Out DISADVANTAGE: COLLISIONS QUESTION:Why not use HIGH-order bits as Cache Index? K-bit Cache Index D-bit data word T Upper-address bits Tag Data K x (T + D)-bit static RAM L15 – Memory Hierarchy 23 6.004 – Spring 20094/2/09Contention, Death, and Taxes... AYZFind “Bitdiddle” Find “Bittwiddle”Nope, I’ve got “BITTWIDDLE” under “E”; I’ll replace it. Nope, I’ve got “BITDIDDLE” under “E”; I’ll replace it. LESSON: In a non-associative cache, SOME pairs of addresses must compete for cache lines... ... if working set includes such pairs, we get THRASHING and poor performance. L15 – Memory Hierarchy 24 6.004 – Spring 20094/2/09Loop B: Pgm at 1024, dataat2048: … but not here! Loop A: Pgm at 1024, dataat 37: Works GREAT here… Direct-Mapped Cache Contention Assume 1024-line direct-mapped cache, 1 word/line. Consider tight loop, at steady state: (assume WORD, not BYTE, addressing) Memory Address 1024 37 1025 38 1026 39 1024 ... 1024 2048 1025 2049 1026 2050 1024 ... Cache Line037 138 239 00011220Hit/Miss HIT HIT HIT HIT HIT HIT HIT MISS MISS MISS MISS MISS MISS MISS We need some associativity, But not full associativity… Next lecture! Figure by MIT OpenCourseWare.Clip art person.Here’s BITWIT, under T LESSON: Choose CACHE LINE from independent parts of request to MINIMIZE CONFLICT given locality patterns... IN CACHE: Select line by LOW ORDER address bits! Does this ELIMINATE contention? Figure by MIT OpenCourseWare.
Description
Here it is explained about Hierearchy of memory , Different types of RAMs such as Static RAM and dynamic RAM. Cache memory and differnet methods for representing Cache . Here it is also discussed about various cache algorithems.
“Prof. Steve Ward, 6.004-15, Multilevel memories; locality, performance, caches, 6.004 Computation Structures, Electrical Engineering and Computer Science , Engineering, Massachusetts Institute of Technology: MIT Open Course Ware,http://ocw.mit.edu (08-08-2011).License: Creative Commons BY-NC-SA: http://ocw.mit.edu/terms/#cc".
Presentation Transcript
Your Facebook Friends on WizIQ