Recent Post

Memory System

Basic Concepts

The maximum size of the memory that can be used in any computer is determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
Most modern computers are byte addressable.
 
Big-endian & Little-endian Assignment
 
Traditional Architecture


                             Figure . Connection of the memory to the processor.
Some Basic Concepts
Block transfer” – bulk data transfer

Memory access time

Memory cycle time

RAM – any location can be accessed for a Read or Write operation in some fixed amount of time that is independent of the location’s address.

Cache memory

Virtual memory, memory management unit

Internal Organization of Memory Chips

                                Figure :Organization of bit cells in a memory chip.

Semiconductor RAM Memories


A Memory Chip

                                    Figure .  Organization of a 1K ´ 1 memory chip.


 

Static Memories
The circuits are capable of retaining their state as long as power is 
applied.

 
                         Figure :  A static RAM cell.

CMOS cell: low power consumption
 

  Figure: An example of a CMOS memory cell.
 

Asynchronous DRAMs
  • Static RAMs are fast, but they cost more area and are more expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed
Asynchronous DRAMs
 
      Figure : A single-transistor dynamic memory cell

A Dynamic Memory Chip

 

                     Figure: Internal organization of a 2M ´ 8 dynamic memory chip.
Synchronous DRAMs
 
The operations of SDRAM are controlled by a clock signal.
 

Figure : Synchronous DRAM.


Synchronous DRAMs

 
                              Figure : Burst read of length 4 in an SDRAM.


 
No CAS pulses is needed in burst operation.

Refresh circuits are included (every 64ms).

Clock frequency > 100 MHz

Intel PC100 and PC133
Synchronous DRAMs
  
The choice of a RAM chip for a given application depends on several factors:Cost, speed, power, size…  
SRAMs are faster, more expensive, smaller. 
DRAMs are slower, cheaper, larger 
Which one for cache and main memory, respectively? 
Refresh overhead – suppose a SDRAM whose cells are in 8K rows; 4 clock cycles are needed to access each row; then it takes 8192×4=32,768 cycles to refresh all rows; if the clock rate is 133 MHz, then it takes 32,768/(133×10-6)=246×10-6 seconds; suppose the typical refreshing period is 64 ms, then the refresh overhead is 0.246/64=0.0038<0.4% of the total time available for accessing the memory.
 
               Memory Controller
 
                                                      Figure: Use of a memory controller.



Read-Only-Memory

 

                                                     Figure :  A ROM cell.

Volatile / non-volatile memory

ROM

PROM: programmable ROM

EPROM: erasable, reprogrammable ROM

EEPROM: can be programmed and erased electrically
Flash Memory
Similar to EEPROM 
Difference: only possible to write an entire block of cells instead of a single cell 
Low power 
Use in portable equipment 
Implementation of such modules
Flash cards

Flash drives
 




Fastest access is to the data held in processor registers. Registers are at the top of the memory hierarchy.

Relatively small amount of memory that can be implemented on the processor  chip. This is processor cache.

Two levels of cache. Level 1 (L1) cache is on the processor chip. Level 2 (L2)  cache is in between main memory and processor. 

Next level is main memory, implemented as SIMMs. Much larger, but much slower than cache memory.

Next level is magnetic disks. Huge amount of inexepensive storage.
Cache Memory  
 


The Cache memory stores a reasonable number of blocks at a given time but this number is small compared to the total number of blocks available in Main  Memory.

The Cache control hardware decide that which block should be removed to create space for the new block that contains the referenced word. 

The collection of rule for making this decision is called the replacement  algorithm. 

The cache control circuit determines whether the requested word currently exists in the cache.

If the data is in the cache it is called a Read or Write hit

If the data is not present in the cache, then a Read miss or Write miss occurs
Cache Memories
 
Effectiveness of cache is based on a property of computer programs called locality of reference 
Most of programs time is spent in loops or procedures called repeatedly.  The remainder of the program is accessed infrequently. 
Temporal referencing – a recently executed instruction is likely to be called again. 
Spatial referencing – instructions in close proximity to a recently executed instruction are likely to be called again.   
Effectiveness of cache is based on a property of computer programs called locality of reference

Most of programs time is spent in loops or procedures called repeatedly.  The remainder of the program is accessed infrequently.

Temporal referencing – a recently executed instruction is likely to be called again.

Spatial referencing – instructions in close proximity to a recently executed instruction are likely to be called again.
Based on locality of reference 
Temporal 
Recently executed instructions are likely to executed again soon 
Spatial
Instructions in close proximity to a recently executed instruction (with respect to an address) are also likely to be executed soon.
Cache Block – a set of contiguous address locations (cache block = cache line) 

Conceptual Operation of Cache
 
Memory control circuitry is designed to take advantage of locality of reference.
Temporal
Whenever an information (instruction or data) is first needed, this item should be brought into the cache where it will hopefully remain until it is needed again.
Spatial
Instead of fetching just one item from the main memory to the
cache, it is useful to fetch several items that reside at adjacent
addresses well.
A set of contiguous addresses are called a block
cache block or cache line
Write through Protocol 
Cache and main memory are updated simultaneously 
Write Back Protocol 
Update on the cache and mark it with an associated flag bit (dirty or modified bit) 
Main memory is updated later, when the block containing this marked word is to be removed from cache to make room for a new block.

 Write Protocols


Write through 
Simpler, but results in unnecessary Write operations in main memory when a cache word is updated several times during its cache residency.  
write back  
can result in unnecessary write operations because when a cache block is written back to the memory all words of the block are written back, even if only a single word has been changed while the block was in the cache.

 Mapping Algorithms


Processor does not need to know explicitly that there is a cache. 
Based on R/W operations, the cache control circuitry determines whether the requested word currently exists in the cache. (Hit) 
If information is in cache for a read, main memory is not involved.  For write operations, system can either use write-through protocol or write-back protocol

Mapping Functions

Specification of correspondence between the main memory blocks and those in cache. 
Hit or Miss
Write through Protocol

Write back protocol (uses dirty bit)

Read miss

Load through or early restart on read miss

Write Miss
 
 Read Protocols

Read miss 
Addressed word is not in cache 
Block of words containing requested word is written from main memory to cache. 
After entire block is written to cache, particular word is forwarded to processor. 
Or word may be sent to processor as soon as it is read from main memory (load-through or early-restart)
reduces processor’s wait time but requires more complex circuitry.
 
Write Miss

If addressed word is not in cache for a write operation, write miss occurs. 
write-through 
information is written directly into main memory. 
Write-back 
block containing word is brought into cache, then the desired word in the cache is overwritten with the new information.


Mapping Function  

Direct Mapping

Associative Mapping

Set-Associative Mapping

Direct Mapping
 

Block j of the main memory maps to j modulo 128 of the cache.

More than one memory block is mapped onto  the same position in the cache.

Resolve the contention by allowing new block to replace the old block

Memory address is divided into three fields:

    - Low order 4 bits determine one of the 16 words in a block.

    - When a new block is brought into the cache, the the next 7 bits determine which cache block this new block is placed in.

    - High order 5 bits determine which of the possible 32 blocks is currently present in the cache. These are tag bits.

Simple to implement but not very flexible

Associative Mapping
 
Main memory block can be placed into any cache position.

Memory address is divided into two fields:

    - Low order 4 bits identify the word within a block.

    - High order 12 bits or tag bits identify a memory block when it is

      resident in the cache.

Flexible, and uses cache space efficiently.

Replacement algorithms can be used to replace an existing block in the  

  cache when the cache is full.



Cost is higher than direct-mapped cache
 

Set-Associative Mapping
 
Blocks of cache are grouped into sets.

Mapping function allows a block of the main memory to reside in any

  block of a specific set.

Memory address is divided into three fields:

      - 6 bit field determines the set number.

      - High order 6 bit fields are compared to the tag

         fields of the two blocks in a set.

Set-associative mapping combination of direct and associative mapping.

Number of blocks per set is a design parameter.

     - One extreme is to have all the blocks in one set,

        requiring no set bits (fully associative mapping).

     - Other extreme is to have one block per set, is

        the same as direct mapping. 

Replacement Algorithms

Difficult to determine which blocks to kick out 
Least Recently Used (LRU) block 
The cache controller tracks references to all blocks as computation proceeds. 
Increase / clear track counters when a hit/miss occurs
For Associative & Set-Associative Cache
Which location should be emptied when the cache is full and a miss occurs? 
  First In First Out (FIFO) 
  Least Recently Used (LRU)  
Distinguish an Empty location from a Full one
  Valid Bit

FIFO- Replacement Algorithms 



LRU-Replacement Algorithms 


Consecutive words in a modules

 
 Figure : Consecutive words in a modules
Consecutive words are placed in a module.

High-order k bits of a memory address determine the module.

Low-order m bits of a memory address determine the word within a module.

When a block of words is transferred from main memory to cache, only one module   is busy at a time.


 
Figure : Consecutive words in consecutive modules
Consecutive words in Consecutive modules
Consecutive words are located in consecutive modules.

Consecutive addresses can be located in consecutive modules.

While transferring a block of data, several memory modules can be kept busy at the same time.
Virtual Memory

Techniques that automatically move program and data blocks into the physical  main memory when they are required for execution is called  the Virtual Memory

The binary address that the processor issues either for instruction or data are called the virtual / Logical address.



When the desired data are in the main memory ,these data are fetched /  accessed immediately.



 If the data are not in the main memory, the MMU causes the Operating    system to bring the data into memory from the disk.



Assume that program and data are composed of fixed-length units called pages.
A page consists of a block of words that occupy contiguous locations in the main memory.
Page is a basic unit of information that is transferred between
       secondary storage and main memory.
Size of a page commonly ranges from 2K to 16K bytes.  
Pages should not be too small, because the access time 
      of a secondary storage device is much larger than the main
       memory.  
Pages should not be too large, else a large portion of the page may not be used, and it will occupy valuable space in the main memory.
Each virtual or logical address generated by a processor is interpreted as a virtual page number (high-order bits) plus an offset (low-order bits) that specifies the location of a particular byte within that page.
Information about the main memory location of each page is kept in the page table.
   -Main memory address where the page is stored.
   -Current status of the page.
Area of the main memory that can hold a page is called as page frame.
Starting address of the page table is kept in a page table base register.
 
 
Virtual address from processor

 

Associative-mapped TLB

High-order bits of the virtual address generated by the processor select the virtual page.



These bits are compared to the virtual page numbers in the TLB.

If there is a match, a hit occurs and the corresponding address of the pageframe is read.



If there is no match, a miss occurs and the page table within the main memory must be consulted.

Set-associative mapped TLBs are found in commercial processors.



No comments