The Impact of Cache Organization in Optimizing Microprocessor Power Consumption

合集下载

Cache概念及组成结构

Cache概念及组成结构
对于Cache概念及组成知识你了解多少？下面店铺整理了Cache 概念及组成结构，供大家参阅。

基本概念
在计算机存储系统的层次结构中，介于中央处理器和主存储器之间的高速小容量存储器。

它和主存储器一起构成一级的存储器。

高速缓冲存储器和主存储器之间信息的调度和传送是由硬件自动进行的。

某些机器甚至有二级三级缓存，每级缓存比前一级缓存速度慢且容量大。

组成结构
高速缓冲存储器是存在于主存与CPU之间的`一级存储器，由静态存储芯片(SRAM)组成，容量比较小但速度比主存高得多，接近于CPU的速度。

主要由三大部分组成：
Cache存储体：存放由主存调入的指令与数据块。

地址转换部件：建立目录表以实现主存地址到缓存地址的转换。

替换部件：在缓存已满时按一定策略进行数据块替换，并修改地址转换部件。

【Cache概念及组成结构】。

cache介绍

cache介绍前⾯已多次提到了Cache，这可是⼀个讨⼈喜欢的东西，您有必要详细了解它的作⽤与原理。

Cache是介于CPU与主内存之间、或者主内存与磁盘之间的⾼速缓冲器，其作⽤是解决系统中数据读写速度不匹配的问题。

其中介于CPU与主内存之间的缓冲器⼜称为RAM Cache，⽽介于主内存与磁盘驱动器之间的缓冲器则称之为Disk Cache，这⾥要讨论的是前者，也就通常简称的Cache。

那么，Cache是怎样⼯作的呢？您⼀定明⽩CPU的运算速度⽐主内存的读写速度要快得多，这就使得CPU在访问内存时要花很长的等待时间，从⽽造成系统整体性能的下降。

为了解决这种速度不匹配的问题，需要在CPU与主内存之间加⼊⽐主内存更快的SRAM（Static Ram，静态存储器）。

SRAM储存了主内存中的数据(专业术语称为“映象”)，使CPU可以直接通过访问SRAM来读写数据。

由于SRAM的速度与CPU的速度相当，因⽽⼤⼤缩短了数据读写的等待时间，系统的整体速度也就得到了提⾼。

既然SRAM那么快，为什么不⽤来作为主内存呢？这是因为SRAM采⽤了与CPU相类似的半导体制造⼯艺，成本极⾼，只有在那些只关⼼性能不考虑价格的场合才会这样做。

这也就使得Cache粉墨登场了，它能将CPU⽤过的数据，以及结果保存起来，让CPU下次处理时先来访问Cache，如果没有可⽤的数据再去别处找，以此来提⾼运⾏速度。

Cache由标记存储器和数据存储器两个基本部分组成。

标记存储器是⽤来储存Cache的控制位与块地址标签，控制位⽤于管理Cache的读写操作，⽽块地址标签则记录着Cache中各块的地址。

这个地址包含了与主内存映射的块地址，并且都与Cache中的⼀块“数据”相对应。

⽽这块“数据”正是贮存于Cache的数据存储器中。

当CPU读取数据时，先通过地址总线把物理地址送到Cache中，与Cache中的块地址标签进⾏对⽐。

若相符合，则表⽰此数据已经存在于Cache中（此情况被戏称为“命中”），这时只需把Cache中的对应数据经由数据总线直接传送给CPU即可。

5-1 存储系统 Cache_v1.0

计算机系统结构
层次之间应满足的原则
一致性原则

处在不同层次存储器中的同一个信息应保持相同的值。

包含性原则

处在内层的信息一定被包含在其外层的存储器中，反之则不成立, 即内层存储器中的全部信息，是其相邻外层存储器中一部分信息的复制品
北京信息科技大学
计算机系统结构
“Cache主存”和“主存辅存”层次
主存块地址 tag index
北京信息科技大学
计算机系统结构
直接映像方式

直接映像方式：是指主存的一个字块只能映像到Cache中确定的一个字块。举例直接映像方式特点：

主存的字块只可以和固定的Cache字块对应，方式直接，利用率低。标志位较短，比较电路的成本低。如果主存空间有 2m块，Cache中字块有2c块，则标志位只要有m-c 位。而且在访问Cache时候仅需要比较一次空间利用率最低，冲突概率最高，实现最简单。
计算机系统结构
现代计算机的层次存储器系统

利用程序的局部性原理:

以最低廉的价格提供尽可能大的存储空间以最快速的技术实现高速存储访问
Processor Control Second Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk)
北京信息科技大学
计算机系统结构
Cache基本知识

高速缓冲存储器：在相对容量较大而速度较慢的主存与高速处理器之间设置的少量但快速的存储器基本工作原理：

把Cache和主存分成若干大小相同的块( block，行、线 line，槽slot )，Cache由块目录表及快速存储器组成对主存地址，根据映象规则生成标签和索引；根据标签和索引查找具体的Cache块无（失效/缺失miss）则到主存取一个块的数据（遇到 Cache没有，空间则需要替换），并给处理器需要的部分有（命中hit）则从Cache读取数据；如果是写入操作，需考虑与主存数据保持一致（写入策略）

cache

L2 SRAM Double Buffering Example
相关代码
• for (i=0; i<(DATASIZE/BUFSIZE)–2; i+=2) { /* ––––––––––––––––––––––––––––––––––––––––––––––––––––– /* InBuffA –> OutBuffA Processing */ /* ––––––––––––––––––––––––––––––––––––––––––––––––––––– CACHE_InvL2(InBuffB, BUFSIZE, CACHE_WAIT); <DMA_transfer(peripheral, InBuffB, BUFSIZE)> CACHE_wbL2(OutBuffB, BUFSIZE, CACHE_WAIT); <DMA_transfer(OutBuffB, peripheral, BUFSIZE)> process(InBuffA, OutBuffA, BUFSIZE); /* ––––––––––––––––––––––––––––––––––––––––––––––––––––– /* InBuffB –> OutBuffB Processing */ /* ––––––––––––––––––––––––––––––––––––––––––––––––––––– CACHE_InvL2(InBuffA, BUFSIZE, CACHE_WAIT); <DMA_transfer(peripheral, InBuffA, BUFSIZE)> CACHE_wbL2(OutBuffA, BUFSIZE, CACHE_WAIT); <DMA_transfer(OutBuffA, peripheral, BUFSIZE)> process(InBuffB, OutBuffB, BUFSIZE); } */ */

cache的使用

cache的使用
（实用版）
目录
1.Cache 的定义和作用
2.Cache 的分类
3.Cache 的工作原理
4.Cache 的优点和局限性
5.Cache 的应用领域
正文
一、Cache 的定义和作用
Cache，即高速缓存，是一种存储技术，主要用于提高数据访问速度。

它的主要作用是减少数据读取时间，提高数据处理速度，从而提高系统的运行效率。

二、Cache 的分类
Cache 主要分为以下几类：
1.CPU Cache：位于 CPU 内部，主要用于存储和加速 CPU 访问的数据和指令。

2.内存 Cache：位于内存和硬盘之间，主要用于存储经常访问的数据，以减少内存和硬盘之间的读写延迟。

3.硬盘 Cache：位于硬盘内部，主要用于存储从磁盘读取的数据，提高读取速度。

4.网络 Cache：位于网络节点，主要用于存储经常访问的网络数据，以减少网络延迟。

三、Cache 的工作原理
Cache 的工作原理主要基于局部性原理，即程序在一段时间内对某些数据的访问比其他数据更频繁。

Cache 通过以下方式实现数据访问加速：
1.命中率：Cache 能够命中访问的数据，直接从 Cache 中读取，无
需访问其他存储设备。

2.未命中：当 Cache 中没有访问的数据时，Cache 会将数据从其他
存储设备中读取并存储，以便下次访问。

四、Cache 的优点和局限性
Cache 的优点主要体现在提高了数据访问速度，减少了数据读取时间，提高了系统的运行效率。

然而，Cache 也存在一定的局限性，如 Cache 失效、Cache 一致性等问题。

code cache利用率

code cache利用率
Code cache利用率是指在程序执行过程中，代码缓存（Code Cache）的有效利用程度。

代码缓存是计算机系统中用于存储和执行已编译的代码的区域。

它可以提高程序的执行性能，减少编译的时间和开销。

代码缓存利用率可以用以下公式计算：
代码缓存利用率 = 已使用的代码缓存空间 / 总的代码缓存空间
其中，已使用的代码缓存空间是指当前存储了已编译代码的代码缓存空间的大小，总的代码缓存空间是指代码缓存的最大容量。

代码缓存利用率越高，表示代码缓存的利用程度越高，程序执行性能也会更好。

而如果代码缓存利用率较低，可能会导致频繁的编译操作，降低程序的执行效率。

因此，优化代码缓存利用率是提高程序性能的一项重要策略。

cache工作原理

cache工作原理1. 概述Cache是计算机系统中的一种高速缓存存储器，用于提高数据访问速度。

它位于主存和CPU之间，用于存储最常用的数据和指令。

Cache工作原理是通过在高速缓存中存储最常访问的数据，以便CPU能够更快地访问这些数据，从而提高系统的整体性能。

2. Cache的结构Cache通常由多级结构组成，其中包括L1、L2、L3等多级缓存。

每一级缓存都有不同的大小和访问速度，越靠近CPU的缓存级别越小且速度越快。

一般来说，L1缓存是最小且最快的，L3缓存是最大且最慢的。

3. Cache的工作原理当CPU需要访问数据时，它首先会检查L1缓存。

如果数据在L1缓存中找到，CPU就可以直接从缓存中读取数据，这样可以大大提高访问速度。

如果数据不在L1缓存中，CPU会继续检查更大的L2缓存，以此类推，直到找到数据或者最后一级缓存。

如果数据在任何一级缓存中找到，CPU会将数据加载到更靠近CPU的缓存级别中，并从缓存中读取数据。

如果数据在所有缓存中都找不到，CPU将从主存中读取数据，并将其加载到L1缓存中，以备将来的访问。

4. Cache的命中和未命中当CPU在缓存中找到所需的数据时，称为“命中”。

如果数据不在缓存中，称为“未命中”。

命中率是衡量缓存性能的重要指标。

高命中率意味着大部分数据都能够从缓存中读取，从而提高系统性能。

未命中率高则意味着缓存无法满足CPU的需求，导致频繁从主存中读取数据，降低系统性能。

5. Cache的替换策略当缓存满时，如果需要将新的数据加载到缓存中，就需要替换掉一部分已有的数据。

常用的替换策略有最近最少使用（LRU）、随机替换等。

LRU策略是指替换最近最长时间未被访问的数据，以便为新的数据腾出空间。

6. Cache的写策略Cache的写策略包括写回（Write Back）和写直达（Write Through）两种方式。

写回策略是指当CPU修改缓存中的数据时，只会更新缓存数据，不会立即写回主存，而是等到缓存被替换出去时才写回主存。

cache的中文译名是

cache的中文译名
cache的中文译名是缓存。

缓存（cache），原始意义是指访问速度比一般随机存取存储器（RAM）快的一种高速存储器，通常它不像系统主存那样使用DRAM技术，而使用昂贵但较快速的SRAM技术。

缓存的设置是所有现代计算机系统发挥高性能的重要因素之一。

特点：
Cache通常保存着一份内存储器中部分内容的副本，该内容副本是最近曾被CPU使用过的数据和程序代码。

Cache 的有效性是利用了程序对存储器的访问在时间上和空间上所具有的局部区域性，即对大多数程序来说，在某个时间片内会集中重复地访问某一个特定的区域。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

The Impact of Cache Organization in Optimizing Microprocessor Power Consumption

N. Mohamed, N. Botros, and W. Zhang Department of Electrical and Computer Engineering Southern Illinois University, Carbondale, IL 62901-6603

ABSTRACT In the recent years, power consumption has be-come increasingly an important design concern as silicon area and performance in modern computer systems design. Several factors have contributed to this trend. Perhaps the most visible have been the remarkable success and growth of Personal Digital Assistants (PDA’s), Cellular phones and pagers, etc .This work is an attempt to explore the impact of feature size shrinkage and cache configuration on optimizing power consumption in modern processors. The work studies two commercially known RISC micro- processors. The StrongARM and the Alpha-21064 microprocessors. The latter is the ancestor of the former and is therefore used here as the baseline in our analysis. Our analysis has illustrated quantitatively great power saving when technology is downsized and cache is well organized Keywords: Deep Submicron, channel length, device threshold, cache associativity

1) INTRODUCTION The StrongARM microprocessor has a fascinating story of drastically decreasing its power consumption from 26 watts- the level of its decedent Alpha21064 [1] -to less than 0.5 watts. Its designers were able to achieve this by aggressively pursuing various techniques across different design hierarchy extending from the device level all the way up to the system level [2]. Since it has been reported that cache consumes well above 40% of total power, we chose to quantitatively explore how influential has been the role of the system cache organization. Moreover, we studied the role

that technology continues to play to achieve this goal as semiconductor industry advances into Deep Submicron (DSM) era. We examined the various trades off between power consumption and performance due to these factors. In the following sections, we present the basic analytical power models used by our power estimating tool: cacti [3] in section 2. We explore the process scaling in section 3 and cache system impact in section 4. The experimental results are stretched in section 5 and we conclude our work in section 6.

2) POWER MODEL To drive analytical power consumption models, let us consider a simple CMOS inverter shown in Figure 1. Its symmetrical shape, its full logic swing and high noise margins make the circuit a fairly reasonable paradigm for most CMOS circuits. Consider Precharge phase of the circuit when the Vout rise from low-to-high (in response to high-to-low transition of the input voltage Vin). Assuming that both transistors are never on simultaneously, the inverter reduces to the equivalent circuit shown in Figure 2 and the total energy E drawn from the power supply is given by

E = ∫ vdd(t). Vdd. dt = Vdd ∫ CL (dvout/dt).dt = CLVdd2 (1)

Figure 1: The CMOS inverter The energy Ec stored in the CL is given by Ec = ∫ Ivdd(t) Vout dt = ∫ CL (dv/dt) dt =

CL ∫ Vout dVout= ½ CL Vdd2 (2)

Thus, during this phase, half of the drawn energy has been dissipated in the PMOS device while the other half has been stored in the load capacitor CL.

Figure 2: Inverter Equivalent Circuit - precharge phase-

Now, considering the discharge phase when the output voltage Vout drops back to low- in response to the input voltage rise to high-the equivalent circuit model is as depicted by Figure 3. The energy En dissipated by the nMOS device is equal to the energy that had been stored in load capacitance. In other words En = ½ CL Vdd2 (3) It follows that the total energy dissipated in both devices Ed, is equal to the energy drawn from the power supply Ed = ∫Vdd(t) Vdd dt = Vdd ∫ CL (dvout/dt).dt = CL Vdd2 (4) Figure 3: Inverter Equivalent Circuit -discharge phase- If the inverter switches between those two phases (precharge and discharge) at a rate of f times per second, then the equivalent power dissipation Pd is given by Pd = CL Vdd2 f (5) If we assume further that the inverter is embedded in a larger integrated circuit, as it is normally the case, and the probability that it switches at every cycle is ρ, then more precise power dissipation metric will be Pd = ρ CL Vdd2 f (6) This power component is know as the dynamic power consumption [3] and is normally contributes to most of power consumption in CMOS circuits. Studies have shown that, for a short period of time Ts/c, and as the input voltage Vin approaches or leaves the benchmark of Vdd/2, both nMOS and pMOS devices conduct simultaneously causing the instantaneous current to spike high to Is/c. This, in turn, consumes portion of the delivered energy and can be computed as