The non-critical buffer Using load latency tolerance to improve data cache efficiency

合集下载

HP HP-UX 说明书

本文档中的信息如有更改，恕不另行通知。担保随 HP 产品及服务提供的明示性担保声明中列出了适用于此 HP 产品及服务的专用担保条款。本文中的任何内容均不构成额外的担保。 HP 对本文中的技术或编辑错误以及缺漏不负任何责任。美国政府许可机密计算机软件。必须有 HP 授予的有效许可证，方可拥有、使用或复制本软件。根据供应商的标准商业许可证的规定，美国政府应遵守 FAR 12.211 和 12.212 中有关 “商业计算机软件” 、 “计算机软件文档”与 “商业货物技术数据”条款的规定。附加版权声明本文档及其所涉及的软件可能同时受到下述一项或多项版权的保护。某些单独的联机帮助页将对这些附加版权加以认可。版权所有 1979, 1980, 1983, 1985-1993 Regents of the University of California 版权所有 1980, 1984, 1986 Novell, Inc. 版权所有 1985, 1986, 1988 Massachusetts Institute of Technology 版权所有 1986-2000 Sun Microsystems, Inc. 版权所有 1988 Carnegie Mellon University 版权所有 1989-1991 The University of Maryland 版权所有 1989-1993 The Open Software Foundation, Inc. 版权所有 1990 Motorola, Inc. 版权所有 1990-1992 Cornell University。版权所有 1991-2003 Mentat Inc. 版权所有 1996 Morning Star Technologies, Inc. 版权所有 1996 Progressive Systems, Inc.

0x6747803指令引用的0x00000034内存,该内存不能为read

0x6747803指令引用的0x00000034内存,该内存不能为read1. 引言1.1 概述本篇长文旨在探讨0x6747803指令引用的0x00000034内存以及该内存为何不能进行read操作。

通过实际案例分析与应用探讨，我们将深入了解相关问题，并对该问题的技术研究和发展进行概述。

最后，我们将总结已有的研究成果并展望未来可能的开拓方向。

1.2 文章结构本文分为五个主要部分，各部分内容如下：第一部分为引言，简要介绍文章的目的、结构和研究重点。

第二部分将详细探讨0x6747803指令引用的0x00000034内存。

我们将解释什么是0x6747803指令以及这段内存的特点。

同时，我们还将讨论为何该内存不能进行read操作。

第三部分通过实际案例来进一步分析和应用0x6747803指令引用的0x00000034内存。

我们将提供两个具体案例并对其进行深入探讨，同时还会对该技术在实践中的局限性进行分析。

第四部分涵盖了近年来相关研究进展概述以及技术发展动态和趋势展望。

我们将介绍最新的研究成果，并对未来该领域可能面临的挑战和问题进行分析。

最后一部分为结论与展望，我们将总结已有的研究成果和发现，并提出新的开拓方向和可行性评估。

1.3 目的本文目的是深入研究0x6747803指令引用的0x00000034内存及相关问题。

通过对其特点、实际案例以及技术发展动态的探讨，旨在增加对该问题的理解并为未来的研究提供参考。

同时，本文还希望能够引起读者对该领域发展动态和挑战的关注，为解决相关问题提供新的思路和方法。

2. 0x6747803指令引用的0x00000034内存：2.1 什么是0x6747803指令：在计算机编程中，指令是一条命令，用于执行某种操作或完成特定任务。

在这种情况下，0x6747803指令是一个特定的机器级指令，它在计算机程序中被引用并执行。

这个指令可能具有特殊的功能或作用。

2.2 0x00000034内存的特点：内存是计算机中用于存储数据和程序代码的地方。

MySQL 错误代码以及出错信息对照大全之欧阳歌谷创编

0101 属于其他进程的专用标志。

欧阳歌谷（2021.02.01）0102 标志已经设置，无法关闭。

0103 无法再次设置该标志。

0104 中断时无法请求专用标志。

0105 此标志先前的所有权已终止。

0106 请将软盘插入驱动器 %1。

0107 后续软盘尚未插入，程序停止。

0108 磁盘正在使用或已由其他进程锁定。

0109 管道已经结束。

0110 系统无法打开指定的设备或文件。

0111 文件名太长。

0112 磁盘空间不足。

0113 没有其他可用的内部文件标识符。

0114 目标内部文件标识符不正确。

0117 该应用程序所运行的 IOCTL 调用不正确。

0118 校验写入的开关参数值不正确。

0119 系统不支持所请求的命令。

0120 该系统上不支持此功能。

0121 标记已超时。

0123 文件名、目录名或卷标语法错误。

0124 系统调用层不正确。

0125 磁盘没有卷标。

0126 找不到指定的模块。

0127 找不到指定的过程。

0128 没有要等候的子进程。

0129 模式下运行。

0130 试图使用操作(而非原始磁盘I/O)的已打开磁盘分区的文件句柄。

0131 试图将文件指针移至文件开头之前。

0132 无法在指定的设备或文件中设置文件指针。

0133 对于包含已连接驱动器的驱动器，不能使用 JOIN 或 SUBST 命令。

0134 试图在已经连接的驱动器上使用 JOIN 或 SUBST 命令。

0135 试图在已经替换的驱动器上使用 JOIN 或 SUBST 命令。

0136 系统试图删除尚未连接的驱动器的 JOIN。

0137 系统试图删除尚未替换的驱动器的替换项。

0138 系统试图将驱动器连接到已连接的驱动器下的目录。

0139 系统试图将驱动器替换成已替换的驱动器下的目录。

0140 系统试图将驱动器连接到已替换的驱动器的一个目录中。

0141 系统试图将驱动器替换成到已连接的驱动器下的目录。

0142 此时系统无法运行 JOIN 或 SUBST。

STM32固件库使用手册的中文翻译版

该固态函数库通过校验所有库函数的输入值来实现实时错误检测。该动态校验提高了软件的鲁棒性。实时检测适合于用户应用程序的开发和调试。但这会增加了成本，可以在最终应用程序代码中移去，以。
因为该固件库是通用的，并且包括了所有外设的功能，所以应用程序代码的大小和执行速度可能不是最优的。对大多数应用程序来说，用户可以直接使用之，对于那些在代码大小和执行速度方面有严格要求的应用程序，该固件库驱动程序可以作为如何设置外设的一份参考资料，根据实际需求对其进行调整。
1.3.1 变量 ................................................................................................................................................ 28 1.3.2 布尔型 ............................................................................................................................................ 28 1.3.3 标志位状态类型 ........................................................................................................................... 29 1.3.4 功能状态类型 .............................................................................................................

nmi detected please consult -回复

nmi detected please consult -回复什么是nmi？NMI（Non-Maskable Interrupt）是一种特殊类型的中断，用于通知处理器发生了一个严重的系统故障，需要立即停止当前的执行并执行一些特殊的处理。

与可屏蔽中断相比，NMI无法被屏蔽或屏蔽。

它被设计成正常的中断处理器无法阻止或忽略的紧急情况。

NMI的发生原因？NMI中断通常由硬件故障、内存错误、总线故障或其他不容忽视的系统问题引起。

例如，当计算机系统遇到故障的电源供应、内存读写错误、CPU 过热等问题时，硬件会触发NMI中断来通知处理器出现问题。

因为NMI 无法被屏蔽，所以即使处理器被卡住或处于繁忙状态，也会立即响应NMI 中断。

NMI的处理过程？当NMI中断发生时，处理器会立即停止当前的执行，并跳转到指定的中断处理例程。

这个处理例程通常由操作系统或硬件厂商提供，并用于处理紧急情况。

处理例程通常会执行以下操作：1. 保存处理器状态：首先，处理例程会保存处理器的当前状态，包括通用寄存器、程序计数器、标志寄存器等。

这样做是为了确保在处理完NMI 后，能够恢复到正常的执行状态。

2. 处理紧急情况：接下来，处理例程会根据NMI中断的具体原因来执行相应的处理操作。

例如，如果是硬件故障导致的NMI，处理例程可能会尝试重新初始化故障设备或进行一些其他修复措施。

3. 恢复正常执行：处理例程完成紧急处理后，会恢复处理器的正常执行状态。

它会从之前保存的状态中恢复寄存器的值，并将程序计数器设置回原来的位置，以确保处理器可以继续执行被中断的程序或任务。

为什么NMI无法被屏蔽？NMI被设计成无法被处理器屏蔽或忽略的原因在于它的紧急性和重要性。

由于NMI通常是由严重的系统故障引起的，忽略NMI可能会导致更严重的问题，甚至损坏硬件或丢失数据。

因此，处理器必须响应NMI中断，并执行相应的紧急处理操作，以确保系统能够尽快恢复正常工作状态。

NMI的应用和意义？NMI中断被广泛应用于计算机系统的可靠性和可用性方面。

USPpharmacopeia791-pH美国药典pH值

791 pHFor compendial purposes, pH is defined as the value given by a suitable, properly standardized, potentiometric instrument (pH meter) capable of reproducing pH values to 0.02 pH unit using an indicator electrode sensitive to hydrogen-ion activity, the glass electrode, and a suitable reference electrode. The instrument should be capable of sensing the potential across the electrode pair and, for pH standardization purposes, applying an adjustable potential to the circuit by manipulation of ―standardization,‖―zero,‖ ―asymmetry,‖ or ―calibration‖ control, and should be able to control the change in millivolts per unit change in pH reading through a ―temperature‖ and/or ―slope‖ control.Measurements are made at 25 ± 2, unless otherwise specified in the individual monograph or herein.The pH scale is defined by the equation:pH = pHs + (E – E S) / kin which E and E S are the measured potentials where the galvanic cell contains the solution under test, represented by pH, and the appropriate Buffer Solution for Standardization, represented by pHs, respectively. The value of k is the change in potential per unit change in pH and is theoretically [0.05916 + 0.000198(t – 25)] volts at any temperature t.It should be emphasized that the definitions of pH, the pH scale, and the values assigned to the Buffer Solutions for Standardization are for the purpose of establishing a practical, operational system so that results may be compared between laboratories. The pH values thus measured do not correspond exactly to those obtained by the definition, pH = – log a H+. So long as the solution being measured is sufficiently similar in composition to the buffer used for standardization, the operational pH corresponds fairly closely to the theoretical pH. Although no claim is made with respect to the suitability of the system for measuring hydrogen-ion activity or concentration, the values obtained are closely related to the activity of the hydrogen-ion in aqueous solutions.Where a pH meter is standardized by use of an aqueous buffer and then used to measure the ―pH‖ of a nonaqueous solution or suspension, the ionization constant ofthe acid or base, the dielectric constant of the medium, the liquid-junction potential(which may give rise to errors of approximately 1 pH unit), and the hydrogen-ion response of the glass electrode are all changed. For these reasons, the values so obtained with solutions that are only partially aqueous in character can be regarded only as apparent pH values.BUFFER SOLUTIONS FOR STANDARDIZATION OF THE PH METERBuffer Solutions for Standardization are to be prepared as directed in the accompanying table.* Buffer salts of requisite purity can be obtained from the National Institute of Science and Technology. Solutions may be stored in hard glass or polyethylene bottles fitted with a tight closure or carbon dioxide-absorbing tube (soda lime). Fresh solutions should be prepared at intervals not to exceed 3 months using carbon dioxide-free water. The table indicates the pH of the buffer solutions as a function of temperature. The instructions presented here are for the preparation of solutions having the designated molal (m) concentrations. For convenience, and to facilitate their preparation, however, instructions are given in terms of dilution to a 1000-mL volume rather than specifying the use of 1000 g of solvent, which is the basis of the molality system of solution concentration. The indicated quantities cannot be computed simply without additional information.pH Values of Buffer Solutions for StandardizationTemperature, CPotassiumTetraoxalate, 0.05 mPotassiumBiphthalate, 0.05 mEquimolalPhosphate, 0.05 mSodiumTetraborate,0.01 mCalciumHydroxide,Saturatedat 2510 1.67 4.00 6.92 9.33 13.00 15 1.67 4.00 6.90 9.28 12.81 20 1.68 4.00 6.88 9.23 12.63 25 1.68 4.01 6.86 9.18 12.45 30 1.68 4.02 6.85 9.14 12.29 35 1.69 4.02 6.84 9.10 12.13 40 1.69 4.04 6.84 9.07 11.98 45 1.70 4.05 6.83 9.04 11.84 50 1.71 4.06 6.83 9.01 11.71Temperature, CPotassiumTetraoxalate, 0.05 mPotassiumBiphthalate, 0.05 mEquimolalPhosphate, 0.05 mSodiumTetraborate,0.01 mCalciumHydroxide,Saturatedat 2555 1.72 4.08 6.83 8.99 11.5760 1.72 4.09 6.84 8.96 11.45 Potassium Tetraoxalate, 0.05 m— Dissolve 12.61 g of KH3(C2O4)2·2H2O in water to make 1000 mL.Potassium Biphthalate, 0.05 m— Dissolve 10.12 g of KHC8H4O4, previously dried at 110 for 1 hour, in water to make 1000 mL.Equimolal Phosphate, 0.05 m— Dissolve 3.53 g of Na2HPO4 and 3.39 g of KH2PO4,each previously dried at 120 for 2 hours, in water to make 1000 mL.Sodium Tetraborate, 0.01 m— Dissolve 3.80 g of Na2B4O7·10H2O in water to make 1000 mL. Protect from absorption of carbon dioxide.Calcium Hydroxide, saturated at 25— Shake an excess of calcium hydroxide withwater, and decant at 25 before use. Protect from absorption of carbon dioxide. Because of variations in the nature and operation of the available pH meters, it is not practicable to give universally applicable directions for the potentiometric determinations of pH. The general principles to be followed in carrying out the instructions provided for each instrument by its manufacturer are set forth in the following paragraphs. Examine the electrodes and, if present, the salt bridge prior to use. If necessary, replenish the salt bridge solution, and observe other precautions indicated by the instrument or electrode manufacturer.To standardize the pH meter, select two Buffer Solutions for Standardization whose difference in pH does not exceed 4 units and such that the expected pH of the material under test falls between them. Fill the cell with one of the Buffer Solutions for Standardization at the temperature at which the test material is to be measured. Set the ―temperature‖ control at the temperature of the solution, and adjust the calibration control to make the observed pH value identical with that tabulated. Rinse the electrodes and cell with several portions of the second Buffer Solution for Standardization, then fill the cell with it, at the same temperature as the material to bemeasured. The pH of the second buffer solution is within ±0.07 pH unit of the tabulated value. If a larger deviation is noted, examine the electrodes and, if they are faulty, replace them. Adjust the ―slope‖ or ―temperature‖ control to make the observed pH value identical with that tabulated. Repeat the standardization until both Buffer Solutions for Standardization give observed pH values within 0.02 pH unit of the tabulated value without further adjustment of the controls. When the system is functioning satisfactorily, rinse the electrodes and cell several times with a few portions of the test material, fill the cell with the test material, and read the pH value. Use carbon dioxide-free water (see Water in the section Reagents, Indicators, and Solutions) for solution or dilution of test material in pH determinations. In all pH measurements, allow a sufficient time for stabilization.Where approximate pH values suffice, indicators and test papers (see Indicators and Indicator Test Papers, in the section Reagents, Indicators, and Solutions) may be suitable.For a discussion of buffers, and for the composition of standard buffer solutions called for in compendial tests and assays, see Buffer Solutions in the section Reagents, Indicators, and Solutions.* Commercially available buffer solutions for pH meter standardization, standardized by methods traceable to the National Institute of Standards and Technology (N IST), labeled with a pH value accurate to 0.01 pH unit may be used. For standardization solutions having a pH lower than 4, a labeled accuracy of 0.02 is acceptable. Solutions prepared from ACS reagent grade materials or other suitable materials, in the stated quantities, may be used provided the pH of the resultant solution is the same as that of the solution prepared from the NIST certified material.。

sae_j2534-1_2004

SURFACEVEHICLERECOMMENDED PRACTICESAE Technical Standards Board Rules provide that: “This report is published by SAE to advance the state of technical and engineering sciences. The use of this report is entirely voluntary, and its applicability and suitability for any particular use, including any patent infringement arising therefrom, is the sole responsibility of the user.”SAE reviews each technical report at least every five years at which time it may be reaffirmed, revised, or cancelled. SAE invites your written comments and suggestions.Copyright © 2004 SAE InternationalAll rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of SAE.TO PLACE A DOCUMENT ORDER: Tel: 877-606-7323 (inside USA and Canada)Tel: 724-776-4970 (outside USA)SAE J2534-1 Revised DEC2004TABLE OF CONTENTS 1. Scope (5)2. References (5)2.1 Applicable Documents (5)2.1.1 SAE Publications (5)2.1.2 ISO Documents (6)3. Definitions (6)4. Acronyms (6)5. Pass-Thru C oncept (7)6. Pass-Thru System Requirements (8)6.1 P C Requirements (8)6.2 Software Requirements and Assumptions (8)6.3 Connection to PC (9)6.4 Connection to Vehicle............................................................................................................9 6.5 C ommunication Protocols (9)6.5.1 ISO 9141................................................................................................................................9 6.5.2 ISO 14230-4 (KWP2000).. (10)6.5.3 SAE J1850 41.6 kbps PWM (Pulse Width Modulation) (10)6.5.4 SAE J1850 10.4 kbps VPW (Variable Pulse Width) (10)6.5.5 C AN (11)6.5.6 ISO 15765-4 (CAN) (11)6.5.7 SAE J2610 DaimlerChrysler SCI (11)6.6 Simultaneous Communication on Multiple Protocols (11)6.7 Programmable Power Supply (12)6.8 Pin Usage (13)6.9 Data Buffering (14)6.10 Error Recovery (14)6.10.1 Device Not Connected (14)6.10.2 Bus Errors (14)7. Win32 Application Programming Interface (15)7.1 API Functions – Overview (15)7.2 API Functions - Detailed Information (15)7.2.1 PassThruOpen (15)7.2.1.1 C /C ++ Prototype (15)7.2.1.2 Parameters (16)7.2.1.3 Return Values (16)7.2.2 PassThru C lose (16)7.2.2.1 C /C ++ Prototype (16)7.2.2.2 Parameters (16)7.2.2.3 Return Values (17)7.2.3 PassThru C onnect (17)7.2.3.1 C /C ++ Prototype (17)7.2.3.2 Parameters (17)7.2.3.3 Flag Values (18)7.2.3.4 Protocal ID Values (19)SAE J2534-1 Revised DEC20047.2.3.5 Return Values (20)7.2.4 PassThruDisconnect............................................................................................................20 7.2.4.1 C /C ++ Prototype (20)7.2.4.2 Parameters (21)7.2.4.3 Return Values ......................................................................................................................21 7.2.5 PassThruReadMsgs. (21)7.2.5.1 C /C ++ Prototype (22)7.2.5.2 Parameters...........................................................................................................................22 7.2.5.3 Return Values . (23)7.2.6 PassThruWriteMsgs (23)7.2.6.1 C /C ++ Prototype ..................................................................................................................24 7.2.6.2 Parameters (24)7.2.6.3 Return Values (25)7.2.7 PassThruStartPeriodicMsg..................................................................................................26 7.2.7.1 C /C ++ Prototype (26)7.2.7.2 Parameters (26)7.2.7.3 Return Values ......................................................................................................................27 7.2.8 PassThruStopPeriodicMsg .. (27)7.2.8.1 C /C ++ Prototype (28)7.2.8.2 Parameters...........................................................................................................................28 7.2.8.3 Return Values . (28)7.2.9 PassThruStartMsgFilter.......................................................................................................28 7.2.9.1 C /C ++ Prototype (31)7.2.9.2 Parameters (31)7.2.9.3 Filter Types ..........................................................................................................................32 7.2.9.4 Return Values . (33)7.2.10 PassThruStopMsgFIlter (33)7.2.10.1 C /C ++ Prototype ..................................................................................................................33 7.2.10.2 Parameters (34)7.2.10.3 Return Values (34)7.2.11 PassThruSetProgrammingVoltage (34)7.2.11.1 C /C ++ Prototype (34)7.2.11.2 Parameters (35)7.2.11.3 Voltage Values (35)7.2.11.4 Return Values (35)7.2.12 PassThruReadVersion (36)7.2.12.1 C /C ++ Prototype (36)7.2.12.2 Parameters (36)7.2.12.3 Return Values (37)7.2.13 PassThruGetLastError (37)7.2.13.1 C /C ++ Prototype (37)7.2.13.2 Parameters (37)7.2.13.3 Return Values (37)7.2.14 PassThruIoctl (38)7.2.14.1 C /C ++ Prototype (38)7.2.14.2 Parameters (38)7.2.14.3 Ioctl ID Values (39)7.2.14.4 Return Values (39)7.3 IO C TL Section (40)7.3.1 GET_C ONFIG (41)7.3.2 SET_C ONFIG (42)SAE J2534-1 Revised DEC20047.3.3 READ_VBATT (46)7.3.4 READ_PROG_VOLTAGE....................................................................................................46 7.3.5 FIVE_BAUD_INIT . (47)7.3.6 FAST_INIT (47)7.3.7 C LEAR_TX_BUFFER (48)7.3.8 C LEAR_RX_BUFFER (48)7.3.9 C LEAR_PERIODI C _MSGS (49)7.3.10 C LEAR_MSG_FILTERS (49)7.3.11 C LEAR_FUN C T_MSG_LOOKUP_TABLE (49)7.3.12 ADD_TO_FUN C T_MSG_LOOKUP_TABLE (50)7.3.13 DELETE_FROM_FUN C T_MSG_LOOKUP_TABLE (50)8. Message Structure (51)8.1 C /C ++ Definition (51)8.2 Elements (51)8.3 Message Data Formats (52)8.4 Format Checks for Messages Passed to the API (53)8.5 Conventions for Returning Messages from the API (53)8.6 Conventions for Returning Indications from the API (53)8.7 Message Flag and Status Definitions..................................................................................54 8.7.1 RxStatus. (54)8.7.2 RxStatus Bits for Messaging Status and Error Indication....................................................55 8.7.3 TxFlags.................................................................................................................................56 9. DLL Installation and Registry...............................................................................................57 9.1 Naming of Files....................................................................................................................57 9.2 Win32 Registy. (57)9.2.1 User Application Interaction with the Registry (59)9.2.2 Attaching to the DLL from an application (60)9.2.2.1 Export Library Definition File (61)10. Return Value Error Codes (61)11. Notes (63)11.1 Marginal Indicia (63)Appendix A General ISO 15765-2 Flow Control Example (64)A.1 Flow Control Overview (64)A.1.1 Examples Overview (65)A.2 Transmitting a Segmented Message (66)A.2.1 C onversation Setup (66)A.2.2 Data Transmission (67)A.2.3 Verification (68)A.3 Transmitting an Unsegmented Message (69)A.3.1 Data Transmission (70)A.3.2 Verification (70)A.4 Receiving a Segmented Message (70)A.4.1 C onversation Setup (70)A.4.2 Reception Notification (70)A.4.3 Data Reception (71)A.5 Receiving and Unsegmented Messages (72)1.ScopeThis SAE Recommended Practice provides the framework to allow reprogramming software applications from all vehicle manufacturers the flexibility to work with multiple vehicle data link interface tools from multiple tool suppliers. This system enables each vehicle manufacturer to control the programming sequence for electronic control units (EC Us) in their vehicles, but allows a single set of programming hardware and vehicle interface to be used to program modules for all vehicle manufacturers.This document does not limit the hardware possibilities for the connection between the PC used for the software application and the tool (e.g., RS-232, RS-485, USB, Ethernet…). Tool suppliers are free to choose the hardware interface appropriate for their tool. The goal of this document is to ensure that reprogramming software from any vehicle manufacturer is compatible with hardware supplied by any tool manufacturer.U.S. Environmental Protection Agency (EPA) and the C alifornia Air Resources Board (ARB) "OBD service information" regulations include requirements for reprogramming emission-related control modules in vehicles for all manufacturers by the aftermarket repair industry. This document is intended to conform to those regulations for 2004 and later model year vehicles. For some vehicles, this interface can also be used to reprogram emission-related control modules in vehicles prior to the 2004 model year, and for non-emission related control modules. For other vehicles, this usage may require additional manufacturer specific capabilities to be added to a fully compliant interface. A second part to this document, SAE J2534-2, is planned to include expanded capabilities that tool suppliers can optionally include in an interface to allow programming of these additional non-mandated vehicle applications. In addition to reprogramming capability, this interface is planned for use in OBD compliance testing as defined in SAE J1699-3. SAE J2534-1 includes some capabilities that are not required for Pass-Thru Programming, but which enable use of this interface for those other purposes without placing a significant burden on the interface manufacturers.Additional requirements for future model years may require revision of this document, most notably the inclusion of SAE J1939 for some heavy-duty vehicles. This document will be reviewed for possible revision after those regulations are finalized and requirements are better understood. Possible revisions include SAE J1939 specific software and an alternate vehicle connector, but the basic hardware of an SAE J2534 interface device is expected to remain unchanged.2.References2.1Applicable PublicationsThe following publications form a part of this specification to the extent specified herein. Unless otherwise indicated, the latest version of SAE publications shall apply.2.1.1SAE P UBLICATIONSAvailable from SAE, 400 Commonwealth Drive, Warrendale, PA 15096-0001.SAE J1850—Class B Data Communications Network InterfaceSAE J1939—Truck and Bus Control and Communications Network (Multiple Parts Apply)SAE J1962—Diagnostic ConnectorSAE J2610—DaimlerChrysler Information Report for Serial Data Communication Interface (SCI)2.1.2 ISO D OCUMENTSAvailable from ANSI, 25 west 43rd Street, New York, NY 10036-8002.ISO 7637-1:1990—Road vehicles—Electrical disturbance by conduction and coupling—Part 1:Passenger cars and light commercial vehicles with nominal 12 V supply voltageISO 9141:1989—Road vehicles—Diagnostic systems—Requirements for interchange of digital informationISO 9141-2:1994—Road vehicles—Diagnostic systems—C ARB requirements for interchange of digitalinformationISO 11898:1993—Road vehicles—Interchange of digital information—Controller area network (CAN) forhigh speed communicationISO 14230-4:2000—Road vehicles—Diagnostic systems—Keyword protocol 2000—Part 4:Requirements for emission-related systemsISO/FDIS 15765-2—Road vehicles—Diagnostics on controller area networks (C AN)—Network layerservicesISO/FDIS 15765-4—Road vehicles—Diagnostics on controller area networks (C AN)—Requirements foremission-related systems3.Definitions 3.1 RegistryA mechanism within Win32 operating systems to handle hardware and software configuration information.4. AcronymsAPI Application Programming InterfaceASCII American Standard Code for Information InterchangeCAN Controller Area NetworkC R C C yclic Redundancy C heckDLL Dynamic Link LibraryECU Electronic Control UnitIFR In-Frame ResponseIOCTL Input / Output ControlKWP Keyword ProtocolOEM Original Equipment ManufacturerP C Personal C omputerPWM Pulse Width ModulationSCI Serial Communications InterfaceSCP Standard Corporate ProtocolUSB Universal Serial BusVPW Variable Pulse Width5.Pass-Thru ConceptProgramming application software supplied by the vehicle manufacturer will run on a commonly available generic PC. This application must have complete knowledge of the programming requirements for the control module to be programmed and will control the programming event. This includes the user interface, selection criteria for downloadable software and calibration files, the actual software and calibration data to be downloaded, the security mechanism to control access to the programming capability, and the actual programming steps and sequence required to program each individual control module in the vehicle. If additional procedures must be followed after the reprogramming event, such as clearing Diagnostic Trouble C odes (DTC), writing part numbers or variant coding information to the control module, or running additional setup procedures, the vehicle manufacturer must either include this in the PC application or include the necessary steps in the service information that references reprogramming.This document defines the following two interfaces for the SAE J2534 pass-thru device:a. Application program interface (API) between the programming application running on a PC and asoftware device driver for the pass-thru deviceb. Hardware interface between the pass-thru device and the vehicleThe manufacturer of an SAE J2534 pass-thru device shall supply connections to both the PC and the vehicle. In addition to the hardware, the interface manufacturer shall supply device driver software, and a Windows installation and setup application that will install the manufacturer's SAE J2534 DLL and other required files, and also update the Windows Registry. The interface between the PC and the pass-thru device can be any technology chosen by the tool manufacturer, including RS-232, RS-485, USB, Ethernet, or any other current or future technology, including wireless technologies.All programming applications shall utilize the common SAE J2534 API as the interface to the pass-thru device driver. The API contains a set of routines that may be used by the programming application to control the pass-thru device, and to control the communications between the pass-thru device and the vehicle. The pass-thru device will not interpret the message content, allowing any message strategy and message structure to be used that is understood by both the programming application and the ECU being programmed. Also, because the message will not be interpreted, the contents of the message cannot be used to control the operation of the interface. For example, if a message is sent to the ECU to go to high speed, a specific instruction must also be sent to the interface to go to high speed.The OEM programming application does not need to know the hardware connected to the PC, which gives the tool manufacturers the flexibility to use any commonly available interface to the PC. The pass-thru device does not need any knowledge of the vehicle or control module being programmed. This will allow all programming applications to work with all pass-thru devices to enable programming of all control modules for all vehicle manufacturers.The interface will not handle the tester present messages automatically. The OEM application is responsible to handle tester present messages.6.3Connection to PCThe interface between the PC and the pass-thru device shall be determined by the manufacturer of the pass-thru device. This can be RS-232, USB, Ethernet, IEEE1394, Bluetooth or any other connection that allows the pass-thru device to meet all other requirements of this document, including timing requirements. The tool manufacturer is also required to include the device driver that supports this connection so that the actual interface used is transparent to both the PC programming application and the vehicle.6.4Connection to VehicleThe interface between the pass-thru device and the vehicle shall be an SAE J1962 connector for serial data communications. The maximum cable length between the pass-thru device and the vehicle is five (5) meters. The interface shall include an insulated banana jack that accepts a standard 0.175" diameter banana plug as the auxiliary pin for connection of programming voltage to a vehicle specific connector on the vehicle.If powered from the vehicle, the interface shall:a. operate normally within a vehicle battery voltage range of 8.0 to 18.0 volts D.C.,b. survive a vehicle battery voltage of up to 24.0 volts D.C. for at least 10 minutes,c. survive, without damage to the interface, a reverse vehicle battery voltage of up to 24.0 volts D.C. forat least 10 minutes.6.5Communication ProtocolsThe following communication protocols shall be supported:6.5.1ISO9141The following specifications clarify and, if in conflict with ISO 9141, override any related specifications in ISO 9141:a. The maximum sink current to be supported by the interface is 100 mA.b. The range for all tests performed relative to ISO 7637-1 is –1.0 to +40.0 V.c. The default bus idle period before the interface shall transmit an address, shall be 300 ms.d. Support following baud rate with ±0.5% tolerance: 10400.e. Support following baud rate with ±1% tolerance: 10000.f. Support following baud rates with ±2% tolerance: 4800, 9600, 9615, 9800, 10870, 11905, 12500,13158, 13889, 14706, 15625, and 19200.g. Support other baud rates if the interface is capable of supporting the requested value within ±2%.h. The baud rate shall be set by the application, not determined by the SAE J2534 interface. Theinterface is not required to support baud rate detection based on the synchronization byte.i. Support odd and even parity in addition to the default of no parity, with seven or eight data bits.Always one start bit and one stop bit.j. Support for timer values that are less than or greater than those specified in ISO 9141 (see Figure 30 in Section 7.3.2).k. Support ability to disable automatic ISO 9141-2 / ISO 14230 checksum verification by the interface to allow vehicle manufacturer specific error detection.l. If the ISO 9141 checksum is verified by the interface, and the checksum is incorrect, the message will be discarded.m. Support both ISO 9141 5-baud initialization and ISO 14230 fast initialization.n. Interface shall not adjust timer parameters based on keyword values.6.5.2ISO14230-4(KWP2000)The ISO 14230 protocol has the same specifications as the ISO 9141 protocol as outlined in the previous section. In addition, the following specifications clarify and, if in conflict with ISO 14230, override any related specifications in ISO 14230:a. The pass-thru interface will not automatically handle tester present messages. The application needsto handle tester present messages when required.b. The pass-thru interface will not perform any special handling for the $78 response code. Anymessage received with a $78 response code will be passed from the interface to the application. The application is required to handle any special timing requirements based on receipt of this response code, including stopping any periodic messages.6.5.3SAE J185041.6 KBPS PWM(P ULSE W IDTH M ODULATION)The following additional features of SAE J1850 must be supported by the pass-thru device:a. Capable of 41.6 kbps and high speed mode of 83.3 kbps.b. Recommend Ford approved SAE J1850PWM (SCP) physical layer6.5.4SAE J185010.4 KBPS VPW(V ARIABLE P ULSE W IDTH)The following additional features of SAE J1850 must be supported by the pass-thru device:a. Capable of 10.4 kbps and high speed mode of 41.6 kbpsb. 4128 byte block transferc. Return to normal speed after a break indication6.5.5CANThe following features of ISO 11898 (CAN) must be supported by the pass-thru device:a. 125, 250, and 500 kbpsb. 11 and 29 bit identifiersc. Support for 80% ± 2% and 68.5% ± 2% bit sample pointd. Allow raw C AN messages. This protocol can be used to handle any custom C AN messagingprotocol, including custom flow control mechanisms.6.5.6ISO15765-4(CAN)The following features of ISO 15765-4 must be supported by the pass-thru device:a. 125, 250, and 500 kbpsb. 11 and 29 bit identifiersc. Support for 80% ± 2% bit sample pointd. To maintain acceptable programming times, the transport layer flow control function, as defined inISO 15765-2, must be incorporated in the pass-thru device (see Appendix A). If the application does not use the ISO 15765-2 transport layer flow control functionality, the CAN protocol will allow for any custom transport layer.e. Receive a multi-frame message with an ISO15765_BS of 0 and an ISO15765_STMIN of 0, asdefined in ISO 15765-2.f. No single frame or multi-frame messages can be received without matching a flow control filter. Nomulti-frame messages can be transmitted without matching a flow control filter.g. Periodic messages will not be suspended during transmission or reception of a multi-framesegmented message.6.5.7SAE J2610D AIMLER C HRYSLER SCIReference the SAE J2610 Information Report for a description of the SCI protocol.When in the half-duplex mode (when SCI_MODE of TxFlags is set to {1} Half-Duplex), every data byte sent is expected to be "echoed" by the controller. The next data byte shall not be sent until the echo byte has been received and verified. If the echoed byte received doesn't match the transmitted byte, or if after a period of T1 no response was received, the transmission will be terminated. Matching echoed bytes will not be placed in the receive message queue.6.6Simultaneous Communication On Multiple ProtocolsThe pass-thru device must be capable of supporting simultaneous communication on multiple protocols during a single programming event. Figure 2 indicates which combinations of protocols shall be supported. If SC I (SAE J2610) communication is not required during the programming event, the interface shall be capable of supporting one of the protocols from data link set 1, data link set 2, and data link set 3. If SC I (SAE J2610) communication is required during the programming event, the interface shall be capable of supporting one of the SCI protocols and one protocol from data link set 1.6.9Data BufferingThe interface/API shall be capable of receiving 8 simultaneous messages. For ISO 15765 these can be multi-frame messages. The interface/API shall be capable of buffering a maximum length (4128 byte) transmit message and a maximum length (4128 byte) receive message.6.10Error Recovery6.10.1D EVICE N OT C ONNECTEDIf the DLL returns ERR_DEVICE_NOT_CONNECTED from any function, that error shall continue to be returned by all functions, even if the device is reconnected. An application can recover from this error condition by closing the device (with PassThruC lose) and re-opening the device (with PassThruOpen, getting a new device ID).6.10.2B US E RRORSAll devices shall handle bus errors in a consistent manner. There are two error strategies: Retry and Drop.The Retry strategy will keep trying to send a packet until successful or stopped by the application. If loopback is on and the message is successfully sent after some number of retries, only one copy of the message shall be placed in the receive queue. Even if the hardware does not support retries, the firmware/software must retry the transmission. If the error condition persists, a blocking write will wait the specified timeout and return ERR_TIMEOUT. The DLL must return the number of successfully transmitted messages in pNumMsgs. The DLL shall not count the message being retried in pNumMsgs. After returning from the function, the device does not stop the retries. The only functions that will stop the retries are PassThruDisconnect (on that protocol), PassThruC lose, or PassThruIoctl (with an IoctllD of CLEAR_TX_BUFFER).Devices shall use the Retry strategy in the following scenarios:•All CAN errors, such as bus off, lack of acknowledgement, loss of arbitration, and no connection (lack of terminating resistor)•SAE J1850PWM or SAE J1850VPW bus fault (bus stuck passive) or loss of arbitration (bus stuck active)The Drop strategy will delete a message from the queue. The message can be dropped immediately on noticing an error or at the end of the transmission. PassThruWriteMsg shall treat dropped messages the same as successfully transmitted messages. However, if loopback is on, the message shall not be placed in the receive queue.Devices shall use the Drop strategy in the following scenarios:•If characters are echoed improperly in SCI•Corrupted ISO 9141 or ISO 14230 transmission•SAE J1850PWM lack of acknowledgement (Exception: The device must try sending the message 3 times before dropping)7.2.5.1 C / C++ Prototypeextern “C” long WINAPI PassThruReadMsgs(unsigned long ChannelID,*pMsg,PASSTHRU_MSGunsigned long *pNumMsgs,unsigned long Timeout)7.2.5.2ParametersChannelID The channel ID assigned by the PassThruConnect function.pMsg Pointer to message structure(s).pNumMsgs Pointer to location where number of messages to read is specified. On return from the function this location will contain the actual number of messages read.Timeout Read timeout (in milliseconds). If a value of 0 is specified the function retrieves up to pNumMsgs messages and returns immediately. Otherwise, the API will not return untilthe Timeout has expired, an error has occurred, or the desired number of messageshave been read. If the number of messages requested have been read, the functionshall not return ERR_TIMEOUT, even if the timeout value is zero.When using the ISO 15765-4 protocol, only SingleFrame messages can be transmitted without a matching flow control filter. Also, P I bytes are transparently added by the API. See PassThruStartMsgFilter and Appendix A for a discussion of flow control filters.7.2.6.1 C / C++ Prototypeextern “C” long WINAPI PassThruWriteMsgs(u nsigned long ChannelID,*pMsg,PASSTHRU_MSGunsigned long *pNumMsgs,unsigned long Timeout)7.2.6.2ParametersChannelID The channel ID assigned by the PassThruConnect function.pMsg Pointer to message structure(s).pNumMsgs Pointer to the location where number of messages to write is specified. On return will contain the actual number of messages that were transmitted (when Timeout is non-zero) or placed in the transmit queue (when Timeout is zero).Timeout Write timeout (in milliseconds). When a value of 0 is specified, the function queues as many of the specified messages as possible and returns immediately. When a valuegreater than 0 is specified, the function will block until the Timeout has expired, an errorhas occurred, or the desired number of messages have been transmitted on the vehiclenetwork. Even if the device can buffer only one packet at a time, this function shall beable to send an arbitrary number of packets if a Timeout value is supplied. Since thefunction returns early if all the messages have been sent, there is normally no penalty forhaving a large timeout (several seconds). If the number of messages requested havebeen written, the function shall not return ERR_TIMEOUT, even if the timeout value iszero.W hen an ERR_TIMEOUT is returned, only the number of messages that were sent onthe vehicle network is known. The number of messages queued is unknown. Applicationwriters should avoid this ambiguity by using a Timeout value large enough to work onslow devices and networks with arbitration delays.。

cpu cache无效的方法

CPU缓存无效是指将CPU缓存中的数据与内存中的数据不一致，可能会导致程序错误和数据不准确。

为了解决CPU缓存无效的问题，可以采取以下方法：1. 内存屏障（Memory Barrier）：内存屏障是一种特殊的指令，用于确保CPU缓存中的数据与内存中的数据一致。

内存屏障可以分为读屏障、写屏障和全屏障。

通过插入适当的内存屏障指令，可以强制CPU刷新缓存并与内存同步。

2. 缓存刷新（Cache Flushing）：缓存刷新是一种将CPU缓存中的数据刷新到内存的操作。

可以使用特定的指令或者系统调用来执行缓存刷新操作。

在需要保证数据一致性的时候，可以在关键代码段前后执行缓存刷新操作。

3. 写入内存（Write to Memory）：在对共享数据进行修改后，可以将数据直接写入内存，而不是仅仅修改CPU缓存中的数据。

这样可以确保其他CPU核心或者设备可以读取到最新的数据。

4. 使用原子操作（Atomic Operations）：原子操作是一种不可中断的操作，可以确保对共享数据的读取和写入是原子性的。

原子操作可以避免多线程同时访问共享数据时的竞态条件和数据不一致问题。

5. 使用同步机制（Synchronization）：使用同步机制，如互斥锁、信号量、条件变量等，可以保证多线程访问共享数据的顺序和一致性。

同步机制可以防止多线程同时访问和修改共享数据，从而避免CPU缓存无效的问题。

6. 使用缓存一致性协议（Cache Coherence Protocol）：缓存一致性协议是一种硬件级别的机制，用于确保多个CPU核心或者多个设备之间的缓存一致性。

通过使用缓存一致性协议，可以自动处理CPU缓存无效的问题，无需手动干预。

在编写程序时，应该注意避免出现CPU缓存无效的情况，尽量使用同步机制和原子操作来保证数据的一致性。

如果确实需要处理CPU缓存无效的问题，可以根据具体的硬件和平台，选择合适的方法来解决。

同时，也可以借助工具和调试器来检测和解决CPU缓存无效的问题。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

The Non-Critical Buffer:Using Load Latency Toleranceto Improve Data Cache EfﬁciencyBrian R.Fisk and R.Iris BaharDivision of Engineering,Brown University,Providence,RI02912Email:brf@,iris@AbstractData cache performance is critical to overall proces-sor performance as the latency gap between CPU core andmain memory increases.Studies have shown that someloads have latency demands that allow them to be servicedfrom slower portions of memory,thus allowing more criticaldata to be kept in higher levels of the cache.We provide astrategy for identifying this latency-tolerant data at runtimeand,using simple heuristics,keep it out of the main cacheand place it instead in a small,parallel,associative buffer.Using such a“Non-Critical Buffer”dramatically improvesthe hit rate for more critical data,and leads to a perfor-mance improvement comparable to or better than other tra-ditional cache improvement schemes.IPC improvements ofover4%are seen for some benchmarks.1.IntroductionThe performance increase of today's high-end micro-processors is due to many factors,among them is the useof speculative,out-of-order execution with highly accu-rate branch prediction.Branch prediction has increasedinstruction-level parallelism(ILP)by allowing programs tospeculatively execute beyond control boundaries,while out-of-order execution has increased ILP by allowing moreﬂex-ibility in instruction issue and execution.The combinationof these techniques has increased processor performance inpart by hiding the memory latency penalty in the case of acache miss;instructions without data dependencies on thecache miss instruction may execute while the miss is beingserviced,thus sustaining higher processor throughput.Hiding memory latency has become particularly criti-cal in the past few years as CPU performance has beenincreasing faster than memory access technologies havebeen improving.This large and increasing gap betweenthe CPU and memory means that a larger number of in-Similarly,if a data access is initiated far enough in advance, it may be serviced by lower levels of the memory hierarchy without affecting performance.Relegating this“latency-tolerant”data to slower portions of memory mitigates the problem of having a limited sizedﬁrst-level cache by re-serving precious cache entries for the less tolerant and/or more frequently accessed data.High-speed caches are often direct-mapped,despite the fact that direct-mapped caches often suffer from conﬂict misses.To alleviate this problem,a small associative buffer may be used in parallel with theﬁrst-level data cache(such as the victim cache[10]or non-temporal buffer[11]cited earlier).These strategies often use this cache as a“trash buffer”for data that is deemed less useful than some other competing data,but may still be required by the processor at some point.In this work,we propose using this buffer to hold data for non-critical,latency-tolerant loads while leaving the more critical data in the high-speed main cache. The non-critical data is identiﬁed during execution,when the data access misses in theﬁrst-level cache,so that the ﬁll data may be prevented from being written into the main cache.In this study we use a cycle-level simulator of an8-issue, speculative,out-of-order processor to evaluate the effective-ness of our Non-Critical Buffer(NCB)scheme compared to other more traditional caching strategies.We make the fol-lowing contributions:We propose various strategies for detecting non-critical data in real time and develop policies to keep this data out of the mainﬁrst level data cache.We show that using the Non-Critical Buffer results ina performance improvement that is usually better thanusing traditional caching schemes.We show that when the Non-Critical Buffer is used, overallﬁrst-level cache miss rates may actually in-crease while overall performance remains the same or improves.This lends support to the idea that the cache is being used more efﬁciently.2.Background and prior work2.1.Associative cachesAs mentioned in Section1,cache performance may be improved by the use of a buffer alongside theﬁrst-level caches[10,9,11,14,8,3].The buffer is a small cache(e.g., between8–16entries)located between theﬁrst level and second level caches.The buffer may be used to hold spe-ciﬁc data(e.g.non-temporal or speculative data),or may be used for general data(e.g.“victim”data).One side-effect of the buffer is that it may prevent useful data from being displaced by less useful data,thereby reducing“cache pol-lution”effects.Figure1shows the design of the memory hierarchy when using a buffer alongside theﬁrst level data cache.Also included in theﬁgure is a representation of the ﬁve main stages found in a speculative out-of-order execu-tion processor.Note that the instruction cache access occurs in the fetch stage while the data cache access occurs in the issue stage for a load operation and the commit stage for a store operation.An instruction may spend more than one cycle in any of theseﬁve stages depending on the type of instruction,data dependencies,and cache hit outcome.UnifiedL2 cacheL1DataCachedL1 bufferdata accessinstructionaccessfromProcessorL1Inst.Cache(dL1)(iL1)Figure1.Memory hierarchy design using abuffer alongside theﬁrst-level data cache.In the case of the victim cache[10],on a main cache miss,the victim cache is accessed.If the address hits in the victim cache,then the data is returned to the CPU,and the block is simultaneously swapped with an appropriate block in the main cache.If the address also misses in the victim cache,the data is retrieved from second-level cache (L2)and placed in the mainﬁrst-level data cache.The dis-placed block in the main cache is then placed in the vic-tim cache;this victim cache block is written back to L2if dirty and then discarded.Jouppi showed that a small vic-tim buffer provided performance comparable to a2-way set-associative cache.However,his analysis did not use a full simulation model to measure impact on processor perfor-mance,nor did it account for the impact of the swap tying up the caches.Other work explores cache bypassing techniques to re-duce pollution,where the L1cache is bypassed on some load misses.In[14],Tyson proposed a method for selec-tively allocating space in theﬁrst level data cache(DL1) based on the characteristics of individual load instructions. They showed that there was a marked reduction in mem-ory bandwidth requirements due to a reduction in DL1pol-lution.A more rigorous experimental method was used in subsequent papers on bypassing for cache pollution to show overall performance improvement.Johnson[9]used a full system simulator/emulator to measure the system-level ef-fects of bypassing the DL1with the aid of a buffer.The buffer stored data which was deemed“short term”in terms of its temporal characteristics.Johnson showed up to a 10%improvement in overall system performance.Sim-ilarly,Rivers and Davidson implemented a small,fully-associative“non-temporal streaming(NTS)cache”in paral-lel with the DL1[11].This cache was used for blocks with a history of non-temporal behavior,keeping often-reused data in the regular DL1cache.The NTS cache usually pro-vided a2-3%performance improvement.John and Subra-manian[8]used a different strategy to determine locality in their annex cache,putting all newﬁlls(due to load misses) into the annex cache and promoting them to the main cache upon reads.Finally,the work of Bahar et al.proposed us-ing a buffer to hold“speculative data”that was determined to have a high probability of being from a mis-speculated path[3].The main cache was then targeted to hold only those blocks of data determined to be non-speculative,thus reducing pollution in the main cache.An alternative to using an associative cache is to use hashing functions on the index bits.This technique is used in the hash-rehash cache[1],the column-associative cache[2],and the skewed-associative cache[12].2.2.Measuring latency toleranceThe main inspiration for this project comes from the re-search of Srinivasan and Lebeck[13],who showed that a large portion of loads do not need to be serviced immedi-ately,and some may be delayed up to32cycles or more before they are needed by other instructions.They also showed that up to36%of loads miss in the L1cache,even though they have shorter latency requirements than L2ac-cess times.Furthermore,up to37%of loads are serviced in theﬁrst-level cache,although they have enough latency tolerance to be satisﬁed by lower levels of the memory hi-erarchy.There is an implication here that,if these latency-tolerant instructions were instead stored in a separate“non-critical”buffer,then more critical instructions will remain in the cache,perhaps providing a solution to each of these problems.To quantify load latency tolerance,Srinivasan and Lebeck equipped their simulated processor with“rollback”capabilities in order to arbitrarily complete loads when they were needed.Loads were allowed to remain outstand-ing until their simulator determined that a load result was needed by another instruction.At this point,the state of the processor was rolled back,the load was allowed to complete at the required time,and execution resumed.This allowed the authors to determine just how long any particular load could be allowed to remain outstanding.A large portion of their research was devoted to deter-mining when a load should be completed.For instance,they discovered that if a branch is(directly or indirectly)depen-dent on the load then this load should be completed as soon as possible.Alternatively,if overall processor performance is degrading(such as the number of functional units used, or the number of new instructions ready for issue per cycle), then there are probably several instructions dependent on an outstanding load,which needs to be completed.Obviously these rollback facilities would be impossible to implement in a real processor.Instead,our experiments attempt to use their observations in implementing a new cache conﬁguration scheme that adapts to load latency tol-erance(or alternatively,to data“criticality”).We do not attempt to determine a load's latency tolerance ahead of time.Instead we try to determine its criticality over the course of any cache miss that occurs.We have two methods of measuring and adapting to load data criticality: 1.Keeping track of the overall performance of the pro-cessor,and2.Counting the number of dependencies added to theload's dependency chain over the course of the miss.In theﬁrst method,we issue loads as usual,and if they miss in the cache,while the miss is being serviced we mea-sure performance degradation by monitoring issue rate or functional unit usage.If processor performance degrada-tion is determined,the load is marked critical and placed in the main cache when theﬁll is received from lower level memory(allowing for fast access of this data the next time it's requested).If the load is determined not to be critical,it is placed in the Non-Critical Buffer(NCB).We make the as-sumption that the next access to the data will be made with a similar processor state.If the data is still in the NCB at the next request,then the fast access may be taken advantage of. However,since the data is(theoretically)latency-tolerant,if it has been replaced in the NCB,little harm should be done.Our second strategy for measuring criticality involves following the dependency graph for each load while it is outstanding.We track the number of dependencies on the load's dependency graph over the course of its time in the Load/Store Queue(LSQ)as well as over the course of the miss.A load is considered critical if more than a given num-ber instructions are attached to the load during the time the miss is being serviced.We consider changes in the depen-dency chain only during the time of the miss since we only have control of the cacheﬁll strategy and not over the LSQ. This strategy tends to perform better than theﬁrst strategy outlined above,but would require more hardware to imple-ment.More details on the use and implementation of the Non-Critical Buffer are given Section3.3.Experimental methodologyThis section presents our experimental environment.First,the CPU simulator will be brieﬂy introduced and thenwe will describe the Non-Critical Buffer(NCB)in more de-tail,how it is accessed as part of the memory hierarchy and the various schemes we used to determine what is latencytolerant data(and therefore earmarked for the NCB).3.1.Full simulator modelWe use an extension of the S IMPLE S CALAR[5]toolsuite.S IMPLE S CALAR is an execution-driven simulatorthat uses binaries compiled to a MIPS-like target.S IM-PLE S CALAR can accurately model a high-performance, dynamically-scheduled,multi-issue processor.We use anextended version of the simulator that more accurately mod-els all the memory hierarchy,implementing non-blocking caches and more precise modeling of cacheﬁlls.The model implements out-of-order execution usinga Register Update Unit(RUU)and a Load/Store Queue(LSQ).The RUU acts as a combined instruction window, array of reservation stations,and reorder buffer.The LSQ holds all pending load and store instructions until they are ready to be sent to the data cache.Among other things,the LSQ must check for load/store dependencies and prevent loads from executing before stores in certain cases.The simulated processor featuresﬁve pipeline stages:Fetch:Fetch new instructions from the the instruction cache and prepare them for decoding.Dispatch:Decode instructions and allocate RUU and LSQ entries.Issue/Execute:Insert ready instructions into a ready queue,and issue instructions from the queue to avail-able functional units.Writeback:Forward results back to dependent in-structions in the RUU.Commit:Commit results to the registerﬁle in pro-gram order,and free RUU and LSQ entries.Table1shows the conﬁguration of the processor mod-eled.Note thatﬁrst level caches are on-chip,while theuniﬁed second level cache is off-chip(thus having a muchhigher latency).In addition we have an16-entry fully-associative buffer associated with eachﬁrst level data cache. Note that the ALU resources listed in Table1may incur different latency and occupancy values depending on the type of operation being performed by the unit.Although an8KB L1cache may seem small by today's standards for a high-performance processor,the SPEC95benchmarks we use for our experiments tend to use a small data set.There-fore,we use a smaller cache to obtain reasonable hit/miss rates.Our baseline processor featured mostly unlimited re-sources,aside from the memory subsystem,in order to try and isolate the effects of the cache on overall performance.Our simulations are executed on SPECint95and SPECfp95benchmarks[6];they were compiled using a re-targeted version of the GNU gcc compiler,with full op-timization.This compiler generates S IMPLE S CALAR ma-chine instructions.Since we are executing a full model on a a very detailed simulator,the benchmarks take several hours to complete;due to time constraints we feed the simula-tor with a small set of inputs.Integer benchmarks(except for go)are executed to completion.Benchmark go and the ﬂoating point benchmarks are simulated only for theﬁrst 600million instructions decoded.Since we want to delay the decision to writeﬁll data to the main data cache or the NCB until after the effects of the miss are known,a method of keeping track of the outstand-ing misses is needed.To provide this service each of the caches may be equipped with a set of“Miss Status Holding Registers”(MSHRs).Upon a miss,the MSHRs are updated with the address that missed.On each cycle,the MSHRs are checked to see if aﬁll has be received from lower-level memory.If so,a cache is chosen(NCB orﬁrst-level data cache)using the desiredﬁll strategy and theﬁll is placed in the appropriate block of that cache.Note that it is now pos-sible for an access to hit a block that has a miss outstanding. While these accesses have a latency greater than the cache access time,in our experiments this is counted as a hit for compatibility with the original S IMPLE S CALAR code.3.2.Non-swapping victim cacheIn order to compare the NCB performance to estab-lished caching strategies,we also implemented a variant of Jouppi's victim cache with some slight changes.First,we access both the main and victim cache in parallel rather than sequentially.Second,our victim buffer is non-swapping, meaning that hits in the victim buffer are not promoted back to the main cache.Prior work[4]has shown that a non-swapping victim buffer performs as well as or better than a swapping victim cache,since the caches are never tied up for extra cycles in order to swap the data.In[10],the data cache of a single issue processor was considered,where a memory access occurs approximately one out of every four cycles;thus the victim cache had ample time to perform swapping.A modern4-issue processor has an average of one data memory access per cycle,and tying up the caches to swap is detrimental to performance.In addition,this scheme should be simpler to implement in hardware than one that implements swapping.Table1.Machine conﬁguration and processor resources.L1Icache8KB direct;32B line;1cycleL1Dcache8KB direct;32B line;1cycleL2Cache256KB4-way;64B line;12cycleMemory128bit-wide;20cycles on hit,50cycles on page missBranch Pred.2k gshare+2k bimodal+2k metaBTB1024entry4-way set assoc.RAS32entry queueITLB32entry fully assoc.DTLB64entry fully assoc.Parameter Units3.3.Implementation of the non-critical bufferSimilar to a victim cache,our Non-Critical Buffer is a small(16-entry),fully-associative buffer that is accessed in parallel with the mainﬁrst-level data cache.Access time of the NCB is the same as the main cache(i.e.1cycle).Unlike the victim cache,the NCB uses an activeﬁll mechanism to dynamically determine whether to place newﬁlls into the main cache(“critical data”)or the NCB(“non-critical data”)during program execution.Ourﬁrst scheme for using the NCB tracks processor per-formance over the past few cycles and uses this information for determining criticality.To record the processor perfor-mance,a simple shift register is used.A particular perfor-mance metric is chosen,such as functional unit utilization, or instruction issue rate.In this context“issue rate”repre-sents the number of new instructions ready to be issued to functional units.On each cycle,if the processor is“busy”enough(for example,more than some user-deﬁned thresh-old of functional units are being used),then a value of1 is shifted into the register.Upon a cache miss,when a de-cision needs to be made about where to place theﬁll,the number of1's in the register is counted and,if it is greater than another user-deﬁned threshold,the data is placed in the NCB.In our experiments we varied the threshold for shift-ing a1into the register,as well as the history length and the number of1's required in the shift register for a load to be considered critical.Our second scheme for using the NCB tracks the number of instructions added to the instruction window that are de-pendent on a given memory operation.The number of data dependencies are referenced at two different points over the lifetime of a memory operation:immediately before a cache access and during a cache miss.These statistics can be tab-ulated separately for hits and misses.In a hardware im-plementation,this scheme might involve a counter for each LSQ entry that is incremented with each new dependency, as well as an identical counter as part of each MSHR(for counting during misses).The dependency-based NCB scheme uses the cache miss dependency information to determine criticality:If the number of dependencies added is greater than a user-deﬁned threshold,the data is deemed critical and placed in the main cache.Otherwise,the data is placed in the NCB. In addition,Srinivasan and Lebeck[13]showed that hav-ing a branch dependent on a load should mark the data as highly latency intolerant,or very critical.If this occurs,the data is marked critical regardless of the number of depen-dencies added during a cache miss(since it is possible that the branch was added before the actual cache access).4.Experimental results4.1.Base caseIn this section we will describe the base case we used for our initial experiments and the baseline results.All other experiments are compared to this one.Table2.Base case results:Results using an8KB direct-mappedﬁrst-level data cache.18.51810.8614compress 1.95327.44 4.08063.5942 3.3705ijpeg 2.5738 3.36 5.611111.71900.7225mgrid 4.6990 4.93 6.2080Table3.Traditional Techniques:Columns labeled%IPC and refer to percent change from base case.Negative percentages indicate a decrease in IPC or an increase in cache miss rate.Columns labeled,and refer to raw changes in numbers.Test%IPC%IPCapplu0.46820.4360.0130.47318.0900.085-0.666-0.537-0.623-0.434 go 2.08446.030-0.017 1.67136.784-0.090-0.256-0.703-0.305-0.650 li 1.65826.893-0.041 1.51824.8040.024-0.099-0.451-0.089-0.446(),and the average number of added during a load's time in the LSQ on a cache hit().We see from this table that the benchmarks vary widely in cache miss rate,LSQ occupancy and the average number of dependencies.Of particular note is the relatively high number of dependencies added during the time the memory access is in the LSQ(prior to actual cache access),, compared to the average number added during a cache miss, .Note that only represents dependencies added during the time in the LSQ,prior to a cache hit.These val-ues give an indication of the overhead of making a mem-ory access,and indicate that,in general,many dependen-cies are added to a load's dependency chain before the ac-tual cache access is even attempted.Our caching technique has no control over this LSQ behavior,but it still shows an improvement in performance.Further enhancement may better address this LSQ problem for an even greater perfor-mance improvement.4.2.Traditional techniquesTraditional approaches for improving processor perfor-mance(particularly IPC)aim to reduce the number of cache misses.Aside from increasing the size of the cache(which is not examined here),associativity may be added to the cache or a traditional victim buffer may be used.These techniques often offer good performance improvements by reducingﬁrst-level cache miss rates.As these methods have been actually implemented in industry designs for some time now,any new caching strategy should at least be com-parable with these tried-and-true methods.Table3presents results obtained by either changing the cache associativity to a2-way associative cache,or by adding a16-entry,fully-associative,non-swapping victim buffer to the base case conﬁguration.Each cache retains its one cycle access latency.As expected,IPC improves across all benchmarks,varying from a0.5%improvement for ap-plu to2.5%for compress.Overall DL1miss rates also improve,by up to46%for go using a2-way associative cache and up to37%using a victim cache(note that,for the purpose of computing DL1 miss rate with an extra associative buffer,a“miss”occurs when data misses in both the main cache as well as the as-sociative buffer).Since a load or store remains in the LSQ until the memory access is completely serviced,it is not surprising that the average LSQ occupancy decreases with improved cache hit rate.Changes in dependencies are rel-atively small because these strategies do not attempt to di-rectly address any dependency issues.4.3.Non-critical bufferThe Non-Critical Buffer is used as part of an active cache management strategy that does not necessarily attempt to reduce cache miss rate,but instead makes sure that the most critical data is retained in the cache.Less critical data may miss more often,resulting in higher overall miss rates,but the performance gains from keeping the critical data in the cache outweigh these penalties.Table4shows results when using an NCB with pro-cessor performance history as the criticality metric.We used several different conﬁgurations in our experiments;the best-performing conﬁguration is shown in the table.Ta-ble5presents results obtained when using the dependency-counting heuristic for measuring criticality.Several conclusions can be immediately drawn from these results.First,both NCB schemes are improving IPC values over the base case.IPC improvements are better than or comparable to improvements seen when using a2-way associative or victim cache,and substantially better in the case of compress.In general,using the dependency count-ing scheme provides better performance improvements than the history scheme.The best results(shown in Table5)were obtained for a criticality threshold of1or more dependen-cies added during a miss.That is,if any instructions are dependent on the load,it should be considered critical.WeTable4.NCB with History Measurement:Percent improvement compared to the base case when using a Non-Critical Buffer and using recent performance history as a criticality metric.History length=6cycles,Busy Threshold=6,Critical threshold=3or fewer.Test%IPC%IPCapplu0.535-11.558-0.0020.556-5.8630.116-0.605-0.617-0.604-0.599 go 1.42028.744-0.009 1.42028.945-0.016-0.141-0.493-0.132-0.462 li 1.00718.2770.068 1.06618.5380.073-0.0790.001-0.089-0.021Table 5.NCB with Dependency Measure-ment:Percent improvement compared to thebase case when using a Non-Critical Bufferand using number of dependencies addedduring a cache miss as a criticality metric.Threshold for criticality is one or more de-pendencies added during a miss.0.347-0.385compress 4.413 2.554 2.186-0.440-0.177ijpeg 1.671-20.238-0.164-0.341-0.186mgrid 1.594-16.633-0.065stantially reduced in most cases.Coupled with signiﬁcant cache miss rate improvements,this may suggest that the his-tory algorithm is too random,and the NCB is effectively only adding associativity to the cache.The algorithm can only count1's in the shift register,and cannot distinguish between different patterns of1's and0's.This may be an issue because“111000”has different implications for per-formance trends than“101010”.However,the performance history strategy is almost certainly easier to implement in hardware than tracking dependencies,as it simply consists of a shift register and logic to count the number of high values in the register.The dependency method would re-quire more complex hardware to count new dependencies added during a cache miss.However,the clearer perfor-mance gains using this strategy may make it worthwhile.5.ConclusionThis paper explores several ways to exploit load latency tolerance in a dynamically scheduled processor.We show that dependency information during a cache miss can be used to determine a load's criticality.A small,associative, Non-Critical Buffer in parallel with the main data cache may be used as an insurance policy against a future cache miss if the data is deemed non-critical,rather than bypass-ing the data cache ing the NCB results in a per-formance improvement comparable to or better than tradi-tional caching techniques.A simple shift register containing a local history of pro-cessor performance does not appear to be the best way to determine criticality.Counting the number of instructions dependent on a load is a better metric,and using this infor-mation as a criticality measurement with the NCB scheme may lead to more(but less critical)cache misses.The shift register history method may not perform as well,but is at-tractive due to its comparatively low hardware implementa-tion cost.A more intelligent method for analyzing data in the history register may lead to improved results.There are several areas open to future work.First,more research needs to be done to investigate the growth of the dependency chain during an instruction's lifetime in the LSQ,prior to cache access,and how it may be affecting this NCB technique.Second,there may be further improve-ments to the dependency-based NCB scheme to increase performance.Finally,while the S IMPLE S CALAR simula-tor features a very reasonable approximation of a memory subsystem,implementing the NCB scheme in a simulator with a more robust memory model could yield substantially different results.Acknowledgements:The authors would like to thank Dirk Grunwald for his generosity in donating spare CPU cycles for many of our simulations.We would also like to thank the anonymous reviewers for their valuable com-ments.References[1] A.Agarwal,J.Hennessy,and M.Horowitz.Cache perfor-mance of operating systems and multiprogramming work-loads.ACM Transactions on Computer Systems,6:393–431,November1988.[2] A.Agarwal and S.D.Pudar.Column-associative caches:A technique for reducing the miss rate of direct-mappedcaches.In26th Annual International Symposium on Mi-croarchitecture,pages179–190.IEEE/ACM,December1993.[3]R.I.Bahar,G.Albera,and ing conﬁdence toreduce energy consumption in high-performance micropro-cessors.In Int.Symposium on Low Power Electronics andDesign,pages64–69.IEEE/ACM,August1998.[4]R.I.Bahar,D.Grunwald,and B.Calder.A comparison ofsoftware code reordering and victim buffers.In ComputerArchitecture News.ACM SIGARCH,March1999.[5] D.C.Burger,T.M.Austin,and S.Bennett.Evaluating fu-ture microprocessors–the SIMPLESCALAR toolset.Techni-cal Report1308,University of Wisconsin–Madison,Com-puter Sciences Department,July1996.[6]J.Gee,M.Hill,D.Pnevmatikatos,and A.J.Smith.Cacheperformance of the SPEC benchmark suite.IEEE Micro,13(4):17–27,August1993.[7]L.Gwennap.Digital21264sets new standard.In Micro-processor Report.MicroDesign Resources,October1996./semiconductor/microrep/digital2.htm.[8]L.K.John and S.A.Design and performance evaluationof a cache assist to implement selective caching.In Inter-national Conference on Computer Design,pages510–518.IEEE,October1997.[9]T.L.Johnson and W.-m.W.Hwu.Run-time adaptive cachehierarchy management via reference analysis.In24th An-nual International Symposium on Computer Architecture,pages315–326.IEEE/ACM,June1997.[10]N.P.Jouppi.Improving direct-mapped cache perfor-mance by the addition of a small fully-associative cache andprefetch buffers.In17th Annual International Symposiumon Computer Architecture.IEEE/ACM,June1990.[11]J.A.Rivers and E.S.Davidson.Reducing conﬂicts in direct-mapped caches with a temporality-based design.In Interna-tional Conference on Parallel Processing,pages154–163,August1996.[12] A.Seznec.A case for two-way skewed-associative caches.In20th Annual International Symposium on Computer Ar-chitecture,pages169–178.IEEE/ACM,May1993.[13]S.T.Srinivasan and A.R.Lebeck.Load latency tolerancein dynamically scheduled processors.In31st Annual Inter-national Symposium on Microarchitecture.IEEE/ACM,De-cember1998.[14]G.Tyson,M.Farrens,J.Matthews,and A.R.Pleszkun.Managing data caches using selective cache line replace-ment.Journal of Parallel Programming,25(3):213–242,June1997.。