Error Recovery in Critical Infrastructure Systems

合集下载

RHEL6 系统启动故障排错

RHEL6 系统启动故障排错2013-05-11 17:08:39| 分类：linux恢复| 标签：|举报|字号大中小订阅RHEL6 系统排错系统故障了，机器无法正常工作，更别提正常提供服务，这个时候如果我们掌握了系统排错，那就大有用武之地了。

我们在学习和实验阶段，可以通过系统排错帮助大家更深入了解系统启动过程。

对系统进行如下破坏：1.破坏grub引导dd if=/dev/zero of=/dev/sda bs=1 count=4462.破坏/boot启动目录rm -rf /boot/*3.破坏/etc/fstab文件rm -rf /etc/fstab4.破坏/etc/inittab、/etc/rc.d/rc.sysinit、/bin/mount文件(任何文件丢失，及文件版本或内容不匹配，均属此列)rm -rf /etc/inittab /etc/rc.d/rc.sysinitcp /bin/ping /bin/mount排错修复：用光盘或者网络引导进入rescue模式，过程中因为fstab表被删除，会提示找不到linux分区，无法自动识别linux分区，确定后进shell使用fdisk -l查看分区，应该可以识别出boot分区和交换分区，也可以看到是否使用到逻辑卷。

如果没有使用逻辑卷，则原根分区是分区形式，使用blkid或e2label查看标签来推断根分区，如果没有标签或无法从标签判断，则需要将分区逐一挂载(挂载点自建)，查看分区内容可判断到根分区。

如果有使用逻辑卷，对分区使用上述方法查看仍没找着根分区，则估计是使用逻辑卷作为根分区，此时，需要激活逻辑卷卷组(lvm vgchange -ay)，再将逻辑卷逐一挂载，查看其内容，可判断到根分区。

找到根分区后，需要处理/etc/fstab，自行编写fstab，注意写正确里面的条目，fstab表里的分区不是采用分区名挂载了，改成使用uuid。

编辑完成fstab之后，敲入exit，选择reboot重启机器。

使用Oracle BMR(块介质恢复)功能快速恢复数据

使用BMR功能快速修复数据坏块Oracle提供了许多方法检测和修补数据库中的数据坏块，而BMR就是其中之一，其它方法还包括Analyze语句、dbv命令以及DBMS_REPAIR等。

DBMS_REPAIR包仅仅对transaction层和data层的坏块（即逻辑损坏的块）起作用，对物理上损坏的块，在它被读到缓冲区中时就已被标识出来了，而DBMS_REPAIR会忽略所有被标识为坏了的块。

要快速修复物理损坏的数据块，可以通过BMR功能来完成。

块介质恢复的最大好处在于可以降低平均恢复时间(MTTR) ，因为介质恢复的最小可恢复单位从数据文件缩小到块。

如果已知数据库中只有少量的块需要介质恢复，则最有效的方式是有选择地进行还原，只恢复需要恢复的块。

而且该技术提高了介质恢复期间的数据可用性，因为在数据恢复期间，数据文件可以保持联机状态，只有正在恢复的数据块是不可访问的。

使用BMR之前，需要对保护的数据进行备份。

在执行BMR时，只需要简单地执行blockrecover 命令就可以了，例如编号为4的数据文件的数据块385发生了损坏，修复时可以如下：RMAN> BLOCKRECOVER DATAFILE 4 BLOCK 385;下面通过一个实例来讨论BMR的使用。

一、建立测试环境首先建立一个测试用的表T1，T1的结构如下：SQL> desc t1Name Null? Type--------------------------------------------------- -------- ---------------------COL1 NUMBERCOL2 CHAR(1000)将COL2的数据类型设置为CHAR(1000)只是为了让每个行记录占用的空间更多，这样我们可以使用较少的记录填充多个数据块。

为该测试表填入一部分测试数据：SQL>insert into t1 select rownum,rownum from dual connect by rownum<=20;上述语句使用了层次查询语句。

grldr is missing最简单的处理方法 -回复

grldr is missing最简单的处理方法-回复以下是一篇关于解决"grldr is missing"错误的1500-2000字的文章。

标题：解决"grldr is missing"的最简单方法引言：计算机是我们生活中不可或缺的一部分，然而，有时我们会遇到一些令人沮丧的错误。

其中一个常见的错误是"grldr is missing"。

幸运的是，解决这个问题并不是太困难。

本文将一步一步地解释最简单的处理方法，帮助您修复"grldr is missing"错误。

第一步：确定问题在开始修复之前，我们首先要确保"grldr is missing"确实是引起您的计算机问题的根源。

这个错误通常在启动计算机时出现，并伴随着一条错误信息，例如"Error: grldr is missing"或"grldr not found"等。

如果您看到了类似的信息，那么"grldr"文件确实丢失或损坏，我们可以继续下一步。

第二步：备份重要数据在修复"grldr is missing"错误之前，我们强烈建议您备份您计算机上的重要数据。

有时修复错误可能会导致数据丢失，所以确保您的文件和文档都有备份是很重要的。

第三步：使用Windows安装光盘最简单的处理方法之一是使用Windows安装光盘来修复"grldr is missing"错误。

首先，插入Windows安装光盘并重启计算机。

在计算机开机时，按下适当的按键，进入计算机的BIOS设置，并确保设置启动顺序为从CD/DVD驱动器启动。

一旦您的计算机启动并加载了Windows安装光盘上的文件，您将看到一个欢迎屏幕。

在这个屏幕上，选择"修复您的计算机"选项。

在下一个屏幕上，您将看到几个可用的修复选项。

电信5G协优考试题库(含答案)

电信5G协优考试题库（含答案）单选题1.关于BWP的应用场景，说法正确的是A、选项全正确B、UE在大小BWP间进行切换，达到省电的效果C、应用于小带宽能力UE接入大带宽网络D、不同的BWP，配置不同的Numerology，承载不同的业务答案：A2.协议中5GNR毫米波单载波支持最大的频域带宽A、200MB、400MC、800MD、1000M答案：B3.5G系统中以()为最小粒度进行QoS管理。

A、E-RABbearerB、PDUSessionC、QoSflowD、以上都不是答案：C4.5G用于下行数据辅助解调的信号是哪项A、DMRSB、PT-RSC、ssD、CSI-RS答案：A5.56单站验证时，传输带宽的要求是？A、500MB、800MC、900MD、2G答案：B6.以下5GNRslotformat的说法对的有A、SCS=60KHz时，支持配置Periodic=0.625msB、Cell-specific的单周期配置中，单个配置周期内只支持一个转换点C、对DL/UL分配的修改以slot为单位答案：B7.在5G中，PUSCH支持的波形有A、DFT-S-OFDMB、DFT-a-OFDMC、DFT-OFDMD、S-OFDM答案：A8.电信选择的帧结构为()A、2ms单周期B、2.5ms单周期C、2.5ms双周期D、5ms单周期答案：C9.以下哪个参数用于指示对于SpCell,是否上报PHRtype2A、phr-Type2SpCellB、phr-Type2OtherCellC、phr-ModeOtherCGD、dualConnectivityPHR答案：A10.5G支持的新业务类型不包括A、eMBBB、URLLCC、eMTCD、mMTC答案：C11.你预计中国的5G将会在什么时候规模商用A、2018到2019.B、2020到2022C、2023到2025D、2025到2030答案：B12.一般情况下，NR基站的RSRP信号低于多少时，用户观看1080P视频开始出现缓冲和卡顿？A、-112dBmB、-107dBmC、-102dBmD、-117dBm答案：D13.IT服务台是一种：A、流程B、设备C、职能D、职称答案：C14.SN添加的事件为A、A2B、A3C、B1D、B2答案：C15.以下SSB的测量中，那些测量标识中只可以在连接态得到：A、SS-RSRPB、SS-RSRQC、SS-SINRD、SINR答案：C16.哪个docker镜像用于配置数据生效及查询？A、oambsB、nfoamC、brsD、ccm答案：B17.5G中，sub-6GHz频段能支持的最大带宽是A、60MHzB、80MHzC、100MHzD、200MHz答案：C18.eLTEeNB和gNB之间的接口称为()接口A、X1B、X2C、XnD、Xx答案：C19.ShortTTI子载波间隔为A、110KHzB、120KHzC、130KHzD、140KHz答案：B20.NR核心网中用于会话管理的模块是A、AMFB、SMFC、UDMD、PCF答案：B21.中移选择的帧结构为()A、2ms单周期B、2.5ms单周期C、2.5ms双周期D、5ms单周期答案：D22.ZXRAN室外宏站楼顶安装天线抱杆直径要求需满足？A、60mm~120mmB、40mm~60mmC、20mm~40mmD、10mm~20mm答案：A23.关于自包含帧说法错误的:A、同一子帧内包含DL、UL和GPB、同一子帧内包含对DL数据和相应的HARQ反馈C、采用自包含帧可以降低对发射机和接收机的硬件要求D、同一子帧内传输UL的调度信息和对应的数据信息答案：C24.属于LPWAN技术的是A、LTEB、EVDOC、CDMAD、NB-IOT答案：D25.5G天线下倾角调整的优先级是以下哪项？A、调整机械下倾＞调整可调电下倾一＞预置电下傾B、预置电下倾つ调整机械下傾－＞调整可调电下倾C、调整可调电下倾一＞预置电下倾→调整机械下傾D、预置电下倾一＞调整可调电下倾调整机械下倾答案：D26.5G的无线接入技术特性将（5GRATfeatures）会分阶段进行，即phase1和p hase2.请问5G的phase2是哪个版本？A、R13B、R14C、R15D、R16答案：D27.5GNR网管服务器时钟同步失败的可能原因有？A、网络连接不正常B、主备板数据库不一致C、EMS小区数目超限D、SBCX备板不在位答案：A28.以下对5GNR切换优化问题分析不正确的是？A、是否漏配邻区B、测试点覆盖是否合理C、小区上行是否存在干扰D、后台查询是否有用户答案：D29.5G系统中，1个CCE包含了多少个REG？A、2B、6C、4D、8答案：B30.NR网络中，PRACH信道不同的序列格式对应不同的小区半径，小区半径最大支持多少KMA、110B、89C、78D、100答案：D31.协议已经定义5G基站可支持CU和DU分离部署架构，在()之间分离A、RRC和PDCPB、PDCP和RLCC、RLC和MACD、MAC和PHY答案：B32.AAU倾角调整优先级正确描述为以下哪项？A、调整可调电下倾->调整机械下倾->数字下倾->设计合理的预置电下倾B、设计合理的预置电下倾->调整可调电下倾->调整机械下倾->数字下倾C、调整可调电下倾->调整机械下倾->设计合理的预置电下倾->数字下倾D、调整可调电下倾->设计合理的预置电下倾->调整机械下倾->数字下倾答案：B33.5G的基站和4G的基站的主要差异在A、RRUB、BBUC、CPRID、接口答案：B34.UE最多监听多少个不同的DCIFormatSizePerSlotA、2B、3C、4D、5答案：C35.仅支持FR1的UE在连接态下完成配置了TRS，TRS与SSB可能存在下面哪个QCL关系A、QCL-TypeAB、QCL-TypeBC、QCL-TypeCD、QCL-TypeD答案：C36.5G网络中，回传承载的是和之间的流量A、DU、CUB、AAU、DUC、CU、核心网D、AAU、CU答案：C37.不属于5G网络的信道或信号是()A、PDSCHB、PUSCHC、PDCCHD、PCFICH答案：D38.5G的愿景是A、一切皆有可能B、高速率，高可靠C、万物互联D、信息随心至，万物触手及答案：D39.关于MeasurementGap描述错误的是A、EN-DC下，网络可以配置Per-UEmeasurementgap，也可以配置Per-FRmeasur ementgap；B、EN-DC下，LTE服务小区和NR服务小区（FR1)的同属于perFR1measurementg ap;C、EN-DC下，gap4～gap11可以用于支持Per-FR1measurementgap的UE；D、EN-DC下，支持per-UEmeasurementgap的UE，若同时用于NR和非NR邻区测量，可以用gap0～11。

initialirecriticalsectionex -回复

initialirecriticalsectionex -回复【initialirecriticalsectionex】是一个API函数，常用于多线程编程中的临界区保护。

临界区是指在多线程环境下访问共享资源的代码段，而保护临界区则是为了避免多个线程同时访问导致的数据竞争和不一致性的问题。

在多线程编程中，多个线程同时访问共享资源可能会导致以下问题：竞争条件、死锁和饥饿等。

为了解决这些问题，需要使用临界区保护机制。

临界区保护机制通过对临界区上锁来确保同一时间只有一个线程可以访问。

initialirecriticalsectionex函数就是用来创建和初始化一个临界区对象。

下面我们来一步一步介绍如何使用initialirecriticalsectionex函数。

第一步是包含头文件。

在使用initialirecriticalsectionex函数之前，我们需要包含Windows.h头文件，该头文件包含了定义临界区的相关结构体和函数的声明。

#include <Windows.h>第二步是定义一个临界区对象。

在使用initialirecriticalsectionex函数之前，我们需要先定义一个临界区对象。

临界区对象可以是全局变量，也可以是局部变量，根据具体情况进行选择。

CRITICAL_SECTION g_criticalSection;第三步是使用initialirecriticalsectionex函数初始化临界区对象。

initialirecriticalsectionex函数的原型如下：BOOL WINAPI InitializeCriticalSectionEx(_Out_ LPCRITICAL_SECTION lpCriticalSection,_In_ DWORD dwSpinCount,_In_ DWORD Flags);lpCriticalSection参数是一个指向临界区对象的指针，dwSpinCount参数表示在没有其他线程等待临界区时，尝试获取临界区的次数（自旋次数），Flags参数用于指定初始化的标志。

critical memory error

Critical Memory ErrorIntroductionA critical memory error is a serious issue that can occur in computer systems when there is a failure in the memory subsystem. This error can have severe consequences, leading to system crashes, data loss, or even compromising system security. In this article, we will explore this issue in detail, understanding its causes, impacts, and potential solutions.Causes of Critical Memory Errors1. Hardware Issues•Faulty RAM: Physical defects in the memory module can lead to critical memory errors.•Overheating: Excessive heat can affect memory chips, causing errors.•Power Supply Problems: Insufficient power supply or voltage fluctuations can impact memory stability.2. Software Bugs•Memory Leaks: Poorly written programs may not release memory properly, leading to memory leaks.•Buffer Overflows: When a program writes more data to a memory buffer than it can hold, it can cause critical memory errors.•System Incompatibility: If a program is not compatible with the underlying operating system or hardware, it may trigger memoryerrors.3. Malware and Security Attacks•Viruses and malware can corrupt system memory or exploit vulnerabilities, causing critical errors.•Denial-of-Service (DoS) Attacks: Attacker floods the system with requests, overwhelming memory and causing errors.•Injection Attacks: Unauthorized code injection can manipulate memory and lead to critical errors.Impacts of Critical Memory Errors1. System Instability•Crashes and Freezes: Memory errors can cause the system to crash or freeze, resulting in data loss and productivity disruption.•Blue Screen of Death (BSOD): Critical errors can trigger the infamous BSOD, making the system unusable.2. Data Corruption and Loss•Files and Documents: Critical memory errors can corrupt or erase important files, leading to data loss.•Applications: Errors during program execution can cause data corruption within the application itself.3. Security Risks•Malicious Access: Memory errors can be exploited by attackers to gain unauthorized access to sensitive information or control over the system.•System Compromise: In severe cases, memory errors can compromise the entire system’s security, allowing attackers to installbackdoors or obtain full control.Preventing and Resolving Critical Memory Errors1. Regular System Maintenance•Update Software and Drivers: Keeping the operating system, applications, and drivers up to date can help mitigate memoryerrors caused by software bugs.•Check Hardware Health: Regularly monitor the hardware components, including RAM, for any signs of failure or overheating.•Use Stable Power Supply: Ensure a stable power supply to avoid voltage fluctuations that can lead to memory errors.2. Implement Security Measures•Use Reliable Security Software: Install and regularly update antivirus and antimalware software to detect and prevent malicious attacks that can compromise memory.•Firewall Protection: Employ a firewall to block unauthorized access attempts and protect against DoS attacks.•Secure Coding Practices: Develop and deploy software using secure coding practices to minimize the risk of memory-relatedvulnerabilities.3. Perform Memory Tests•Memory Diagnostic Tools: Utilize built-in memory diagnostic tools, such as Windows Memory Diagnostic, to detect and fix memory-related issues.•Third-Party Memory Testing Software: Use specialized memory testing software for more thorough analysis and problem detection.4. Debugging Techniques•Debugging Tools: Employ debugging tools to analyze memory-related errors during the development phase and ensure early detection and resolution.•Log Analysis: Review system logs to identify patterns or warnings related to memory errors, facilitating timely troubleshooting. ConclusionA critical memory error is a significant issue that can have severe consequences for computer systems. By understanding the causes and impacts of these errors, preventive measures can be taken to minimize the risk. Regular system maintenance, implementing security measures, and performing memory tests are crucial steps toward preventing and resolving critical memory errors. By following these best practices,individuals and organizations can maintain system stability, protect data integrity, and ensure the security of their computer systems.。

sae_j2534-1_2004

SURFACEVEHICLERECOMMENDED PRACTICESAE Technical Standards Board Rules provide that: “This report is published by SAE to advance the state of technical and engineering sciences. The use of this report is entirely voluntary, and its applicability and suitability for any particular use, including any patent infringement arising therefrom, is the sole responsibility of the user.”SAE reviews each technical report at least every five years at which time it may be reaffirmed, revised, or cancelled. SAE invites your written comments and suggestions.Copyright © 2004 SAE InternationalAll rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of SAE.TO PLACE A DOCUMENT ORDER: Tel: 877-606-7323 (inside USA and Canada)Tel: 724-776-4970 (outside USA)SAE J2534-1 Revised DEC2004TABLE OF CONTENTS 1. Scope (5)2. References (5)2.1 Applicable Documents (5)2.1.1 SAE Publications (5)2.1.2 ISO Documents (6)3. Definitions (6)4. Acronyms (6)5. Pass-Thru C oncept (7)6. Pass-Thru System Requirements (8)6.1 P C Requirements (8)6.2 Software Requirements and Assumptions (8)6.3 Connection to PC (9)6.4 Connection to Vehicle............................................................................................................9 6.5 C ommunication Protocols (9)6.5.1 ISO 9141................................................................................................................................9 6.5.2 ISO 14230-4 (KWP2000).. (10)6.5.3 SAE J1850 41.6 kbps PWM (Pulse Width Modulation) (10)6.5.4 SAE J1850 10.4 kbps VPW (Variable Pulse Width) (10)6.5.5 C AN (11)6.5.6 ISO 15765-4 (CAN) (11)6.5.7 SAE J2610 DaimlerChrysler SCI (11)6.6 Simultaneous Communication on Multiple Protocols (11)6.7 Programmable Power Supply (12)6.8 Pin Usage (13)6.9 Data Buffering (14)6.10 Error Recovery (14)6.10.1 Device Not Connected (14)6.10.2 Bus Errors (14)7. Win32 Application Programming Interface (15)7.1 API Functions – Overview (15)7.2 API Functions - Detailed Information (15)7.2.1 PassThruOpen (15)7.2.1.1 C /C ++ Prototype (15)7.2.1.2 Parameters (16)7.2.1.3 Return Values (16)7.2.2 PassThru C lose (16)7.2.2.1 C /C ++ Prototype (16)7.2.2.2 Parameters (16)7.2.2.3 Return Values (17)7.2.3 PassThru C onnect (17)7.2.3.1 C /C ++ Prototype (17)7.2.3.2 Parameters (17)7.2.3.3 Flag Values (18)7.2.3.4 Protocal ID Values (19)SAE J2534-1 Revised DEC20047.2.3.5 Return Values (20)7.2.4 PassThruDisconnect............................................................................................................20 7.2.4.1 C /C ++ Prototype (20)7.2.4.2 Parameters (21)7.2.4.3 Return Values ......................................................................................................................21 7.2.5 PassThruReadMsgs. (21)7.2.5.1 C /C ++ Prototype (22)7.2.5.2 Parameters...........................................................................................................................22 7.2.5.3 Return Values . (23)7.2.6 PassThruWriteMsgs (23)7.2.6.1 C /C ++ Prototype ..................................................................................................................24 7.2.6.2 Parameters (24)7.2.6.3 Return Values (25)7.2.7 PassThruStartPeriodicMsg..................................................................................................26 7.2.7.1 C /C ++ Prototype (26)7.2.7.2 Parameters (26)7.2.7.3 Return Values ......................................................................................................................27 7.2.8 PassThruStopPeriodicMsg .. (27)7.2.8.1 C /C ++ Prototype (28)7.2.8.2 Parameters...........................................................................................................................28 7.2.8.3 Return Values . (28)7.2.9 PassThruStartMsgFilter.......................................................................................................28 7.2.9.1 C /C ++ Prototype (31)7.2.9.2 Parameters (31)7.2.9.3 Filter Types ..........................................................................................................................32 7.2.9.4 Return Values . (33)7.2.10 PassThruStopMsgFIlter (33)7.2.10.1 C /C ++ Prototype ..................................................................................................................33 7.2.10.2 Parameters (34)7.2.10.3 Return Values (34)7.2.11 PassThruSetProgrammingVoltage (34)7.2.11.1 C /C ++ Prototype (34)7.2.11.2 Parameters (35)7.2.11.3 Voltage Values (35)7.2.11.4 Return Values (35)7.2.12 PassThruReadVersion (36)7.2.12.1 C /C ++ Prototype (36)7.2.12.2 Parameters (36)7.2.12.3 Return Values (37)7.2.13 PassThruGetLastError (37)7.2.13.1 C /C ++ Prototype (37)7.2.13.2 Parameters (37)7.2.13.3 Return Values (37)7.2.14 PassThruIoctl (38)7.2.14.1 C /C ++ Prototype (38)7.2.14.2 Parameters (38)7.2.14.3 Ioctl ID Values (39)7.2.14.4 Return Values (39)7.3 IO C TL Section (40)7.3.1 GET_C ONFIG (41)7.3.2 SET_C ONFIG (42)SAE J2534-1 Revised DEC20047.3.3 READ_VBATT (46)7.3.4 READ_PROG_VOLTAGE....................................................................................................46 7.3.5 FIVE_BAUD_INIT . (47)7.3.6 FAST_INIT (47)7.3.7 C LEAR_TX_BUFFER (48)7.3.8 C LEAR_RX_BUFFER (48)7.3.9 C LEAR_PERIODI C _MSGS (49)7.3.10 C LEAR_MSG_FILTERS (49)7.3.11 C LEAR_FUN C T_MSG_LOOKUP_TABLE (49)7.3.12 ADD_TO_FUN C T_MSG_LOOKUP_TABLE (50)7.3.13 DELETE_FROM_FUN C T_MSG_LOOKUP_TABLE (50)8. Message Structure (51)8.1 C /C ++ Definition (51)8.2 Elements (51)8.3 Message Data Formats (52)8.4 Format Checks for Messages Passed to the API (53)8.5 Conventions for Returning Messages from the API (53)8.6 Conventions for Returning Indications from the API (53)8.7 Message Flag and Status Definitions..................................................................................54 8.7.1 RxStatus. (54)8.7.2 RxStatus Bits for Messaging Status and Error Indication....................................................55 8.7.3 TxFlags.................................................................................................................................56 9. DLL Installation and Registry...............................................................................................57 9.1 Naming of Files....................................................................................................................57 9.2 Win32 Registy. (57)9.2.1 User Application Interaction with the Registry (59)9.2.2 Attaching to the DLL from an application (60)9.2.2.1 Export Library Definition File (61)10. Return Value Error Codes (61)11. Notes (63)11.1 Marginal Indicia (63)Appendix A General ISO 15765-2 Flow Control Example (64)A.1 Flow Control Overview (64)A.1.1 Examples Overview (65)A.2 Transmitting a Segmented Message (66)A.2.1 C onversation Setup (66)A.2.2 Data Transmission (67)A.2.3 Verification (68)A.3 Transmitting an Unsegmented Message (69)A.3.1 Data Transmission (70)A.3.2 Verification (70)A.4 Receiving a Segmented Message (70)A.4.1 C onversation Setup (70)A.4.2 Reception Notification (70)A.4.3 Data Reception (71)A.5 Receiving and Unsegmented Messages (72)1.ScopeThis SAE Recommended Practice provides the framework to allow reprogramming software applications from all vehicle manufacturers the flexibility to work with multiple vehicle data link interface tools from multiple tool suppliers. This system enables each vehicle manufacturer to control the programming sequence for electronic control units (EC Us) in their vehicles, but allows a single set of programming hardware and vehicle interface to be used to program modules for all vehicle manufacturers.This document does not limit the hardware possibilities for the connection between the PC used for the software application and the tool (e.g., RS-232, RS-485, USB, Ethernet…). Tool suppliers are free to choose the hardware interface appropriate for their tool. The goal of this document is to ensure that reprogramming software from any vehicle manufacturer is compatible with hardware supplied by any tool manufacturer.U.S. Environmental Protection Agency (EPA) and the C alifornia Air Resources Board (ARB) "OBD service information" regulations include requirements for reprogramming emission-related control modules in vehicles for all manufacturers by the aftermarket repair industry. This document is intended to conform to those regulations for 2004 and later model year vehicles. For some vehicles, this interface can also be used to reprogram emission-related control modules in vehicles prior to the 2004 model year, and for non-emission related control modules. For other vehicles, this usage may require additional manufacturer specific capabilities to be added to a fully compliant interface. A second part to this document, SAE J2534-2, is planned to include expanded capabilities that tool suppliers can optionally include in an interface to allow programming of these additional non-mandated vehicle applications. In addition to reprogramming capability, this interface is planned for use in OBD compliance testing as defined in SAE J1699-3. SAE J2534-1 includes some capabilities that are not required for Pass-Thru Programming, but which enable use of this interface for those other purposes without placing a significant burden on the interface manufacturers.Additional requirements for future model years may require revision of this document, most notably the inclusion of SAE J1939 for some heavy-duty vehicles. This document will be reviewed for possible revision after those regulations are finalized and requirements are better understood. Possible revisions include SAE J1939 specific software and an alternate vehicle connector, but the basic hardware of an SAE J2534 interface device is expected to remain unchanged.2.References2.1Applicable PublicationsThe following publications form a part of this specification to the extent specified herein. Unless otherwise indicated, the latest version of SAE publications shall apply.2.1.1SAE P UBLICATIONSAvailable from SAE, 400 Commonwealth Drive, Warrendale, PA 15096-0001.SAE J1850—Class B Data Communications Network InterfaceSAE J1939—Truck and Bus Control and Communications Network (Multiple Parts Apply)SAE J1962—Diagnostic ConnectorSAE J2610—DaimlerChrysler Information Report for Serial Data Communication Interface (SCI)2.1.2 ISO D OCUMENTSAvailable from ANSI, 25 west 43rd Street, New York, NY 10036-8002.ISO 7637-1:1990—Road vehicles—Electrical disturbance by conduction and coupling—Part 1:Passenger cars and light commercial vehicles with nominal 12 V supply voltageISO 9141:1989—Road vehicles—Diagnostic systems—Requirements for interchange of digital informationISO 9141-2:1994—Road vehicles—Diagnostic systems—C ARB requirements for interchange of digitalinformationISO 11898:1993—Road vehicles—Interchange of digital information—Controller area network (CAN) forhigh speed communicationISO 14230-4:2000—Road vehicles—Diagnostic systems—Keyword protocol 2000—Part 4:Requirements for emission-related systemsISO/FDIS 15765-2—Road vehicles—Diagnostics on controller area networks (C AN)—Network layerservicesISO/FDIS 15765-4—Road vehicles—Diagnostics on controller area networks (C AN)—Requirements foremission-related systems3.Definitions 3.1 RegistryA mechanism within Win32 operating systems to handle hardware and software configuration information.4. AcronymsAPI Application Programming InterfaceASCII American Standard Code for Information InterchangeCAN Controller Area NetworkC R C C yclic Redundancy C heckDLL Dynamic Link LibraryECU Electronic Control UnitIFR In-Frame ResponseIOCTL Input / Output ControlKWP Keyword ProtocolOEM Original Equipment ManufacturerP C Personal C omputerPWM Pulse Width ModulationSCI Serial Communications InterfaceSCP Standard Corporate ProtocolUSB Universal Serial BusVPW Variable Pulse Width5.Pass-Thru ConceptProgramming application software supplied by the vehicle manufacturer will run on a commonly available generic PC. This application must have complete knowledge of the programming requirements for the control module to be programmed and will control the programming event. This includes the user interface, selection criteria for downloadable software and calibration files, the actual software and calibration data to be downloaded, the security mechanism to control access to the programming capability, and the actual programming steps and sequence required to program each individual control module in the vehicle. If additional procedures must be followed after the reprogramming event, such as clearing Diagnostic Trouble C odes (DTC), writing part numbers or variant coding information to the control module, or running additional setup procedures, the vehicle manufacturer must either include this in the PC application or include the necessary steps in the service information that references reprogramming.This document defines the following two interfaces for the SAE J2534 pass-thru device:a. Application program interface (API) between the programming application running on a PC and asoftware device driver for the pass-thru deviceb. Hardware interface between the pass-thru device and the vehicleThe manufacturer of an SAE J2534 pass-thru device shall supply connections to both the PC and the vehicle. In addition to the hardware, the interface manufacturer shall supply device driver software, and a Windows installation and setup application that will install the manufacturer's SAE J2534 DLL and other required files, and also update the Windows Registry. The interface between the PC and the pass-thru device can be any technology chosen by the tool manufacturer, including RS-232, RS-485, USB, Ethernet, or any other current or future technology, including wireless technologies.All programming applications shall utilize the common SAE J2534 API as the interface to the pass-thru device driver. The API contains a set of routines that may be used by the programming application to control the pass-thru device, and to control the communications between the pass-thru device and the vehicle. The pass-thru device will not interpret the message content, allowing any message strategy and message structure to be used that is understood by both the programming application and the ECU being programmed. Also, because the message will not be interpreted, the contents of the message cannot be used to control the operation of the interface. For example, if a message is sent to the ECU to go to high speed, a specific instruction must also be sent to the interface to go to high speed.The OEM programming application does not need to know the hardware connected to the PC, which gives the tool manufacturers the flexibility to use any commonly available interface to the PC. The pass-thru device does not need any knowledge of the vehicle or control module being programmed. This will allow all programming applications to work with all pass-thru devices to enable programming of all control modules for all vehicle manufacturers.The interface will not handle the tester present messages automatically. The OEM application is responsible to handle tester present messages.6.3Connection to PCThe interface between the PC and the pass-thru device shall be determined by the manufacturer of the pass-thru device. This can be RS-232, USB, Ethernet, IEEE1394, Bluetooth or any other connection that allows the pass-thru device to meet all other requirements of this document, including timing requirements. The tool manufacturer is also required to include the device driver that supports this connection so that the actual interface used is transparent to both the PC programming application and the vehicle.6.4Connection to VehicleThe interface between the pass-thru device and the vehicle shall be an SAE J1962 connector for serial data communications. The maximum cable length between the pass-thru device and the vehicle is five (5) meters. The interface shall include an insulated banana jack that accepts a standard 0.175" diameter banana plug as the auxiliary pin for connection of programming voltage to a vehicle specific connector on the vehicle.If powered from the vehicle, the interface shall:a. operate normally within a vehicle battery voltage range of 8.0 to 18.0 volts D.C.,b. survive a vehicle battery voltage of up to 24.0 volts D.C. for at least 10 minutes,c. survive, without damage to the interface, a reverse vehicle battery voltage of up to 24.0 volts D.C. forat least 10 minutes.6.5Communication ProtocolsThe following communication protocols shall be supported:6.5.1ISO9141The following specifications clarify and, if in conflict with ISO 9141, override any related specifications in ISO 9141:a. The maximum sink current to be supported by the interface is 100 mA.b. The range for all tests performed relative to ISO 7637-1 is –1.0 to +40.0 V.c. The default bus idle period before the interface shall transmit an address, shall be 300 ms.d. Support following baud rate with ±0.5% tolerance: 10400.e. Support following baud rate with ±1% tolerance: 10000.f. Support following baud rates with ±2% tolerance: 4800, 9600, 9615, 9800, 10870, 11905, 12500,13158, 13889, 14706, 15625, and 19200.g. Support other baud rates if the interface is capable of supporting the requested value within ±2%.h. The baud rate shall be set by the application, not determined by the SAE J2534 interface. Theinterface is not required to support baud rate detection based on the synchronization byte.i. Support odd and even parity in addition to the default of no parity, with seven or eight data bits.Always one start bit and one stop bit.j. Support for timer values that are less than or greater than those specified in ISO 9141 (see Figure 30 in Section 7.3.2).k. Support ability to disable automatic ISO 9141-2 / ISO 14230 checksum verification by the interface to allow vehicle manufacturer specific error detection.l. If the ISO 9141 checksum is verified by the interface, and the checksum is incorrect, the message will be discarded.m. Support both ISO 9141 5-baud initialization and ISO 14230 fast initialization.n. Interface shall not adjust timer parameters based on keyword values.6.5.2ISO14230-4(KWP2000)The ISO 14230 protocol has the same specifications as the ISO 9141 protocol as outlined in the previous section. In addition, the following specifications clarify and, if in conflict with ISO 14230, override any related specifications in ISO 14230:a. The pass-thru interface will not automatically handle tester present messages. The application needsto handle tester present messages when required.b. The pass-thru interface will not perform any special handling for the $78 response code. Anymessage received with a $78 response code will be passed from the interface to the application. The application is required to handle any special timing requirements based on receipt of this response code, including stopping any periodic messages.6.5.3SAE J185041.6 KBPS PWM(P ULSE W IDTH M ODULATION)The following additional features of SAE J1850 must be supported by the pass-thru device:a. Capable of 41.6 kbps and high speed mode of 83.3 kbps.b. Recommend Ford approved SAE J1850PWM (SCP) physical layer6.5.4SAE J185010.4 KBPS VPW(V ARIABLE P ULSE W IDTH)The following additional features of SAE J1850 must be supported by the pass-thru device:a. Capable of 10.4 kbps and high speed mode of 41.6 kbpsb. 4128 byte block transferc. Return to normal speed after a break indication6.5.5CANThe following features of ISO 11898 (CAN) must be supported by the pass-thru device:a. 125, 250, and 500 kbpsb. 11 and 29 bit identifiersc. Support for 80% ± 2% and 68.5% ± 2% bit sample pointd. Allow raw C AN messages. This protocol can be used to handle any custom C AN messagingprotocol, including custom flow control mechanisms.6.5.6ISO15765-4(CAN)The following features of ISO 15765-4 must be supported by the pass-thru device:a. 125, 250, and 500 kbpsb. 11 and 29 bit identifiersc. Support for 80% ± 2% bit sample pointd. To maintain acceptable programming times, the transport layer flow control function, as defined inISO 15765-2, must be incorporated in the pass-thru device (see Appendix A). If the application does not use the ISO 15765-2 transport layer flow control functionality, the CAN protocol will allow for any custom transport layer.e. Receive a multi-frame message with an ISO15765_BS of 0 and an ISO15765_STMIN of 0, asdefined in ISO 15765-2.f. No single frame or multi-frame messages can be received without matching a flow control filter. Nomulti-frame messages can be transmitted without matching a flow control filter.g. Periodic messages will not be suspended during transmission or reception of a multi-framesegmented message.6.5.7SAE J2610D AIMLER C HRYSLER SCIReference the SAE J2610 Information Report for a description of the SCI protocol.When in the half-duplex mode (when SCI_MODE of TxFlags is set to {1} Half-Duplex), every data byte sent is expected to be "echoed" by the controller. The next data byte shall not be sent until the echo byte has been received and verified. If the echoed byte received doesn't match the transmitted byte, or if after a period of T1 no response was received, the transmission will be terminated. Matching echoed bytes will not be placed in the receive message queue.6.6Simultaneous Communication On Multiple ProtocolsThe pass-thru device must be capable of supporting simultaneous communication on multiple protocols during a single programming event. Figure 2 indicates which combinations of protocols shall be supported. If SC I (SAE J2610) communication is not required during the programming event, the interface shall be capable of supporting one of the protocols from data link set 1, data link set 2, and data link set 3. If SC I (SAE J2610) communication is required during the programming event, the interface shall be capable of supporting one of the SCI protocols and one protocol from data link set 1.6.9Data BufferingThe interface/API shall be capable of receiving 8 simultaneous messages. For ISO 15765 these can be multi-frame messages. The interface/API shall be capable of buffering a maximum length (4128 byte) transmit message and a maximum length (4128 byte) receive message.6.10Error Recovery6.10.1D EVICE N OT C ONNECTEDIf the DLL returns ERR_DEVICE_NOT_CONNECTED from any function, that error shall continue to be returned by all functions, even if the device is reconnected. An application can recover from this error condition by closing the device (with PassThruC lose) and re-opening the device (with PassThruOpen, getting a new device ID).6.10.2B US E RRORSAll devices shall handle bus errors in a consistent manner. There are two error strategies: Retry and Drop.The Retry strategy will keep trying to send a packet until successful or stopped by the application. If loopback is on and the message is successfully sent after some number of retries, only one copy of the message shall be placed in the receive queue. Even if the hardware does not support retries, the firmware/software must retry the transmission. If the error condition persists, a blocking write will wait the specified timeout and return ERR_TIMEOUT. The DLL must return the number of successfully transmitted messages in pNumMsgs. The DLL shall not count the message being retried in pNumMsgs. After returning from the function, the device does not stop the retries. The only functions that will stop the retries are PassThruDisconnect (on that protocol), PassThruC lose, or PassThruIoctl (with an IoctllD of CLEAR_TX_BUFFER).Devices shall use the Retry strategy in the following scenarios:•All CAN errors, such as bus off, lack of acknowledgement, loss of arbitration, and no connection (lack of terminating resistor)•SAE J1850PWM or SAE J1850VPW bus fault (bus stuck passive) or loss of arbitration (bus stuck active)The Drop strategy will delete a message from the queue. The message can be dropped immediately on noticing an error or at the end of the transmission. PassThruWriteMsg shall treat dropped messages the same as successfully transmitted messages. However, if loopback is on, the message shall not be placed in the receive queue.Devices shall use the Drop strategy in the following scenarios:•If characters are echoed improperly in SCI•Corrupted ISO 9141 or ISO 14230 transmission•SAE J1850PWM lack of acknowledgement (Exception: The device must try sending the message 3 times before dropping)7.2.5.1 C / C++ Prototypeextern “C” long WINAPI PassThruReadMsgs(unsigned long ChannelID,*pMsg,PASSTHRU_MSGunsigned long *pNumMsgs,unsigned long Timeout)7.2.5.2ParametersChannelID The channel ID assigned by the PassThruConnect function.pMsg Pointer to message structure(s).pNumMsgs Pointer to location where number of messages to read is specified. On return from the function this location will contain the actual number of messages read.Timeout Read timeout (in milliseconds). If a value of 0 is specified the function retrieves up to pNumMsgs messages and returns immediately. Otherwise, the API will not return untilthe Timeout has expired, an error has occurred, or the desired number of messageshave been read. If the number of messages requested have been read, the functionshall not return ERR_TIMEOUT, even if the timeout value is zero.When using the ISO 15765-4 protocol, only SingleFrame messages can be transmitted without a matching flow control filter. Also, P I bytes are transparently added by the API. See PassThruStartMsgFilter and Appendix A for a discussion of flow control filters.7.2.6.1 C / C++ Prototypeextern “C” long WINAPI PassThruWriteMsgs(u nsigned long ChannelID,*pMsg,PASSTHRU_MSGunsigned long *pNumMsgs,unsigned long Timeout)7.2.6.2ParametersChannelID The channel ID assigned by the PassThruConnect function.pMsg Pointer to message structure(s).pNumMsgs Pointer to the location where number of messages to write is specified. On return will contain the actual number of messages that were transmitted (when Timeout is non-zero) or placed in the transmit queue (when Timeout is zero).Timeout Write timeout (in milliseconds). When a value of 0 is specified, the function queues as many of the specified messages as possible and returns immediately. When a valuegreater than 0 is specified, the function will block until the Timeout has expired, an errorhas occurred, or the desired number of messages have been transmitted on the vehiclenetwork. Even if the device can buffer only one packet at a time, this function shall beable to send an arbitrary number of packets if a Timeout value is supplied. Since thefunction returns early if all the messages have been sent, there is normally no penalty forhaving a large timeout (several seconds). If the number of messages requested havebeen written, the function shall not return ERR_TIMEOUT, even if the timeout value iszero.W hen an ERR_TIMEOUT is returned, only the number of messages that were sent onthe vehicle network is known. The number of messages queued is unknown. Applicationwriters should avoid this ambiguity by using a Timeout value large enough to work onslow devices and networks with arbitration delays.。

3GPP TS 36.331 V13.2.0 (2016-06)

3GPP TS 36.331 V13.2.0 (2016-06)Technical Specification3rd Generation Partnership Project;Technical Specification Group Radio Access Network;Evolved Universal Terrestrial Radio Access (E-UTRA);Radio Resource Control (RRC);Protocol specification(Release 13)The present document has been developed within the 3rd Generation Partnership Project (3GPP TM) and may be further elaborated for the purposes of 3GPP. The present document has not been subject to any approval process by the 3GPP Organizational Partners and shall not be implemented.This Specification is provided for future development work within 3GPP only. The Organizational Partners accept no liability for any use of this Specification. Specifications and reports for implementation of the 3GPP TM system should be obtained via the 3GPP Organizational Partners' Publications Offices.KeywordsUMTS, radio3GPPPostal address3GPP support office address650 Route des Lucioles - Sophia AntipolisValbonne - FRANCETel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16InternetCopyright NotificationNo part may be reproduced except as authorized by written permission.The copyright and the foregoing restriction extend to reproduction in all media.© 2016, 3GPP Organizational Partners (ARIB, ATIS, CCSA, ETSI, TSDSI, TTA, TTC).All rights reserved.UMTS™ is a Trade Mark of ETSI registered for the benefit of its members3GPP™ is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational PartnersLTE™ is a Trade Mark of ETSI currently being registered for the benefit of its Members and of the 3GPP Organizational Partners GSM® and the GSM logo are registered and owned by the GSM AssociationBluetooth® is a Trade Mark of the Bluetooth SIG registered for the benefit of its membersContentsForeword (18)1Scope (19)2References (19)3Definitions, symbols and abbreviations (22)3.1Definitions (22)3.2Abbreviations (24)4General (27)4.1Introduction (27)4.2Architecture (28)4.2.1UE states and state transitions including inter RAT (28)4.2.2Signalling radio bearers (29)4.3Services (30)4.3.1Services provided to upper layers (30)4.3.2Services expected from lower layers (30)4.4Functions (30)5Procedures (32)5.1General (32)5.1.1Introduction (32)5.1.2General requirements (32)5.2System information (33)5.2.1Introduction (33)5.2.1.1General (33)5.2.1.2Scheduling (34)5.2.1.2a Scheduling for NB-IoT (34)5.2.1.3System information validity and notification of changes (35)5.2.1.4Indication of ETWS notification (36)5.2.1.5Indication of CMAS notification (37)5.2.1.6Notification of EAB parameters change (37)5.2.1.7Access Barring parameters change in NB-IoT (37)5.2.2System information acquisition (38)5.2.2.1General (38)5.2.2.2Initiation (38)5.2.2.3System information required by the UE (38)5.2.2.4System information acquisition by the UE (39)5.2.2.5Essential system information missing (42)5.2.2.6Actions upon reception of the MasterInformationBlock message (42)5.2.2.7Actions upon reception of the SystemInformationBlockType1 message (42)5.2.2.8Actions upon reception of SystemInformation messages (44)5.2.2.9Actions upon reception of SystemInformationBlockType2 (44)5.2.2.10Actions upon reception of SystemInformationBlockType3 (45)5.2.2.11Actions upon reception of SystemInformationBlockType4 (45)5.2.2.12Actions upon reception of SystemInformationBlockType5 (45)5.2.2.13Actions upon reception of SystemInformationBlockType6 (45)5.2.2.14Actions upon reception of SystemInformationBlockType7 (45)5.2.2.15Actions upon reception of SystemInformationBlockType8 (45)5.2.2.16Actions upon reception of SystemInformationBlockType9 (46)5.2.2.17Actions upon reception of SystemInformationBlockType10 (46)5.2.2.18Actions upon reception of SystemInformationBlockType11 (46)5.2.2.19Actions upon reception of SystemInformationBlockType12 (47)5.2.2.20Actions upon reception of SystemInformationBlockType13 (48)5.2.2.21Actions upon reception of SystemInformationBlockType14 (48)5.2.2.22Actions upon reception of SystemInformationBlockType15 (48)5.2.2.23Actions upon reception of SystemInformationBlockType16 (48)5.2.2.24Actions upon reception of SystemInformationBlockType17 (48)5.2.2.25Actions upon reception of SystemInformationBlockType18 (48)5.2.2.26Actions upon reception of SystemInformationBlockType19 (49)5.2.3Acquisition of an SI message (49)5.2.3a Acquisition of an SI message by BL UE or UE in CE or a NB-IoT UE (50)5.3Connection control (50)5.3.1Introduction (50)5.3.1.1RRC connection control (50)5.3.1.2Security (52)5.3.1.2a RN security (53)5.3.1.3Connected mode mobility (53)5.3.1.4Connection control in NB-IoT (54)5.3.2Paging (55)5.3.2.1General (55)5.3.2.2Initiation (55)5.3.2.3Reception of the Paging message by the UE (55)5.3.3RRC connection establishment (56)5.3.3.1General (56)5.3.3.1a Conditions for establishing RRC Connection for sidelink communication/ discovery (58)5.3.3.2Initiation (59)5.3.3.3Actions related to transmission of RRCConnectionRequest message (63)5.3.3.3a Actions related to transmission of RRCConnectionResumeRequest message (64)5.3.3.4Reception of the RRCConnectionSetup by the UE (64)5.3.3.4a Reception of the RRCConnectionResume by the UE (66)5.3.3.5Cell re-selection while T300, T302, T303, T305, T306, or T308 is running (68)5.3.3.6T300 expiry (68)5.3.3.7T302, T303, T305, T306, or T308 expiry or stop (69)5.3.3.8Reception of the RRCConnectionReject by the UE (70)5.3.3.9Abortion of RRC connection establishment (71)5.3.3.10Handling of SSAC related parameters (71)5.3.3.11Access barring check (72)5.3.3.12EAB check (73)5.3.3.13Access barring check for ACDC (73)5.3.3.14Access Barring check for NB-IoT (74)5.3.4Initial security activation (75)5.3.4.1General (75)5.3.4.2Initiation (76)5.3.4.3Reception of the SecurityModeCommand by the UE (76)5.3.5RRC connection reconfiguration (77)5.3.5.1General (77)5.3.5.2Initiation (77)5.3.5.3Reception of an RRCConnectionReconfiguration not including the mobilityControlInfo by theUE (77)5.3.5.4Reception of an RRCConnectionReconfiguration including the mobilityControlInfo by the UE(handover) (79)5.3.5.5Reconfiguration failure (83)5.3.5.6T304 expiry (handover failure) (83)5.3.5.7Void (84)5.3.5.7a T307 expiry (SCG change failure) (84)5.3.5.8Radio Configuration involving full configuration option (84)5.3.6Counter check (86)5.3.6.1General (86)5.3.6.2Initiation (86)5.3.6.3Reception of the CounterCheck message by the UE (86)5.3.7RRC connection re-establishment (87)5.3.7.1General (87)5.3.7.2Initiation (87)5.3.7.3Actions following cell selection while T311 is running (88)5.3.7.4Actions related to transmission of RRCConnectionReestablishmentRequest message (89)5.3.7.5Reception of the RRCConnectionReestablishment by the UE (89)5.3.7.6T311 expiry (91)5.3.7.7T301 expiry or selected cell no longer suitable (91)5.3.7.8Reception of RRCConnectionReestablishmentReject by the UE (91)5.3.8RRC connection release (92)5.3.8.1General (92)5.3.8.2Initiation (92)5.3.8.3Reception of the RRCConnectionRelease by the UE (92)5.3.8.4T320 expiry (93)5.3.9RRC connection release requested by upper layers (93)5.3.9.1General (93)5.3.9.2Initiation (93)5.3.10Radio resource configuration (93)5.3.10.0General (93)5.3.10.1SRB addition/ modification (94)5.3.10.2DRB release (95)5.3.10.3DRB addition/ modification (95)5.3.10.3a1DC specific DRB addition or reconfiguration (96)5.3.10.3a2LWA specific DRB addition or reconfiguration (98)5.3.10.3a3LWIP specific DRB addition or reconfiguration (98)5.3.10.3a SCell release (99)5.3.10.3b SCell addition/ modification (99)5.3.10.3c PSCell addition or modification (99)5.3.10.4MAC main reconfiguration (99)5.3.10.5Semi-persistent scheduling reconfiguration (100)5.3.10.6Physical channel reconfiguration (100)5.3.10.7Radio Link Failure Timers and Constants reconfiguration (101)5.3.10.8Time domain measurement resource restriction for serving cell (101)5.3.10.9Other configuration (102)5.3.10.10SCG reconfiguration (103)5.3.10.11SCG dedicated resource configuration (104)5.3.10.12Reconfiguration SCG or split DRB by drb-ToAddModList (105)5.3.10.13Neighbour cell information reconfiguration (105)5.3.10.14Void (105)5.3.10.15Sidelink dedicated configuration (105)5.3.10.16T370 expiry (106)5.3.11Radio link failure related actions (107)5.3.11.1Detection of physical layer problems in RRC_CONNECTED (107)5.3.11.2Recovery of physical layer problems (107)5.3.11.3Detection of radio link failure (107)5.3.12UE actions upon leaving RRC_CONNECTED (109)5.3.13UE actions upon PUCCH/ SRS release request (110)5.3.14Proximity indication (110)5.3.14.1General (110)5.3.14.2Initiation (111)5.3.14.3Actions related to transmission of ProximityIndication message (111)5.3.15Void (111)5.4Inter-RAT mobility (111)5.4.1Introduction (111)5.4.2Handover to E-UTRA (112)5.4.2.1General (112)5.4.2.2Initiation (112)5.4.2.3Reception of the RRCConnectionReconfiguration by the UE (112)5.4.2.4Reconfiguration failure (114)5.4.2.5T304 expiry (handover to E-UTRA failure) (114)5.4.3Mobility from E-UTRA (114)5.4.3.1General (114)5.4.3.2Initiation (115)5.4.3.3Reception of the MobilityFromEUTRACommand by the UE (115)5.4.3.4Successful completion of the mobility from E-UTRA (116)5.4.3.5Mobility from E-UTRA failure (117)5.4.4Handover from E-UTRA preparation request (CDMA2000) (117)5.4.4.1General (117)5.4.4.2Initiation (118)5.4.4.3Reception of the HandoverFromEUTRAPreparationRequest by the UE (118)5.4.5UL handover preparation transfer (CDMA2000) (118)5.4.5.1General (118)5.4.5.2Initiation (118)5.4.5.3Actions related to transmission of the ULHandoverPreparationTransfer message (119)5.4.5.4Failure to deliver the ULHandoverPreparationTransfer message (119)5.4.6Inter-RAT cell change order to E-UTRAN (119)5.4.6.1General (119)5.4.6.2Initiation (119)5.4.6.3UE fails to complete an inter-RAT cell change order (119)5.5Measurements (120)5.5.1Introduction (120)5.5.2Measurement configuration (121)5.5.2.1General (121)5.5.2.2Measurement identity removal (122)5.5.2.2a Measurement identity autonomous removal (122)5.5.2.3Measurement identity addition/ modification (123)5.5.2.4Measurement object removal (124)5.5.2.5Measurement object addition/ modification (124)5.5.2.6Reporting configuration removal (126)5.5.2.7Reporting configuration addition/ modification (127)5.5.2.8Quantity configuration (127)5.5.2.9Measurement gap configuration (127)5.5.2.10Discovery signals measurement timing configuration (128)5.5.2.11RSSI measurement timing configuration (128)5.5.3Performing measurements (128)5.5.3.1General (128)5.5.3.2Layer 3 filtering (131)5.5.4Measurement report triggering (131)5.5.4.1General (131)5.5.4.2Event A1 (Serving becomes better than threshold) (135)5.5.4.3Event A2 (Serving becomes worse than threshold) (136)5.5.4.4Event A3 (Neighbour becomes offset better than PCell/ PSCell) (136)5.5.4.5Event A4 (Neighbour becomes better than threshold) (137)5.5.4.6Event A5 (PCell/ PSCell becomes worse than threshold1 and neighbour becomes better thanthreshold2) (138)5.5.4.6a Event A6 (Neighbour becomes offset better than SCell) (139)5.5.4.7Event B1 (Inter RAT neighbour becomes better than threshold) (139)5.5.4.8Event B2 (PCell becomes worse than threshold1 and inter RAT neighbour becomes better thanthreshold2) (140)5.5.4.9Event C1 (CSI-RS resource becomes better than threshold) (141)5.5.4.10Event C2 (CSI-RS resource becomes offset better than reference CSI-RS resource) (141)5.5.4.11Event W1 (WLAN becomes better than a threshold) (142)5.5.4.12Event W2 (All WLAN inside WLAN mobility set becomes worse than threshold1 and a WLANoutside WLAN mobility set becomes better than threshold2) (142)5.5.4.13Event W3 (All WLAN inside WLAN mobility set becomes worse than a threshold) (143)5.5.5Measurement reporting (144)5.5.6Measurement related actions (148)5.5.6.1Actions upon handover and re-establishment (148)5.5.6.2Speed dependant scaling of measurement related parameters (149)5.5.7Inter-frequency RSTD measurement indication (149)5.5.7.1General (149)5.5.7.2Initiation (150)5.5.7.3Actions related to transmission of InterFreqRSTDMeasurementIndication message (150)5.6Other (150)5.6.0General (150)5.6.1DL information transfer (151)5.6.1.1General (151)5.6.1.2Initiation (151)5.6.1.3Reception of the DLInformationTransfer by the UE (151)5.6.2UL information transfer (151)5.6.2.1General (151)5.6.2.2Initiation (151)5.6.2.3Actions related to transmission of ULInformationTransfer message (152)5.6.2.4Failure to deliver ULInformationTransfer message (152)5.6.3UE capability transfer (152)5.6.3.1General (152)5.6.3.2Initiation (153)5.6.3.3Reception of the UECapabilityEnquiry by the UE (153)5.6.4CSFB to 1x Parameter transfer (157)5.6.4.1General (157)5.6.4.2Initiation (157)5.6.4.3Actions related to transmission of CSFBParametersRequestCDMA2000 message (157)5.6.4.4Reception of the CSFBParametersResponseCDMA2000 message (157)5.6.5UE Information (158)5.6.5.1General (158)5.6.5.2Initiation (158)5.6.5.3Reception of the UEInformationRequest message (158)5.6.6 Logged Measurement Configuration (159)5.6.6.1General (159)5.6.6.2Initiation (160)5.6.6.3Reception of the LoggedMeasurementConfiguration by the UE (160)5.6.6.4T330 expiry (160)5.6.7 Release of Logged Measurement Configuration (160)5.6.7.1General (160)5.6.7.2Initiation (160)5.6.8 Measurements logging (161)5.6.8.1General (161)5.6.8.2Initiation (161)5.6.9In-device coexistence indication (163)5.6.9.1General (163)5.6.9.2Initiation (164)5.6.9.3Actions related to transmission of InDeviceCoexIndication message (164)5.6.10UE Assistance Information (165)5.6.10.1General (165)5.6.10.2Initiation (166)5.6.10.3Actions related to transmission of UEAssistanceInformation message (166)5.6.11 Mobility history information (166)5.6.11.1General (166)5.6.11.2Initiation (166)5.6.12RAN-assisted WLAN interworking (167)5.6.12.1General (167)5.6.12.2Dedicated WLAN offload configuration (167)5.6.12.3WLAN offload RAN evaluation (167)5.6.12.4T350 expiry or stop (167)5.6.12.5Cell selection/ re-selection while T350 is running (168)5.6.13SCG failure information (168)5.6.13.1General (168)5.6.13.2Initiation (168)5.6.13.3Actions related to transmission of SCGFailureInformation message (168)5.6.14LTE-WLAN Aggregation (169)5.6.14.1Introduction (169)5.6.14.2Reception of LWA configuration (169)5.6.14.3Release of LWA configuration (170)5.6.15WLAN connection management (170)5.6.15.1Introduction (170)5.6.15.2WLAN connection status reporting (170)5.6.15.2.1General (170)5.6.15.2.2Initiation (171)5.6.15.2.3Actions related to transmission of WLANConnectionStatusReport message (171)5.6.15.3T351 Expiry (WLAN connection attempt timeout) (171)5.6.15.4WLAN status monitoring (171)5.6.16RAN controlled LTE-WLAN interworking (172)5.6.16.1General (172)5.6.16.2WLAN traffic steering command (172)5.6.17LTE-WLAN aggregation with IPsec tunnel (173)5.6.17.1General (173)5.7Generic error handling (174)5.7.1General (174)5.7.2ASN.1 violation or encoding error (174)5.7.3Field set to a not comprehended value (174)5.7.4Mandatory field missing (174)5.7.5Not comprehended field (176)5.8MBMS (176)5.8.1Introduction (176)5.8.1.1General (176)5.8.1.2Scheduling (176)5.8.1.3MCCH information validity and notification of changes (176)5.8.2MCCH information acquisition (178)5.8.2.1General (178)5.8.2.2Initiation (178)5.8.2.3MCCH information acquisition by the UE (178)5.8.2.4Actions upon reception of the MBSFNAreaConfiguration message (178)5.8.2.5Actions upon reception of the MBMSCountingRequest message (179)5.8.3MBMS PTM radio bearer configuration (179)5.8.3.1General (179)5.8.3.2Initiation (179)5.8.3.3MRB establishment (179)5.8.3.4MRB release (179)5.8.4MBMS Counting Procedure (179)5.8.4.1General (179)5.8.4.2Initiation (180)5.8.4.3Reception of the MBMSCountingRequest message by the UE (180)5.8.5MBMS interest indication (181)5.8.5.1General (181)5.8.5.2Initiation (181)5.8.5.3Determine MBMS frequencies of interest (182)5.8.5.4Actions related to transmission of MBMSInterestIndication message (183)5.8a SC-PTM (183)5.8a.1Introduction (183)5.8a.1.1General (183)5.8a.1.2SC-MCCH scheduling (183)5.8a.1.3SC-MCCH information validity and notification of changes (183)5.8a.1.4Procedures (184)5.8a.2SC-MCCH information acquisition (184)5.8a.2.1General (184)5.8a.2.2Initiation (184)5.8a.2.3SC-MCCH information acquisition by the UE (184)5.8a.2.4Actions upon reception of the SCPTMConfiguration message (185)5.8a.3SC-PTM radio bearer configuration (185)5.8a.3.1General (185)5.8a.3.2Initiation (185)5.8a.3.3SC-MRB establishment (185)5.8a.3.4SC-MRB release (185)5.9RN procedures (186)5.9.1RN reconfiguration (186)5.9.1.1General (186)5.9.1.2Initiation (186)5.9.1.3Reception of the RNReconfiguration by the RN (186)5.10Sidelink (186)5.10.1Introduction (186)5.10.1a Conditions for sidelink communication operation (187)5.10.2Sidelink UE information (188)5.10.2.1General (188)5.10.2.2Initiation (189)5.10.2.3Actions related to transmission of SidelinkUEInformation message (193)5.10.3Sidelink communication monitoring (195)5.10.6Sidelink discovery announcement (198)5.10.6a Sidelink discovery announcement pool selection (201)5.10.6b Sidelink discovery announcement reference carrier selection (201)5.10.7Sidelink synchronisation information transmission (202)5.10.7.1General (202)5.10.7.2Initiation (203)5.10.7.3Transmission of SLSS (204)5.10.7.4Transmission of MasterInformationBlock-SL message (205)5.10.7.5Void (206)5.10.8Sidelink synchronisation reference (206)5.10.8.1General (206)5.10.8.2Selection and reselection of synchronisation reference UE (SyncRef UE) (206)5.10.9Sidelink common control information (207)5.10.9.1General (207)5.10.9.2Actions related to reception of MasterInformationBlock-SL message (207)5.10.10Sidelink relay UE operation (207)5.10.10.1General (207)5.10.10.2AS-conditions for relay related sidelink communication transmission by sidelink relay UE (207)5.10.10.3AS-conditions for relay PS related sidelink discovery transmission by sidelink relay UE (208)5.10.10.4Sidelink relay UE threshold conditions (208)5.10.11Sidelink remote UE operation (208)5.10.11.1General (208)5.10.11.2AS-conditions for relay related sidelink communication transmission by sidelink remote UE (208)5.10.11.3AS-conditions for relay PS related sidelink discovery transmission by sidelink remote UE (209)5.10.11.4Selection and reselection of sidelink relay UE (209)5.10.11.5Sidelink remote UE threshold conditions (210)6Protocol data units, formats and parameters (tabular & ASN.1) (210)6.1General (210)6.2RRC messages (212)6.2.1General message structure (212)–EUTRA-RRC-Definitions (212)–BCCH-BCH-Message (212)–BCCH-DL-SCH-Message (212)–BCCH-DL-SCH-Message-BR (213)–MCCH-Message (213)–PCCH-Message (213)–DL-CCCH-Message (214)–DL-DCCH-Message (214)–UL-CCCH-Message (214)–UL-DCCH-Message (215)–SC-MCCH-Message (215)6.2.2Message definitions (216)–CounterCheck (216)–CounterCheckResponse (217)–CSFBParametersRequestCDMA2000 (217)–CSFBParametersResponseCDMA2000 (218)–DLInformationTransfer (218)–HandoverFromEUTRAPreparationRequest (CDMA2000) (219)–InDeviceCoexIndication (220)–InterFreqRSTDMeasurementIndication (222)–LoggedMeasurementConfiguration (223)–MasterInformationBlock (225)–MBMSCountingRequest (226)–MBMSCountingResponse (226)–MBMSInterestIndication (227)–MBSFNAreaConfiguration (228)–MeasurementReport (228)–MobilityFromEUTRACommand (229)–Paging (232)–ProximityIndication (233)–RNReconfiguration (234)–RNReconfigurationComplete (234)–RRCConnectionReconfiguration (235)–RRCConnectionReconfigurationComplete (240)–RRCConnectionReestablishment (241)–RRCConnectionReestablishmentComplete (241)–RRCConnectionReestablishmentReject (242)–RRCConnectionReestablishmentRequest (243)–RRCConnectionReject (243)–RRCConnectionRelease (244)–RRCConnectionResume (248)–RRCConnectionResumeComplete (249)–RRCConnectionResumeRequest (250)–RRCConnectionRequest (250)–RRCConnectionSetup (251)–RRCConnectionSetupComplete (252)–SCGFailureInformation (253)–SCPTMConfiguration (254)–SecurityModeCommand (255)–SecurityModeComplete (255)–SecurityModeFailure (256)–SidelinkUEInformation (256)–SystemInformation (258)–SystemInformationBlockType1 (259)–UEAssistanceInformation (264)–UECapabilityEnquiry (265)–UECapabilityInformation (266)–UEInformationRequest (267)–UEInformationResponse (267)–ULHandoverPreparationTransfer (CDMA2000) (273)–ULInformationTransfer (274)–WLANConnectionStatusReport (274)6.3RRC information elements (275)6.3.1System information blocks (275)–SystemInformationBlockType2 (275)–SystemInformationBlockType3 (279)–SystemInformationBlockType4 (282)–SystemInformationBlockType5 (283)–SystemInformationBlockType6 (287)–SystemInformationBlockType7 (289)–SystemInformationBlockType8 (290)–SystemInformationBlockType9 (295)–SystemInformationBlockType10 (295)–SystemInformationBlockType11 (296)–SystemInformationBlockType12 (297)–SystemInformationBlockType13 (297)–SystemInformationBlockType14 (298)–SystemInformationBlockType15 (298)–SystemInformationBlockType16 (299)–SystemInformationBlockType17 (300)–SystemInformationBlockType18 (301)–SystemInformationBlockType19 (301)–SystemInformationBlockType20 (304)6.3.2Radio resource control information elements (304)–AntennaInfo (304)–AntennaInfoUL (306)–CQI-ReportConfig (307)–CQI-ReportPeriodicProcExtId (314)–CrossCarrierSchedulingConfig (314)–CSI-IM-Config (315)–CSI-IM-ConfigId (315)–CSI-RS-Config (317)–CSI-RS-ConfigEMIMO (318)–CSI-RS-ConfigNZP (319)–CSI-RS-ConfigNZPId (320)–CSI-RS-ConfigZP (321)–CSI-RS-ConfigZPId (321)–DMRS-Config (321)–DRB-Identity (322)–EPDCCH-Config (322)–EIMTA-MainConfig (324)–LogicalChannelConfig (325)–LWA-Configuration (326)–LWIP-Configuration (326)–RCLWI-Configuration (327)–MAC-MainConfig (327)–P-C-AndCBSR (332)–PDCCH-ConfigSCell (333)–PDCP-Config (334)–PDSCH-Config (337)–PDSCH-RE-MappingQCL-ConfigId (339)–PHICH-Config (339)–PhysicalConfigDedicated (339)–P-Max (344)–PRACH-Config (344)–PresenceAntennaPort1 (346)–PUCCH-Config (347)–PUSCH-Config (351)–RACH-ConfigCommon (355)–RACH-ConfigDedicated (357)–RadioResourceConfigCommon (358)–RadioResourceConfigDedicated (362)–RLC-Config (367)–RLF-TimersAndConstants (369)–RN-SubframeConfig (370)–SchedulingRequestConfig (371)–SoundingRS-UL-Config (372)–SPS-Config (375)–TDD-Config (376)–TimeAlignmentTimer (377)–TPC-PDCCH-Config (377)–TunnelConfigLWIP (378)–UplinkPowerControl (379)–WLAN-Id-List (382)–WLAN-MobilityConfig (382)6.3.3Security control information elements (382)–NextHopChainingCount (382)–SecurityAlgorithmConfig (383)–ShortMAC-I (383)6.3.4Mobility control information elements (383)–AdditionalSpectrumEmission (383)–ARFCN-ValueCDMA2000 (383)–ARFCN-ValueEUTRA (384)–ARFCN-ValueGERAN (384)–ARFCN-ValueUTRA (384)–BandclassCDMA2000 (384)–BandIndicatorGERAN (385)–CarrierFreqCDMA2000 (385)–CarrierFreqGERAN (385)–CellIndexList (387)–CellReselectionPriority (387)–CellSelectionInfoCE (387)–CellReselectionSubPriority (388)–CSFB-RegistrationParam1XRTT (388)–CellGlobalIdEUTRA (389)–CellGlobalIdUTRA (389)–CellGlobalIdGERAN (390)–CellGlobalIdCDMA2000 (390)–CellSelectionInfoNFreq (391)–CSG-Identity (391)–FreqBandIndicator (391)–MobilityControlInfo (391)–MobilityParametersCDMA2000 (1xRTT) (393)–MobilityStateParameters (394)–MultiBandInfoList (394)–NS-PmaxList (394)–PhysCellId (395)–PhysCellIdRange (395)–PhysCellIdRangeUTRA-FDDList (395)–PhysCellIdCDMA2000 (396)–PhysCellIdGERAN (396)–PhysCellIdUTRA-FDD (396)–PhysCellIdUTRA-TDD (396)–PLMN-Identity (397)–PLMN-IdentityList3 (397)–PreRegistrationInfoHRPD (397)–Q-QualMin (398)–Q-RxLevMin (398)–Q-OffsetRange (398)–Q-OffsetRangeInterRAT (399)–ReselectionThreshold (399)–ReselectionThresholdQ (399)–SCellIndex (399)–ServCellIndex (400)–SpeedStateScaleFactors (400)–SystemInfoListGERAN (400)–SystemTimeInfoCDMA2000 (401)–TrackingAreaCode (401)–T-Reselection (402)–T-ReselectionEUTRA-CE (402)6.3.5Measurement information elements (402)–AllowedMeasBandwidth (402)–CSI-RSRP-Range (402)–Hysteresis (402)–LocationInfo (403)–MBSFN-RSRQ-Range (403)–MeasConfig (404)–MeasDS-Config (405)–MeasGapConfig (406)–MeasId (407)–MeasIdToAddModList (407)–MeasObjectCDMA2000 (408)–MeasObjectEUTRA (408)–MeasObjectGERAN (412)–MeasObjectId (412)–MeasObjectToAddModList (412)–MeasObjectUTRA (413)–ReportConfigEUTRA (422)–ReportConfigId (425)–ReportConfigInterRAT (425)–ReportConfigToAddModList (428)–ReportInterval (429)–RSRP-Range (429)–RSRQ-Range (430)–RSRQ-Type (430)–RS-SINR-Range (430)–RSSI-Range-r13 (431)–TimeToTrigger (431)–UL-DelayConfig (431)–WLAN-CarrierInfo (431)–WLAN-RSSI-Range (432)–WLAN-Status (432)6.3.6Other information elements (433)–AbsoluteTimeInfo (433)–AreaConfiguration (433)–C-RNTI (433)–DedicatedInfoCDMA2000 (434)–DedicatedInfoNAS (434)–FilterCoefficient (434)–LoggingDuration (434)–LoggingInterval (435)–MeasSubframePattern (435)–MMEC (435)–NeighCellConfig (435)–OtherConfig (436)–RAND-CDMA2000 (1xRTT) (437)–RAT-Type (437)–ResumeIdentity (437)–RRC-TransactionIdentifier (438)–S-TMSI (438)–TraceReference (438)–UE-CapabilityRAT-ContainerList (438)–UE-EUTRA-Capability (439)–UE-RadioPagingInfo (469)–UE-TimersAndConstants (469)–VisitedCellInfoList (470)–WLAN-OffloadConfig (470)6.3.7MBMS information elements (472)–MBMS-NotificationConfig (472)–MBMS-ServiceList (473)–MBSFN-AreaId (473)–MBSFN-AreaInfoList (473)–MBSFN-SubframeConfig (474)–PMCH-InfoList (475)6.3.7a SC-PTM information elements (476)–SC-MTCH-InfoList (476)–SCPTM-NeighbourCellList (478)6.3.8Sidelink information elements (478)–SL-CommConfig (478)–SL-CommResourcePool (479)–SL-CP-Len (480)–SL-DiscConfig (481)–SL-DiscResourcePool (483)–SL-DiscTxPowerInfo (485)–SL-GapConfig (485)。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Error Recovery in Critical Infrastructure SystemsJohn C. Knight, Matthew C. Elder, Xing DuDepartment of Computer ScienceUniversity of VirginiaCharlottesville, VA{knight, elder, xd2a}@AbstractCritical infrastructure applications provide services upon which society depends heavily;such applications require survivability in the face of faults that might cause a loss of service.These applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue.Fault tolerance is a key mechanism by which survivability can be achieved in these information systems.Much of the literature on fault-tolerant distributed systems focuses on local error recovery by masking the effects of faults.We describe a direction for error recov-ery in the face of catastrophic faults where the effects of the faults cannot be masked using available resources.The goal is to provide continued service that is either an alternate or degraded service by reconﬁguring the system rather than masking faults.We outline the requirements for a reconﬁgurable system architecture and present an error recovery system that enables systematic structuring of error recovery speciﬁcations and implementations. 1. IntroductionThe provision of dependable service in infrastructure applications such as electric power generation and control,banking andﬁnancial systems,telecommunications,and transporta-tion systems has become a major national concern[34],[35].Society has become so depen-dent on such services that the loss of any of them would have serious consequences.Such services are often referred to as critical infrastructure applications.As has occurred in many domains,sophisticated information systems have been intro-duced into critical infrastructure applications as the cost of all forms of computing hardware has dropped and the availability of sophisticated software has increased.This has led to dra-matic efﬁciency improvements and service enhancements but,along with these beneﬁts,a signiﬁcant vulnerability has been introduced:the provision of service is now completely dependent in many cases on the correct operation of computerized information systems.Fail-ure of an information system upon which a critical infrastructure application depends will often eliminate service quickly and completely.The dependability of the information sys-tems,which we refer to as critical information systems,has therefore become a major con-cern.Dependability has many facets—reliability,availability and safety,and so on[24]—and critical infrastructure applications have a variety of dependability requirements.In most cases,very high availability is important,but reliability and safety arise in such systems as transportation control,and security is becoming an increasingly signiﬁcant dependability property in all application domains.An important dependability requirement of critical infrastructure applications is that, under predeﬁned adverse circumstances that preclude the provision of entirely normal service to the user,such systems must provide predeﬁned forms of alternate service.The necessary service might be a degraded form of normal service,a different service,or some combination.Adverse circumstances might be widespread environmental damage,equipment failures, software failures,sophisticated malicious attacks,and so on.This particular dependability requirement is referred to as survivability.In the terminology of fault tolerance,survivability can be thought of as requiring very speciﬁc(and usually elaborate)error recovery after a fault.This paper is about the mechanisms that are needed within the applications themselves to provide error recovery,i.e.,state restoration and continued service,under circumstances where the application has been subjected to extreme damage.Application systems must be designed to permit effective state restoration and appropriate continued service,and we address the issues of application system design in this context.We are concerned with faults that affect either large parts of the system or the entire system and which cannot be masked.Thus,faults such as the failure of a single processor or a single communications link are not within the scope of this paper.Faults such as these can be tolerated by local redundancy and their effects can be totally masked at reasonable cost.Faults of interest include widespread physical damage in which substantial resources are lost,and coordinated security attacks in which multiple attacks occur in a short time period.We assume that error detection and dam-age assessment are taken care of by some other mechanism such as a control-system architecture[45].The outline of the paper is as follows.In the next section we summarize brieﬂy the func-tionality and characteristics of critical infrastructure applications and their associated infor-mation systems,and then present a detailed example of survivability.In section4,we discuss the role of fault tolerance and the requirements for an approach to survivability.In section5, we present some related work,and in section6we deﬁne directions for a solution approach.Finally, we present our conclusions.2. Critical infrastructure applicationsSome background material about critical infrastructure applications is helpful in under-standing the technical approaches that might be developed to realize survivability.Detailed descriptions of four applications are available elsewhere[20].In this section we summarize three applications very brieﬂy and then outline a set of important characteristics that tend to be present in information systems supporting critical infrastructure applications.Finally in this section,we discuss characteristics of future critical infrastructure applications that impact approaches to survivability that might be developed.2.1. ApplicationsThe nation’s banking andﬁnance systems provide a very wide range of services—check clearing,ATM service,credit and debit card processing,securities and commodities markets, electronic funds transfers,foreign currency transfers,and so on.These services are imple-mented by complex, interconnected, networked information systems.The most fundamentalﬁnancial service is the payment system.The payment system is the mechanism by which value is transferred from one account to another.Transfers might be for relatively small amounts,as occur with personal checks,all the way up to very large amounts, typical of commercial transactions.For a variety of practical and legal reasons,individualbanks do not communicate directly with each other to transfer funds.Rather,most funds are transferred in large blocks by either the Federal Reserve or by an Automated Clearing House (ACH).The freight-rail transport system,another critical infrastructure application,moves large amounts of raw materials,manufactured good,fuels,and food.Although not responsible for moving everything in any one of these categories,loss of freight-rail transportation would be devastating.Management of the freight-rail system uses computers extensively for a variety of purposes.For example,every freight car in North America is tracked electronically as it moves and databases are maintained of car and locomotive locations.Tracking is achieved using track-side equipment that communicates in real time with computers maintaining the database.An especially important application that is being used increasingly in the freight-rail sys-tem is just-in-time delivery.Train movements are scheduled so that,for example,raw materi-als arrive at a manufacturing plant just as they are required.This type of service necessitates analysis of demands and resources over a wide area,often nationally,if optimal choices are to be made.The generation and distribution of electric power,the third critical infrastructure applica-tion we consider,is accomplished by a wide variety of generating,switching,and transmis-sion equipment that is owned and operated by a number of different utility companies.However,all of this equipment is interconnected,and control is exercised over the equipment using a system that is rapidly becoming a single national network.The control mechanisms within a region are responsible for managing the equipment in that region and interconnec-tion of area control mechanisms is responsible for arranging and managing power transfers between regions.The complexity of the control mechanisms in the power generation industry is being affected considerably by industry deregulation.2.2. Application system characteristicsThe architecture of the information systems upon which critical infrastructure applications rely are tailored very substantially to the services of the industries which they serve and inﬂu-enced inevitably by cost-beneﬁt trade-off’s.For example,the systems are typically distrib-uted over a very wide area with large numbers of nodes located at sites dictated by the application.Beyond this,however,there are a number of characteristics that these applica-tions possess in whole or in part which are important in constraining the ways by which these applications approach error recovery. These characteristics are as follows:•Heterogeneous nodes.Despite the large number of nodes in many of these systems,a small number of nodes are often far more critical to the functionality of the system than the remainder.This occurs because critical parts of the system’s functionality are implemented on just one or a small number of nodes.Heterogeneity extends also to the hardware plat-forms, operating systems, application software, and even authoritative domains.•Stylized communication structures.In a number of circumstances,critical infrastructure applications use dedicated,point-to-point links rather than fully-interconnected networks.Reasons for this approach include meeting application performance requirements,better security, and no requirement for full connectivity.•Composite functionality.The service supplied to an end user is often attained by compos-ing different functionality at different nodes.Thus entirely different programs running ondifferent nodes provide different services,and complete service can only be obtained when several subsystems cooperate and operate in some predeﬁned sequence.This is quite unlike more familiar applications such as mail servers routing mail through the Internet.•Performance requirements.Some critical infrastructures applications,such as theﬁnancial payment system,have soft real-time constraints and throughput requirements(checks have to be cleared and there are lots of checks)while others,such as parts of many transporta-tion systems and many energy control systems,have hard real-time constraints.In some systems,performance requirements change with time as load or functionality changes—over a period of hours inﬁnancial systems or over a period of days or months in transpor-tation systems, for example.•Extensive databases.Infrastructure applications are all about data.Many employ several very extensive databases with different databases being located at different nodes and with most databases handling very large numbers of transactions.•COTS and legacy components.For all the usual reasons,critical infrastructure applications utilize COTS components including hardware,operating systems,network protocols,data-base systems,and applications.In addition,these systems contain legacy components—custom-built software that has evolved with the system over many years.2.3. Future system characteristicsThe characteristics listed in the previous section are important,and most are likely to remain so in systems of the future.But the rate of introduction of new technology into these systems and the introduction of entirely new types of application is rapid,and these suggest that error recovery techniques must take into account the likely characteristics of future sys-tems also.We hypothesize that the following will be important architectural aspects of future infrastructure systems:•Very large number of nodes.The number of nodes in infrastructure networks is likely to increase dramatically as enhancements are made in functionality,performance,and user access.The effect of this on error recovery is considerable.In particular,it suggests that error recovery will have to be regional in the sense that different parts of the network will require different recovery strategies.It also suggests that the implementation effort involved in error recovery will be substantial because there are likely to be many regions and there will be many different anticipated faults,each of which might require different treatment.•Extensive, low-level redundancy.As the cost of hardware continues to drop,more redun-dancy will be built into low-level components of systems.Examples include mirrored disks and redundant server groups.This will simplify error recovery in the case of low-level faults;however,catastrophic errors will still require sophisticated recovery strategies.•Packet-switched networks.For many reasons,the Internet is becoming the network tech-nology of choice in the construction of new systems,in spite of its inherent drawbacks(e.g.poor security and performance guarantees).However,the transition to packet-switched networks,whether it be the current Internet or virtual-private networks imple-mented over some incarnation of the Internet,seems inevitable and impacts solution approaches for error recovery.3. Survivability—an exampleIn this section,we present an example of survivability requirements to illustrate the extent, scope,and complexity of the error recovery that might well be needed in a typical critical infrastructure application.We use as an example application a highly simpliﬁed version of the nationalﬁnancial payment system.It is important to note that,inevitably,most of the details of the payment system are missing from this example and simpliﬁcations have been made since we seek only to illustrate certain points.The interested reader is referred to the text by Summers for comprehensive details of the payment system[46].In addition,the faults and continued service requirements in this example are entirely hypothetical and designed for illustration only, but characteristic of strategies that might be employed for error recovery. 3.1. System architecture and application functionalityThe information system that implements a major part of the payment system is very roughly a hierarchic,tree-like network as illustrated in Figure1.At the top level is a central facility operated by the Federal Reserve.At the second level are the twelve regional Federal Reserve banks.At the third level of the tree are the approximately9,500commercial banks that are members of the Federal Reserve.Finally,at the lowest level of hierarchy are the remaining 16,500 or so banks and their branch banks[20].Processing a retail payment(an individual check)in this system proceeds roughly as fol-lows.At the lowest level,nodes simply accept checks,create an electronic description of the relevant information,and forward the details to the next level in the hierarchy.At the next level,payments to different banks are collected together in a batch and the details forwarded to the Federal Reserve system.Periodically(typically once a day)the Federal Reserve moves funds between accounts that it maintains for commercial banks and then funds are disburseddown through the system to individual user rge commercial payments originate electronically and are handled individually as they are presented.Extensive amounts of data are maintained throughout this er account informa-tion is maintained at central facilities by retail banks,and this information provides all the expected user services together with check authentication(as occurs when a check is scanned at a retail outlet).The Federal Reserve maintains accounts for all its member banks together with detailed logs and status information about payment activity.But,of course,this is just a small part of the data maintained by the banking andﬁnancial system.Vast amounts of data are also needed for all the otherﬁnancial services,and the databases are used in combination and in different ways by different services.3.2. Survivability requirementsA complete survivability speciﬁcation must document precisely all of the faults that thesystem is required to handle and,for each,document the prescribed system response.Hypo-thetical examples of the possible faults and their high-level responses for our simpliﬁed ver-sion of the payments system are shown in Table1.We include in the table faults ranging from the loss of a single leaf node to the loss of the a critical node and its backup facilities.For purposes of illustration,we examine one particular fault in more detail to see what error recovery actions are needed.The fault we use for illustration is the loss of the top-level node of theﬁnancial payment system—the Federal Reserve system’s main data center and its backup ing our highly simpliﬁed architecture of the payment system in this example,we assume that this node consists of a single processing entity with a single backup that maintains mirror image databases.The actual Federal Reserve system uses a much more sophisticated backup system. The survivability requirements for this fault are the following:•Fault:Federal Reserve main processing center failure(common-mode software failure, propagation of corrupt data, terrorism).•On failure:Complete suspension of all payment services.Entireﬁnancial network informed(member banks,otherﬁnancial organizations,foreign banks,government agen-cies).Previously identiﬁed Federal Reserve regional bank designated as temporary replacement for Federal facilities.All services terminated at replacement facility,minimal payment service started at replacement facility(e.g.,payment service for federal agencies only). All communication redirected for major client nodes.•On repair:Payment system restarted by resuming applications in sequence and resuming service to member banks in sequence within an application.Minimal service on replace-ment facility terminated.For this particular fault,we assume that all processing ceases immediately.This is actually the most benign fault that the system could experience at the top-level node.More serious faults that could occur include undetected hardware failure in which data was lost,a software fault that corrupted primary and backup databases,and an operational failure in which pri-mary data was lost.State restoration in this case involves establishing a consistent state at all of the clients connected to the Federal Reserve system.This requires determining the transaction requests that have been sent to the Federal Reserve but not processed.Since this is a standard database.Table 1. Survivability requirements summaryissue,we do not consider it further.We note,however,that notiﬁcation or detection of failure by the clients is an essential element of error recovery.The provision of continued service is more complex in this fault scenario.Given the loss of the most critical node in the network with a tree topology,the network is now effectively par-titioned.One part of error recovery will involve re-establishing connectivity between the par-titioned subtrees,each of which has a Federal Reserve regional bank at its top node.There are a number of alternatives for re-establishing connectivity:•Promote one regional bank to be the new root node in the tree and have all other regional banks establish links to it. (This requires 11 new links to be established.)•Establish links between each pair of regional banks,resulting in a fully-interconnected12-node network. (This requires 66 new links to be established.)•Establish links to provide some other topology;for example,connect the12regional banks with a ring topology. (This requires 12 new links to be established.)In practice,it is unlikely that any of the above three alternatives would be used.A combi-nation using different strategies in different locations is the most likely approach and it is very possible that even in this case parts of the network would remain unconnected.Once whatever connectivity that is possible is reestablished,a major reorganization through the network would be required.First,the new root of the tree will have to suspend most but probably all of its normal service activities.Second this node will have to prepare the copies of the databases that are needed for payment processing and that it would have to have maintained during normal processing if it were to function as an emergency backup. Third,the entire set of clients will have to be informed of the change and of the level of ser-vice that will be provided and when this will occur.These clients will have to take their own actions including eliminating many services,reducing others,and perhaps starting certain emergency services.Fourth,the new root node will have to initiate payment applications up to the limit of the processing and communications capacity that was available.The available capacity will be greatly reduced and the services that the system could provide once opera-tional would be far less that would normally be the case.Which customers get what service would have to be have been deﬁned ahead of time as part of the survivability speciﬁcation.Much more complex but probably more useful recovery scenarios are possible for critical services if a restricted form of service is acceptable.For example,one of the services of the Federal Reserve is maintenance of member bank accounts.Maintenance of member bank accounts could be taken over by the temporary replacement node but,because of reduced facilities,services would be severely reduced.This function could be distributed following a catastrophic failure, however. We consider two possible solution strategies:•Each member bank maintains only its own account details,and only sends a batch message to be processed if it has the funds to cover the deposit.(This approach distributes responsibility for maintaining positive balances.)•Each regional bank maintains an account balance for every other regional bank with which it exchanges batch messages.(This approach requires more resources,but allows more value to be transferred throughout the system.)The details of the fault scenario we have described in this example are possible from the computer science perspective as are many others.What the banking community requires in practice depends upon the many details and priorities that exist within that domain and will probably be far more elaborate that our example.However,our example does illustrate manyof the issues that have to be considered in application error recovery and shows how complex this process is.An important aspect of survivability that is omitted form this example is the need to cope with multiple sequential faults.It will be the case in many circumstances that a situation gets worse over time.For example,a terrorist attack on the physical equipment of a critical infor-mation system might proceed in a series of stages.The attack might be detected initially dur-ing an early stage(although it is highly unlikely that the detection process would be able to diagnose a cause in this case)and the system would then take appropriate action.Subsequent failures of physical equipment would have to be dealt with by a system that had already been reconﬁgured to deal with the initial attack.This complicates the provision of error recovery immensely.4. Survivability and fault tolerance4.1. The role of fault toleranceIn general,survivability is a requirement or a set of related requirements that a system must meet.As illustrated by the example above,the requirements can be quite complex and will often involve many different aspects of the application.For different faults that a system might experience,the requirements that have to be met might be quite different from each other and require entirely different actions by the application.In particular,for critical infra-structure applications,it will often be the case that the effects of a fault leave the system with greatly reduced resources(processing services,communications capacity,etc.)and substan-tial changes in the service provided to the user will be necessary.Fault tolerance is one of the mechanisms by which dependability(and thus survivability) can be obtained.It is not the only mechanism of course since fault elimination and fault removal are often alternate(and complementary)possibilities.In the case of critical infra-structure applications,however,the manifestation of faults during operation is inevitable.Such systems cannot be protected against all environmental damage,terrorist acts,opera-tional mistakes,software defects,and so on.Thus,adding the ability to tolerate certain types of fault is the only practical approach to achieving survivability.Tolerating a fault does not necessarily mean masking its effects,however.The essential meaning of survivability is the ability of a system to deal with more“serious”faults by pro-viding a prescribed service that is not the same as the normal service.The catastrophic faults to which a system of interest must respond are those that are not masked,i.e.by design there is insufﬁcient redundancy for the system to be able to handle the faults transparently.If it were intended that the system mask such faults,as will be the case for many faults,then the system would do so.Catastrophic faults that are not masked in any given system are not nec-essarily unanticipated.Rather,a conscious decision is made in favor of an approach to error recovery other than masking because the cost of handling faults transparently is redundancy.And redundancy is expensive.Some redundancy is necessary even if a fault is not to be masked,and,in addition,redundancy is necessary for the detection of errors.However,repli-cating all the elements of a critical information system so that all faults of interest can be masked is prohibitively expensive.Since we are concerned with faults whose effects will not be masked,fault tolerance in general requires actions by the application.The particular actions required in any given sys-tem are application-speciﬁc but they will require such functionality as stopping some ser-vices,starting others,and modifying yet others.In order to make such changes,the application must be prepared to make the changes and so must be designed with this in mind.For purposes of analysis,we view an application executing on a distributed system as a concurrent program with one or more processes running on each node and with processes communicating via a protocol that operates over network links.Although there is no shared memory in the conventional sense,it is very likely that there are sharedﬁles.This view is use-ful because it permits existing work on fault tolerance in concurrent systems to act as a start-ing point from which to develop application error recovery mechanisms in the sense that we desire.4.2. Requirements for a solutionThe general requirements that have to be met by any realistic approach to application error recovery derive from the characteristics of the applications and the need to tolerate serious faults that cannot be masked.More precisely,we identify the following solution require-ments:•Very large networks running sophisticated applications must be supported.The scale of critical infrastructure applications necessitates a solution approach that scales to networks with thousands of nodes.Similarly,the size of the application software presents signiﬁcant performance challenges that must be met.•Resuming normal processing following the repair of fault must be supported.The prob-lem of dealing with the effects of a fault is really only half of the problem.The systems of interest typically have very high availability requirements and will be repaired while in operation.Thus,resuming prescribed levels of service after repair must be part of any via-ble solution.•There must be minimal application re-write required.For many reasons—economic,polit-ical,technical,and otherwise—it is nearly impossible to re-write the software within criti-cal information systems from scratch.Critical information systems will most likely have to evolve to provide improved survivability;however,the extensive use of COTS and legacy software currently in these systems complicates revision of the software.Revision to the applications must be kept to a minimum,but the application will have to make provision for support of new error recovery services.•Error recovery must be performed securely.Security attacks against these systems are a major threat.It must be the case that any new error recovery services that are provided do not introduce an additional security vulnerability that can be exploited to perpetrate further damage to the system.•Link failure and partitioning are errors that must be handled.The stylized connection structure of most critical infrastructure applications dictates that,by default,every node will not be able to communicate with every other node.This introduces the possibility thata link failure will partition the network.Because the service provided is composed of func-tionality provided by multiple application nodes,error recovery must include strategies for circumventing link failure and network partitions.•There are highly structured requirements for continued service.The requirements for con-tinued service will not be homogenous—in fact they will be far from it.Different nodes and different node classes will be required to implement different services.Similarly,。