IDK Users_Guide

IDK Users_Guide
IDK Users_Guide

IDK User’s Guide

Revision History

Overview

The Integration Development Kit, is a toolkit that enables the creation and execution platform test assertions, as well as providing the necessary utility functions to exercise the supported testin features. This BETA release provides the functionality to execute RAS test assertions, and inject errors

The idk_core resides on the System Under Test (SUT). The IDK client connects to the core via a TCP/IP connection. The IDK client requires python 2.7 and a network connection to the SUT to execute. The IDK client contains descriptions for a specific SUT platform type (Skylake for example). Features

?Remote scripting support.

?Platform Acceptance Tests

o Tests that run on the SUT prior to connection to the Controller to verify that the SUT hardware and OS is supported

?RAS assertion tests:

o Set of ACPI and Corrupted Data Correction test assertions

?EINJ Error injection

?Detailed failure log

Requirements

System Under Test:

?Haswell, Broadwell, or Purley platform

?One of the supported operating systems:

o Windows 2008R2 – 2012R2

o RHEL 6.5 – 7.2

o SLES 11.2 – 12.1

o Ubuntu 14.04LTS 15.10

o Solaris 11

o Vmware ESXi 5, 6

o FreeBSD 10.1-10.2

?Valid TCP/IP network and Port 1861 opened on the firewall Controller:

?Windows 7 or newer

?Python 2.7.7

Installation:

System Under Test:

1.If Windows, Install the IDK_Core.msi installer file.

2.If Other OS’s unzip the correct .tgz file and run ./install.sh as root

3.Configure Firewall to open port 1861 for incoming connections Controller:

1.Install Python

2.7

2.Configure Firewall to open port 1861 for outgoing connections

3.Copy the IDK_Client to a folder to the controller laptop

Execution:

System Under Test:

1.Change into the installation directory (default is c:\IDK_Core)

https://www.360docs.net/doc/3212748700.html,unch idk_core.exe with the /listen command option:

Description of command line options for System Under Test:

Notes:

The idk_core normally runs a series of performance acceptance tests (PAT) at startup. These tests check for the correct OS version, and access to the correct hardware resources and device drivers. If any of the PAT tests fail, idk_core will stop running. The /no_pat and /pat_fail_ignore options allow additional control and debug capability for this mode of operation.

Client:

Execution is controlled by the IDK client on the controller (Host machine, such as a network

connect laptop). The IDK client provides descriptions for a target. To get access to the haswell

platform, we must import the haswell package, and make a connection to it.

Once this is done, you can access the functions defined in the API for the platform. For example, run all the tests.

There are the following sample scripts provided:

?sample_core.py

o python sample_core.py

verifies the low level access to the SUT hardware registers

?sample_RAS_assertions.py

o python sample_test_assertions.py

Executes all the defined test assertions for the platform. Results are in a timestamped

.csv file

Platform can be haswell, broadwell, or skylake

?sample_xlate_all.py

o python sample_xlate_all.py

Reverse translate each populated dimm rank.

Platform can be haswell or broadwell

?purley_xlate_all.py

o python purley_xlate_all.py

Reverse translate each populated dimm rank.

Platform can be skylake

?sample_inject_all.py

o python sample_inject_all.py

Inject an error into each dimm rank via the einj interface

Platform can be broadwell or haswell

?purley_inject_all.py

o python sample_inject_all.py

Inject an error into each dimm rank via the einj interface

Platform can be skylake

IDK Library Interface

Hardware Interface:

Each “Platform” interface contains a “Hardware” interface. The hw interface is the low level hardware IO that interfaces directory to the idk_core on the SUT.

The followi ng functions are available in the “hardware” module. Please see the sample file “sample_core.py”

Example: Use the hardware interface to read msr 0x179 on cpu socket 0

Each platform interface contains a list of relevant ACPI tables, such as EINJ. You can get a handle to the platform einj table from the “acpi_tables” dictionary in the platform. The following functions are available for EINJ:

Example: Get a handle to the EINJ table, and use it to inject a memory error

Usage Note:

When injecting memory fatal errors into a low memory region, there is a chance that the fatal error will be caught by pre-fetch. This can cause an unexpected system hang (instead of machine-check). Please select a memory address higher than 0x1000 when injecting memory fatal errors to achieve the expected result.

The “Platform” contains a list of named registers in the member “register”. This is a dictionary that is indexed by the register name. These register names are defined in the platform.xml file. Each register has the following available functions.

Example: Get the “CPUBUSNO” register, refresh it against CPU socket 0, then display the value of the CPUBUSNO_1 field.

Logging Interface

Each platform will contain a logging interface. When the platform object(haswell, broadwell, skylake) connects to the SUT, it creates a logfile. If no logfile name is supplied it will create a time stamped log filename. ie. 2015-12-15_122007_IDK_skylake.log

The following types of log messages are supported:

DEBUG5: Lowest level of debug info.

DEBUG4: Low level information. Ie. Hardware access info with register address info

DEBUG3: Medium level info. Ie. Register level access with registers noted by name

DEBUG2: High level debug info ie. Platform level initialization messages.

DEBUG1: Highest level of debug info ie. Script level debug messages

INFO: General info messages intended for the user to see

WARN: non fatal error in the script occurred, Notification of deprecated features

ERROR: Fatal error occurred in the, unable to complete the requested action, ie, register is unaccessable, etc.

CRITICAL: Fatal error occurred, and the SUT is no longer accessable.

When the platform connected, INFO and higher messages will be displayed to the console. DEBUG1 and higher will be written to the log file. These messages can be written from user script files with the following functions:

platform.debug(level, msg ):

Log a message at debug levels 1 - 5

https://www.360docs.net/doc/3212748700.html,(msg):

Log a message at the info level

platform.warning(msg):

Log a message at the warning level

platform.error(msg):

Log a message at the error level

platform.critical(msg):

Log a message at the error level

The logging levels can be accessed with the following functions:

platform.setLoggingLevel(level):

Sets the both the file and console to log at the specified level platform.setFileLoggingLevel(level):

Sets the file to log at the specified level

platform.setConsoleLoggingLevel(level):

Sets the console to log at specified level

platform.getLoggingLevel(self):

get a dictionary with both file and console logging levels

platform.getFileLoggingLevel(self):

get the file logging level

platform.getConsoleLoggingLevel(self):

get the console logging level

Broadwell/Haswell Platform

Platform Information Functions:

GetErrorStatus(self)

Get a dictionary of error, overflow, indexed by dimm location ShowDimmPop(self)

Print the dimm population in human readable format

ShowErrors(self)

Display the contents of the platform correrrcnt registers

mem_populated_ch_ranks(self)

Get a set containing the channel ranks that are populated

Each set element is a tuple in the form: (cpu, imc, chan, rank)

mem_populated_chan(self)

Get a set containing just the memory channels that have dimms installed Each set element is a tuple in the form: (cpu, imc, chan)

mem_populated_dimms(self)

Get a set containing the dimm slots that have dimms installed

Each set element is a tuple in the form: (cpu, imc, chan, dimm)

mem_populated_mc(self)

Get a set containing just the memory controllers that have dimms instal led

Each set element is a tuple in the form: (cpu, imc)

mem_populated_ranks(self)

Get a set containing the dimm ranks that are populated

Each set element is a tuple in the form: (cpu, imc, chan, dimm, rank) mem_populated_sockets(self)

Get a set containing just the CPU sockets that have dimms installed

Test Assertion Functions:

RunTest(self, test_name, verbose=1, dependency='warn')

Run Tests

Args:

testlist: List of testcases, SubTests, or TestGroups to run

NOTE: currently only implemented is keyword "ALL"

verbose: if 1, display each subtest message as it occurs

else, don't print messages to the console

dependency: ignore: don't check dependencies at all, just run

warn: Warn of a dependency failing, but run anyways run: If the dependency hasn't been run, run it

before running the test

NOTE: Dependency check is not currently implemented RunTests(self, *testlist)

Run Tests

Args:

testlist: List of testcases, SubTests to run

Error Injection Functions:

Ondie memory injection call graph:

mem_arm_ondie(self, PA=None, retries=3, target_channel='primary', persistent=False) Arm the error injection register for the specified target physical

address.

PA = Physical address as returned by AT_reverse_ranslate

retries = number of retries

target_channel = the channel in the PA to arm,”primary” or “secondary”

persistent = inject continuously if True, single shot if False

Returns: 1 if successful, 0 if not

mem_inject_ondie(self, SA=None, PA=None, error_type='ECC_1', retries=3,

target_channel='primary')

Inject a memory error at the specified address.

PA = Physical address as returned by AT_reverse_ranslate

SA = 64 bit system address. Provide PA or SA, but not both

error_type = “ECC_1” or “ECC_2” for single or double bit errors

retries = number of retries

target_channel = inject to ”primary” or “secondary” memory channel mem_plant_ondie(self, PA=None, error_type='ECC_1', retries=3, target_channel='primary') Plant a memory error at the specified address, but do not consume it.

PA = Physical address as returned by AT_reverse_ranslate

error_type = “ECC_1” or “ECC_2” for single or double bit errors

retries = number of retries

target_channel = inject to ”primary” or “secondary” memory channel mem_setup_ondie(self, PA=None, error_type='ECC_1', target_channel='primary')

Setup the injection registers for an injection, but do not plant or arm PA = Physical address as returned by AT_reverse_ranslate

error_type = “ECC_1” or “ECC_2” for single or double bit errors

target_channel = inject to ”primary” or “secondary” memory channel

pcie_inject_ondie(self, type, socket, device, function)

Inject a pcie error of the specified type to the specificed address ShowPatrolScrub(self)

Show the current location of the patrol scrub cursor

pscrub_enable(self, enable=None)

Enable or disable the patrol srcubber

IN: enable. True or False to enable or disable

pscrub_set_address(self, addr_info)

Set the patrol scrubber for the targeted PA to the SA

Note: The patrol scrubber register appears to be unwritable for HSW-

Address Translation Functions:

AT_Init(self, verbose=False)

Load all of the address translations ranges

AT_forward_translate(self, address)

Forward address translation

IN: 64 bit System Address

OUT:

addr_info dictionary with socket, imc, channel, dimm, and rank in formation

AT_reverse_translate(self, addr_info)

Reverse address translation. Perform reverse RIR, TAD, and SAD on a PA to determine the System Address

IN: addr_info A physical address dictionary. must contain ["socket"

] ["imc"] ["channel"] ["dimm"] and ["rank"] fields

OUT: A physical address dictionary, will have “address” field

contatining the resulting system address, and additional addressing

info

* Please note. These functions no longer use the “outfile” parameter as of version 1.85, since the output is also written to the logfile.

AT_show_rir_ranges(self)

Display the loaded RIR ranges

AT_show_sad_ranges(self)

Display the loaded SAD ranges

AT_show_tad_ranges(self)

Display the loaded TAD ranges

.

make_PA(self, skt=0, imc=0, ch=0, dimm=0, rank=0)

Make a sparse PA from the supplied tuple This can be passed to AT_reve rse_translate to fill out the rest of the PA fields

EX.

Skylake / Cannonlake Platforms:

As above API with these exceptions:

Error Injection Functions:

Ondie memory injection call graph:

ond.mem_arm_ondie(self, PA=None, retries=3, target_channel='primary', persistent=False) Arm the error injection register for the specified target physical

address.

PA = Physical address as returned by AT_reverse_ranslate

retries = number of retries

target_channel = the channel in the PA to arm,”primary” or “secondary”

persistent = inject continuously if True, single shot if False

Returns: 1 if successful, 0 if not

ond.mem_inject_ondie(self, SA=None, PA=None, error_type='ECC_1', retries=3,

target_channel='primary')

Inject a memory error at the specified address.

PA = Physical address as returned by AT_reverse_ranslate

SA = 64 bit system address. Provide PA or SA, but not both

error_type = “ECC_1” or “ECC_2” for single or double bit errors

retries = number of retries

target_channel = inject to ”primary” or “secondary” memory channel ond.mem_plant_ondie(self, PA=None, error_type='ECC_1', retries=3,

target_channel='primary')

Plant a memory error at the specified address, but do not consume it.

PA = Physical address as returned by AT_reverse_ranslate

error_type = “ECC_1” or “ECC_2” for single or double bit errors

retries = number of retries

target_channel = inject to ”primary” or “secondary” memory channel ond.mem_setup_ondie(self, PA=None, error_type='ECC_1', target_channel='primary') Setup the injection registers for an injection, but do not plant or arm PA = Physical address as returned by AT_reverse_ranslate

error_type = “ECC_1” or “ECC_2” for single or double bit errors

target_channel = inject to ”primary” or “secondary” memory channel ond.mem_setup_ondie_fields(self, PA=None, target_channel = "primary",

addr_mask_hi = 0x00, addr_mask_lo = 0x00, dev0=0x00, dev1=0x00, dev0_c

unk=0x00, dev1_chunk=0x00, dev0_xor=0x00, dev1_xor = 0x00 )

Setup the ondie injection fields directly, allowing fine control over

the device selection, and xor bits.

Address Translation Functions:

at.Init(self, verbose=False)

Load all of the address translations ranges

at.forward_translate(self, address)

Forward address translation

IN: 64 bit System Address

OUT:

addr_info dictionary with socket, imc, channel, dimm, and rank in formation

at.reverse_translate(self, addr_info)

Reverse address translation. Perform reverse RIR, TAD, and SAD on a PA to determine the System Address

IN: addr_info A physical address dictionary. must contain ["socket"

] ["imc"] ["channel"] ["dimm"] and ["rank"] fields

at.show_rir_ranges(self)

Display the loaded RIR ranges

at.show_sad_ranges(self)

Display the loaded SAD ranges

at.show_tad_ranges(self)

Display the loaded TAD ranges

相关主题
相关文档
最新文档