IDK Users_Guide
IDK User’s Guide
Revision History
Overview
The Integration Development Kit, is a toolkit that enables the creation and execution platform test assertions, as well as providing the necessary utility functions to exercise the supported testin features. This BETA release provides the functionality to execute RAS test assertions, and inject errors
The idk_core resides on the System Under Test (SUT). The IDK client connects to the core via a TCP/IP connection. The IDK client requires python 2.7 and a network connection to the SUT to execute. The IDK client contains descriptions for a specific SUT platform type (Skylake for example). Features
?Remote scripting support.
?Platform Acceptance Tests
o Tests that run on the SUT prior to connection to the Controller to verify that the SUT hardware and OS is supported
?RAS assertion tests:
o Set of ACPI and Corrupted Data Correction test assertions
?EINJ Error injection
?Detailed failure log
Requirements
System Under Test:
?Haswell, Broadwell, or Purley platform
?One of the supported operating systems:
o Windows 2008R2 – 2012R2
o RHEL 6.5 – 7.2
o SLES 11.2 – 12.1
o Ubuntu 14.04LTS 15.10
o Solaris 11
o Vmware ESXi 5, 6
o FreeBSD 10.1-10.2
?Valid TCP/IP network and Port 1861 opened on the firewall Controller:
?Windows 7 or newer
?Python 2.7.7
Installation:
System Under Test:
1.If Windows, Install the IDK_Core.msi installer file.
2.If Other OS’s unzip the correct .tgz file and run ./install.sh as root
3.Configure Firewall to open port 1861 for incoming connections Controller:
1.Install Python
2.7
2.Configure Firewall to open port 1861 for outgoing connections
3.Copy the IDK_Client to a folder to the controller laptop
Execution:
System Under Test:
1.Change into the installation directory (default is c:\IDK_Core)
https://www.360docs.net/doc/3212748700.html,unch idk_core.exe with the /listen command option:
Description of command line options for System Under Test:
Notes:
The idk_core normally runs a series of performance acceptance tests (PAT) at startup. These tests check for the correct OS version, and access to the correct hardware resources and device drivers. If any of the PAT tests fail, idk_core will stop running. The /no_pat and /pat_fail_ignore options allow additional control and debug capability for this mode of operation.
Client:
Execution is controlled by the IDK client on the controller (Host machine, such as a network
connect laptop). The IDK client provides descriptions for a target. To get access to the haswell
platform, we must import the haswell package, and make a connection to it.
Once this is done, you can access the functions defined in the API for the platform. For example, run all the tests.
There are the following sample scripts provided:
?sample_core.py
o python sample_core.py
verifies the low level access to the SUT hardware registers
?sample_RAS_assertions.py
o python sample_test_assertions.py
Executes all the defined test assertions for the platform. Results are in a timestamped
.csv file
Platform can be haswell, broadwell, or skylake
?sample_xlate_all.py
o python sample_xlate_all.py
Reverse translate each populated dimm rank.
Platform can be haswell or broadwell
?purley_xlate_all.py
o python purley_xlate_all.py
Reverse translate each populated dimm rank.
Platform can be skylake
?sample_inject_all.py
o python sample_inject_all.py
Inject an error into each dimm rank via the einj interface
Platform can be broadwell or haswell
?purley_inject_all.py
o python sample_inject_all.py
Inject an error into each dimm rank via the einj interface
Platform can be skylake
IDK Library Interface
Hardware Interface:
Each “Platform” interface contains a “Hardware” interface. The hw interface is the low level hardware IO that interfaces directory to the idk_core on the SUT.
The followi ng functions are available in the “hardware” module. Please see the sample file “sample_core.py”
Example: Use the hardware interface to read msr 0x179 on cpu socket 0
Each platform interface contains a list of relevant ACPI tables, such as EINJ. You can get a handle to the platform einj table from the “acpi_tables” dictionary in the platform. The following functions are available for EINJ:
Example: Get a handle to the EINJ table, and use it to inject a memory error
Usage Note:
When injecting memory fatal errors into a low memory region, there is a chance that the fatal error will be caught by pre-fetch. This can cause an unexpected system hang (instead of machine-check). Please select a memory address higher than 0x1000 when injecting memory fatal errors to achieve the expected result.
The “Platform” contains a list of named registers in the member “register”. This is a dictionary that is indexed by the register name. These register names are defined in the platform.xml file. Each register has the following available functions.
Example: Get the “CPUBUSNO” register, refresh it against CPU socket 0, then display the value of the CPUBUSNO_1 field.
Logging Interface
Each platform will contain a logging interface. When the platform object(haswell, broadwell, skylake) connects to the SUT, it creates a logfile. If no logfile name is supplied it will create a time stamped log filename. ie. 2015-12-15_122007_IDK_skylake.log
The following types of log messages are supported:
DEBUG5: Lowest level of debug info.
DEBUG4: Low level information. Ie. Hardware access info with register address info
DEBUG3: Medium level info. Ie. Register level access with registers noted by name
DEBUG2: High level debug info ie. Platform level initialization messages.
DEBUG1: Highest level of debug info ie. Script level debug messages
INFO: General info messages intended for the user to see
WARN: non fatal error in the script occurred, Notification of deprecated features
ERROR: Fatal error occurred in the, unable to complete the requested action, ie, register is unaccessable, etc.
CRITICAL: Fatal error occurred, and the SUT is no longer accessable.
When the platform connected, INFO and higher messages will be displayed to the console. DEBUG1 and higher will be written to the log file. These messages can be written from user script files with the following functions:
platform.debug(level, msg ):
Log a message at debug levels 1 - 5
https://www.360docs.net/doc/3212748700.html,(msg):
Log a message at the info level
platform.warning(msg):
Log a message at the warning level
platform.error(msg):
Log a message at the error level
platform.critical(msg):
Log a message at the error level
The logging levels can be accessed with the following functions:
platform.setLoggingLevel(level):
Sets the both the file and console to log at the specified level platform.setFileLoggingLevel(level):
Sets the file to log at the specified level
platform.setConsoleLoggingLevel(level):
Sets the console to log at specified level
platform.getLoggingLevel(self):
get a dictionary with both file and console logging levels
platform.getFileLoggingLevel(self):
get the file logging level
platform.getConsoleLoggingLevel(self):
get the console logging level
Broadwell/Haswell Platform
Platform Information Functions:
GetErrorStatus(self)
Get a dictionary of error, overflow, indexed by dimm location ShowDimmPop(self)
Print the dimm population in human readable format
ShowErrors(self)
Display the contents of the platform correrrcnt registers
mem_populated_ch_ranks(self)
Get a set containing the channel ranks that are populated
Each set element is a tuple in the form: (cpu, imc, chan, rank)
mem_populated_chan(self)
Get a set containing just the memory channels that have dimms installed Each set element is a tuple in the form: (cpu, imc, chan)
mem_populated_dimms(self)
Get a set containing the dimm slots that have dimms installed
Each set element is a tuple in the form: (cpu, imc, chan, dimm)
mem_populated_mc(self)
Get a set containing just the memory controllers that have dimms instal led
Each set element is a tuple in the form: (cpu, imc)
mem_populated_ranks(self)
Get a set containing the dimm ranks that are populated
Each set element is a tuple in the form: (cpu, imc, chan, dimm, rank) mem_populated_sockets(self)
Get a set containing just the CPU sockets that have dimms installed
Test Assertion Functions:
RunTest(self, test_name, verbose=1, dependency='warn')
Run Tests
Args:
testlist: List of testcases, SubTests, or TestGroups to run
NOTE: currently only implemented is keyword "ALL"
verbose: if 1, display each subtest message as it occurs
else, don't print messages to the console
dependency: ignore: don't check dependencies at all, just run
warn: Warn of a dependency failing, but run anyways run: If the dependency hasn't been run, run it
before running the test
NOTE: Dependency check is not currently implemented RunTests(self, *testlist)
Run Tests
Args:
testlist: List of testcases, SubTests to run
Error Injection Functions:
Ondie memory injection call graph:
mem_arm_ondie(self, PA=None, retries=3, target_channel='primary', persistent=False) Arm the error injection register for the specified target physical
address.
PA = Physical address as returned by AT_reverse_ranslate
retries = number of retries
target_channel = the channel in the PA to arm,”primary” or “secondary”
persistent = inject continuously if True, single shot if False
Returns: 1 if successful, 0 if not
mem_inject_ondie(self, SA=None, PA=None, error_type='ECC_1', retries=3,
target_channel='primary')
Inject a memory error at the specified address.
PA = Physical address as returned by AT_reverse_ranslate
SA = 64 bit system address. Provide PA or SA, but not both
error_type = “ECC_1” or “ECC_2” for single or double bit errors
retries = number of retries
target_channel = inject to ”primary” or “secondary” memory channel mem_plant_ondie(self, PA=None, error_type='ECC_1', retries=3, target_channel='primary') Plant a memory error at the specified address, but do not consume it.
PA = Physical address as returned by AT_reverse_ranslate
error_type = “ECC_1” or “ECC_2” for single or double bit errors
retries = number of retries
target_channel = inject to ”primary” or “secondary” memory channel mem_setup_ondie(self, PA=None, error_type='ECC_1', target_channel='primary')
Setup the injection registers for an injection, but do not plant or arm PA = Physical address as returned by AT_reverse_ranslate
error_type = “ECC_1” or “ECC_2” for single or double bit errors
target_channel = inject to ”primary” or “secondary” memory channel
pcie_inject_ondie(self, type, socket, device, function)
Inject a pcie error of the specified type to the specificed address ShowPatrolScrub(self)
Show the current location of the patrol scrub cursor
pscrub_enable(self, enable=None)
Enable or disable the patrol srcubber
IN: enable. True or False to enable or disable
pscrub_set_address(self, addr_info)
Set the patrol scrubber for the targeted PA to the SA
Note: The patrol scrubber register appears to be unwritable for HSW-
Address Translation Functions:
AT_Init(self, verbose=False)
Load all of the address translations ranges
AT_forward_translate(self, address)
Forward address translation
IN: 64 bit System Address
OUT:
addr_info dictionary with socket, imc, channel, dimm, and rank in formation
AT_reverse_translate(self, addr_info)
Reverse address translation. Perform reverse RIR, TAD, and SAD on a PA to determine the System Address
IN: addr_info A physical address dictionary. must contain ["socket"
] ["imc"] ["channel"] ["dimm"] and ["rank"] fields
OUT: A physical address dictionary, will have “address” field
contatining the resulting system address, and additional addressing
info
* Please note. These functions no longer use the “outfile” parameter as of version 1.85, since the output is also written to the logfile.
AT_show_rir_ranges(self)
Display the loaded RIR ranges
AT_show_sad_ranges(self)
Display the loaded SAD ranges
AT_show_tad_ranges(self)
Display the loaded TAD ranges
.
make_PA(self, skt=0, imc=0, ch=0, dimm=0, rank=0)
Make a sparse PA from the supplied tuple This can be passed to AT_reve rse_translate to fill out the rest of the PA fields
EX.
Skylake / Cannonlake Platforms:
As above API with these exceptions:
Error Injection Functions:
Ondie memory injection call graph:
ond.mem_arm_ondie(self, PA=None, retries=3, target_channel='primary', persistent=False) Arm the error injection register for the specified target physical
address.
PA = Physical address as returned by AT_reverse_ranslate
retries = number of retries
target_channel = the channel in the PA to arm,”primary” or “secondary”
persistent = inject continuously if True, single shot if False
Returns: 1 if successful, 0 if not
ond.mem_inject_ondie(self, SA=None, PA=None, error_type='ECC_1', retries=3,
target_channel='primary')
Inject a memory error at the specified address.
PA = Physical address as returned by AT_reverse_ranslate
SA = 64 bit system address. Provide PA or SA, but not both
error_type = “ECC_1” or “ECC_2” for single or double bit errors
retries = number of retries
target_channel = inject to ”primary” or “secondary” memory channel ond.mem_plant_ondie(self, PA=None, error_type='ECC_1', retries=3,
target_channel='primary')
Plant a memory error at the specified address, but do not consume it.
PA = Physical address as returned by AT_reverse_ranslate
error_type = “ECC_1” or “ECC_2” for single or double bit errors
retries = number of retries
target_channel = inject to ”primary” or “secondary” memory channel ond.mem_setup_ondie(self, PA=None, error_type='ECC_1', target_channel='primary') Setup the injection registers for an injection, but do not plant or arm PA = Physical address as returned by AT_reverse_ranslate
error_type = “ECC_1” or “ECC_2” for single or double bit errors
target_channel = inject to ”primary” or “secondary” memory channel ond.mem_setup_ondie_fields(self, PA=None, target_channel = "primary",
addr_mask_hi = 0x00, addr_mask_lo = 0x00, dev0=0x00, dev1=0x00, dev0_c
unk=0x00, dev1_chunk=0x00, dev0_xor=0x00, dev1_xor = 0x00 )
Setup the ondie injection fields directly, allowing fine control over
the device selection, and xor bits.
Address Translation Functions:
at.Init(self, verbose=False)
Load all of the address translations ranges
at.forward_translate(self, address)
Forward address translation
IN: 64 bit System Address
OUT:
addr_info dictionary with socket, imc, channel, dimm, and rank in formation
at.reverse_translate(self, addr_info)
Reverse address translation. Perform reverse RIR, TAD, and SAD on a PA to determine the System Address
IN: addr_info A physical address dictionary. must contain ["socket"
] ["imc"] ["channel"] ["dimm"] and ["rank"] fields
at.show_rir_ranges(self)
Display the loaded RIR ranges
at.show_sad_ranges(self)
Display the loaded SAD ranges
at.show_tad_ranges(self)
Display the loaded TAD ranges