Inject Error

Injects an error or clears a previously injected error on one or more persistent memory modules for testing purposes.

Note: Error Injection is disabled by default in the BIOS, and may not be available on all platforms. Consult your server documentation or vendor before proceeding.

ipmctl set [OPTIONS] -dimm (DimmIDs) [PROPERTIES]

Targets

  • -dimm (DimmIDs): Injects or clears an error on a specified module by supplying the DIMM target and one or more comma-separated DimmIDs. The default is to inject the error on all manageable modules.

Properties

This command supports setting or clearing one type of error at a time

  • Clear=1: Clears a previously injected error. This property must be combined with one of the other properties indicating the previously injected error to clear.

  • Temperature: Injects an artificial media temperature in degrees Celsius into the module. The firmware that is monitoring the temperature of the module will then be alerted and take necessary precautions to preserve the module. The value is injected immediately and will override the firmware from reading the actual media temperature of the device, directing it to use this value instead. This may cause adverse reactions by the firmware and result in an alert or log.

    Note: The injected temperature value will remain until the next reboot or it is cleared. The media temperature is an artificial temperature and will not cause harm to the part, although firmware actions due to improper temperature injections may cause adverse effects on the module. If the Critical Shutdown Temperature or higher is passed in, this may cause the module firmware to perform a shutdown in order to preserve the part and data. The temperature value will be ignored on clear.

  • Poison: The physical address to poison.

    Note: The address must be 256 byte aligned (e.g., 0x10000000, 0x10000100, 0x10000200...).

    Poison is not possible for any address in the PM region if the PM region is locked. Injected poison errors are only triggered on a subsequent read of the poisoned address, in which case an error log will be generated by the firmware. No alerts will be sent.

    This command can be used to clear non-injected poison errors. The data will be zero’d after clearing. There is no requirement to enable error injection prior to request to clear poison errors.

    The caller is responsible for keeping a list of injected poison errors in order to properly clear the injected errors afterwards. Simply disabling injection does not clear injected poison errors. Injected poison errors are persistent across power cycles and system resets.

  • PoisonType: The type of memory to poison. One of:

    • PatrolScrub: Injects a poison error at the specified address simulating an error found during a patrol scrub operation, which is indifferent to how the memory is currently allocated. This is the default.

    • MemoryMode: Injects a poison error at the specified address currently allocated in Memory Mode.

    • AppDirect: Injects a poison error at the specified address currently allocated as App Direct.

      Note: If the address to poison is not currently allocated as the specified memory type, an error is returned.

  • PackageSparing=1: Triggers an artificial package sparing. If package sparing is enabled and the module still has spares remaining, this will cause the firmware to report that there are no spares remaining.

  • PercentageRemaining: Injects an artificial module life remaining percentage into the persistent memory module. This will cause the firmware to take appropriate action based on the value and if necessary generate an error log and an alert and update the health status.

  • FatalMediaError=1: Injects a fake media fatal error which will cause the firmware to generate an error log and an alert.

    NOTE: When a fatal media error is injected, the BSR Media Disabled status bit will be set, indicating a media error. Use the disable trigger input parameter to clear the injected fatal error.

    NOTE: Injecting a fatal media error is unsupported on Windows. Please contact Microsoft for assistance in performing this action.

  • DirtyShutdown=1: Injects an ADR failure, which will result in a dirty shutdown upon reboot.

Examples

Set the media temperature on all manageable modules to 100 degrees Celsius.

$ sudo ipmctl set -dimm Temperature=100

Clear the injected media temperature on all manageable modules

$ sudo ipmctl set -dimm Temperature=1 Clear=1

Poison address 0x10000100 on module 0x0001

$ sudo ipmctl set -dimm 0x0001 Poison=0x10000200

Clear the injected poison of address 0x10000200 on module 0x0001.

$ sudo ipmctl set -dimm 0x0001 Poison=0x10000200 Clear=1

Trigger an artificial package sparing on all manageable modules.

$ sudo ipmctl set -dimm PackageSparing=1

Trigger an artificial package sparing on module 0x0001.

$ sudo ipmctl set -dimm 0x0001 PackageSparing=1

Set the life remaining percentage on all manageable modules to 10%.

$ sudo ipmctl set -dimm PercentageRemaining=10

Set the life remaining percentage module 0x0001 to 10%.

$ sudo ipmctl set -dimm 0x0001 PercentageRemaining=10

Clear the injected remaining life percentage on all manageable modules. The value of PercentageRemaining is irrelevant.

$ sudo ipmctl set -dimm PercentageRemaining=10 Clear=1

Trigger an artificial Asynchronous DRAM Refresh (ADR) failure on all manageable modules, which will result in a dirty shutdown on each module on the next reboot.

$ sudo ipmctl set -dimm DirtyShutdown=1

Trigger an artificial Asynchronous DRAM Refresh (ADR) failure on module 0x0001, which will result in a dirty shutdown on each module on the next reboot.

$ sudo ipmctl set -dimm 0x0001 DirtyShutdown=1

Simulate a Fatal Media Error on PMem module 0x2001.

WARNING: Injecting a Fatal Media Error may cause the host to crash. Linux hosts may enter emergency mode if there are PMem mount points in /etc/fstab. Additional recovery activities may be required.

Clearing the fault will not cause any data loss.

$ sudo ipmctl set -dimm 0x2001 FatalMediaError=1

Clear the Fatal Media Error on PMem module 0x20001

$ sudo ipmctl set -dimm 0x2001 FatalMediaError=1 Clear=1 

Last updated