PMEM and BLK Modes
The NFIT (NVDIMM Firmware Interface Table) specification defined by the ACPI v6.0 standardizes not only the description of Persistent Memory (PMEM) and Block (BLK) modes, but also the platform message-passing entry points for control and configuration.
The Linux LIBNVDIMM subsystem provides support for three types of NVDIMMs; PMEM, BLK, and NVDIMM. PMEM (Persistent Memory) devices allow byte addressable access. BLK (block) devices allow sector atomicity like traditional storage devices. NVDIMM devices can simultaneously support both PMEM and BLK mode access. These three modes of operation are described by the "NVDIMM Firmware Interface Table" (NFIT) in ACPI v6.0 or later. While the LIBNVDIMM implementation is generic and supports pre-NFIT platforms, it was guided by the superset of capabilities need to support this ACPI v6.0 definition for NVDIMM resources. The bulk of the kernel implementation is in place to handle the case where the DIMM Physical Address (DPA) accessible via PMEM is aliased with DPA accessible via BLK. When that occurs a LABEL is needed to reserve DPA for exclusive access via one mode a time.
For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block device driver:
- 1.PMEM (nd_pmem.ko): Drives a system-physical-address range. This range is contiguous in system memory and may be interleaved (hardware memory controller striped) across multiple DIMMs. When interleaved the platform may optionally provide details of which DIMMs are participating in the interleave. Note that while LIBNVDIMM describes system-physical-address ranges that may alias with BLK access as ND_NAMESPACE_PMEM ranges and those without alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no distinction. The different device-types are an implementation detail that userspace can exploit to implement policies like "only interface with address ranges from certain DIMMs". It is worth noting that when aliasing is present and a DIMM lacks a label, then no block device can be created by default as userspace needs to do at least one allocation of DPA to the PMEM range. In contrast ND_NAMESPACE_IO ranges, once registered, can be immediately attached to nd_pmem.
- 2.BLK (nd_blk.ko): This driver performs I/O using a set of platform defined apertures. A set of apertures will access just one DIMM. Multiple windows (apertures) allow multiple concurrent accesses, much like tagged-command-queuing, and would likely be used by different threads or different CPUs. The NFIT specification defines a standard format for a BLK-aperture, but the spec also allows for vendor specific layouts, and non-NFIT BLK implementations may have other designs for BLK I/O. For this reason, "nd_blk" calls back into platform-specific code to perform the I/O. One such implementation is defined in the "Driver Writer's Guide" and "DSM Interface Example".
While PMEM provides direct byte-addressable CPU-load/store access to NVDIMM storage, it does not provide the best system RAS (recovery, availability, and serviceability) model. Access to a corrupted system-physical-address address causes a CPU exception while access to a corrupted address through a BLK-aperture causes that block window to raise an error status in a register. The latter is more aligned with the standard error model that host-bus-adapter attached disks present. Also, if an administrator ever wants to replace a memory it is easier to service a system at DIMM module boundaries. Compare this to PMEM where data could be interleaved in an opaque hardware specific manner across several DIMMs.
BLK-apertures solve these RAS problems, but their presence is also the major contributing factor to the complexity of the ND subsystem. They complicate the implementation because PMEM and BLK alias in DPA space. Any given DIMM's DPA-range may contribute to one or more system-physical-address sets of interleaved DIMMs, and may also be accessed in its entirety through its BLK-aperture. Accessing a DPA through a system-physical-address while simultaneously accessing the same DPA through a BLK-aperture has undefined results. For this reason, DIMMs with this dual interface configuration includes a DSM function to store/retrieve a LABEL. The LABEL effectively partitions the DPA-space into exclusive system-physical-address and BLK-aperture accessible regions. For simplicity, a DIMM is allowed a PMEM "region" per each interleave set in which it is a member. The remaining DPA space can be carved into an arbitrary number of BLK devices with discontiguous extents.