Regions, Atomic Sectors, and DAX

One of the few reasons to allow multiple BLK namespaces per REGION is so that each BLK-namespace can be configured with a Block Translation Table (BTT) with unique atomic sector sizes. While a PMEM device can host a BTT, the LABEL specification does not provide for a sector size to be specified for a PMEM namespace. This is due to the expectation that the primary usage model for PMEM is via DAX, and the BTT is incompatible with DAX. However, for the cases where an application or filesystem still needs atomic sector update guarantees, it can register a BTT on a PMEM device or partition.

Example NVDIMM Platform

Figure 1 shows an example platform with four NVDIMMs, two integrated memory controllers (IMC), and a single CPU .

                             (a)               (b)           DIMM   BLK-REGION
          +-------------------+--------+--------+--------+
+------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
| imc0 +--+- - - region0- - - +--------+        +--------+
+--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
   |      +-------------------+--------v        v--------+
+--+---+                               |                 |
| cpu0 |                                     region1
+--+---+                               |                 |
   |      +----------------------------^        ^--------+
+--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
| imc1 +--+----------------------------|        +--------+
+------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
          +----------------------------+--------+--------+

Figure 1: Example sysfs layout

Each unique interface (BLK or PMEM) to DPA space is identified by a region device with a dynamically assigned id (REGION0 - REGION5).

  1. The first portion of DIMM0 and DIMM1 is interleaved as REGION0. A

    single PMEM namespace is created in the REGION0-SPA-range that spans most

    of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that

    interleaved system-physical-address range is reclaimed as BLK-aperture

    accessed space starting at DPA-offset (a) into each DIMM. In that

    reclaimed space, we create two BLK-aperture "namespaces" from REGION2 and

    REGION3 where "blk2.0" and "blk3.0" are just human readable names that

    could be set to any user-desired name in the LABEL.

  2. In the last portion of DIMM0 and DIMM1 we have an interleaved system-physical-address range, REGION1, that spans those two DIMMs as well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and "blk5.0".

  3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1 interleaved system-physical-address range (i.e. the DPA address past offset (b) are also included in the "blk4.0" and "blk5.0" namespaces. Note, that this example shows that BLK-aperture namespaces don't need to

    be contiguous in DPA-space. This bus is provided by the kernel under the device/sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and the nfit_test.ko module is loaded. This not only test LIBNVDIMM but the acpi_nfit.ko driver as well.

Last updated