# MinIO MemKV RELEASE.2026-05-26T21-39-33Z

Released: 2026-05-26

A targeted patch on top of `RELEASE.2026-05-26T08-06-11Z` that fixes a
multi-second first-read stall on CUDA-equipped hosts. The fix restores
expected RDMA read throughput on read-heavy workloads — most visible on
short benchmark runs and on any client that does not warm up with writes
before reads. No protocol, configuration, or hardware-compatibility
changes.

---

## Downloads

### Server Binary

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [memkv](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv) |
| Linux    | arm64        | [memkv](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv) |

### NIXL Plugin (for Dynamo / KVBM integrations)

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [libplugin_MEMKV.so](https://dl.min.io/aistor/memkv/release/linux-amd64/libplugin_MEMKV.so) |
| Linux    | arm64        | [libplugin_MEMKV.so](https://dl.min.io/aistor/memkv/release/linux-arm64/libplugin_MEMKV.so) |

### LD_PRELOAD Shim (for MLPerf-Storage kvcache workloads)

| Platform | Architecture | Download |
| -------- | ------------ | -------- |
| Linux    | amd64        | [libmemkv_preload.so](https://dl.min.io/aistor/memkv/release/linux-amd64/libmemkv_preload.so) |
| Linux    | arm64        | [libmemkv_preload.so](https://dl.min.io/aistor/memkv/release/linux-arm64/libmemkv_preload.so) |

### Packages

`.deb`, `.rpm`, and `.apk` packages bundle the server + both `.so` sidecars + the LMCache and sglang Python wheels into a single per-arch install.

| Format | Architecture | Download |
| ------ | ------------ | -------- |
| DEB    | amd64        | [memkv\_20260526213933.0.0_amd64.deb](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv_20260526213933.0.0_amd64.deb) |
| DEB    | arm64        | [memkv\_20260526213933.0.0_arm64.deb](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv_20260526213933.0.0_arm64.deb) |
| RPM    | amd64        | [memkv-20260526213933.0.0-1.x86_64.rpm](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv-20260526213933.0.0-1.x86_64.rpm) |
| RPM    | arm64        | [memkv-20260526213933.0.0-1.aarch64.rpm](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv-20260526213933.0.0-1.aarch64.rpm) |
| APK    | amd64        | [memkv\_20260526213933.0.0_x86_64.apk](https://dl.min.io/aistor/memkv/release/linux-amd64/memkv_20260526213933.0.0_x86_64.apk) |
| APK    | arm64        | [memkv\_20260526213933.0.0_aarch64.apk](https://dl.min.io/aistor/memkv/release/linux-arm64/memkv_20260526213933.0.0_aarch64.apk) |

After installing the deb/rpm, the Python plugin wheels land at `/usr/share/memkv/wheels/`:

```bash
pip install /usr/share/memkv/wheels/memkv_lmcache-*.whl
pip install /usr/share/memkv/wheels/memkv_sglang-*.whl
```

The NIXL plugin is auto-symlinked to `/opt/nvidia/nvda_nixl/lib/plugins/` when that directory exists (postinstall hook).

### Container Image

```bash
docker pull quay.io/minio/memkv:RELEASE.2026-05-26T21-39-33Z
docker pull quay.io/minio/memkv:latest
```

Container ships the server + the NIXL plugin (under `/usr/local/lib/plugins/`). The LD_PRELOAD shim and Python wheels are not included in the container image — use the deb/rpm for those.

### Verification

Each binary is signed with both minisign (preferred) and GPG; sha256sums are published alongside.

```bash
# minisign
minisign -Vm memkv -P RWTx5Zr1tiHQLwG9keckT0c45M3AGeHD6IvimQHpyRywVWGbP1aVSGav

# sha256
sha256sum -c memkv.sha256sum
```

---

## Changes since RELEASE.2026-05-26T08-06-11Z

### Bug Fixes

- **Client RDMA: eliminate ~2.4 s first-read stall on CUDA hosts.** On
  hosts with the NVIDIA driver installed, the client's memory-region
  registration deferred the actual `ibv_reg_mr` (and the CUDA classify
  step that decides whether a buffer is host or device memory) to the
  first hot-path operation. That classify path lazily loads libcuda and
  triggers `cuInit`, which the NVIDIA driver serializes process-wide
  and takes ~2.4 s on multi-GPU systems. Workloads that warmed up with
  writes absorbed the cost; read-only workloads (and short benchmark
  runs) paid it on every thread's first read. The client now performs
  the classify, MR registration, and one-shot libcuda load at setup
  time, before any RDMA op is timed.

### Performance

- **4 KiB read throughput restored on read-only workloads.** With the
  cuInit stall removed, 4 KiB random reads on a 64-thread client recover
  from 0.77 Gbps (2.65 ms mean) to 5.78 Gbps (282 µs mean). 16 MiB reads
  return to the 96.8 GiB/s (774 Gbps) headline with no tail outliers.
  Workloads that previously masked the stall with a write-heavy warmup
  are unaffected.

---

## Documentation

- Hosted docs: <https://docs.min.io/memkv/>
- Embedded docs (in the binary): `memkv doc` serves the same site locally.

## Support

- Security disclosures: security@min.io