NVIDIA VRAM as Linux Swap for Bigger AI Models

New GitHub tool lets you use NVIDIA GPU VRAM as swap space on Linux, helping developers run larger local AI models on hybrid laptops with soldered RAM.

NVIDIA VRAM as Linux Swap for Bigger AI Models

Project Overview

The repository

nbd-vramc0deJedi
View on GitHub โ†’
was shared on Hacker News with instructions for exposing NVIDIA GPU VRAM as a swap device on Linux. The tool targets hybrid graphics laptops where the discrete RTX card remains idle while system RAM fills. It allocates up to 7 GB of VRAM on an RTX 3070 Laptop GPU and registers it through the NBD protocol. The result triples addressable memory when combined with existing zram and SSD swap layers.

Technical Implementation

The daemon allocates memory directly through the CUDA driver API rather than attempting BAR1 pinning. It then exposes the buffer as an NBD target over a Unix socket. The kernel nbd module attaches to that socket and presents /dev/nbdX, which the administrator can mark as a swap device with standard swapon calls.

Data movement follows the path kernel swap subsystem to nbd driver to Unix socket to cuMemcpyHtoD and cuMemcpyDtoH calls. No custom kernel module is required, so the setup survives driver and kernel updates without recompilation. The overflow priority is explicit: system RAM first, then VRAM over PCIe, then compressed zram, then SSD.

The author rejected the nvidia_p2p_get_pages_persistent route after confirming it returns EINVAL on consumer GeForce silicon. Only datacenter SKUs expose the necessary RM-level interfaces.

Running Larger Models on Constrained Hardware

Developers training or fine-tuning LLMs on laptops often hit the 16 GB RAM ceiling before GPU memory becomes the bottleneck. Mapping VRAM as high-priority swap lets the process keep model weights and optimizer states resident longer. On the tested AMD + RTX 3070 configuration the combined tiers reached roughly 46 GB of addressable space.

The approach still incurs PCIe round-trips on every page fault that lands in VRAM. For inference workloads with predictable access patterns this cost remains acceptable; for training loops with random access the latency shows up as reduced throughput. Users must also leave the NVIDIA driver loaded and the card powered, which prevents full runtime power gating on some laptop platforms.

Setup and Stability Notes

Installation follows the provided Makefile and udev rules that create the socket and manage the daemon. A power-check script prevents allocation when the dGPU is unavailable. The author reports stable operation under kernel 6.17 and driver 580.159.03 on Pop!_OS, with 7 GB successfully swapped before SSD fallback.

Because the block device sits behind a user-space daemon, any crash in the nbd-vram process immediately removes the swap target. Administrators should therefore place it after zram in the priority list so that critical compressed pages remain available. No production stability data exists yet beyond the single laptop test case.

FAQs

Does this require an NVIDIA kernel module? No. The daemon uses only the user-space CUDA driver API and the in-tree nbd kernel driver.

Can the VRAM swap device survive a driver update? Yes. Because no symbols are taken from the NVIDIA kernel module, the binary continues to function after driver upgrades.

What happens if the GPU is needed for display? The tool is intended for hybrid laptops where the integrated GPU drives the screen. On systems where the NVIDIA card is the primary display output, allocation will fail or cause visual corruption.

---

๐Ÿ“– Related articles

Need a consultation?

I help companies and startups build software, automate workflows, and integrate AI. Let's talk.

Get in touch
โ† Back to blog