Zero-Residue Migration¶

Flexium's zero-residue migration ensures that when a training job migrates from one GPU to another, no memory is left behind on the source GPU.

Driver Requirement: NVIDIA 580+ for GPU migration

How It Works¶

Traditional approaches (like model.to(device)) leave memory fragments due to PyTorch's caching allocator. Flexium uses driver-level migration:

Full GPU Reclamation - The source GPU is immediately available for other workloads
No Memory Fragmentation - Clean memory state on both source and target GPUs
Seamless Continuation - Training resumes exactly where it left off

Zero-residue migration is automatic:

import flexium
flexium.init()

# Your training code here
# Migration happens when triggered via dashboard
train_model()

Or with explicit scope control:

import flexium.auto

with flexium.auto.run():
    train_model()

You can verify zero-residue behavior by monitoring GPU memory:

# Before migration
nvidia-smi

# After migration - source GPU should show 0 MB used by the process
nvidia-smi