Installation Guide¶
This guide covers all installation methods and requirements for Flexium.AI.
System Requirements¶
Hardware¶
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA GPU with CUDA support | NVIDIA A100/H100 or consumer RTX 30xx/40xx |
| RAM | 8 GB | 16+ GB |
| Storage | 1 GB for flexium | SSD recommended |
Software¶
| Requirement | Version | Notes |
|---|---|---|
| Operating System | Linux x86_64 | Ubuntu 20.04+, RHEL 8+, Debian 10+ |
| Python | 3.8 - 3.12 | 3.10+ recommended |
| NVIDIA Driver | 580+ | Required for zero-residue migration |
| CUDA | 12.4+ | Required for driver 580+ |
| PyTorch | 2.0+ | With CUDA support |
Driver 580+ Required
Zero-residue migration requires NVIDIA driver version 580 or higher. Earlier drivers do not support the necessary migration features.
Verify Driver Version¶
If your driver is older than 580, you'll need to update:
# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-580
# Or download from NVIDIA website:
# https://www.nvidia.com/Download/index.aspx
Installation Methods¶
Method 1: From PyPI (Recommended)¶
Method 2: From Source¶
# Clone the repository
git clone https://github.com/flexiumai/flexium.git
cd flexium
# Install in development mode
pip install -e .
# Or install with all extras
pip install -e ".[all]"
Method 3: From GitHub Release¶
pip install https://github.com/flexiumai/flexium/releases/download/v0.1.1/flexium-0.1.1-py3-none-any.whl
PyTorch Installation¶
Flexium requires PyTorch with CUDA 12.4+ support. Install PyTorch before installing flexium.
For CUDA 12.4+¶
For Latest PyTorch¶
Visit pytorch.org/get-started to get the install command for your system. Make sure to select CUDA 12.4 or higher.
Verify PyTorch CUDA¶
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"
Expected output:
Dependencies¶
Core Dependencies (Auto-installed)¶
| Package | Version | Purpose |
|---|---|---|
python-socketio[client] |
>=5.0.0 | WebSocket communication |
pynvml |
>=11.0.0 | GPU monitoring |
flask |
>=2.0.0 | Web dashboard |
Development Dependencies¶
| Package | Purpose |
|---|---|
pytest |
Testing |
pytest-cov |
Coverage |
mypy |
Type checking |
ruff |
Linting |
Environment Setup¶
Option 1: Virtual Environment (Recommended)¶
# Create virtual environment
python -m venv flexium-env
source flexium-env/bin/activate
# Install PyTorch with CUDA 12.4
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# Install flexium
pip install flexium
Option 2: Conda Environment¶
# Create conda environment
conda create -n flexium python=3.10
conda activate flexium
# Install PyTorch with CUDA 12.4
conda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia
# Install flexium
pip install flexium
Option 3: System-wide Installation¶
Configuration¶
Config File (Recommended)¶
Create ~/.flexiumrc:
# Server address with workspace
server: app.flexium.ai/myworkspace
# Default device
device: cuda:0
# Heartbeat interval (seconds)
heartbeat_interval: 3.0
Environment Variables¶
| Variable | Description | Default |
|---|---|---|
FLEXIUM_SERVER |
Server address with workspace (host:port/workspace) | None (local mode) |
GPU_DEVICE |
Default GPU device | cuda:0 |
FLEXIUM_LOG_LEVEL |
Log level | INFO |
FLEXIUM_DEBUG |
Enable debug mode | false |
Example:
# Format: host:port/workspace
export FLEXIUM_SERVER="app.flexium.ai/myworkspace"
export GPU_DEVICE=cuda:0
export FLEXIUM_LOG_LEVEL=DEBUG
URL Format
The FLEXIUM_SERVER variable uses a token-in-path format: host:port/workspace. This routes your training jobs to the correct workspace orchestrator.
Project-Local Config¶
Create .flexiumrc in your project directory (takes precedence over ~/.flexiumrc):
Verification¶
Step 1: Check Installation¶
# Verify flexium is installed
python -c "import flexium; print(f'Flexium version: {flexium.__version__}')"
# Verify module loads
python -c "import flexium.auto; print('OK')"
Step 2: Check GPU Access¶
python -c "
import torch
import pynvml
pynvml.nvmlInit()
device_count = pynvml.nvmlDeviceGetCount()
print(f'GPUs detected: {device_count}')
for i in range(device_count):
handle = pynvml.nvmlDeviceGetHandleByIndex(i)
name = pynvml.nvmlDeviceGetName(handle)
print(f' GPU {i}: {name}')
pynvml.nvmlShutdown()
"
Step 3: Test Server Connection¶
Step 4: Test Training Integration¶
# Create test script
cat > test_flexium.py << 'EOF'
import flexium.auto
import torch
with flexium.auto.run():
x = torch.zeros(100, 100).cuda()
print(f"Tensor on: {x.device}")
print("Flexium integration working!")
EOF
# Run test
FLEXIUM_SERVER="app.flexium.ai/myworkspace" python test_flexium.py
Troubleshooting Installation¶
"CUDA not available"¶
# Check NVIDIA driver
nvidia-smi
# Check PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"
Solutions:
1. Install NVIDIA driver: sudo apt install nvidia-driver-580
2. Reinstall PyTorch with CUDA 12.4: pip install torch --index-url https://download.pytorch.org/whl/cu124
"Module 'flexium' not found"¶
"python-socketio installation fails"¶
# Install build dependencies
sudo apt install build-essential python3-dev
# Then install flexium
pip install flexium
"pynvml fails to initialize"¶
This usually means the NVIDIA driver is not loaded:
"Permission denied" errors¶
# Add user to video group
sudo usermod -aG video $USER
# Log out and back in, or use newgrp
newgrp video
Next Steps¶
- Getting Started - Quick start guide
- Architecture - How flexium works
- API Reference - Complete API docs
- Troubleshooting - Common issues