This guide covers installation of Inferno AI on all supported platforms.
Choose your preferred installation method based on your platform and use case.
The easiest way to install Inferno on macOS:
# Install Inferno via Homebrew
brew install inferno
# Verify installation
inferno --version
# Check GPU support (Apple Silicon)
inferno info --gpu
Download the pre-built binary for macOS:
# Download the latest release
curl -LO https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-universal.tar.gz
# Extract the archive
tar xzf inferno-macos-universal.tar.gz
# Move to a directory in your PATH
sudo mv inferno /usr/local/bin/
# Make executable
sudo chmod +x /usr/local/bin/inferno
# Verify installation
inferno --version
Metal GPU acceleration is automatically enabled on Apple Silicon Macs (M1/M2/M3/M4):
# Verify Metal support
inferno info --gpu
# Run with GPU acceleration (automatic)
inferno run --model llama-2-7b-chat --prompt "Hello!"
# Download the .deb package
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-amd64.deb
# Install the package
sudo dpkg -i inferno-linux-amd64.deb
# Fix any dependency issues
sudo apt-get install -f
# Verify installation
inferno --version
# Download the binary
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
# Extract
tar xzf inferno-linux-x86_64.tar.gz
# Move to system path
sudo mv inferno /usr/local/bin/
# Make executable
sudo chmod +x /usr/local/bin/inferno
# Verify installation
inferno --version
To use NVIDIA GPUs with CUDA:
# Install CUDA Toolkit (version 11.x or 12.x)
# Follow instructions at: https://developer.nvidia.com/cuda-downloads
# Verify CUDA installation
nvidia-smi
# Inferno will automatically detect and use CUDA
inferno info --gpu
To use AMD GPUs with ROCm:
# Install ROCm (Ubuntu/Debian)
# Follow instructions at: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
# Verify ROCm installation
rocm-smi
# Inferno will automatically detect and use ROCm
inferno info --gpu
# Install using winget (Windows 10+)
winget install Inferno.InfernoAI
# Verify installation
inferno --version
inferno-windows-x64.msi
inferno --version
# Download the Windows binary
Invoke-WebRequest -Uri "https://github.com/ringo380/inferno/releases/latest/download/inferno-windows-x64.zip" -OutFile "inferno.zip"
# Extract the archive
Expand-Archive -Path "inferno.zip" -DestinationPath "C:\Program Files\Inferno"
# Add to PATH (run as Administrator)
$env:Path += ";C:\Program Files\Inferno"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [EnvironmentVariableTarget]::Machine)
# Verify installation
inferno --version
# Download and install CUDA Toolkit from NVIDIA
# https://developer.nvidia.com/cuda-downloads
# Verify installation
nvidia-smi
# Inferno will automatically detect CUDA
inferno info --gpu
Docker is the recommended method for consistent deployment across platforms.
# Pull the latest Inferno image
docker pull ringo380/inferno:latest
# Verify installation
docker run --rm ringo380/inferno:latest inferno --version
# Create directories for models and cache
mkdir -p ~/inferno-data/models
mkdir -p ~/inferno-data/cache
# Run with volume mounts
docker run -d \
--name inferno \
-p 8080:8080 \
-v ~/inferno-data/models:/data/models \
-v ~/inferno-data/cache:/data/cache \
ringo380/inferno:latest serve --port 8080
NVIDIA GPU:
# Install NVIDIA Container Toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
# Run with GPU support
docker run -d \
--name inferno \
--gpus all \
-p 8080:8080 \
-v ~/inferno-data/models:/data/models \
ringo380/inferno:latest serve --port 8080
AMD GPU:
# Run with ROCm support
docker run -d \
--name inferno \
--device=/dev/kfd \
--device=/dev/dri \
-p 8080:8080 \
-v ~/inferno-data/models:/data/models \
ringo380/inferno:latest serve --port 8080
Create a docker-compose.yml
file:
version: '3.8'
services:
inferno:
image: ringo380/inferno:latest
container_name: inferno
ports:
- "8080:8080"
volumes:
- ./inferno-data/models:/data/models
- ./inferno-data/cache:/data/cache
command: serve --port 8080
restart: unless-stopped
# For NVIDIA GPU support:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
Run with:
docker-compose up -d
For developers or advanced users who want to build from source:
# Install Rust (all platforms)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Source the Rust environment
source $HOME/.cargo/env
# Verify Rust installation
rustc --version
cargo --version
# Clone the repository
git clone https://github.com/ringo380/inferno.git
cd inferno
# Build in release mode
cargo build --release
# The binary will be at: target/release/inferno
# Move to system path
sudo cp target/release/inferno /usr/local/bin/
# Verify installation
inferno --version
# Build with CUDA support only
cargo build --release --features cuda
# Build with ROCm support only
cargo build --release --features rocm
# Build with all GPU backends
cargo build --release --features cuda,rocm,metal
After installation, verify everything is working:
# Check version
inferno --version
# View system information
inferno info
# Check GPU support
inferno info --gpu
# List available commands
inferno --help
Expected output:
Inferno v0.7.0
Platform: darwin-arm64
GPU: Metal (Apple M3 Pro)
# Download a small model to test
inferno models download llama-2-7b-chat
# Test with a simple prompt
inferno run --model llama-2-7b-chat --prompt "Hello, world!"
# Start the server
inferno serve --port 8080
# Test the API
curl http://localhost:8080/v1/models
If you get “command not found” after installation:
# Check if binary is in PATH
which inferno
# If not found, add to PATH (Linux/macOS)
export PATH="$PATH:/usr/local/bin"
# Make permanent by adding to ~/.bashrc or ~/.zshrc
echo 'export PATH="$PATH:/usr/local/bin"' >> ~/.bashrc
source ~/.bashrc
# Make sure the binary is executable
sudo chmod +x /usr/local/bin/inferno
# Check GPU information
inferno info --gpu
# For NVIDIA, verify CUDA installation
nvidia-smi
# For AMD, verify ROCm installation
rocm-smi
For more troubleshooting help, see the Troubleshooting Guide.
Ready to run your first inference? Head to the Quick Start Guide →