Skip to Content

Installation Guide

This guide covers installation of Inferno AI on all supported platforms.

System Requirements

Minimum Requirements

Installation Methods

Choose your preferred installation method based on your platform and use case.


macOS Installation

The easiest way to install Inferno on macOS:

# Install Inferno via Homebrew
brew install inferno
 
# Verify installation
inferno --version
 
# Check GPU support (Apple Silicon)
inferno info --gpu

Method 2: Binary Download

Download the pre-built binary for macOS:

# Download the latest release
curl -LO https://github.com/ringo380/inferno/releases/latest/download/inferno-macos-universal.tar.gz
 
# Extract the archive
tar xzf inferno-macos-universal.tar.gz
 
# Move to a directory in your PATH
sudo mv inferno /usr/local/bin/
 
# Make executable
sudo chmod +x /usr/local/bin/inferno
 
# Verify installation
inferno --version

Apple Silicon GPU Support

Metal GPU acceleration is automatically enabled on Apple Silicon Macs (M1/M2/M3/M4):

# Verify Metal support
inferno info --gpu
 
# Run with GPU acceleration (automatic)
inferno run --model llama-2-7b-chat --prompt "Hello!"

Linux Installation

Method 1: Package Manager (Ubuntu/Debian)

# Download the .deb package
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-amd64.deb
 
# Install the package
sudo dpkg -i inferno-linux-amd64.deb
 
# Fix any dependency issues
sudo apt-get install -f
 
# Verify installation
inferno --version

Method 2: Binary Installation

# Download the binary
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64.tar.gz
 
# Extract
tar xzf inferno-linux-x86_64.tar.gz
 
# Move to system path
sudo mv inferno /usr/local/bin/
 
# Make executable
sudo chmod +x /usr/local/bin/inferno
 
# Verify installation
inferno --version

NVIDIA GPU Support (CUDA)

To use NVIDIA GPUs with CUDA:

# Install CUDA Toolkit (version 11.x or 12.x)
# Follow instructions at: https://developer.nvidia.com/cuda-downloads
 
# Verify CUDA installation
nvidia-smi
 
# Inferno will automatically detect and use CUDA
inferno info --gpu

AMD GPU Support (ROCm)

To use AMD GPUs with ROCm:

# Install ROCm (Ubuntu/Debian)
# Follow instructions at: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
 
# Verify ROCm installation
rocm-smi
 
# Inferno will automatically detect and use ROCm
inferno info --gpu

Windows Installation

Method 1: Windows Package Manager (winget)

# Install using winget (Windows 10+)
winget install Inferno.InfernoAI
 
# Verify installation
inferno --version

Method 2: MSI Installer

  1. Download the MSI installer from GitHub Releases 
  2. Run the installer: inferno-windows-x64.msi
  3. Follow the installation wizard
  4. Open a new terminal and verify:
inferno --version

Method 3: Binary Download

# Download the Windows binary
Invoke-WebRequest -Uri "https://github.com/ringo380/inferno/releases/latest/download/inferno-windows-x64.zip" -OutFile "inferno.zip"
 
# Extract the archive
Expand-Archive -Path "inferno.zip" -DestinationPath "C:\Program Files\Inferno"
 
# Add to PATH (run as Administrator)
$env:Path += ";C:\Program Files\Inferno"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [EnvironmentVariableTarget]::Machine)
 
# Verify installation
inferno --version

NVIDIA GPU Support (Windows)

# Download and install CUDA Toolkit from NVIDIA
# https://developer.nvidia.com/cuda-downloads
 
# Verify installation
nvidia-smi
 
# Inferno will automatically detect CUDA
inferno info --gpu

Docker Installation

Docker is the recommended method for consistent deployment across platforms.

Basic Docker Setup

# Pull the latest Inferno image
docker pull ringo380/inferno:latest
 
# Verify installation
docker run --rm ringo380/inferno:latest inferno --version

Run with Persistent Storage

# Create directories for models and cache
mkdir -p ~/inferno-data/models
mkdir -p ~/inferno-data/cache
 
# Run with volume mounts
docker run -d \
  --name inferno \
  -p 8080:8080 \
  -v ~/inferno-data/models:/data/models \
  -v ~/inferno-data/cache:/data/cache \
  ringo380/inferno:latest serve --port 8080

Docker with GPU Support

NVIDIA GPU:

# Install NVIDIA Container Toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
 
# Run with GPU support
docker run -d \
  --name inferno \
  --gpus all \
  -p 8080:8080 \
  -v ~/inferno-data/models:/data/models \
  ringo380/inferno:latest serve --port 8080

AMD GPU:

# Run with ROCm support
docker run -d \
  --name inferno \
  --device=/dev/kfd \
  --device=/dev/dri \
  -p 8080:8080 \
  -v ~/inferno-data/models:/data/models \
  ringo380/inferno:latest serve --port 8080

Docker Compose

Create a docker-compose.yml file:

version: '3.8'
 
services:
  inferno:
    image: ringo380/inferno:latest
    container_name: inferno
    ports:
      - "8080:8080"
    volumes:
      - ./inferno-data/models:/data/models
      - ./inferno-data/cache:/data/cache
    command: serve --port 8080
    restart: unless-stopped
    # For NVIDIA GPU support:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

Run with:

docker-compose up -d

Build from Source

For developers or advanced users who want to build from source:

Prerequisites

# Install Rust (all platforms)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
# Source the Rust environment
source $HOME/.cargo/env
 
# Verify Rust installation
rustc --version
cargo --version

Build Process

# Clone the repository
git clone https://github.com/ringo380/inferno.git
cd inferno
 
# Build in release mode
cargo build --release
 
# The binary will be at: target/release/inferno
# Move to system path
sudo cp target/release/inferno /usr/local/bin/
 
# Verify installation
inferno --version

Build with Specific Features

# Build with CUDA support only
cargo build --release --features cuda
 
# Build with ROCm support only
cargo build --release --features rocm
 
# Build with all GPU backends
cargo build --release --features cuda,rocm,metal

Verify Installation

After installation, verify everything is working:

# Check version
inferno --version
 
# View system information
inferno info
 
# Check GPU support
inferno info --gpu
 
# List available commands
inferno --help

Expected output:

Inferno v0.7.0
Platform: darwin-arm64
GPU: Metal (Apple M3 Pro)

Post-Installation Steps

1. Download Your First Model

# Download a small model to test
inferno models download llama-2-7b-chat

2. Run a Test Inference

# Test with a simple prompt
inferno run --model llama-2-7b-chat --prompt "Hello, world!"

3. Start the API Server

# Start the server
inferno serve --port 8080
 
# Test the API
curl http://localhost:8080/v1/models

Troubleshooting

Command Not Found

If you get “command not found” after installation:

# Check if binary is in PATH
which inferno
 
# If not found, add to PATH (Linux/macOS)
export PATH="$PATH:/usr/local/bin"
 
# Make permanent by adding to ~/.bashrc or ~/.zshrc
echo 'export PATH="$PATH:/usr/local/bin"' >> ~/.bashrc
source ~/.bashrc

Permission Denied

# Make sure the binary is executable
sudo chmod +x /usr/local/bin/inferno

GPU Not Detected

# Check GPU information
inferno info --gpu
 
# For NVIDIA, verify CUDA installation
nvidia-smi
 
# For AMD, verify ROCm installation
rocm-smi

For more troubleshooting help, see the Troubleshooting Guide.


Next Steps

Ready to run your first inference? Head to the Quick Start Guide