Skip to Content

Getting Started

This guide will help you install Inferno and run your first AI inference.

Installation

Choose the installation method for your platform:

macOS

The easiest way to install on macOS is via Homebrew:

brew install inferno

Alternatively, download the desktop app from the downloads page.

Linux

Download the latest binary from GitHub Releases :

# Download and extract (replace VERSION with latest)
curl -LO https://github.com/ringo380/inferno/releases/download/VERSION/inferno-linux-x64.tar.gz
tar -xzf inferno-linux-x64.tar.gz
 
# Move to your PATH
sudo mv inferno /usr/local/bin/

Or install via Cargo:

cargo install inferno

Windows

Download the installer from GitHub Releases  or install via Cargo:

cargo install inferno

Docker

Pull the image from GitHub Container Registry:

docker pull ghcr.io/ringo380/inferno:latest

Verify Installation

Check that Inferno is installed correctly:

inferno --version

Your First Inference

Let’s run a simple inference to make sure everything works:

inferno run --model llama2 --prompt "Hello, world!"

This command will:

  1. Check if you have the llama2 model
  2. Download it if needed
  3. Run the inference
  4. Display the result

List Available Models

See what models are available:

inferno models list

Starting the API Server

To use Inferno with OpenAI-compatible clients, start the API server:

inferno serve --port 8080

The server will start at http://localhost:8080 with OpenAI-compatible endpoints.

Test the API

Once the server is running, test it with curl:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Next Steps