Quick Start Tutorial

Get Inferno up and running in under 10 minutes with this hands-on tutorial.

What You’ll Build

By the end of this tutorial, you’ll have:

✅ Inferno installed and configured
✅ A working model downloaded and tested
✅ An API server running
✅ A simple Python application using Inferno

Time Required: ~10 minutes

Prerequisites

macOS, Linux, or Windows
8GB+ RAM
10GB free disk space
Basic command-line knowledge

Step 1: Install Inferno (2 minutes)

Choose your platform:

macOS

brew install inferno
inferno --version

Linux

wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-amd64.deb
sudo dpkg -i inferno-linux-amd64.deb
inferno --version

Windows

winget install Inferno.InfernoAI
inferno --version

Docker

docker pull ringo380/inferno:latest
docker run --rm ringo380/inferno:latest inferno --version

Expected output: Inferno v0.7.0

Step 2: Download a Model (3 minutes)

Download Llama 2 7B Chat (~4GB):

inferno models download llama-2-7b-chat

This will download the model to ~/.local/share/inferno/models/.

Verify download:

inferno models list

Step 3: Run Your First Inference (1 minute)

Test the model with a simple prompt:

inferno run \
  --model llama-2-7b-chat \
  --prompt "Explain artificial intelligence in one sentence"

Expected output:

Artificial intelligence is the development of computer systems
that can perform tasks that typically require human intelligence.

Step 4: Start the API Server (1 minute)

Start the OpenAI-compatible API server:

inferno serve --port 8080

Expected output:

Inferno API Server v0.7.0
Listening on http://127.0.0.1:8080
Ready to accept requests!

Leave this terminal open and open a new terminal for the next steps.

Step 5: Test the API (1 minute)

In a new terminal, test the API with curl:

# List models
curl http://localhost:8080/v1/models
 
# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Step 6: Build a Python App (2 minutes)

Install the OpenAI SDK:

pip install openai

Create app.py:

from openai import OpenAI
 
# Connect to Inferno
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)
 
# Simple chat
response = client.chat.completions.create(
    model="llama-2-7b-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Python?"}
    ]
)
 
print(response.choices[0].message.content)

Run it:

python app.py

Success! You’ve built a working AI-powered application.

Next Steps

Enhance Your Application

Add Streaming:

stream = client.chat.completions.create(
    model="llama-2-7b-chat",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Try Different Models:

inferno models download mistral-7b-instruct-v0.2
inferno models download codellama-7b

Optimize Performance:

Enable GPU acceleration
Configure batching
Adjust thread count

Continue Learning

Performance Optimization - Make it faster
Examples - See more code examples
API Reference - Learn the complete API
Production Deployment - Deploy to production

Troubleshooting

Model download fails:

# Check internet connection
ping huggingface.co
 
# Try alternative source
inferno models download llama-2-7b-chat --source alternative

Server won’t start:

# Check if port is in use
lsof -i :8080
 
# Use different port
inferno serve --port 8081

Import errors in Python:

# Make sure OpenAI SDK is installed
pip install --upgrade openai

For more help, see the Troubleshooting Guide.

Summary

Congratulations! In just 10 minutes, you’ve:

✅ Installed Inferno
✅ Downloaded a model
✅ Run inference
✅ Started an API server
✅ Built a Python application

You’re now ready to build AI-powered applications with Inferno!