Skip to Content

Quick Start Tutorial

Get Inferno up and running in under 10 minutes with this hands-on tutorial.

What You’ll Build

By the end of this tutorial, you’ll have:

Time Required: ~10 minutes


Prerequisites


Step 1: Install Inferno (2 minutes)

Choose your platform:

brew install inferno
inferno --version

Expected output: Inferno v0.7.0


Step 2: Download a Model (3 minutes)

Download Llama 2 7B Chat (~4GB):

inferno models download llama-2-7b-chat

This will download the model to ~/.local/share/inferno/models/.

Verify download:

inferno models list

Step 3: Run Your First Inference (1 minute)

Test the model with a simple prompt:

inferno run \
  --model llama-2-7b-chat \
  --prompt "Explain artificial intelligence in one sentence"

Expected output:

Artificial intelligence is the development of computer systems
that can perform tasks that typically require human intelligence.

Step 4: Start the API Server (1 minute)

Start the OpenAI-compatible API server:

inferno serve --port 8080

Expected output:

Inferno API Server v0.7.0
Listening on http://127.0.0.1:8080
Ready to accept requests!

Leave this terminal open and open a new terminal for the next steps.


Step 5: Test the API (1 minute)

In a new terminal, test the API with curl:

# List models
curl http://localhost:8080/v1/models
 
# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Step 6: Build a Python App (2 minutes)

Install the OpenAI SDK:

pip install openai

Create app.py:

from openai import OpenAI
 
# Connect to Inferno
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)
 
# Simple chat
response = client.chat.completions.create(
    model="llama-2-7b-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Python?"}
    ]
)
 
print(response.choices[0].message.content)

Run it:

python app.py

Success! You’ve built a working AI-powered application.


Next Steps

Enhance Your Application

  1. Add Streaming:
stream = client.chat.completions.create(
    model="llama-2-7b-chat",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
  1. Try Different Models:
inferno models download mistral-7b-instruct-v0.2
inferno models download codellama-7b
  1. Optimize Performance:

Continue Learning


Troubleshooting

Model download fails:

# Check internet connection
ping huggingface.co
 
# Try alternative source
inferno models download llama-2-7b-chat --source alternative

Server won’t start:

# Check if port is in use
lsof -i :8080
 
# Use different port
inferno serve --port 8081

Import errors in Python:

# Make sure OpenAI SDK is installed
pip install --upgrade openai

For more help, see the Troubleshooting Guide.


Summary

Congratulations! In just 10 minutes, you’ve:

You’re now ready to build AI-powered applications with Inferno!