Get Inferno up and running in under 10 minutes with this hands-on tutorial.
By the end of this tutorial, you’ll have:
Time Required: ~10 minutes
Choose your platform:
brew install inferno
inferno --version
Expected output: Inferno v0.7.0
Download Llama 2 7B Chat (~4GB):
inferno models download llama-2-7b-chat
This will download the model to ~/.local/share/inferno/models/
.
Verify download:
inferno models list
Test the model with a simple prompt:
inferno run \
--model llama-2-7b-chat \
--prompt "Explain artificial intelligence in one sentence"
Expected output:
Artificial intelligence is the development of computer systems
that can perform tasks that typically require human intelligence.
Start the OpenAI-compatible API server:
inferno serve --port 8080
Expected output:
Inferno API Server v0.7.0
Listening on http://127.0.0.1:8080
Ready to accept requests!
Leave this terminal open and open a new terminal for the next steps.
In a new terminal, test the API with curl:
# List models
curl http://localhost:8080/v1/models
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Install the OpenAI SDK:
pip install openai
Create app.py
:
from openai import OpenAI
# Connect to Inferno
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
# Simple chat
response = client.chat.completions.create(
model="llama-2-7b-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
]
)
print(response.choices[0].message.content)
Run it:
python app.py
Success! You’ve built a working AI-powered application.
stream = client.chat.completions.create(
model="llama-2-7b-chat",
messages=[{"role": "user", "content": "Count to 10"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
inferno models download mistral-7b-instruct-v0.2
inferno models download codellama-7b
Model download fails:
# Check internet connection
ping huggingface.co
# Try alternative source
inferno models download llama-2-7b-chat --source alternative
Server won’t start:
# Check if port is in use
lsof -i :8080
# Use different port
inferno serve --port 8081
Import errors in Python:
# Make sure OpenAI SDK is installed
pip install --upgrade openai
For more help, see the Troubleshooting Guide.
Congratulations! In just 10 minutes, you’ve:
You’re now ready to build AI-powered applications with Inferno!