Complete technical reference for Inferno AI including API endpoints, CLI commands, system architecture, and changelog.
Complete OpenAI-compatible API documentation with request/response examples.
Includes:
Comprehensive command-line interface documentation for all 45+ commands.
Includes:
Deep dive into Inferno’s system architecture and design decisions.
Includes:
Version history, release notes, and migration guides.
Includes:
# Chat completion
POST /v1/chat/completions
# List models
GET /v1/models
# Embeddings
POST /v1/embeddings
# Run inference
inferno run --model MODEL --prompt "text"
# Start server
inferno serve --port 8080
# Manage models
inferno models list
inferno models download MODEL
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{...}'
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-2-7b-chat",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
# Version and info
inferno --version
inferno info
# Models
inferno models download llama-2-7b-chat
inferno models list
# Run inference
inferno run --model llama-2-7b-chat --prompt "Hello"
# Start server
inferno serve