Complete reference for the Inferno command-line interface.
These flags work with all commands:
--help, -h Show help information
--version, -v Show version information
--verbose Enable verbose logging
--quiet Suppress non-error output
inferno run
Run inference with a model.
inferno run --model MODEL_NAME --prompt "Your prompt"
Options:
--model, -m
(required): Model name or path--prompt, -p
(required): Input prompt text--temperature, -t
: Sampling temperature (0.0 to 2.0, default: 0.7)--max-tokens
: Maximum tokens to generate (default: 512)--top-p
: Top-p sampling parameter (default: 0.9)--seed
: Random seed for reproducible results--stream
: Stream output tokens as they’re generatedExamples:
# Basic inference
inferno run --model llama2 --prompt "Explain AI"
# With custom parameters
inferno run --model llama2 --prompt "Write a poem" --temperature 1.2 --max-tokens 200
# Streaming output
inferno run --model llama2 --prompt "Tell me a story" --stream
inferno serve
Start the API server.
inferno serve [OPTIONS]
Options:
--port, -p
: Port to listen on (default: 8080)--host
: Host address to bind (default: 127.0.0.1)--workers
: Number of worker threads (default: CPU cores)--cors
: Enable CORS with specified origins--api-key
: Require API key authenticationExamples:
# Start server on default port
inferno serve
# Custom port and host
inferno serve --port 3000 --host 0.0.0.0
# With API key auth
inferno serve --api-key your-secret-key
# Enable CORS
inferno serve --cors "*"
inferno models
Manage AI models.
inferno models list
Lists all available and installed models.
inferno models download MODEL_NAME
Download a model from the registry.
Examples:
# Download a specific model
inferno models download llama2-7b
# Download with progress
inferno models download --verbose llama3-8b
inferno models remove MODEL_NAME
Remove an installed model.
inferno models info MODEL_NAME
Show detailed information about a model.
inferno batch
Run batch inference on multiple inputs.
inferno batch --model MODEL --input FILE --output FILE
Options:
--model, -m
: Model to use--input, -i
: Input file (JSONL format)--output, -o
: Output file for results--parallel
: Number of parallel inferences (default: 1)Input Format (JSONL):
{"prompt": "First prompt", "id": "1"}
{"prompt": "Second prompt", "id": "2"}
Example:
inferno batch --model llama2 --input prompts.jsonl --output results.jsonl --parallel 4
inferno config
Manage configuration.
inferno config [SUBCOMMAND]
Subcommands:
show
: Display current configurationset KEY VALUE
: Set a configuration valueget KEY
: Get a configuration valuereset
: Reset to default configurationExamples:
# Show config
inferno config show
# Set default model
inferno config set default_model llama2
# Get a value
inferno config get api_port
Configure Inferno using environment variables:
INFERNO_HOME
: Directory for models and config (default: ~/.inferno
)INFERNO_PORT
: Default API server portINFERNO_LOG_LEVEL
: Logging level (debug, info, warn, error)INFERNO_CACHE_DIR
: Cache directory for temporary filesExample:
export INFERNO_HOME=/data/inferno
export INFERNO_LOG_LEVEL=debug
inferno serve
Create ~/.inferno/config.toml
for persistent configuration:
[server]
port = 8080
host = "127.0.0.1"
workers = 4
[inference]
default_model = "llama2"
temperature = 0.7
max_tokens = 512
[gpu]
enabled = true
device = "auto" # auto, cuda, metal, rocm, intel
0
: Success1
: General error2
: Invalid arguments3
: Model not found4
: Network error5
: GPU error