CLI Reference

Complete reference for the Inferno command-line interface.

Global Flags

These flags work with all commands:

--help, -h       Show help information
--version, -v    Show version information
--verbose        Enable verbose logging
--quiet          Suppress non-error output

Commands

`inferno run`

Run inference with a model.

inferno run --model MODEL_NAME --prompt "Your prompt"

Options:

--model, -m (required): Model name or path
--prompt, -p (required): Input prompt text
--temperature, -t: Sampling temperature (0.0 to 2.0, default: 0.7)
--max-tokens: Maximum tokens to generate (default: 512)
--top-p: Top-p sampling parameter (default: 0.9)
--seed: Random seed for reproducible results
--stream: Stream output tokens as they’re generated

Examples:

# Basic inference
inferno run --model llama2 --prompt "Explain AI"
 
# With custom parameters
inferno run --model llama2 --prompt "Write a poem" --temperature 1.2 --max-tokens 200
 
# Streaming output
inferno run --model llama2 --prompt "Tell me a story" --stream

`inferno serve`

Start the API server.

inferno serve [OPTIONS]

Options:

--port, -p: Port to listen on (default: 8080)
--host: Host address to bind (default: 127.0.0.1)
--workers: Number of worker threads (default: CPU cores)
--cors: Enable CORS with specified origins
--api-key: Require API key authentication

Examples:

# Start server on default port
inferno serve
 
# Custom port and host
inferno serve --port 3000 --host 0.0.0.0
 
# With API key auth
inferno serve --api-key your-secret-key
 
# Enable CORS
inferno serve --cors "*"

`inferno models`

Manage AI models.

List Models

inferno models list

Lists all available and installed models.

Download Model

inferno models download MODEL_NAME

Download a model from the registry.

Examples:

# Download a specific model
inferno models download llama2-7b
 
# Download with progress
inferno models download --verbose llama3-8b

Remove Model

inferno models remove MODEL_NAME

Remove an installed model.

Info

inferno models info MODEL_NAME

Show detailed information about a model.

`inferno batch`

Run batch inference on multiple inputs.

inferno batch --model MODEL --input FILE --output FILE

Options:

--model, -m: Model to use
--input, -i: Input file (JSONL format)
--output, -o: Output file for results
--parallel: Number of parallel inferences (default: 1)

Input Format (JSONL):

{"prompt": "First prompt", "id": "1"}
{"prompt": "Second prompt", "id": "2"}

Example:

inferno batch --model llama2 --input prompts.jsonl --output results.jsonl --parallel 4

`inferno config`

Manage configuration.

inferno config [SUBCOMMAND]

Subcommands:

show: Display current configuration
set KEY VALUE: Set a configuration value
get KEY: Get a configuration value
reset: Reset to default configuration

Examples:

# Show config
inferno config show
 
# Set default model
inferno config set default_model llama2
 
# Get a value
inferno config get api_port

Environment Variables

Configure Inferno using environment variables:

INFERNO_HOME: Directory for models and config (default: ~/.inferno)
INFERNO_PORT: Default API server port
INFERNO_LOG_LEVEL: Logging level (debug, info, warn, error)
INFERNO_CACHE_DIR: Cache directory for temporary files

Example:

export INFERNO_HOME=/data/inferno
export INFERNO_LOG_LEVEL=debug
inferno serve

Configuration File

Create ~/.inferno/config.toml for persistent configuration:

[server]
port = 8080
host = "127.0.0.1"
workers = 4
 
[inference]
default_model = "llama2"
temperature = 0.7
max_tokens = 512
 
[gpu]
enabled = true
device = "auto"  # auto, cuda, metal, rocm, intel

Exit Codes

0: Success
1: General error
2: Invalid arguments
3: Model not found
4: Network error
5: GPU error