Troubleshooting Guide

Common issues and solutions for Inferno AI.

Quick Diagnostic Commands

Before diving into specific issues, run these diagnostic commands:

# Check Inferno version
inferno --version
 
# View system information
inferno info
 
# Check GPU status
inferno info --gpu
 
# Validate configuration
inferno config validate
 
# View logs
tail -f ~/.local/share/inferno/logs/inferno.log

Installation Issues

Command Not Found

Problem: inferno: command not found after installation

Solutions:

# 1. Check if binary exists
which inferno
ls -l /usr/local/bin/inferno
 
# 2. Add to PATH (Linux/macOS)
export PATH="$PATH:/usr/local/bin"
echo 'export PATH="$PATH:/usr/local/bin"' >> ~/.bashrc
source ~/.bashrc
 
# 3. Make executable
sudo chmod +x /usr/local/bin/inferno
 
# 4. Reinstall
brew reinstall inferno  # macOS

Windows:

# Add to PATH (run as Administrator)
$env:Path += ";C:\Program Files\Inferno"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [EnvironmentVariableTarget]::Machine)

Permission Denied

Problem: Permission denied when running commands

Solution:

# Make binary executable
sudo chmod +x /usr/local/bin/inferno
 
# Fix ownership
sudo chown $USER:$USER ~/.local/share/inferno
 
# Or run with sudo (not recommended for regular use)
sudo inferno --version

Port Already in Use

Problem: Error: Address already in use (port 8080)

Solutions:

# 1. Find process using port
lsof -i :8080          # macOS/Linux
netstat -ano | findstr :8080  # Windows
 
# 2. Kill the process
kill -9 <PID>          # macOS/Linux
taskkill /PID <PID> /F # Windows
 
# 3. Use a different port
inferno serve --port 8081

GPU Issues

GPU Not Detected

Problem: GPU not being used despite being available

Diagnostic:

# Check GPU status
inferno info --gpu
 
# Verify CUDA (NVIDIA)
nvidia-smi
 
# Verify ROCm (AMD)
rocm-smi
 
# Verify Metal (Apple Silicon)
system_profiler SPDisplaysDataType

Solutions:

NVIDIA GPU:

# Install/update CUDA toolkit
# https://developer.nvidia.com/cuda-downloads
 
# Verify CUDA installation
nvcc --version
 
# Force CUDA backend
inferno serve --gpu-backend cuda
 
# Check CUDA libraries
ldconfig -p | grep cuda

AMD GPU:

# Install ROCm
# Follow: https://rocmdocs.amd.com/en/latest/
 
# Verify ROCm
rocm-smi
 
# Force ROCm backend
inferno serve --gpu-backend rocm

Apple Silicon:

# Metal should work automatically
# If not, update macOS
softwareupdate -l
 
# Check for Metal support
inferno info --gpu

Out of GPU Memory

Problem: CUDA out of memory or similar GPU memory errors

Solutions:

# 1. Use a smaller model
inferno run --model llama-2-7b-chat  # Instead of 70B
 
# 2. Reduce GPU layers
inferno serve --gpu-layers 20  # Instead of -1 (all)
 
# 3. Enable memory mapping
inferno serve --mmap
 
# 4. Clear GPU cache
nvidia-smi --gpu-reset  # NVIDIA
 
# 5. Reduce batch size
inferno serve --batch-size 256  # Lower than default

Slow GPU Inference

Problem: GPU inference is slower than expected

Solutions:

# 1. Ensure GPU is being used
inferno info --gpu  # Should show GPU backend
 
# 2. Offload all layers to GPU
inferno serve --gpu-layers -1
 
# 3. Increase batch size
inferno serve --batch-size 512
 
# 4. Check GPU utilization
nvidia-smi dmon  # NVIDIA
watch -n 1 rocm-smi  # AMD
 
# 5. Update GPU drivers

Model Issues

Model Download Fails

Problem: Cannot download models

Solutions:

# 1. Check internet connection
ping huggingface.co
 
# 2. Download manually
wget https://huggingface.co/...model.gguf
inferno run --model-path ./model.gguf
 
# 3. Check disk space
df -h
 
# 4. Use alternative source
inferno models download --source huggingface MODEL_NAME
 
# 5. Retry with verbose output
inferno models download MODEL_NAME --verbose

Model Loading Errors

Problem: Error loading model or model fails to load

Solutions:

# 1. Verify model file integrity
inferno models info MODEL_NAME
 
# 2. Check model format
file /path/to/model.gguf  # Should show GGUF format
 
# 3. Re-download model
inferno models remove MODEL_NAME
inferno models download MODEL_NAME
 
# 4. Check available memory
free -h  # Linux/macOS
systeminfo | findstr Memory  # Windows
 
# 5. Use memory-mapped loading
inferno run --model MODEL --mmap

Model Not Found

Problem: Model exists but Inferno can’t find it

Solutions:

# 1. List available models
inferno models list
 
# 2. Check models directory
ls -la ~/.local/share/inferno/models/
 
# 3. Use absolute path
inferno run --model-path /full/path/to/model.gguf
 
# 4. Set models directory
export INFERNO_MODELS_DIR=/path/to/models
inferno serve
 
# 5. Verify configuration
inferno config show models

Performance Issues

Slow Inference

Problem: Inference is slower than expected

Solutions:

# 1. Enable GPU acceleration
inferno serve --gpu-layers -1
 
# 2. Use quantized models
inferno models download llama-2-7b-q4  # 4-bit quantized
 
# 3. Optimize thread count
inferno serve --threads $(nproc)  # Use all CPU cores
 
# 4. Enable memory mapping
inferno serve --mmap
 
# 5. Increase batch size
inferno serve --batch-size 512
 
# 6. Preload models
inferno models preload MODEL_NAME

High Memory Usage

Problem: Inferno uses too much RAM

Solutions:

# 1. Use smaller/quantized models
inferno models download MODEL-q4  # 4-bit quantization
 
# 2. Enable memory mapping
inferno serve --mmap
 
# 3. Offload to GPU
inferno serve --gpu-layers -1
 
# 4. Reduce context size
inferno serve --context-size 2048  # Instead of 4096
 
# 5. Monitor memory usage
inferno info --memory

High CPU Usage

Problem: Excessive CPU usage

Solutions:

# 1. Reduce thread count
inferno serve --threads 4
 
# 2. Offload to GPU
inferno serve --gpu-layers -1
 
# 3. Lower batch size
inferno serve --batch-size 128
 
# 4. Check for background processes
top  # Linux/macOS
tasklist  # Windows

API Server Issues

Server Won’t Start

Problem: API server fails to start

Solutions:

# 1. Check if port is in use
lsof -i :8080
 
# 2. Use different port
inferno serve --port 8081
 
# 3. Check logs
tail -f ~/.local/share/inferno/logs/inferno.log
 
# 4. Verify configuration
inferno config validate
 
# 5. Start with minimal config
inferno serve --host 127.0.0.1 --port 8080

Connection Refused

Problem: Cannot connect to API server

Solutions:

# 1. Verify server is running
ps aux | grep inferno
 
# 2. Check if listening on correct interface
netstat -tuln | grep 8080
 
# 3. Use correct URL
curl http://localhost:8080/health  # Not 127.0.0.1 vs localhost
 
# 4. Check firewall
sudo ufw status  # Linux
netsh advfirewall show allprofiles  # Windows
 
# 5. Bind to all interfaces
inferno serve --host 0.0.0.0

Request Timeouts

Problem: API requests timeout

Solutions:

# 1. Increase server timeout
inferno serve --timeout 600  # 10 minutes
 
# 2. Reduce max_tokens in request
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "...", "max_tokens": 512}'
 
# 3. Use streaming
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "...", "stream": true}'
 
# 4. Check system resources
top
nvidia-smi  # If using GPU

401 Unauthorized

Problem: Authentication errors

Solutions:

# 1. Check if auth is enabled
inferno config show security
 
# 2. Use correct API key
curl -H "Authorization: Bearer YOUR_KEY" ...
 
# 3. Disable auth for testing
inferno serve --no-auth
 
# 4. Generate new API key
inferno auth create-key

Docker Issues

Container Won’t Start

Problem: Docker container fails to start

Solutions:

# 1. Check container logs
docker logs inferno
 
# 2. Verify image
docker images | grep inferno
 
# 3. Check port binding
docker ps -a
 
# 4. Remove and recreate
docker rm -f inferno
docker run -d --name inferno -p 8080:8080 ringo380/inferno:latest
 
# 5. Check resource limits
docker stats inferno

GPU Not Available in Docker

Problem: GPU not accessible from container

Solutions:

# 1. Install NVIDIA Container Toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
 
# 2. Run with GPU support
docker run --gpus all -p 8080:8080 ringo380/inferno:latest
 
# 3. Verify GPU in container
docker exec -it inferno nvidia-smi
 
# 4. Use correct runtime
docker run --runtime=nvidia --gpus all ...

Platform-Specific Issues

macOS

Apple Silicon Metal Issues:

# Update macOS
softwareupdate -l
sudo softwareupdate -ia
 
# Reinstall Xcode Command Line Tools
xcode-select --install
 
# Check Metal support
system_profiler SPDisplaysDataType | grep Metal

Linux

Library Not Found:

# Install missing libraries
sudo apt-get install libgomp1 libopenblas-dev  # Debian/Ubuntu
sudo yum install libgomp openblas-devel        # CentOS/RHEL
 
# Update library cache
sudo ldconfig

Windows

DLL Missing:

# Install Visual C++ Redistributable
# https://aka.ms/vs/17/release/vc_redist.x64.exe
 
# Install CUDA (for NVIDIA)
# https://developer.nvidia.com/cuda-downloads

Logging and Debugging

Enable Debug Logging

# Verbose logging
inferno serve --log-level debug
 
# Trace logging (very verbose)
inferno serve --log-level trace
 
# Log to file
inferno serve --log-file /var/log/inferno/debug.log

View Logs

# Real-time logs
tail -f ~/.local/share/inferno/logs/inferno.log
 
# Last 100 lines
tail -n 100 ~/.local/share/inferno/logs/inferno.log
 
# Search logs
grep "ERROR" ~/.local/share/inferno/logs/inferno.log

Getting Help

If you’re still stuck:

Check Documentation: Browse the full documentation
Search GitHub Issues: github.com/ringo380/inferno/issues
Create an Issue: Report a bug
Check Logs: Always include logs when reporting issues

Providing Debug Information

When asking for help, include:

# System information
inferno info
 
# GPU information
inferno info --gpu
 
# Configuration
inferno config show
 
# Recent logs
tail -n 50 ~/.local/share/inferno/logs/inferno.log

Maintenance

Clean Cache

# Clear model cache
inferno cache clear
 
# Clear all cache
rm -rf ~/.local/share/inferno/cache/*

Update Inferno

# macOS (Homebrew)
brew upgrade inferno
 
# Linux (package manager)
sudo apt-get update && sudo apt-get upgrade inferno
 
# Docker
docker pull ringo380/inferno:latest

Next Steps

Production Deployment - Deploy to production
Configuration - Optimize configuration
Performance Optimization - Improve performance