Common issues and solutions for Inferno AI.
Before diving into specific issues, run these diagnostic commands:
# Check Inferno version
inferno --version
# View system information
inferno info
# Check GPU status
inferno info --gpu
# Validate configuration
inferno config validate
# View logs
tail -f ~/.local/share/inferno/logs/inferno.log
Problem: inferno: command not found
after installation
Solutions:
# 1. Check if binary exists
which inferno
ls -l /usr/local/bin/inferno
# 2. Add to PATH (Linux/macOS)
export PATH="$PATH:/usr/local/bin"
echo 'export PATH="$PATH:/usr/local/bin"' >> ~/.bashrc
source ~/.bashrc
# 3. Make executable
sudo chmod +x /usr/local/bin/inferno
# 4. Reinstall
brew reinstall inferno # macOS
Windows:
# Add to PATH (run as Administrator)
$env:Path += ";C:\Program Files\Inferno"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [EnvironmentVariableTarget]::Machine)
Problem: Permission denied when running commands
Solution:
# Make binary executable
sudo chmod +x /usr/local/bin/inferno
# Fix ownership
sudo chown $USER:$USER ~/.local/share/inferno
# Or run with sudo (not recommended for regular use)
sudo inferno --version
Problem: Error: Address already in use (port 8080)
Solutions:
# 1. Find process using port
lsof -i :8080 # macOS/Linux
netstat -ano | findstr :8080 # Windows
# 2. Kill the process
kill -9 <PID> # macOS/Linux
taskkill /PID <PID> /F # Windows
# 3. Use a different port
inferno serve --port 8081
Problem: GPU not being used despite being available
Diagnostic:
# Check GPU status
inferno info --gpu
# Verify CUDA (NVIDIA)
nvidia-smi
# Verify ROCm (AMD)
rocm-smi
# Verify Metal (Apple Silicon)
system_profiler SPDisplaysDataType
Solutions:
NVIDIA GPU:
# Install/update CUDA toolkit
# https://developer.nvidia.com/cuda-downloads
# Verify CUDA installation
nvcc --version
# Force CUDA backend
inferno serve --gpu-backend cuda
# Check CUDA libraries
ldconfig -p | grep cuda
AMD GPU:
# Install ROCm
# Follow: https://rocmdocs.amd.com/en/latest/
# Verify ROCm
rocm-smi
# Force ROCm backend
inferno serve --gpu-backend rocm
Apple Silicon:
# Metal should work automatically
# If not, update macOS
softwareupdate -l
# Check for Metal support
inferno info --gpu
Problem: CUDA out of memory
or similar GPU memory errors
Solutions:
# 1. Use a smaller model
inferno run --model llama-2-7b-chat # Instead of 70B
# 2. Reduce GPU layers
inferno serve --gpu-layers 20 # Instead of -1 (all)
# 3. Enable memory mapping
inferno serve --mmap
# 4. Clear GPU cache
nvidia-smi --gpu-reset # NVIDIA
# 5. Reduce batch size
inferno serve --batch-size 256 # Lower than default
Problem: GPU inference is slower than expected
Solutions:
# 1. Ensure GPU is being used
inferno info --gpu # Should show GPU backend
# 2. Offload all layers to GPU
inferno serve --gpu-layers -1
# 3. Increase batch size
inferno serve --batch-size 512
# 4. Check GPU utilization
nvidia-smi dmon # NVIDIA
watch -n 1 rocm-smi # AMD
# 5. Update GPU drivers
Problem: Cannot download models
Solutions:
# 1. Check internet connection
ping huggingface.co
# 2. Download manually
wget https://huggingface.co/...model.gguf
inferno run --model-path ./model.gguf
# 3. Check disk space
df -h
# 4. Use alternative source
inferno models download --source huggingface MODEL_NAME
# 5. Retry with verbose output
inferno models download MODEL_NAME --verbose
Problem: Error loading model
or model fails to load
Solutions:
# 1. Verify model file integrity
inferno models info MODEL_NAME
# 2. Check model format
file /path/to/model.gguf # Should show GGUF format
# 3. Re-download model
inferno models remove MODEL_NAME
inferno models download MODEL_NAME
# 4. Check available memory
free -h # Linux/macOS
systeminfo | findstr Memory # Windows
# 5. Use memory-mapped loading
inferno run --model MODEL --mmap
Problem: Model exists but Inferno can’t find it
Solutions:
# 1. List available models
inferno models list
# 2. Check models directory
ls -la ~/.local/share/inferno/models/
# 3. Use absolute path
inferno run --model-path /full/path/to/model.gguf
# 4. Set models directory
export INFERNO_MODELS_DIR=/path/to/models
inferno serve
# 5. Verify configuration
inferno config show models
Problem: Inference is slower than expected
Solutions:
# 1. Enable GPU acceleration
inferno serve --gpu-layers -1
# 2. Use quantized models
inferno models download llama-2-7b-q4 # 4-bit quantized
# 3. Optimize thread count
inferno serve --threads $(nproc) # Use all CPU cores
# 4. Enable memory mapping
inferno serve --mmap
# 5. Increase batch size
inferno serve --batch-size 512
# 6. Preload models
inferno models preload MODEL_NAME
Problem: Inferno uses too much RAM
Solutions:
# 1. Use smaller/quantized models
inferno models download MODEL-q4 # 4-bit quantization
# 2. Enable memory mapping
inferno serve --mmap
# 3. Offload to GPU
inferno serve --gpu-layers -1
# 4. Reduce context size
inferno serve --context-size 2048 # Instead of 4096
# 5. Monitor memory usage
inferno info --memory
Problem: Excessive CPU usage
Solutions:
# 1. Reduce thread count
inferno serve --threads 4
# 2. Offload to GPU
inferno serve --gpu-layers -1
# 3. Lower batch size
inferno serve --batch-size 128
# 4. Check for background processes
top # Linux/macOS
tasklist # Windows
Problem: API server fails to start
Solutions:
# 1. Check if port is in use
lsof -i :8080
# 2. Use different port
inferno serve --port 8081
# 3. Check logs
tail -f ~/.local/share/inferno/logs/inferno.log
# 4. Verify configuration
inferno config validate
# 5. Start with minimal config
inferno serve --host 127.0.0.1 --port 8080
Problem: Cannot connect to API server
Solutions:
# 1. Verify server is running
ps aux | grep inferno
# 2. Check if listening on correct interface
netstat -tuln | grep 8080
# 3. Use correct URL
curl http://localhost:8080/health # Not 127.0.0.1 vs localhost
# 4. Check firewall
sudo ufw status # Linux
netsh advfirewall show allprofiles # Windows
# 5. Bind to all interfaces
inferno serve --host 0.0.0.0
Problem: API requests timeout
Solutions:
# 1. Increase server timeout
inferno serve --timeout 600 # 10 minutes
# 2. Reduce max_tokens in request
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "...", "max_tokens": 512}'
# 3. Use streaming
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "...", "stream": true}'
# 4. Check system resources
top
nvidia-smi # If using GPU
Problem: Authentication errors
Solutions:
# 1. Check if auth is enabled
inferno config show security
# 2. Use correct API key
curl -H "Authorization: Bearer YOUR_KEY" ...
# 3. Disable auth for testing
inferno serve --no-auth
# 4. Generate new API key
inferno auth create-key
Problem: Docker container fails to start
Solutions:
# 1. Check container logs
docker logs inferno
# 2. Verify image
docker images | grep inferno
# 3. Check port binding
docker ps -a
# 4. Remove and recreate
docker rm -f inferno
docker run -d --name inferno -p 8080:8080 ringo380/inferno:latest
# 5. Check resource limits
docker stats inferno
Problem: GPU not accessible from container
Solutions:
# 1. Install NVIDIA Container Toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
# 2. Run with GPU support
docker run --gpus all -p 8080:8080 ringo380/inferno:latest
# 3. Verify GPU in container
docker exec -it inferno nvidia-smi
# 4. Use correct runtime
docker run --runtime=nvidia --gpus all ...
Apple Silicon Metal Issues:
# Update macOS
softwareupdate -l
sudo softwareupdate -ia
# Reinstall Xcode Command Line Tools
xcode-select --install
# Check Metal support
system_profiler SPDisplaysDataType | grep Metal
Library Not Found:
# Install missing libraries
sudo apt-get install libgomp1 libopenblas-dev # Debian/Ubuntu
sudo yum install libgomp openblas-devel # CentOS/RHEL
# Update library cache
sudo ldconfig
DLL Missing:
# Install Visual C++ Redistributable
# https://aka.ms/vs/17/release/vc_redist.x64.exe
# Install CUDA (for NVIDIA)
# https://developer.nvidia.com/cuda-downloads
# Verbose logging
inferno serve --log-level debug
# Trace logging (very verbose)
inferno serve --log-level trace
# Log to file
inferno serve --log-file /var/log/inferno/debug.log
# Real-time logs
tail -f ~/.local/share/inferno/logs/inferno.log
# Last 100 lines
tail -n 100 ~/.local/share/inferno/logs/inferno.log
# Search logs
grep "ERROR" ~/.local/share/inferno/logs/inferno.log
If you’re still stuck:
When asking for help, include:
# System information
inferno info
# GPU information
inferno info --gpu
# Configuration
inferno config show
# Recent logs
tail -n 50 ~/.local/share/inferno/logs/inferno.log
# Clear model cache
inferno cache clear
# Clear all cache
rm -rf ~/.local/share/inferno/cache/*
# macOS (Homebrew)
brew upgrade inferno
# Linux (package manager)
sudo apt-get update && sudo apt-get upgrade inferno
# Docker
docker pull ringo380/inferno:latest