Welcome to the comprehensive documentation for Inferno AI - a high-performance, privacy-first AI inference server that runs 100% locally on your machine.
What is Inferno?
Inferno is a powerful AI inference server designed for local AI model execution with enterprise-grade features. It provides:
Complete Privacy: All inference happens on your device - your data never leaves your machine
Exceptional Performance: GPU acceleration with Metal (13x faster on Apple Silicon), CUDA, and ROCm support
OpenAI Compatible: Drop-in replacement for OpenAI API - works with existing tools
Multi-Format Support: Production-ready GGUF support, ONNX in active development
Cross-Platform: Available for macOS, Linux, Windows, and Docker