Skip to Content

Inferno AI Documentation

Welcome to the comprehensive documentation for Inferno AI - a high-performance, privacy-first AI inference server that runs 100% locally on your machine.

What is Inferno?

Inferno is a powerful AI inference server designed for local AI model execution with enterprise-grade features. It provides:


Documentation Sections

Getting Started

New to Inferno? Start here for installation guides, quick start tutorials, and initial configuration.

Guides

In-depth guides for common workflows, production deployment, and best practices.

Tutorials

Step-by-step tutorials for specific use cases and advanced features.

Reference

Complete technical reference for APIs, CLI commands, and architecture.

Examples

Practical code examples and integration guides for popular frameworks.

Troubleshooting

Common issues, solutions, and debugging techniques.



Version Information

Current version: v0.7.0

Inferno is in active development. We encourage you to report any issues you encounter and contribute to the project!


Key Features at a Glance

Privacy-First Architecture

Run AI models completely offline. Perfect for sensitive data, confidential work, or when you value your privacy.

GPU Acceleration

OpenAI-Compatible API

Works seamlessly with existing tools and libraries. Simply change the base URL:

from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)
 
response = client.chat.completions.create(
    model="llama-2-7b-chat",
    messages=[{"role": "user", "content": "Hello!"}]
)

Model Format Support


Getting Help

Ready to get started? Head to the Getting Started section