Is Hoonify AI compatible with OpenAI SDKs?

Yes. Set base_url to https://api.hoonify.ai/v1 and swap your API key. No other code changes needed.

What models are available on Hoonify AI?

10+ open-source models including DeepSeek R1, Qwen 3, Llama 4 Scout, Mistral Large, Gemma 2, and more. New models are added the same day they release.

Does Hoonify AI store my prompts?

No. Hoonify AI has zero data retention by default. Prompts and completions are never stored, logged, or used for training.

Developer Documentation

API Reference

Everything you need to start running open models via the Hoonify AI API. OpenAI-compatible. No infrastructure required.

Quick Start

Get from zero to your first API response in under 5 minutes. Hoonify AI is fully OpenAI-compatible — no new SDK to learn.

Get your API key

Install the OpenAI SDK

Run pip install openai or npm install openai. Hoonify AI speaks the OpenAI protocol — no additional packages needed.

Point to Hoonify AI

Set base_url to https://api.hoonify.ai/v1. That's the only change required.

quickstart.py

1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="https://api.hoonify.ai/v1",
5    api_key="YOUR_API_KEY"
6)
7 
8response = client.chat.completions.create(
9    model="qwen-3",
10    messages=[{"role": "user", "content": "Hello!"}]
11)
12 
13print(response.choices[0].message.content)

Authentication

All API requests must include your API key as a Bearer token in the Authorization header.

Authorization: Bearer YOUR_API_KEY

Key format

API keys begin with hoo_ followed by a random string. Store keys in environment variables — never hardcode them in source.

Keep your API key secure. If compromised, rotate it immediately from your dashboard. Hoonify AI will never ask for your key over email or chat.

Your First Request

The chat endpoint is POST /v1/chat/completions. It accepts the same shape as the OpenAI Chat API.

Endpoint

POST https://api.hoonify.ai/v1/chat/completions

Response shape

response.json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "qwen-3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22
  }
}

Models

Retrieve available models programmatically or browse the full catalog with pricing at /models.

List models

GET https://api.hoonify.ai/v1/models

list_models.py

1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="https://api.hoonify.ai/v1",
5    api_key="YOUR_API_KEY"
6)
7 
8models = client.models.list()
9for model in models.data:
10    print(model.id)

Available model IDs

Model ID	Provider	Context	Description
qwen-3	Qwen	128K	State-of-the-art reasoning, switchable thinking mode
deepseek-r1	DeepSeek	128K	Chain-of-thought reasoning, frontier performance
llama-4-scout	Meta	10M	Natively multimodal with 10M token context
qwen2-5-72b	Qwen	128K	General-purpose instruction following
deepseek-v3	DeepSeek	128K	Efficient MoE, fast generation
llama-3-3-70b	Meta	128K	Battle-tested production LLM
qwq-32b	Qwen	32K	Lightweight reasoning for math and code
gemma-2-27b	Google	8K	Compact, cost-efficient generation
mistral-7b-instruct	Mistral	32K	Fast, high-throughput inference
mistral-large	Mistral	128K	Flagship with strong multilingual support

Chat Completions

The Chat Completions endpoint supports the full OpenAI Chat API parameter set. Pass a messages array with one or more turns, optionally prefixed with a system message.

Request parameters

Parameter	Type	Required	Description
model	string	required	Model ID to use. See /v1/models for the full list.
messages	array	required	Array of message objects with role and content fields.
stream	boolean	optional	Stream responses as SSE. Default: false.
temperature	number	optional	Sampling temperature 0–2. Higher = more random. Default: 1.
max_tokens	integer	optional	Maximum tokens to generate. Model context limit applies.
top_p	number	optional	Nucleus sampling probability mass. Default: 1.
frequency_penalty	number	optional	Penalty for token frequency. Range: -2 to 2. Default: 0.
presence_penalty	number	optional	Penalty for token presence. Range: -2 to 2. Default: 0.
stop	string \| array	optional	Stop sequences — model halts before generating them.
user	string	optional	End-user identifier for abuse monitoring.

Message roles

Each object in messages takes a role of "system", "user", or "assistant". The content field is a string for text, or an array for multimodal inputs on vision models.

Streaming

Set stream: true to receive responses as server-sent events (SSE). Each event contains a JSON delta with a partial completion. The stream ends with data: [DONE].

streaming.py

1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="https://api.hoonify.ai/v1",
5    api_key="YOUR_API_KEY"
6)
7 
8stream = client.chat.completions.create(
9    model="qwen-3",
10    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
11    stream=True
12)
13 
14for chunk in stream:
15    delta = chunk.choices[0].delta.content
16    if delta:
17        print(delta, end="", flush=True)

Event format

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Rate Limits

Rate limits are enforced per API key and vary by plan. Current limits are included in response headers on every request.

Header	Description
x-ratelimit-limit-requests	Maximum requests per minute for your tier
x-ratelimit-remaining-requests	Requests remaining in current window
x-ratelimit-reset-requests	ISO 8601 timestamp when window resets
x-ratelimit-limit-tokens	Maximum tokens per minute for your tier
x-ratelimit-remaining-tokens	Tokens remaining in current window

For higher limits in production, contact us about dedicated inference with reserved capacity.

Error Handling

All errors return a JSON body with an error object containing a message field. HTTP status codes follow standard REST conventions.

400

Bad Request

Malformed request body or missing required parameters.

401

Unauthorized

Invalid or missing API key.

403

Forbidden

Key valid but lacks permission for this operation.

422

Unprocessable Entity

Request structure valid but parameters are semantically invalid.

429

Rate Limited

Too many requests. Use exponential backoff and retry.

500

Server Error

Unexpected server error. Retry with backoff. Contact support if persistent.

503

Service Unavailable

Model temporarily unavailable. Retry after a short delay.

Error response shape

{
  "error": {
    "message": "Invalid API key provided.",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Retry strategy

For 429 and 5xx errors, use exponential backoff: wait 1 s, 2 s, 4 s before retrying. Most transient errors resolve within three attempts.

Code Examples

Practical patterns for the most common use cases.

Multi-turn conversation

Build conversation history by appending each assistant reply to messages before the next request.

multi_turn.py

1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="https://api.hoonify.ai/v1",
5    api_key="YOUR_API_KEY"
6)
7 
8messages = [
9    {"role": "system", "content": "You are a concise technical assistant."},
10    {"role": "user", "content": "What is a transformer model?"},
11]
12 
13response = client.chat.completions.create(model="qwen-3", messages=messages)
14reply = response.choices[0].message.content
15messages.append({"role": "assistant", "content": reply})
16 
17# Continue the conversation
18messages.append({"role": "user", "content": "How is attention used in LLMs?"})
19response = client.chat.completions.create(model="qwen-3", messages=messages)
20print(response.choices[0].message.content)

System prompt

Add a "system" role message at the start of messages to set the model's persona or constraints for the entire session. System prompts are strongly supported by all models in the catalog.

Ready to build?