Developer Documentation

API Reference

Everything you need to start running open models via the Hoonify AI API. OpenAI-compatible. No infrastructure required.

Quick Start

Get from zero to your first API response in under 5 minutes. Hoonify AI is fully OpenAI-compatible — no new SDK to learn.

1
Get your API key
Sign up for early API access.
2
Install the OpenAI SDK
Run pip install openai or npm install openai. Hoonify AI speaks the OpenAI protocol — no additional packages needed.
3
Point to Hoonify AI
Set base_url to https://api.hoonify.ai/v1. That's the only change required.
quickstart.py
1from openai import OpenAI
2 
3client = OpenAI(
4 base_url="https://api.hoonify.ai/v1",
5 api_key="YOUR_API_KEY"
6)
7 
8response = client.chat.completions.create(
9 model="qwen-3",
10 messages=[{"role": "user", "content": "Hello!"}]
11)
12 
13print(response.choices[0].message.content)

Authentication

All API requests must include your API key as a Bearer token in the Authorization header.

Authorization: Bearer YOUR_API_KEY

Key format

API keys begin with hoo_ followed by a random string. Store keys in environment variables — never hardcode them in source.

Keep your API key secure. If compromised, rotate it immediately from your dashboard. Hoonify AI will never ask for your key over email or chat.

Your First Request

The chat endpoint is POST /v1/chat/completions. It accepts the same shape as the OpenAI Chat API.

Endpoint

POST https://api.hoonify.ai/v1/chat/completions

Response shape

response.json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "qwen-3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22
  }
}

Models

Retrieve available models programmatically or browse the full catalog with pricing at /models.

List models

GET https://api.hoonify.ai/v1/models
list_models.py
1from openai import OpenAI
2 
3client = OpenAI(
4 base_url="https://api.hoonify.ai/v1",
5 api_key="YOUR_API_KEY"
6)
7 
8models = client.models.list()
9for model in models.data:
10 print(model.id)

Available model IDs

Model IDProviderContextDescription
qwen-3Qwen128K
deepseek-r1DeepSeek128K
llama-4-scoutMeta10M
qwen2-5-72bQwen128K
deepseek-v3DeepSeek128K
llama-3-3-70bMeta128K
qwq-32bQwen32K
gemma-2-27bGoogle8K
mistral-7b-instructMistral32K
mistral-largeMistral128K

Chat Completions

The Chat Completions endpoint supports the full OpenAI Chat API parameter set. Pass a messages array with one or more turns, optionally prefixed with a system message.

Request parameters

ParameterTypeRequiredDescription
modelstringrequiredModel ID to use. See /v1/models for the full list.
messagesarrayrequiredArray of message objects with role and content fields.
streambooleanoptionalStream responses as SSE. Default: false.
temperaturenumberoptionalSampling temperature 0–2. Higher = more random. Default: 1.
max_tokensintegeroptionalMaximum tokens to generate. Model context limit applies.
top_pnumberoptionalNucleus sampling probability mass. Default: 1.
frequency_penaltynumberoptionalPenalty for token frequency. Range: -2 to 2. Default: 0.
presence_penaltynumberoptionalPenalty for token presence. Range: -2 to 2. Default: 0.
stopstring | arrayoptionalStop sequences — model halts before generating them.
userstringoptionalEnd-user identifier for abuse monitoring.

Message roles

Each object in messages takes a role of "system", "user", or "assistant". The content field is a string for text, or an array for multimodal inputs on vision models.

Streaming

Set stream: true to receive responses as server-sent events (SSE). Each event contains a JSON delta with a partial completion. The stream ends with data: [DONE].

streaming.py
1from openai import OpenAI
2 
3client = OpenAI(
4 base_url="https://api.hoonify.ai/v1",
5 api_key="YOUR_API_KEY"
6)
7 
8stream = client.chat.completions.create(
9 model="qwen-3",
10 messages=[{"role": "user", "content": "Explain quantum entanglement."}],
11 stream=True
12)
13 
14for chunk in stream:
15 delta = chunk.choices[0].delta.content
16 if delta:
17 print(delta, end="", flush=True)

Event format

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Rate Limits

Rate limits are enforced per API key and vary by plan. Current limits are included in response headers on every request.

HeaderDescription
x-ratelimit-limit-requestsMaximum requests per minute for your tier
x-ratelimit-remaining-requestsRequests remaining in current window
x-ratelimit-reset-requestsISO 8601 timestamp when window resets
x-ratelimit-limit-tokensMaximum tokens per minute for your tier
x-ratelimit-remaining-tokensTokens remaining in current window

For higher limits in production, contact us about dedicated inference with reserved capacity.

Error Handling

All errors return a JSON body with an error object containing a message field. HTTP status codes follow standard REST conventions.

400
Bad Request
Malformed request body or missing required parameters.
401
Unauthorized
Invalid or missing API key.
403
Forbidden
Key valid but lacks permission for this operation.
422
Unprocessable Entity
Request structure valid but parameters are semantically invalid.
429
Rate Limited
Too many requests. Use exponential backoff and retry.
500
Server Error
Unexpected server error. Retry with backoff. Contact support if persistent.
503
Service Unavailable
Model temporarily unavailable. Retry after a short delay.

Error response shape

{
  "error": {
    "message": "Invalid API key provided.",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Retry strategy

For 429 and 5xx errors, use exponential backoff: wait 1 s, 2 s, 4 s before retrying. Most transient errors resolve within three attempts.

Code Examples

Practical patterns for the most common use cases.

Multi-turn conversation

Build conversation history by appending each assistant reply to messages before the next request.

multi_turn.py
1from openai import OpenAI
2 
3client = OpenAI(
4 base_url="https://api.hoonify.ai/v1",
5 api_key="YOUR_API_KEY"
6)
7 
8messages = [
9 {"role": "system", "content": "You are a concise technical assistant."},
10 {"role": "user", "content": "What is a transformer model?"},
11]
12 
13response = client.chat.completions.create(model="qwen-3", messages=messages)
14reply = response.choices[0].message.content
15messages.append({"role": "assistant", "content": reply})
16 
17# Continue the conversation
18messages.append({"role": "user", "content": "How is attention used in LLMs?"})
19response = client.chat.completions.create(model="qwen-3", messages=messages)
20print(response.choices[0].message.content)

System prompt

Add a "system" role message at the start of messages to set the model's persona or constraints for the entire session. System prompts are strongly supported by all models in the catalog.

Ready to build?
Sign up for early access when we launch.