Quick Start
Get from zero to your first API response in under 5 minutes. Hoonify AI is fully OpenAI-compatible — no new SDK to learn.
pip install openai or npm install openai. Hoonify AI speaks the OpenAI protocol — no additional packages needed.base_url to https://api.hoonify.ai/v1. That's the only change required.1from openai import OpenAI23client = OpenAI(4 base_url="https://api.hoonify.ai/v1",5 api_key="YOUR_API_KEY"6)78response = client.chat.completions.create(9 model="qwen-3",10 messages=[{"role": "user", "content": "Hello!"}]11)1213print(response.choices[0].message.content)
Authentication
All API requests must include your API key as a Bearer token in the Authorization header.
Key format
API keys begin with hoo_ followed by a random string. Store keys in environment variables — never hardcode them in source.
Keep your API key secure. If compromised, rotate it immediately from your dashboard. Hoonify AI will never ask for your key over email or chat.
Your First Request
The chat endpoint is POST /v1/chat/completions. It accepts the same shape as the OpenAI Chat API.
Endpoint
Response shape
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "qwen-3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
"total_tokens": 22
}
}Models
Retrieve available models programmatically or browse the full catalog with pricing at /models.
List models
1from openai import OpenAI23client = OpenAI(4 base_url="https://api.hoonify.ai/v1",5 api_key="YOUR_API_KEY"6)78models = client.models.list()9for model in models.data:10 print(model.id)
Available model IDs
| Model ID | Provider | Context | Description |
|---|---|---|---|
| qwen-3 | Qwen | 128K | |
| deepseek-r1 | DeepSeek | 128K | |
| llama-4-scout | Meta | 10M | |
| qwen2-5-72b | Qwen | 128K | |
| deepseek-v3 | DeepSeek | 128K | |
| llama-3-3-70b | Meta | 128K | |
| qwq-32b | Qwen | 32K | |
| gemma-2-27b | 8K | ||
| mistral-7b-instruct | Mistral | 32K | |
| mistral-large | Mistral | 128K |
Chat Completions
The Chat Completions endpoint supports the full OpenAI Chat API parameter set. Pass a messages array with one or more turns, optionally prefixed with a system message.
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | Model ID to use. See /v1/models for the full list. |
| messages | array | required | Array of message objects with role and content fields. |
| stream | boolean | optional | Stream responses as SSE. Default: false. |
| temperature | number | optional | Sampling temperature 0–2. Higher = more random. Default: 1. |
| max_tokens | integer | optional | Maximum tokens to generate. Model context limit applies. |
| top_p | number | optional | Nucleus sampling probability mass. Default: 1. |
| frequency_penalty | number | optional | Penalty for token frequency. Range: -2 to 2. Default: 0. |
| presence_penalty | number | optional | Penalty for token presence. Range: -2 to 2. Default: 0. |
| stop | string | array | optional | Stop sequences — model halts before generating them. |
| user | string | optional | End-user identifier for abuse monitoring. |
Message roles
Each object in messages takes a role of "system", "user", or "assistant". The content field is a string for text, or an array for multimodal inputs on vision models.
Streaming
Set stream: true to receive responses as server-sent events (SSE). Each event contains a JSON delta with a partial completion. The stream ends with data: [DONE].
1from openai import OpenAI23client = OpenAI(4 base_url="https://api.hoonify.ai/v1",5 api_key="YOUR_API_KEY"6)78stream = client.chat.completions.create(9 model="qwen-3",10 messages=[{"role": "user", "content": "Explain quantum entanglement."}],11 stream=True12)1314for chunk in stream:15 delta = chunk.choices[0].delta.content16 if delta:17 print(delta, end="", flush=True)
Event format
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"!"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]Rate Limits
Rate limits are enforced per API key and vary by plan. Current limits are included in response headers on every request.
| Header | Description |
|---|---|
| x-ratelimit-limit-requests | Maximum requests per minute for your tier |
| x-ratelimit-remaining-requests | Requests remaining in current window |
| x-ratelimit-reset-requests | ISO 8601 timestamp when window resets |
| x-ratelimit-limit-tokens | Maximum tokens per minute for your tier |
| x-ratelimit-remaining-tokens | Tokens remaining in current window |
For higher limits in production, contact us about dedicated inference with reserved capacity.
Error Handling
All errors return a JSON body with an error object containing a message field. HTTP status codes follow standard REST conventions.
Error response shape
{
"error": {
"message": "Invalid API key provided.",
"type": "authentication_error",
"code": "invalid_api_key"
}
}Retry strategy
For 429 and 5xx errors, use exponential backoff: wait 1 s, 2 s, 4 s before retrying. Most transient errors resolve within three attempts.
Code Examples
Practical patterns for the most common use cases.
Multi-turn conversation
Build conversation history by appending each assistant reply to messages before the next request.
1from openai import OpenAI23client = OpenAI(4 base_url="https://api.hoonify.ai/v1",5 api_key="YOUR_API_KEY"6)78messages = [9 {"role": "system", "content": "You are a concise technical assistant."},10 {"role": "user", "content": "What is a transformer model?"},11]1213response = client.chat.completions.create(model="qwen-3", messages=messages)14reply = response.choices[0].message.content15messages.append({"role": "assistant", "content": reply})1617# Continue the conversation18messages.append({"role": "user", "content": "How is attention used in LLMs?"})19response = client.chat.completions.create(model="qwen-3", messages=messages)20print(response.choices[0].message.content)
System prompt
Add a "system" role message at the start of messages to set the model's persona or constraints for the entire session. System prompts are strongly supported by all models in the catalog.