Are all models OpenAI API-compatible?

Yes. Every model on Hoonify AI is accessible through an OpenAI-compatible endpoint. Simply change your base URL and model name — no SDK changes, no rewrites.

How quickly are new models added?

We aim to have new open-source models available the same day they are publicly released.

Is my data used for training?

No. Hoonify AI has zero data retention by default. Your prompts and completions are never stored, logged, or used to train any model.

How is pricing calculated?

All models are priced per million tokens, billed separately for input and output. You only pay for what you use — no seat fees or monthly minimums.

Do you support streaming and batch requests?

Both are supported. Pass stream: true for real-time token streaming, or omit it for a standard synchronous response.

Model Catalog

Open AI Models, Ready to Run

OpenAI-compatible inference for the latest open models. Swap your base URL and go.

Browse 12 open-source models from 7 providers

DeepSeek R1 0528

Just launched

TextReasoningCode

Provider: DeepSeek
Context: 128K
Parameters: 671B MoE

Updated R1 with deeper reasoning and algorithmic optimizations. Performance approaching O3 and Gemini 2.5 Pro on math, coding, and logic benchmarks.

Input

$0.55 / 1M tokens

Output

$2.19 / 1M tokens

model="deepseek-r1-0528"

DeepSeek V3.2

Just launched

TextChatReasoningCode

Provider: DeepSeek
Context: 128K
Parameters: 685B MoE

Harmonizes computational efficiency with superior reasoning and agentic performance via Sparse Attention and a Scalable Reinforcement Learning Framework.

Input

$0.27 / 1M tokens

Output

$1.10 / 1M tokens

model="deepseek-v3-2"

GLM 4.7

TextChatCodeReasoning

Provider: GLM
Context: 128K
Parameters: ~9B

Code-focused model with core coding, vibe coding, tool use, and complex reasoning. Designed as a developer-facing coding partner.

Input

$0.10 / 1M tokens

Output

$0.25 / 1M tokens

model="glm-4-7"

GLM 5

Just launched

TextChatCodeReasoning

Provider: GLM
Context: 128K
Parameters: 744B MoE

744B MoE (40B active) targeting complex systems engineering and long-horizon agentic tasks. Integrates DeepSeek Sparse Attention for reduced deployment cost.

Input

$0.40 / 1M tokens

Output

$1.50 / 1M tokens

model="glm-5"

GPT-OSS-120b

Just launched

TextChatReasoningCode

Provider: OpenAI
Context: 128K
Parameters: 120B

OpenAI's open-weight 120B model designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Input

$0.80 / 1M tokens

Output

$2.40 / 1M tokens

model="gpt-oss-120b"

Kimi K2.5

MultimodalChatReasoning

Provider: Kimi
Context: 128K
Parameters: ~72B

Native multimodal agentic model trained on ~15T mixed visual and text tokens. Integrates vision, language, and advanced agentic capabilities with instant and thinking modes.

Input

$0.20 / 1M tokens

Output

$0.60 / 1M tokens

model="kimi-k2-5"

Llama 4 Maverick 17B 128E

MultimodalChatReasoning

Provider: Meta
Context: 128K
Parameters: 17B × 128E

Natively multimodal MoE model with 128 experts offering industry-leading text and image understanding. Part of Meta's next-generation Llama 4 series.

Input

$0.20 / 1M tokens

Output

$0.65 / 1M tokens

model="llama-4-maverick-17b-128e"

Llama 3.3 70B

TextChatCode

Provider: Meta
Context: 128K
Parameters: 70B

Instruction-tuned 70B model optimized for multilingual dialogue, outperforming many open and closed chat models on industry benchmarks.

Input

$0.12 / 1M tokens

Output

$0.30 / 1M tokens

model="llama-3-3-70b"

MiniMax M2.5

TextChatCodeReasoning

Provider: MiniMax
Context: 128K
Parameters: —

RL-trained across hundreds of thousands of real-world environments. Achieves 80.2% on SWE-Bench Verified. SOTA in coding, agentic tool use, and office tasks.

Input

$0.30 / 1M tokens

Output

$1.00 / 1M tokens

model="minimax-m2-5"

Early Access

Be first in line.

Frequently Asked Questions

Everything you need to know about the Hoonify AI model catalog and API.