AI API Token Cost Calculator
Estimate the cost of using AI language model APIs (OpenAI, Anthropic, Google) based on token count and model selection.
Compare GPT-4o, Claude, and Gemini pricing.
How AI API Pricing Works Most AI language model APIs (OpenAI, Anthropic, Google) charge per token — a unit roughly equal to 4 characters or 0.75 words in English. Pricing is usually quoted per 1 million tokens and is split between input tokens (your prompt) and output tokens (the model’s response).
Key Concepts
- Input tokens — everything you send to the model: system prompt, conversation history, user message
- Output tokens — the model’s response. Output is always more expensive than input.
- Context window — the maximum tokens (input + output) the model can process at once
The Formula Cost per request = (Input tokens × Input price + Output tokens × Output price) ÷ 1,000,000 Daily cost = Cost per request × Requests per day Monthly cost = Daily cost × 30
Token Estimation Rules of Thumb
- 1,000 tokens ≈ 750 words ≈ about 1.5 pages of English text
- A typical short question with context: 500–2,000 input tokens
- A typical detailed answer: 200–1,000 output tokens
- Long documents: 100 tokens per ~75 words
- Code is often more tokens per word than plain English
Model Pricing (approximate, early 2025) Prices change frequently — always check the official provider pricing pages before budgeting.
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
Cost Optimization Tips
- Use smaller/faster models for simple tasks (classification, extraction, summarization)
- Reserve large models (GPT-4o, Claude 3.5 Sonnet) for complex reasoning tasks
- Cache repeated system prompts where supported (Anthropic prompt caching reduces costs up to 90%)
- Batch requests when real-time response is not required (often 50% cheaper)
- Minimize conversation history in prompts — only include what is necessary for context
- Monitor token counts during development using the tokenizer tools each provider offers