Customer support chatbots are one of the most common LLM use cases — and one of the most expensive when you pick the wrong model. Here is the real cost breakdown for running a support bot in 2026, with per-conversation, per-customer, and per-month numbers across every major model.
A typical support conversation looks like this:
For a 5-message exchange (which is the median length), total token usage:
| Turn | Input tokens | Output tokens |
|---|---|---|
| 1 | 1,280 (system + RAG + user) | 250 |
| 2 | 1,830 (+ history) | 250 |
| 3 | 2,380 (+ history) | 250 |
| 4 | 2,930 (+ history) | 250 |
| 5 | 3,480 (+ history) | 250 |
Total per conversation: ~11,900 input tokens, 1,250 output tokens.
Plug your real conversation shape into the calculator.
Open AI Cost Calculator →| Model | Per conversation | Per 1,000 conversations |
|---|---|---|
| Gemini 2.0 Flash | $0.00169 | $1.69 |
| GPT-4o mini | $0.00254 | $2.54 |
| Claude Haiku 3.5 | $0.01452 | $14.52 |
| GPT-4o | $0.04225 | $42.25 |
| Claude Sonnet 4 | $0.05445 | $54.45 |
| Claude Opus 4 | $0.27225 | $272.25 |
The cheap tier costs about 1/100th of Claude Opus 4. For high-volume support, the choice is obvious — but quality matters too.
| Volume | GPT-4o mini | GPT-4o | Claude Sonnet 4 |
|---|---|---|---|
| 100 conversations/day (3K/mo) | $7.62 | $126.75 | $163.35 |
| 500/day (15K/mo) | $38.10 | $633.75 | $816.75 |
| 2,000/day (60K/mo) | $152.40 | $2,535.00 | $3,267.00 |
| 10,000/day (300K/mo) | $762.00 | $12,675.00 | $16,335.00 |
At 10,000 daily conversations, switching from GPT-4o to GPT-4o mini saves $11,913/month — enough to fund another engineer. The quality loss for typical tier-1 support is usually negligible.
If you sell support automation to other companies, your cost per active customer depends on conversation volume per customer per month:
| Customer profile | Conversations/mo | Cost on GPT-4o mini | Cost on GPT-4o |
|---|---|---|---|
| Light (small business) | 5 | $0.013 | $0.21 |
| Standard (mid-market) | 25 | $0.064 | $1.06 |
| Heavy (enterprise dept) | 150 | $0.38 | $6.34 |
| Whale (high-volume support) | 800 | $2.03 | $33.80 |
For a $99/month per-customer SaaS, the GPT-4o mini cost is invisible. The GPT-4o cost is also fine for most customers — $33.80 on a whale is still a tiny fraction of $99. On Claude Opus, the math breaks: a single whale would cost $169/month to serve.
In production, this pattern typically results in:
Blended cost is ~1.5x the cheap model alone, not 10x. You get most of the quality benefit at a fraction of the all-premium cost.
1. Unbounded conversation history. If you never truncate, every message in a long conversation includes all prior messages. A 30-turn conversation can use 30,000+ input tokens for the last message alone. Always cap history at 10-20 messages, or summarize older turns.
2. Retrieving too many chunks. RAG systems default to 5-10 chunks per query. For chatbots, 3-5 is usually enough. Cutting chunks in half cuts input cost in half.
3. Verbose responses. Set max_tokens to 300 or 400 for chatbots. Without a cap, models will sometimes write essays. The cap makes cost predictable and responses more readable.
For a 2026 customer support chatbot:
Run the numbers in the AI Cost Calculator with the chatbot preset to see what your specific volume costs.
Project your support chatbot bill across every major model.
Open AI Cost Calculator →