Document summarization is one of the cheapest LLM workloads. The output is short, the prompts are simple, and modern cheap models do it nearly as well as flagships. If your bill for summarization is over $100/month and you're not summarizing thousands of documents per day, you're using the wrong model.
A typical document summary has this shape:
The document text dominates. A 5-page document is roughly 2,500 tokens. A 20-page document is roughly 10,000. A 100-page report is roughly 50,000.
| Doc size | Input tokens | GPT-4o mini | GPT-4o | Claude Sonnet 4 | Claude Opus 4 |
|---|---|---|---|---|---|
| 1 page (~500w) | 700 | $0.000165 | $0.00275 | $0.0033 | $0.01650 |
| 5 pages (~2,500w) | 3,200 | $0.00075 | $0.0125 | $0.015 | $0.075 |
| 20 pages (~10K w) | 12,700 | $0.00255 | $0.0425 | $0.051 | $0.255 |
| 50 pages (~25K w) | 31,700 | $0.0063 | $0.105 | $0.126 | $0.630 |
| 100 pages (~50K w) | 63,200 | $0.0126 | $0.210 | $0.252 | $1.260 |
| Book (~100K w) | 125,700 | $0.0252 | $0.420 | $0.504 | $2.520 |
Even a 100-page report on GPT-4o mini costs about 1.3 cents. Summarizing 1,000 documents per day at this volume costs $13/day or about $390/month. The same workload on Claude Opus 4: $1,260/day or $37,800/month — 96x more.
See exactly what summarization costs at your document size and volume.
Open AI Cost Calculator →Most cheap models have 128K-200K context windows. Gemini 2.5 Pro has 2M. For documents longer than your model's context window, you need a strategy:
Strategy 1 — Map-reduce. Split the document into chunks of 8K-15K tokens. Summarize each chunk independently. Concatenate the summaries. Run a final summarization pass over the combined summaries. Cost: roughly 1.3-1.5x the single-pass approach. Quality: usually slightly worse than single-pass, but works on any document length.
Strategy 2 — Long-context single pass. Use Gemini 2.5 Pro (2M tokens) or Claude Sonnet 4 (200K tokens). Pass the entire document in one call. Quality: best, because the model sees everything in one pass. Cost: high — a 200K input on Sonnet 4 is $0.60 per document.
Strategy 3 — Extract first, then summarize. Use a cheap model to extract just the important sections (intro, conclusions, headers, key paragraphs). Then summarize only those. Cost: lowest — typically 40-60% of single-pass. Quality: good if your extraction prompt is well-tuned.
For a 50-page (31,700 token) document on different models and strategies:
| Strategy | GPT-4o mini | GPT-4o | Claude Sonnet 4 |
|---|---|---|---|
| Single pass | $0.0063 | $0.105 | $0.126 |
| Map-reduce (3 chunks) | $0.0089 | $0.149 | $0.179 |
| Extract + summarize | $0.0034 | $0.063 | $0.076 |
Extract-and-summarize is usually the cheapest approach for long documents. Single-pass is cheapest if the document fits in context.
The honest truth: for routine summarization, cheap models are nearly indistinguishable from premium models. Where differences appear:
For news article summaries, blog post summaries, meeting notes, or short reports — the cheap tier is fine. Use the savings on a better RAG retriever or a faster vector DB.
Use the AI Cost Calculator with the "Document Summary" preset to model your real workload.
Calculate your summarization bill across every model.
Open AI Cost Calculator →