What is the cheapest LLM for document summarization?

For most documents under 10 pages, Gemini 2.0 Flash and GPT-4o mini cost less than $0.002 per document. Gemini 2.5 Flash with its 1M token context can handle very long documents (book-length) for under $0.05 each.

How much does it cost to summarize 10,000 documents?

Average 5-page document = ~3,000 input tokens, ~500 output tokens. Across 10,000 documents: GPT-4o mini costs $7.50, GPT-4o costs $125, Claude Sonnet 4 costs $165, Claude Opus 4 costs $825. The cheap tier is essentially free at this volume.

Which LLM produces the best summaries?

For pure quality, Claude Sonnet 4 and GPT-4.1 produce the most consistent, well-structured summaries. For quality-per-dollar, Gemini 2.5 Flash and GPT-4o mini win. The quality gap between cheap and premium for summarization is smaller than for reasoning or code.

How do I summarize documents longer than the context window?

Three approaches: (1) chunk and summarize each chunk, then summarize the summaries (map-reduce), (2) use Gemini 2.5 Pro 2M context window for very long documents, (3) extract the most important sections first (intro, conclusions, headers) and summarize only those.

Cheapest LLM for Document Summarization in 2026 — Real Cost Comparison

Last updated: April 20267 min readAI Tools

Document summarization is one of the cheapest LLM workloads. The output is short, the prompts are simple, and modern cheap models do it nearly as well as flagships. If your bill for summarization is over $100/month and you're not summarizing thousands of documents per day, you're using the wrong model.

What a Summarization Job Actually Costs

A typical document summary has this shape:

Input: System prompt (200 tokens) + document text (varies)
Output: Summary (typically 100-500 tokens)

The document text dominates. A 5-page document is roughly 2,500 tokens. A 20-page document is roughly 10,000. A 100-page report is roughly 50,000.

Cost Per Document by Size

Doc size	Input tokens	GPT-4o mini	GPT-4o	Claude Sonnet 4	Claude Opus 4
1 page (~500w)	700	$0.000165	$0.00275	$0.0033	$0.01650
5 pages (~2,500w)	3,200	$0.00075	$0.0125	$0.015	$0.075
20 pages (~10K w)	12,700	$0.00255	$0.0425	$0.051	$0.255
50 pages (~25K w)	31,700	$0.0063	$0.105	$0.126	$0.630
100 pages (~50K w)	63,200	$0.0126	$0.210	$0.252	$1.260
Book (~100K w)	125,700	$0.0252	$0.420	$0.504	$2.520

Even a 100-page report on GPT-4o mini costs about 1.3 cents. Summarizing 1,000 documents per day at this volume costs $13/day or about $390/month. The same workload on Claude Opus 4: $1,260/day or $37,800/month — 96x more.

See exactly what summarization costs at your document size and volume.

Open AI Cost Calculator →

How to Summarize Very Long Documents

Most cheap models have 128K-200K context windows. Gemini 2.5 Pro has 2M. For documents longer than your model's context window, you need a strategy:

Strategy 1 — Map-reduce. Split the document into chunks of 8K-15K tokens. Summarize each chunk independently. Concatenate the summaries. Run a final summarization pass over the combined summaries. Cost: roughly 1.3-1.5x the single-pass approach. Quality: usually slightly worse than single-pass, but works on any document length.

Strategy 2 — Long-context single pass. Use Gemini 2.5 Pro (2M tokens) or Claude Sonnet 4 (200K tokens). Pass the entire document in one call. Quality: best, because the model sees everything in one pass. Cost: high — a 200K input on Sonnet 4 is $0.60 per document.

Strategy 3 — Extract first, then summarize. Use a cheap model to extract just the important sections (intro, conclusions, headers, key paragraphs). Then summarize only those. Cost: lowest — typically 40-60% of single-pass. Quality: good if your extraction prompt is well-tuned.

Cost-Per-Document Across Strategies

For a 50-page (31,700 token) document on different models and strategies:

Strategy	GPT-4o mini	GPT-4o	Claude Sonnet 4
Single pass	$0.0063	$0.105	$0.126
Map-reduce (3 chunks)	$0.0089	$0.149	$0.179
Extract + summarize	$0.0034	$0.063	$0.076

Extract-and-summarize is usually the cheapest approach for long documents. Single-pass is cheapest if the document fits in context.

Quality Differences That Actually Show Up

The honest truth: for routine summarization, cheap models are nearly indistinguishable from premium models. Where differences appear:

Long documents (50+ pages): Premium models maintain coherence better and forget less of the early content
Technical documents: Claude Sonnet 4 and GPT-4.1 handle jargon and equations more accurately
Multilingual content: Gemini 2.5 Pro tends to handle non-English content better than the others
Structured output (sections, bullets, headings): GPT-4.1 and Claude Sonnet follow structure prompts more reliably

For news article summaries, blog post summaries, meeting notes, or short reports — the cheap tier is fine. Use the savings on a better RAG retriever or a faster vector DB.

The Recommended Stack for Summarization

Default model: GPT-4o mini for documents under 100 pages, Gemini 2.5 Flash for longer
Long documents: Gemini 2.5 Flash (1M context) or Gemini 2.5 Pro (2M context)
Output cap: 500 tokens unless you specifically need a long summary
Prompt structure: "Summarize the following document in N bullet points. Focus on [specific aspect]. Output format: markdown."
Quality check: if the summary is too short, too generic, or misses key points, escalate that prompt to GPT-4.1 or Claude Sonnet 4

Use the AI Cost Calculator with the "Document Summary" preset to model your real workload.

Calculate your summarization bill across every model.

Open AI Cost Calculator →

Cheapest LLM for Document Summarization in 2026 — Real Cost Comparison

What a Summarization Job Actually Costs

Cost Per Document by Size

How to Summarize Very Long Documents

Cost-Per-Document Across Strategies

Quality Differences That Actually Show Up

The Recommended Stack for Summarization

Related Posts

AI Cost Calculator

AI Summarizer

Cheapest LLM 2026