Budget Tier LLM Comparison: Best Models Under $1/1M Tokens

The budget tier—defined as $0.25 to $1.00 per million input tokens—is where smart developers maximize value. These models deliver 75-80% of frontier performance at 5-20% of the cost. Whether you’re prototyping, preprocessing data at scale, or running hobby projects, this tier offers the best price-performance ratio in AI.

Who this is for: Hobbyists, indie developers, data pipeline builders, and anyone running high-volume workflows where token costs compound quickly.

The bottom line: You can process 1 million input tokens for less than a dollar. With Gemini 3 Flash’s free input tier, that cost drops to zero.

Quick Comparison Table

Model	Input/1M	Output/1M	Context	SWE-bench	Key Advantage
GPT-5 mini	$0.25	$2.00	128K	~72%	Cheapest OpenAI option, reliable ecosystem
Gemini 3 Flash	FREE*	$3.00	1M	78.0%	Free inputs, largest context, best value
Kimi k2.5	$0.60	$2.50	256K	76.8%	Open source, vision capable, agent swarm
Claude Haiku 4.5	$1.00	$5.00	200K	~75%	Fastest responses, Anthropic reliability

* Google Gemini 3 Flash offers FREE input tokens at standard tier. Output still charged.

Performance context: These models trail Claude Opus 4.5 (~80.9% SWE-bench) by only 2-8 percentage points while costing 5-25x less on input tokens.

Individual Model Deep-Dives

GPT-5 mini: The OpenAI Gateway

Pricing: $0.25/1M input | $2.00/1M output | Batch: 50% discount

OpenAI’s entry-level model is the cheapest way to access the OpenAI ecosystem. At $0.25 per million input tokens, it’s the lowest list price in this tier.

Strengths:

Ecosystem integration: Native support in OpenAI SDK, GPT-4-compatible function calling
Batch pricing: 50% discount on asynchronous workloads drops input to $0.125/1M
Predictable behavior: Conservative training reduces hallucinations on simple tasks
Tool use: Reliable function calling for agent workflows

Weaknesses:

Smallest context: 128K tokens limits large codebase analysis
Performance gap: ~8 points behind Gemini 3 Flash on SWE-bench
No free tier: Unlike Gemini, every token costs money
Rate limits: Aggressive TPM limits on lower-tier API keys

Best for: Teams already invested in OpenAI tooling, simple classification tasks, and applications where 128K context is sufficient.

Gemini 3 Flash: The Value Champion

Pricing: FREE/1M input | $3.00/1M output | Batch: $0.15 in / $1.25 out

Google’s Gemini 3 Flash redefines budget pricing with completely free input tokens. For workflows heavy on context (RAG, document analysis, codebase processing), this is unbeatable.

Strengths:

Free inputs: Zero cost for prompts, documents, and context
Massive context: 1 million token window fits entire repositories
Top-tier performance: 78.0% SWE-bench (highest in budget tier)
Batch savings: 58% discount on asynchronous jobs
Context caching: $1/hour storage + $0.075/1M cached input

Weaknesses:

Output costs: $3/1M output is higher than GPT-5 mini’s $2/1M
Output limits: 8,192 token generation cap
Ecosystem: Smaller third-party tooling vs OpenAI
Data concerns: Free tier data used for training (disable with billing)

Best for: High-input workflows, document processing, RAG systems, and any scenario where you read more than you write. See the full Gemini 3 Flash analysis for detailed benchmarks.

Kimi k2.5: The Multimodal Workhorse

Pricing: $0.60/1M input (cache miss) | $0.10/1M (cache hit) | $2.50/1M output

Moonshot AI’s open-source model brings unique capabilities to the budget tier: native multimodal processing and parallel agent execution.

Strengths:

Vision-to-code: Generate working UIs from mockups, screenshots, and videos
Agent swarm: Orchestrate up to 100 sub-agents for parallel tasks
Open source: Self-hostable, inspectable weights
Cache efficiency: $0.10/1M on cache hits (83% savings)
Large context: 256K tokens (2x GPT-5 mini)

Weaknesses:

Input costs: $0.60/1M is highest in tier (though cache hits help)
Ecosystem maturity: Smaller community than OpenAI/Anthropic
Latency: “Thinking” mode adds response time
Documentation: English resources less comprehensive

Best for: Visual workflows (UI generation from designs), parallel research tasks, and teams wanting open-source flexibility. See the Kimi k2.5 deep-dive for vision capabilities.

Claude Haiku 4.5: The Reliable Speedster

Pricing: $1.00/1M input | $5.00/1M output | Batch: 50% discount

Anthropic’s budget offering trades some cost efficiency for reliability and speed. At $1/1M input, it’s the priciest in this tier but still 5x cheaper than Claude Opus 4.5.

Strengths:

Speed: Fastest response times in Claude family
Reliability: Anthropic’s safety training reduces problematic outputs
200K context: Larger than GPT-5 mini, competitive with Kimi
Batch discount: 50% off asynchronous workloads
Enterprise trust: SOC 2 compliance, established vendor

Weaknesses:

Highest cost: $1/1M input, $5/1M output (2x output cost of others)
Performance: ~75% SWE-bench (lowest in tier)
No free inputs: Unlike Gemini, you pay for everything
Rate limits: Stricter than OpenAI on entry tiers

Best for: Production applications needing Anthropic’s safety standards, fast classification tasks, and teams already using Claude Code or Claude Max.

Use Case Recommendations

Prototyping & Experimentation

Winner: Gemini 3 Flash

Free inputs mean you can iterate endlessly without cost anxiety. Test prompts, explore edge cases, and validate approaches before scaling. The 1M context lets you throw entire codebases or documents at the model.

Runner-up: Kimi k2.5 (via Kilo Code’s free week)

High-Volume Data Preprocessing

Winner: Gemini 3 Flash (Batch)

For ETL pipelines, document classification, or data enrichment:

Free inputs on standard tier
Batch output at $1.25/1M (58% savings)
1M context reduces chunking complexity

Cost example: Processing 10M input tokens + 500K output tokens = $0.625 (vs $50+ with premium models)

UI Generation from Visuals

Winner: Kimi k2.5

Native vision encoder processes mockups, screenshots, and videos to generate functional code. The agent swarm can parallelize component generation across complex designs.

Runner-up: GPT-5 mini for simple text-to-UI tasks without visual input

Production API (Cost-Sensitive)

Winner: GPT-5 mini

Most predictable pricing, mature SDK, and broad tooling support. The 50% batch discount makes it competitive for asynchronous workloads.

Runner-up: Claude Haiku 4.5 if Anthropic’s safety standards are required

Codebase Analysis & RAG

Winner: Gemini 3 Flash

1M context eliminates chunking. Free inputs make RAG preprocessing nearly free. 78% SWE-bench means it understands code well enough for most analysis tasks.

Cost comparison for 500K input + 10K output:

Gemini 3 Flash: $0.03
Kimi k2.5: $0.33 (11x more)
Claude Opus 4.5: $2.75 (92x more)

Value Analysis: Real-World Scenarios

Scenario A: Daily Prototyping (30 days)

Usage: 50K input + 5K output per day

Model	Daily Cost	Monthly Cost	Annual Cost
Gemini 3 Flash	$0.015	$0.45	$5.40
GPT-5 mini	$0.0225	$0.68	$8.16
Kimi k2.5	$0.045	$1.35	$16.20
Claude Haiku 4.5	$0.075	$2.25	$27.00

Takeaway: Even at hobby scale, Gemini’s free inputs create meaningful savings. The difference between Flash and Haiku pays for a coffee every month.

Scenario B: High-Volume Preprocessing (1B tokens/month)

Usage: 900M input + 100M output tokens

Model	Input Cost	Output Cost	Total	Savings vs Haiku
Gemini 3 Flash	$0	$300	$300	94%
GPT-5 mini (batch)	$112.50	$200	$312.50	94%
Kimi k2.5	$540	$250	$790	84%
Claude Haiku 4.5 (batch)	$450	$2,500	$2,950	—

Takeaway: At scale, Gemini 3 Flash’s free inputs dominate. Even GPT-5 mini’s aggressive batch pricing can’t match zero-cost inputs.

Scenario C: Mixed API Workload (Startup)

Usage: 100M input + 20M output per month, mix of sync/async

Model	Sync Cost	Batch Cost	Total
Gemini 3 Flash	$0 + $30K	$15K + $12.5K	$57.5K
GPT-5 mini	$25K + $40K	$12.5K + $20K	$97.5K
Kimi k2.5	$60K + $50K	—	$110K
Claude Haiku 4.5	$100K + $100K	$50K + $50K	$300K

Takeaway: Startups can save $200K+ annually by choosing Gemini 3 Flash over Claude Haiku 4.5 for equivalent volume.

Decision Flowchart

Start: What's your primary constraint?
│
├─► Cost is everything ──► Gemini 3 Flash (free inputs)
│
├─► Need vision/multimodal ──► Kimi k2.5
│
├─► OpenAI ecosystem lock-in ──► GPT-5 mini
│
├─► Need Anthropic safety/reliability ──► Claude Haiku 4.5
│
└─► Balanced value ──► Gemini 3 Flash (best performance + free inputs)

Quick Decision Matrix

If you need…	Choose	Why
Lowest possible cost	Gemini 3 Flash	Free inputs
Best coding performance	Gemini 3 Flash	78% SWE-bench
Vision-to-code	Kimi k2.5	Native multimodal
OpenAI compatibility	GPT-5 mini	Ecosystem match
Fastest responses	Claude Haiku 4.5	Speed-optimized
Largest context	Gemini 3 Flash	1M tokens
Self-hosting	Kimi k2.5	Open source

Special Pricing Features

Batch Discounts

All providers offer batch processing discounts for asynchronous workloads:

Model	Batch Discount	Effective Input	Effective Output
GPT-5 mini	50%	$0.125/1M	$1.00/1M
Gemini 3 Flash	58% output	FREE/1M	$1.25/1M
Claude Haiku 4.5	50%	$0.50/1M	$2.50/1M

When to use batch: Data preprocessing, overnight jobs, non-urgent analysis, training data generation.

Rate Limits (Entry Tier)

Model	Requests/Min	Tokens/Min	Notes
GPT-5 mini	3,500	200K	Increases with usage tier
Gemini 3 Flash	1,000	4M	Free tier: 15/min
Kimi k2.5	100	100K	Varies by plan
Claude Haiku 4.5	50	50K	Strictest entry limits

Free Access Options

You don’t need to pay to try these models:

Model	Free Path	Limitations
GPT-5 mini	OpenAI free tier	$5 credit, 3 months
Gemini 3 Flash	Google AI Studio	15 req/min, data training
Kimi k2.5	Kilo Code	1 week unlimited
Kimi k2.5	OpenCode Zen	Daily rate limits
Claude Haiku 4.5	Claude Free	Web only, rate limited

Summary

The budget tier ($0.25-$1.00/1M input) offers exceptional value for developers willing to trade the last few percentage points of performance for massive cost savings.

Our recommendation:

Default choice: Gemini 3 Flash. Free inputs + best performance + largest context = unbeatable value.
For visual workflows: Kimi k2.5. The only budget option with native vision capabilities and agent swarms.
For OpenAI ecosystems: GPT-5 mini. Cheapest way to stay in the OpenAI tooling chain.
For Anthropic reliability: Claude Haiku 4.5. When safety and speed matter more than cost.

The math is clear: You can get 96% of frontier performance for 4-20% of the cost. For most prototyping, preprocessing, and production workloads, budget tier models are the smart choice.

Gemini 3 Flash deep-dive — Full benchmarks and pricing analysis
Kimi k2.5 capabilities — Vision features and agent swarm details
Claude vs OpenAI pricing — Premium tier comparison
Free Frontier Stack — Access all these models for free
Smart Spend Guide — Subscription vs API break-even math

Last updated: 2026-01-30. Pricing subject to change. Verify current rates on provider websites before committing to large workloads.

Quick Comparison Table

Individual Model Deep-Dives

GPT-5 mini: The OpenAI Gateway

Gemini 3 Flash: The Value Champion

Kimi k2.5: The Multimodal Workhorse

Claude Haiku 4.5: The Reliable Speedster

Use Case Recommendations

Prototyping & Experimentation

High-Volume Data Preprocessing

UI Generation from Visuals

Production API (Cost-Sensitive)

Codebase Analysis & RAG

Value Analysis: Real-World Scenarios

Scenario A: Daily Prototyping (30 days)

Scenario B: High-Volume Preprocessing (1B tokens/month)

Scenario C: Mixed API Workload (Startup)

Decision Flowchart

Quick Decision Matrix

Special Pricing Features

Batch Discounts

Rate Limits (Entry Tier)

Free Access Options

Summary

Related Comparisons

Related Analysis