LLM Models
Language model guides, benchmarks, and selection advice. The underlying AI models that power coding tools and agentic platforms.
What Are LLM Models?
LLM Models (Large Language Models) are the underlying AI systems that power coding tools and agentic platforms. This section covers model capabilities, pricing, benchmarks, and selection guidance—not the tools that use them.
Available Model Guides
Claude Opus 4.5
Anthropic’s flagship reasoning model. Highest SWE-bench score at 80.9%.
Kimi k2.5
Moonshot AI’s high-performing model with free access options. 76.8% SWE-bench.
Gemini 3 Flash
Google’s 1M context window model with free input tokens. 78.0% SWE-bench.
GLM 4.7
Zhipu AI’s model available through OpenCode and other platforms.
Quick Selection Guide
| Model | SWE-bench | Price/1M | Best For |
|---|---|---|---|
| Claude Opus 4.5 | 80.9% | $5/$25 | Maximum reasoning, safety-critical |
| Gemini 3 Flash | 78.0% | Free/$0.15 | High context, cost-sensitive |
| Kimi k2.5 | 76.8% | $3/$3 | Best value, free access available |
See Budget Tier, Mid-Range, and Premium comparisons for detailed analysis.
Related Categories
- Coding Tools — Tools that use these models for coding assistance
- Agentic Tools — Autonomous platforms powered by these models
- Compare — Model-to-model comparisons
- 2026-02-01 | Gemini 3 Flash: 78% SWE-bench, $0.50/1M Input, 8x Cheaper Than Claude Opus Google's Gemini 3 Flash delivers frontier coding performance with $0.50/1M input tokens, 65K output limit, and 1M context. Full benchmarks, pricing breakdown, and when to choose it over Claude Opus 4.5 or Kimi k2.5.
- 2026-02-01 | GLM 4.7: Thinking Mode, 73.8% SWE-bench, Free via OpenCode & Z.AI Z.AI's GLM 4.7 delivers 73.8% SWE-bench coding performance with unique thinking mode visibility. Free via OpenCode Zen and Z.AI tier, with $3/month coding plan for heavy usage.
- 2026-02-01 | Kimi k2.5: Capabilities, Benchmarks & Free Access Moonshot AI's Kimi k2.5 model: 76.8% SWE-bench score, 256K context window, multimodal capabilities, and how to access it free through Kilo Code and OpenCode Zen.
- 2026-01-30 | Claude Opus 4.5: 80.9% SWE-bench, Pricing & When to Use Anthropic's flagship reasoning model delivers the highest verified SWE-bench score at 80.9%. Complete pricing breakdown ($5/1M input), subscription vs API analysis, and comparison to Kimi k2.5 and Gemini 3 Flash.