Three Years in the Vibe Coding Trenches: A 2023-2026 Retrospective

Digital Archaeology

The posts are gone. Not just from this site—from the Wayback Machine, from Google Cache, from the digital record entirely. In April 2023, I published what I called “the 20-minute blog”: an experiment in using GPT-4 to generate a complete blog post from prompt to publish in under half an hour. It worked. It also marked the beginning of what we’d later call the “vibe coding” era.

The titles survive in old sitemarks:

The 20-Minute Blog (April 4, 2023)
The Fortune-Telling Cyborg: How AI is Revolutionizing Divination (April 4, 2023)
The Top 10 AI Technologies to Watch in 2023 (April 4, 2023)
10 Game-Changing AI Chatbot Plugins That Will Change Your Life Forever (April 5, 2023)

I deleted them sometime in late 2023, embarrassed by their shallowness. Now, three years later, I wish I’d kept them—not as content, but as artifacts. This is my attempt at reconstruction and reckoning.

The 2023 Context

GPT-4 Had Just Dropped

March 2023. GPT-4’s release felt like a phase transition. The jump from GPT-3.5 to GPT-4 was qualitatively different from previous improvements. GPT-3.5 was clever but brittle. GPT-4 could sustain coherence across longer contexts, handle more nuanced instructions, and—crucially—seem to understand what you wanted even when you expressed it poorly.

This created the conditions for “vibe coding”: the practice of describing what you wanted in natural language and letting the model figure out the implementation. The term didn’t exist yet. We just knew something had changed.

The Prompt Engineering Gold Rush

Everyone was a prompt engineer in April 2023. Twitter threads promised “10 prompts that will 10x your productivity.” Substack newsletters analyzed chain-of-thought techniques like they were discovering the structure of DNA. Courses sold for $500 teaching “advanced prompting.”

The 20-Minute Blog post was my contribution to this economy: a proof-of-concept that you could automate content creation entirely. The process was:

Prompt GPT-4 with a topic and tone
Generate three title options
Select one, generate outline
Generate sections
Light editing (mostly formatting)
Publish

Total time: 18 minutes.

What the Posts Actually Said

From memory and the titles, I can reconstruct the arguments:

The 20-Minute Blog argued that content creation was being democratized. The bottleneck wasn’t writing skill anymore—it was having something to say. The post probably made some hand-waving claims about “human-AI collaboration” and “augmented creativity.”

The Fortune-Telling Cyborg was stranger. It explored using LLMs for divination—treating them as oracles, not assistants. This was during the brief window when people were genuinely experimenting with AI as a tool for spiritual/irrational practices. The post likely walked a line between skepticism and genuine curiosity.

Top 10 AI Technologies was exactly what you’d expect: a listicle generated from GPT-4’s training data about itself and its competitors. Probably mentioned LangChain, Auto-GPT, and various wrapper startups that died within six months.

10 Game-Changing Plugins covered the ChatGPT plugin ecosystem—which OpenAI effectively abandoned by mid-2024 in favor of GPTs, which they then also deprioritized.

The Critique: What We Got Wrong

1. Hallucination Wasn’t a Bug, It Was a Feature (We Thought)

In 2023, we treated hallucinations as creative noise. The Fortune-Telling Cyborg post probably celebrated the model’s ability to generate plausible-sounding but unverified information as “intuition” or “pattern matching.”

Reality: Hallucination is a fundamental alignment problem. Three years later, it’s still the primary blocker for autonomous AI agents in production. The difference is we now build verification layers instead of pretending the problem is creative expression.

2. Speed Was the Wrong Metric

The 20-Minute Blog optimized for time-to-publish. This was backwards. The scarce resource was never typing speed—it was insight. By automating the easy part (word generation), we made the hard part (thinking) harder by flooding ourselves with plausible-sounding garbage.

The 2026 view: Good AI-assisted content takes longer than pure human writing because you’re verifying claims, checking sources, and iterating on structure. The AI doesn’t save you writing time; it saves you typing time while adding verification burden.

3. The Wrapper Collapse

Those “game-changing plugins”? Most were thin wrappers around API calls. The Top 10 post probably hyped startups that added a web UI to GPT-4 and called it a product. By 2024, OpenAI had absorbed most of these use cases directly. By 2025, the model capabilities had advanced past what the wrappers provided.

Lesson: Betting on AI wrappers is betting against foundation model progress. This shaped how we think about agentic tools today—real value is in orchestration, evaluation, and safety infrastructure, not UI polish.

4. Prompt Engineering Wasn’t Engineering

Those “advanced prompting techniques” were mostly just… asking clearly. Chain-of-thought prompting works because it forces you to articulate your reasoning, not because of magical incantations. The prompt engineering gold rush was mostly consultants selling common sense at consultant prices.

2026 insight: The real “prompt engineering” is threat modeling. How do you structure your system so that malicious inputs can’t cause harmful outputs? How do you validate that the model’s reasoning is actually happening? This is security work, not optimization work.

What Actually Worked

Not everything from the vibe coding era was wrong.

1. Natural Language as Interface

The core insight held: describing intent in natural language is more efficient than specifying implementation for many tasks. The error was thinking this meant we didn’t need to understand the implementation at all.

Modern agentic tools (OpenClaw, Claude Code) preserve natural language interfaces but add verification, rollback, and explicit tool definitions. The vibe is still there; the blind trust is gone.

2. Rapid Prototyping

The 20-Minute Blog workflow—generate, iterate, publish—transferred successfully to code. Modern vibe coding (the term did stick) is about rapid exploration: generate ten variations, evaluate, keep the best. This works for architecture exploration, UI mockups, and data pipeline design.

The difference is we now evaluate before shipping, not after.

3. AI as Thought Partner

The Fortune-Telling Cyborg accidentally touched on something real. LLMs are good at lateral thinking—suggesting connections you wouldn’t have made. The error was treating them as oracles instead of sparring partners.

Current best practice: use AI for divergence (generating options), humans for convergence (selecting and validating). The 2023 approach skipped the convergence step.

The Evolution: 2023 to 2026

Model Capabilities

2023	2026
GPT-4 (8K context)	Claude 3.5 Sonnet, o3, DeepSeek-R1 (200K-1M context)
~20% code execution accuracy	~85% code execution accuracy
No tool use	Native tool use, computer use, browser automation
Hallucinations: unfixable	Hallucinations: managed with verification
Single-turn optimization	Multi-turn agentic workflows

Content Quality Expectations

2023: Any AI-generated content was impressive. The novelty carried the work.

2024: AI content required human editing. “AI-assisted” became the standard.

2025: Verification became mandatory. Claims needed sources. Hallucinations were unacceptable.

2026: AI-generated content is assumed. The differentiator is evaluation rigor and security analysis. Anyone can generate text; few can validate it under adversarial conditions.

The Pivot This Site Made

The 2023 posts were content about AI, generated by AI. Meta without meaning.

The 2026 posts are analysis of AI: security audits, verification reports, implementation guides. The AI assists the research, but the claims are verified against primary sources. The content is about what AI systems actually do, not what they promise.

This mirrors the broader shift in the field: from demo culture to production culture.

Lessons for the Next Three Years

1. Verification Beats Generation

The scarce skill in 2026 isn’t prompting; it’s evaluation. Can you verify that an AI’s output is correct, secure, and aligned with intent? This requires understanding the domain well enough to spot errors—a harder bar than generating plausible text.

2. Safety Is Not a Feature

The 2023 approach treated safety as something to add later. “Let’s build the thing, then make it safe.” Three years of jailbreaks, prompt injections, and autonomous agent failures have taught us: safety is architectural, not cosmetic.

This is why modern coverage focuses on isolation, sandboxing, and least-privilege access. You can’t vibe-code your way out of a security failure.

3. The Wrapper Problem Persists

Every month, a new AI wrapper promises to automate some knowledge work. Every year, foundation models absorb the capability directly. The 2023 plugin ecosystem, 2024’s GPTs, 2025’s “AI employees”—same pattern.

The durable value is in the hard stuff: evaluation infrastructure, security boundaries, human-AI interaction design. Not the thin layer on top.

4. Speed Still Matters, Differently

The 20-Minute Blog was fast to write and slow to read (because it was bad). Good AI-assisted work is slow to write and fast to read—because the time goes into verification and structure, not typing.

Optimize for reader time, not writer time.

The Archaeological Method

I can’t recover those 2023 posts. But I can tell you what they represented: genuine excitement about a new capability, coupled with naivety about its limitations. They were products of a moment when we thought the hard problems were solved and the remaining work was integration.

The hard problems weren’t solved. They’d barely been identified.

In 2026, we’re still identifying them. The difference is we now have three years of failure modes to learn from. The vibe coding era didn’t end; it grew up. The vibes are still there—we’re just wearing safety equipment now.

Appendix: How I’d Do It Today

If I were writing the 20-Minute Blog post today, the workflow would be unrecognizable. Not slower—differently structured:

2026: The 4-Hour Verified Post

Phase	Time	Activity	AI Role
Research	60 min	Identify primary sources, claims, counterarguments	Generate search queries, summarize sources
Verification	45 min	Check claims against sources, identify conflicts	Flag statements needing citation
Structure	30 min	Outline with explicit evidence chains	Suggest organizational patterns
Drafting	60 min	Write with inline source references	Expand bullet points, suggest transitions
Review	45 min	Check for hallucinations, verify quotes	Generate review checklist
Security check	30 min	Ensure no leaked info, safe examples	—
Publishing	30 min	Final formatting, OG images, tags	Generate metadata suggestions
Total	~4.5 hours

Key Differences from 2023

Source-First, Not Prompt-First

2023: Start with a prompt, see what the AI generates, publish. 2026: Start with sources, extract claims, verify AI understands them correctly.

The AI assists the research, not replaces it.

Explicit Verification Steps

Every claim gets checked:

Statistics → Original study or official source
Quotes → Verified transcript or primary source
Technical claims → Documentation or reproducible test
Predictions → Labeled as speculation

Hallucination Resistance Built-In

Instead of hoping the AI doesn’t hallucinate:

Key claims flagged for manual verification
AI-generated content marked as draft until reviewed
Sources archived (Wayback Machine) before citing
Confidence levels explicitly stated

Security Considerations

Before publishing anything about tools or infrastructure:

Are the examples safe to replicate?
Could someone follow these instructions and expose themselves to risk?
Are we inadvertently advertising vulnerable configurations?

This is the focus of the current site: /risks/openclaw/architecture-risk/

The Same, But Different

The Fortune-Telling Cyborg post today would be a technical analysis of:

How LLMs generate convincing but ungrounded predictions
The psychology of AI “intuition” and anthropomorphism
Security risks of treating AI outputs as authoritative
Verification methodologies for AI-assisted research

It would cite actual studies on AI hallucination, user trust, and decision-making under uncertainty. It would probably take a week to write properly. It would be worth reading.

On Speed

The 20-minute workflow wasn’t wrong about efficiency. It was wrong about where the efficiency gains come from. AI doesn’t make you write faster; it makes you:

Explore more ideas before committing (divergence)
Express rough thoughts in polished prose (translation)
Identify gaps in your reasoning (verification)
Format and structure consistently (production)

The time saved on these tasks gets reinvested in research and verification. The post takes longer, but it’s actually correct.

On Vibe Coding Today

I still vibe code. Every project on this site starts with Claude or Claude Code generating a rough structure. The difference is what happens after:

Vibe → Generate initial exploration
Verify → Check against sources, security constraints
Iterate → Tighten claims, add evidence
Ship → Only after explicit review

The 2023 error was stopping at step 1. The 2026 workflow recognizes that step 2 is where the value gets created.

/posts/20-minute-blog-archaeological-update/ — Canonical update of the original April 4, 2023 experiment
/posts/vibe-coding-april-2023-lost-posts/ — Reconstruction of the deleted April 2023 cluster
/posts/vibe-coding-trenches-lessons-2023-2026/ — Synthesis playbook for legacy 2023 traffic
/risks/openclaw/architecture-risk/ — What happens when agents have too much access
/verify/openclaw-claims/ — Verification methodology for AI tool claims
/implement/openclaw/yolo-safely/ — Secure deployment practices
/posts/openclaw-security-reality-2026/ — Current security analysis of agentic tools
/verify/vibe-coding-archive-evidence/ — Wayback capture inventory for the 2023 legacy URLs and 2026 replacements

Revision note

Production-ready as the long-form anchor. Revisit after April 2026 anniversary updates to append measured deltas.