Blume.codes vision - Blume Blog

Your coding agent got dramatically smarter this year. The workflow around it didn't.

If you've handed real work to an agent, you know the feeling. The model can touch a hundred files, run your tools, open a PR — and then forget the one convention you've explained four times. The power is real. So is the mismatch.

This is a post about where that mismatch comes from, what we're shipping first to attack it (Blume Sidecar), and why we think the endgame is bigger than developer tooling.

The curve everyone's climbing

Stage	Name	What it looks like	What changes
0	No AI	Manual coding; AI isn't part of the workflow	Throughput is bounded by human implementation
1	Inline	Autocomplete + small edits inside the editor	People write code faster, but the workflow is the same
2	Multi-file	Agents refactor and implement scoped tasks across files	Delegation becomes practical for "contained" work
3	Agentic	Larger tasks are delegated; humans constantly steer and review	Speed increases, but steering loops and loss of control dominate
4	Agentic-native codebases	Repos are structured for agents: context, rules, hooks, ignore patterns, toolchains	Less steering; fewer preventable mistakes; faster delegation
5	Agentic CI	Permission gates, self-testing, agent reviewers with high-quality review rules	Agent work becomes safe to test, review, and merge without losing control
6	Code factory	End-to-end loops: plan → implement → test → PR → fix review → merge → deploy	Throughput shifts from writing code to supervising reliable delivery
7	Agentic-native organization	Company-wide intent and context stay coherent across teams, tools, and time	The org scales AI-native execution without drifting into folklore

Most teams are scattered across these stages. That's fine. The interesting part is where almost everyone gets stuck: the jump from agentic development (Stage 3) to an agentic-native codebase (Stage 4).

That transition is where it gets messy — and where current tooling abandons you.

Stage 3 is fighting a smart amnesiac

Push past small, safe tasks and the same patterns show up every time:

You repeat yourself — conventions, constraints, what not to touch — session after session.
The agent fills in blanks it shouldn't, because it has no idea about your legacy edge cases.
Misalignment shows up as token spend and wasted attention: the agent works harder to land in the same place.
Run several agents in parallel and you lose the thread of what changed, where, and why.
You try to fix it with rules and scripts. They sprawl, rot, and one stale rule quietly degrades output for weeks.

Here's the line we keep coming back to:

Most "agent failures" are context failures your workflow has no way to detect or repair.

The model isn't the bottleneck. The bottleneck is that the truth about your project — its decisions, constraints, and intent — lives nowhere durable and nowhere enforceable.

Intent leaks out of every session

Every serious agent session produces genuinely valuable information: decisions ("this way, not that way"), constraints ("never touch these files"), patterns ("we validate input here"), domain rules ("this edge case behaves like X").

And then it evaporates — into chat transcripts, PR threads, review comments, Slack, a meeting nobody wrote down. None of it becomes an artifact that survives the next session, the next teammate, the next harness, the next model.

So every session starts with a silent re-onboarding. Your agent is smart. Why does it keep forgetting the basics? Because you've given it nothing to remember with. The knowledge that should be canonical is folklore instead.

The bet: if you can measure steering, you can fix it

We think the most valuable signal in agentic development is the thing everyone treats as friction — your repeated steering.

Repeating yourself isn't just annoying. It's evidence. Evidence of missing context, of a rule that should exist, of a constraint that should be enforceable, of a team decision that should be canonical.

Harvest that intent, structure it into one canonical model of your domain, and push it back out as enforceable context across every harness — and "babysitting an agent" turns into "delegating to one." More importantly, you finally get something you've never had here: a way to tell whether a change to your setup helped or hurt, instead of guessing.

Blume Sidecar: start where it actually hurts

We're launching with something deliberately small and immediately useful.

Sidecar is a narrow, always-on desktop companion that sits next to you while you work. It's for people running multiple agents across different harnesses who want to stay in control. It does three things:

Shows you every agent — what's running, what finished, what's waiting on you.
Surfaces the hidden context — the rules, skills, hooks, and ignore patterns that quietly shape behavior.
Flags drift, with permission — when what you're saying in chat contradicts what your setup implies, it proposes a fix and waits for your yes.

We're opinionated about exactly one thing: auto-improvement you can't trust is just automated damage. So Sidecar starts with transparency and small, approved changes — not magic.

From sidecar to source of truth

Sidecar is the wedge. The system behind it is the point.

Next comes a local domain model — a single source of truth for intent — plus the analytics to tell you whether a change improved or degraded agent outcomes, and the machinery to turn repeated steering into durable rules that stay current. This is where an agentic-native codebase stops being a craft project and becomes a discipline.

And it's where the problem gets genuinely hard: not one person and one agent, but many people, many harnesses, shifting requirements, and institutional knowledge trapped in conversations.

The bigger version: a company brain

Solve it in engineering and it generalizes. The long-term shape is a system that captures intent across the company — not just code — keeps decisions auditable, catches misalignment before it becomes execution, and hands every internal agent curated, current context for its job. Humans stay in the loop without re-explaining themselves forever.

Companies built by humans expressing intent, and agents systematizing and executing it. The interface stops being prompting. The interface becomes alignment.

Why now

Two things are compounding: the models keep getting stronger, and teams keep trying to delegate bigger scopes to them. The universal failure mode is the same everywhere — more prompts, more docs, more rules, and no garbage collection, no measurement, no canonical truth. So the system drifts.

If your agent keeps getting it wrong, your intent isn't stored anywhere it can act on. That's the gap we're closing.

Sidecar is step one: a practical tool for people already living on the agentic frontier. What we're actually building is the layer that makes agents reliable — across tools, across time, and eventually across the whole organization.

If you've already run into this, you're early. You're exactly who we built it for.