A Week In AI — Issue #6
The Harness Is the Edge
Look at the announcements from Microsoft, Anthropic, OpenAI, and Google over the last six months and tell me you can still describe a meaningful structural difference between their stacks.
Microsoft Foundry, AWS Bedrock OpenAI, Google’s Gemini Enterprise & Antigravity, and Anthropic… they’re all providing Models, Agent runtimes, evaluation frameworks, and the ability to secure workloads. The names, marketing, and packaging are different, but the stack is the same.
This is what convergence looks like, and it’s been creeping up on the industry quietly while everyone was focused on the next model release.
The last issue I wrote about was the agentic world being in its “settling phase,” and convergence at the platform layer is exactly what you’d expect to see in that phase. The big ideas got worked out. Everyone implemented them. Everyone’s roadmaps started rhyming. We’re now in the part of the cycle where the question isn’t “which platform has the better primitives” because they all have the same primitives. The question now is “what do you actually do with them?”. This is why I think the most important shift in mindset right now is this: stop chasing the new nifty tool and start asking what your stack truly needs to perform well in production.
That’s a bigger shift than it sounds. AI is getting to a point where we can start thinking about it like it’s “boring”. We can start thinking about the most important piece of the puzzle, which is implementing AI in production and getting it to run alongside current workflows, tooling, and platforms.
Shifting Gears: Where The Edge Goes Now
Aside from the above, this is the claim I’ve been thinking about this week: the next wave of real production wins isn’t going to come from a better Model. It’s going to come from a better Agent harness.
I believe we’re closer to the LLM plateau than most people are admitting. The hosted frontier Models will keep getting incrementally better. They’ll get faster, cheaper, and modestly smarter. But is the era of one Model release rewriting your roadmap? I think that era is closing. The reason why is that the Models have been training on all of the “data” and information that exists in the world, and that’s A LOT… but think about it - how much changes over the course of 3-6 months? Well, a fair amount, but not as much as “the entire world’s data and information for thousands of years”.
What isn’t plateauing is the harness. The harness is wide open and that’s where I’d be investing your team’s attention right now.
The thing that will continue to make big changes model-wise is the domain-specific Models and Small Language Models. Models that are geared toward a specific purpose and speciality will be key, aside from Agent Harnesses, moving forward.
Stack vs Harness
I’ve been throwing these two terms around interchangeably in some conversations and that’s a mistake. They’re related, but they’re not the same thing, and the distinction matters.
Your Agent Stack is your full, end-to-end production suite. The Models, the Gateway, the registry, the MCP layer, the identity provider, the policy engine, the observability, the deployment pipeline, and the Agent runtime. It’s the infrastructure your Agents run on. Luckily, the patterns and components that exist have largely been converged on via the industry in terms of what good looks like.
Your Agent Harness is different. The harness is everything you put together to make a specific Agent actually perform as expected. The prompts, tool definitions, context retrieval, memory strategy, evaluation suite, guardrails, MCP Server tools, and Agent Skills are what make up the harness. The LLM is the brains of the operation, and the Agent Harness is the tools that the Agent uses to specify how the brain should act.
The stack is what everyone is going to have. The harness is what differentiates the organizations that get real value out of Agents from the organizations that don’t.
What This Looks Like In Practice
If I’m an architect or platform engineer right now, the question I’d be asking my team is: where are we spending our time?
If most of the engineering hours are going into “evaluating the new tool that came out this week,” that’s great because we should all still be curious and interested in what’s going on. We do, however, need to spend bandwidth figuring out how it all works in production as well.
For example, if a lot of hours are going into “how do we structure the harness for our specific Agents, our specific workflows, our specific organizational patterns”, that’s where the leverage is.
The stack is becoming a commodity. The harness isn’t, and won’t be anytime soon. That’s a good place to spend your energy.
More next week.
Michael
