A Week In AI: Issue #7
Identity Is the Unit of Policy Now
Here’s a policy example I was working with this week:
jwt.act.sub == "research-agent" && mcp.tool.name in ["search", "fetch_doc"]
That expression says something the AI space took a few months to figure out: the unit of policy isn’t the network anymore, it isn’t the user, and it isn’t even the service. It’s the Agent identity.
If the actor in the token is the research-agent, it can call search and fetch_doc. Anything else? Denied. Different Agent? Different tool list. Same MCP Server, different scoped capabilities, all expressed against identity claims.
This is a real shift, and most of the conversations I had this week were some version of it. They included things like:
MCP tool scoping, which tools are exposed, and which aren’t.
Tool selection that depends on which Agent is asking.
Agent isolation enforced at the policy layer, not at the network layer.
The patterns are all converging on identity as the primary axis.
The plumbing that makes this work is either OBO or another method to collect an Agents identity (e.g - grabbing the Agent Identity from kagent via agentgateway). This allows the Agent’s request to carry both the original user identity AND the Agent’s identity, separately, all the way through to the MCP Server. Without that, you can’t write the policy expression above as there’s no jwt.act.sub to evaluate against. Identity-aware policy is what you build on top of it.
A few issues back I wrote that security urgency was reshaping the order in which people build. This is the next layer of that story. Now that organizations have decided to put a Gateway in the path, the immediate next question is “what do we enforce there, and against what?” The answer they’re landing on is identity. Specifically, the Agent’s identity as a first-class detail, not just the human’s.
If you’re designing your Agent stack right now, give the policy expressiveness more weight than you think you should. The question to ask isn’t “can I block tools” because every Gateway can block tools. The question is “can I block tools based on which Agent is calling, what context it’s in, what claims its token carries, and what action it’s trying to take.” That’s where the real production requirements are going.
Hiding the Model
Intent-driven model routing or “Auto-mode” (Whatever your organization calls it) is the pattern where the user or the Agent doesn’t pick a Model, the Gateway does, based on the shape of the request. A simple lookup goes to a small, fast Model. A reasoning-heavy task goes to a frontier Model. A high-context summary goes to whatever’s best in the long context that day. The user sees an Agent. They never see “this Model vs that Model.” This can also go under the “failover” pattern. For example, I hit a token limit, so I automatically get pushed to a lower frontier Model (e.g - Sonnet) instead of having the ability to use Opus.
The common thread is the same thing the harness conversation pointed at last Newsletter. Model selection is moving from the application layer down into the infrastructure layer. The Agent doesn’t need to know and the user doesn’t need to know. The Gateway picks the right Model for the request based on policy, limits, and the task at hand.
Bring Your Own Key (BYOK)
Copilot, whether it’s via GitHub, the Copilot SDK, or the CLI client (like claude or codex) has been popping up quite a bit within enterprise environments. This makes sense as the majority of large enterprises are running Microsoft-based software, and Copilit is finding its way in through various means (e.g - through the O365 suite). For the customers who are using Copilot, BYOK comes up often. The concept of BYOK is you can use Copilot, but instead of using a Microsoft 365 license, you can use your API key from Anthropic, OpenAI, Gemini, etc., which makes sense as these same organizations may also have enterprise licenses with said LLM providers.
BYOK allows you to use Copilot, but with your own API key from whichever provider you’d like.
Code Mode
The basic idea: instead of using an MCP Server and its tools, you interact with a standard API that uses the OpenAPI format spec. The Model writes code and the code calls the tools. It’s a different mental model for tool use, and in a lot of cases, it’s dramatically more token-efficient than the standard MCP Server tool-calling pattern as an Agent will ingest all of the tools definitions, which causes a massive amount of token usage for no reason.
This connects to the progressive disclosure work from Newsletter issue #2 as it’s the “same family” of optimizations, but a different mechanism. Both are about not paying tokens for tool definitions you aren’t using, and both can produce double-digit-percentage cost reductions on real workloads. If you have a high-volume Agent workload and you haven’t measured what Code Mode would do to your token bill, you’re probably leaving money on the floor.
Aside from the token savings, the reality is that sometimes you don’t want to hook up an MCP Server to an Agent to utilize tools. Sometimes, you just want your Agent to have the ability to call an API, but still be able to observe and secure that traffic like you can with MCP servers.
Related but distinct: GitHub Copilot now exposes itself as an MCP Server, which means other Agents can use Copilot as a tool. That’s an interesting inversion. Copilot started as a coding assistant for humans. It’s becoming a tool that other Agents call as part of larger workflows.
Quick Notes
A stateless MCP variant is moving through the spec process. Stateless transport is a real ergonomic improvement for a lot of production scenarios and is easier to scale, easier to load-balance, and easier to deploy behind standard infrastructure. Worth tracking once it lands. You can learn more here: https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/
Agent Substrate is an isolation/sandbox platform that sits on top of k8s. It runs Agent-like workloads called “actors”. A really neat implementation is that it uses its own control plane for users to interact with (via gRPC), not the k8s control plane. Instead, in incorporates Kubernetes for what it’s best for; orchestration and resources.
On observability: this is top of mind for every organization, and luckily, the way agentgateway exposes a ton of data points (metrics/traces/logs), they can be ingested into any monitoring and observability platform. Prometheus/Grafana, Datadog, Azure Monitor, App Dynamics, New Relic, doesn’t matter. The key is using OTel exporters/ingestors. Observability isn’t the bottleneck for Agent production deployments.
More next week.
Michael
tity.
If the actor in the token is the research-agent, it can call search and fetch_doc. Anything else? Denied. Different Agent? Different tool list. Same MCP Server, different scoped capabilities, all expressed against identity claims.
This is a real shift, and most of the conversations I had this week were some version of it. They included things like:
MCP tool scoping, which tools are exposed, and which aren’t.
Tool selection that depends on which Agent is asking.
Agent isolation enforced at the policy layer, not at the network layer.
The patterns are all converging on identity as the primary axis.
The plumbing that makes this work is either OBO or another method to collect an Agents identity (e.g - grabbing the Agent Identity from kagent via agentgateway). This allows the Agent’s request to carry both the original user identity AND the Agent’s identity, separately, all the way through to the MCP Server. Without that, you can’t write the policy expression above as there’s no jwt.act.sub to evaluate against. Identity-aware policy is what you build on top of it.
A few issues back I wrote that security urgency was reshaping the order in which people build. This is the next layer of that story. Now that organizations have decided to put a Gateway in the path, the immediate next question is “what do we enforce there, and against what?” The answer they’re landing on is identity. Specifically, the Agent’s identity as a first-class detail, not just the human’s.
If you’re designing your Agent stack right now, give the policy expressiveness more weight than you think you should. The question to ask isn’t “can I block tools” because every Gateway can block tools. The question is “can I block tools based on which Agent is calling, what context it’s in, what claims its token carries, and what action it’s trying to take.” That’s where the real production requirements are going.
Hiding the Model
Intent-driven model routing or “Auto-mode” (Whatever your organization calls it) is the pattern where the user or the Agent doesn’t pick a Model, the Gateway does, based on the shape of the request. A simple lookup goes to a small, fast Model. A reasoning-heavy task goes to a frontier Model. A high-context summary goes to whatever’s best in the long context that day. The user sees an Agent. They never see “this Model vs that Model.” This can also go under the “failover” pattern. For example, I hit a token limit, so I automatically get pushed to a lower frontier Model (e.g - Sonnet) instead of having the ability to use Opus.
The common thread is the same thing the harness conversation pointed at last Newsletter. Model selection is moving from the application layer down into the infrastructure layer. The Agent doesn’t need to know and the user doesn’t need to know. The Gateway picks the right Model for the request based on policy, limits, and the task at hand.
Bring Your Own Key (BYOK)
Copilot, whether it’s via GitHub, the Copilot SDK, or the CLI client (like claude or codex) has been popping up quite a bit within enterprise environments. This makes sense as the majority of large enterprises are running Microsoft-based software, and Copilit is finding its way in through various means (e.g - through the O365 suite). For the customers who are using Copilot, BYOK comes up often. The concept of BYOK is you can use Copilot, but instead of using a Microsoft 365 license, you can use your API key from Anthropic, OpenAI, Gemini, etc., which makes sense as these same organizations may also have enterprise licenses with said LLM providers.
BYOK allows you to use Copilot, but with your own API key from whichever provider you’d like.
Code Mode
The basic idea: instead of using an MCP Server and its tools, you interact with a standard API that uses the OpenAPI format spec. The Model writes code and the code calls the tools. It’s a different mental model for tool use, and in a lot of cases, it’s dramatically more token-efficient than the standard MCP Server tool-calling pattern as an Agent will ingest all of the tools definitions, which causes a massive amount of token usage for no reason.
This connects to the progressive disclosure work from Newsletter issue #2 as it’s the “same family” of optimizations, but a different mechanism. Both are about not paying tokens for tool definitions you aren’t using, and both can produce double-digit-percentage cost reductions on real workloads. If you have a high-volume Agent workload and you haven’t measured what Code Mode would do to your token bill, you’re probably leaving money on the floor.
Aside from the token savings, the reality is that sometimes you don’t want to hook up an MCP Server to an Agent to utilize tools. Sometimes, you just want your Agent to have the ability to call an API, but still be able to observe and secure that traffic like you can with MCP servers.
Related but distinct: GitHub Copilot now exposes itself as an MCP Server, which means other Agents can use Copilot as a tool. That’s an interesting inversion. Copilot started as a coding assistant for humans. It’s becoming a tool that other Agents call as part of larger workflows.
Quick Notes
A stateless MCP variant is moving through the spec process. Stateless transport is a real ergonomic improvement for a lot of production scenarios and is easier to scale, easier to load-balance, and easier to deploy behind standard infrastructure. Worth tracking once it lands. You can learn more here: https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/
Agent Substrate is an isolation/sandbox platform that sits on top of k8s. It runs Agent-like workloads called “actors”. A really neat implementation is that it uses its own control plane for users to interact with (via gRPC), not the k8s control plane. Instead, in incorporates Kubernetes for what it’s best for; orchestration and resources.
On observability: this is top of mind for every organization, and luckily, the way agentgateway exposes a ton of data points (metrics/traces/logs), they can be ingested into any monitoring and observability platform. Prometheus/Grafana, Datadog, Azure Monitor, App Dynamics, New Relic, doesn’t matter. The key is using OTel exporters/ingestors. Observability isn’t the bottleneck for Agent production deployments.
More next week.
Michael

