If you’ve watched the last twelve months of agent demos, you’ve seen the same shape over and over: the model drafts something impressive, hands it back to the user, and stops. The user copies, pastes, switches windows, logs into the third-party tool, pastes again, clicks publish. The agent did the visible work. The user did the boring, repeatable, stateful work.
That gap has a name now. It’s the execution trust layer.
The Model Context Protocol (Anthropic, November 2024) gave us the syntax of agent-tool communication — tools/list, tools/call, JSON-RPC over stdio or HTTP. That was the necessary first move. What it deliberately didn’t do — and shouldn’t have, because protocols are not products — is define the semantics of safe agent action. Who can approve a publish? How does idempotency work under retry? Where does the audit trail live? When does measurement fire and what does it write back? MCP left those questions open. They are also the questions every business cares about the moment an agent touches the real world.
The execution trust layer is the answer to those questions. It’s the semantic layer on top of MCP’s syntactic layer. It doesn’t replace MCP. It runs on top of it and makes the agent’s tool calls safe to execute.
The vocabulary that’s been missing
A stateless LLM can do a surprising amount of business work. It can write a launch post that fits LinkedIn’s 200-character hook, draft a 4-email sequence with the right send-day cadence, score a piece of copy against brand voice, generate three Product Hunt variants. The writing is solved.
What’s not solved is the stateful work the model is structurally incapable of doing inside a single call:
- Holding OAuth tokens across sessions. When the agent picks LinkedIn as a channel, somebody needs to remember the refresh token. The model can’t.
- Enforcing “no publish without explicit human consent” server-side. Approval can’t live in the prompt — the prompt is replayable. It has to live in a database column that flips from
awaiting_approvaltoapprovedvia a signed transition. - Polling measurement APIs 24 hours after publish. The model isn’t running 24 hours later. Something stateful has to wake up, hit GA4, pull engagement, and write the result somewhere the next launch can read.
- Persisting tenant-scoped memory across runs. Brand voice samples, repo facts, channel performance, claim-risk flags — they outlive any conversation. They belong to the workspace, not the context window.
- Guaranteeing idempotency under retry. Same idempotency key, same result. A flaky network on retry shouldn’t double-publish. The retry-safe behavior lives in the server’s state, not the agent’s memory.
- Joining every action back to its run lineage. Six weeks later, when somebody asks “who approved that post and on what basis?”, the audit trail has to answer. The model has no idea what it said last Tuesday.
We call this set the agentDependency. It’s a load-bearing field in every launch response from ChiefLab’s MCP, and we put it on the homepage on purpose. It’s not marketing — it’s the inventory of work the model cannot do alone.
Why this matters now
Every MCP server eventually has to answer the same question from a sceptical buyer: “Why is this better than just calling the model directly?” The polite answer is “we have a nice tool wrapper.” The honest answer is the six items above.
The execution trust layer isn’t a value-add. It’s the part of the system that lets the model touch the real world safely. Without it, you have a chatbot. With it, you have agent infrastructure — the same shape as Stripe (payments infrastructure), Vercel (deployment infrastructure), Cloudflare (network infrastructure). Each of those is the stateful, server-side piece that a stateless caller delegates to.
The Stripe analogy is precise. Your LLM is the developer writing the code. The execution trust layer is paymentintents.create — the server-side primitive that holds the card token, enforces idempotency, persists the receipt, and reports the chargeback two weeks later. Nobody asks Stripe “why can’t I just call the Visa API directly?” because the answer is obvious: you can, and then you have to build a state machine, an audit log, an idempotency cache, a webhook reconciler, and SOC 2 compliance. Or you can call Stripe.
What an operator is
In our language, an operator is a server-side tool that performs one category of business work and conforms to the execution contract. The contract is a six-stage lifecycle: prepare → review → approve → execute → measure → remember. Every operator implements all six.
The lifecycle is the canonical answer to the agent-versus-infrastructure question:
- Prepare — the operator gathers context (repo evidence, tenant brand, connector readiness) and returns drafts + staged actions. No external side effects.
- Review — the calling agent surfaces drafts inline in chat (
renderInChat). The human reads the actual content, not metadata. - Approve — the human flips an action’s status. Two paths: in-chat (
chieflab_approve_action) or web (signed reviewUrl). - Execute — only approved actions fire. Connectors run server-side with idempotency. The model is not in the loop here.
- Measure — 24 hours later, the operator pulls metrics from the relevant connectors. Tied to the originating
runId. - Remember — measurement + human signals (rejections, edits) flow back into the per-tenant brain. The next launch grounds against the result.
A “tool with MCP” implements steps 1 and maybe 4. The execution trust layer implements all six. That’s the load-bearing difference.
The vocabulary, in one paragraph
An operator follows the execution contract by staging actions as awaiting_approval, surfacing them via renderInChat to the calling agent (with reviewUrl as a fallback for phone or multi-person approval), waiting for the approval state machine to flip status, executing through connectors server-side under idempotency keys, and persisting the result + human signals to the per-tenant brain so the next run grounds against compounding state. The agentDependency list — OAuth tokens, approval state, signed reviewUrls, idempotency, run lineage, cross-launch memory — is what a stateless LLM cannot replicate inside a context window.
If a tool you’re evaluating doesn’t fit that paragraph, it’s a chatbot. If it does, it’s a piece of agent infrastructure.
Read the spec, build an operator
ChiefLab is the reference implementation of the contract. We ship six operators today — chieflab-launch, chieflab-post, chieflab-email, chieflab-measure, chieflab-brain, chieflab-connect — and the spec at chieflab.io/spec/v0.1 is MIT-licensed. If you’re building an operator (in any category — not just GTM), follow the same lifecycle, list yourself in .well-known/mcp.json, and the vocabulary travels.
The execution trust layer isn’t a product. It’s a category. We just happen to think the GTM operator is the right place to start.