Deep Engineering #48: Erik Wilde on Agent-Ready APIs, Widespread MCP Adoption, and the OpenAPI Standards That Matter
On the abstraction level problem, the limits of linting, and why investing in your API foundation matters more than chasing the current delivery protocol
Building Reliable AI Agents with Java and LangChain4J
A hands-on workshop covering how to build production-grade AI agents using Java and LangChain4J.
🗓️ Friday, June 13 · 10:00 AM – 1:30 PM ET · Online
2 for 1 deal is live. Use code DEEPENG50 for 50% off.
✍️ From the editor’s desk,
Welcome to the 48th issue of Deep Engineering!
Google announced Managed Agents in the Gemini API two days ago at Google I/O, making it possible to spin up an agent that can reason, use tools, execute code, and browse the web with a single API call. The infrastructure work that previously required teams to build and manage sandboxes, scaffolding, and execution environments is being abstracted away. The capability is in public preview and Google is clear that outputs should be reviewed before use in sensitive workflows, but the direction is quite clear. Deploying agents is getting significantly easier.
What is not getting easier at the same pace is making the APIs those agents will call worth calling. APIs designed for actual developers, who can tolerate ambiguous descriptions, infer intent from sparse documentation, and navigate hundreds of operations to find the right one, do not work the same way for agents. Agents are less reliable at resolving ambiguous API semantics, choosing among many overlapping operations, and safely composing actions without machine-readable contracts and guardrails.
Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador at the OpenAPI Initiative, has spent considerable time on solving for that gap. We spoke with Wilde about what agent-ready actually means in practice, and he explained from his engineering purview why MCP will not fix a poorly designed API foundation, and what platform engineers should start planning for today.
The expert insights in today's issue are based on our recent live interview with Wilde and you can read or watch the full Q&A here.
Let’s get started.
Featured Newsletter: Machine Learning at Scale.
Are you a SWE looking to upskill into ML systems? Get high quality ML system design content delivered to your inbox. Learn how to design and scale Machine Learning Systems.
Subscribe to Machine Learning at Scale
🧠 Expert Insights
Your APIs Are Not Ready for Agents, and MCP Will Not Fix That
by Saqib Jan with Erik Wilde
The conversation about AI agents in enterprise software dominates engineering mindshare as to how agents will consume APIs and what it actually takes to make that consumption work reliably. Most organisations have taken the shortcut. They have built an MCP server, pointed it at their existing API landscape, and told themselves the agent problem is solved. Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador at the OpenAPI Initiative, thinks that is the wrong bet, and his reasoning is specific enough to be useful.
“Whatever you invest in better APIs becomes useful for everybody,” Wilde affirms. “If you invest specifically in MCP, that investment is effectively scoped to LLM consumers.” The point is not that MCP is useless. It is that MCP is a delivery mechanism, and delivery mechanisms change. The API foundation underneath it does not change nearly as quickly, and if that foundation is poorly designed for the agents that will eventually consume it, no amount of tooling stacked on top of it will compensate. The organisations that will be in the strongest position in two years are the ones investing in the foundation now, not the ones chasing the current delivery protocol.
The abstraction level problem
The clearest way to understand what makes an API agent-ready is to look at a concrete example, and Wilde in our interview offered one that makes the problem immediately legible. The GitHub REST API currently has around 1,100 operations. That is not unreasonable for a product as complex as GitHub. A developer can navigate 1,100 operations because they bring context, experience, and the ability to read documentation and infer intent. They know roughly what they are looking for and they can work toward it even when the path is not obvious.
An agent does not work that way. “For an agent to work directly with that GitHub API is pretty complex,” Wilde points out, “because a lot of those operations need to be combined in a certain way to result in the workflows that you really want to accomplish on GitHub.” The agent has to figure out not just what each individual operation does but how they compose, in what order, under what conditions, and with what dependencies. With 1,100 operations, the combinatorial space of possible workflows is enormous, and agents navigating it without guidance will produce unreliable results.
Now look at the GitHub MCP server, which has around 70 tools. Each of those tools represents a higher-level workflow, something a developer might actually want to accomplish on GitHub rather than a low-level operation that contributes to that accomplishment. The reduction from 1,100 to 70 is not a loss of capability. It is a gain in usability for the specific class of consumer that is trying to get things done rather than explore a surface. “What I would say,” Wilde argues on this point, “is that if you had a genuinely agent-friendly GitHub API, it might also just have around 70 operations.” The MCP server is not adding something new. It is providing the abstraction level that the underlying API should have provided in the first place.
This is the abstraction level problem, and it is the most important design question for engineering teams building API infrastructure that agents will consume. The APIs that were designed for developer flexibility, with many fine-grained operations that compose in powerful ways, are exactly the wrong shape for agents that need to accomplish specific goals reliably. The discipline of designing for agents is the discipline of asking what a consumer actually wants to accomplish and surfacing that at the API level, rather than exposing every atomic capability and leaving the composition to the consumer.
What agent-ready actually means
The properties that follow from the abstraction level insight are consistent and actionable. An API designed for agent consumption should not be too fine-grained, and its descriptions should be intent-based and written at a level that is meaningful for a language model rather than just technically accurate for a developer who already knows the domain. It should have examples, ideally multiple examples per operation rather than one, because examples are one of the most reliable ways for a model to understand what an operation actually does in practice. Its error messages should be meaningful enough that an agent encountering a failure has enough information to understand what happened and what it might do next.
“If an AI agent looks at a poorly described API and cannot figure out how it works, it will just move on to the next one,” Wilde notes. “It has less context. It has less experience. It does not really know as well as an actual developer what to do.” This is the practical consequence of the abstraction level problem at the description level. A developer reading a sparse API description can fill in the gaps from domain knowledge and engineering experience. An agent cannot do that reliably, and the result is not a helpful error or a clarifying question. It is a silent failure or a wrong action.
Wilde and his team have built a scoring mechanism for API readiness that makes these dimensions concrete. The scoring uses a combination of standard linting, running tools like Spectral and Redocly to check structural conditions, and LLM-based checks that evaluate whether descriptions are written in a way that is genuinely useful for an agent rather than just present. The distinction matters because a description that exists and passes a structural check may still be useless for an agent if it describes what an operation does technically without explaining what a consumer would use it to accomplish. “These descriptions need to represent intent,” Wilde highlights. “What is the intent of somebody who would use this operation?”
Linting though necessary is not sufficient
Linting has become standard practice in well-run API programs, and Wilde endorses it as a first line of defense. The popular tools are capable and in some cases open source, and the practice of defining shared rule sets that teams can discuss, extend, and maintain in version control is genuinely useful. But in our conversation he was clear that linting alone does not get you to agent-ready, and teams that treat it as the complete solution are leaving the most important problems unaddressed.
The structural checks that linting tools perform are exactly that. They can tell you whether a description field exists and whether it meets a minimum length requirement, but they cannot tell you whether the description is written in a way that helps an agent understand what the operation is for. They can flag a missing example but cannot evaluate whether the examples present give a model enough signal to use the operation correctly in a novel context. The gap between what linting checks and what agent readiness requires is the gap between structure and meaning, and closing it requires evaluation mechanisms that go beyond pattern matching on OpenAPI descriptions.
Wilde also makes an important point about rule set governance that is worth taking seriously. “I am not a big fan of just reusing existing rule sets,” he contends. “I would always say start owning this, build up your own in a collaborative fashion.” The Zalando and Adidas rule sets that circulate in the API community are useful references, but they were built for specific contexts and specific quality standards. Adopting them wholesale means inheriting decisions that were made for a different organisation’s constraints. The value of a rule set comes not just from the rules it contains but from the process by which those rules were agreed upon, which is a process that builds shared understanding of what good API design actually means in a particular context.
MCP is a delivery mechanism, not a foundation
MCP has been growing fast. It is now under the Linux Foundation, major model providers support it, and a growing number of enterprise vendors are shipping MCP servers as a standard part of their product offering. For engineers deciding where to invest, it looks like an obvious answer to the question of how to make APIs accessible to agents.
Wilde’s skepticism is not about MCP’s current momentum. It is about what MCP is and what it is not. “MCP is the current delivery mechanism,” he says. “You need a delivery mechanism, but I would not build too many things that are MCP-specific.” At Jentic, the team supports MCP because it is what the market expects right now, but they have deliberately avoided deep investment in MCP-specific infrastructure. If MCP were replaced by something else, the transition would be straightforward because the underlying work, making APIs well-described, well-structured, and semantically rich, would carry over entirely. That work is not MCP-dependent. It is foundational.
The risk for teams that invert this priority is real. Building an MCP server on top of a poorly designed API landscape means the MCP server inherits all of those same problems. Operations that are too fine-grained stay too fine-grained, descriptions that lack intent stay unreadable to a model, and error messages that tell a human nothing tell an agent even less. The wrapper changes the protocol by which those problems reach the agent, not the problems themselves.
Open standards outlast any delivery protocol
One of the clearest threads in Wilde’s thinking is the value of building on open standards rather than specific tools or protocols. This is not an abstract preference for openness. It is a practical argument about optionality. Teams that build their API practices on OpenAPI, Arazzo, and Overlays are building on specifications that are independent of any vendor, any model provider, and any current delivery protocol including MCP. When the next delivery mechanism arrives, or when the current tooling landscape shifts, the foundation remains.
Arazzo is worth understanding in this context. It is a workflow language published by the OpenAPI Initiative that allows you to describe sequences of API interactions in a standardised format. If accomplishing a particular goal requires calling five endpoints in a specific order with specific dependencies, Arazzo is the language for expressing that. For agents, which struggle with exactly this kind of multi-step composition, a well-constructed Arazzo workflow is one of the most useful things an API producer can provide. “Figuring out multi-step workflows is one of the hardest things for agents to do right now,” Wilde says, “and Arazzo is genuinely good at describing those. We just need to make it discoverable.”
Overlays, the third specification from the OpenAPI Initiative, provides a way to express changes to an OpenAPI description in a standardised diff format. “We use overlays,” Wilde shares, “to deliver improvement suggestions alongside API scores. When the scoring mechanism identifies that an API is not well-designed for AI consumption, it also produces an Overlay that shows exactly what would need to change to improve it.” That makes the gap between current state and agent-ready state concrete and actionable rather than a list of abstract recommendations.
The APIs you design today will still be running in two years
The practical implication of everything Wilde argues is a specific recommendation about timing. API landscapes change slowly. Whatever is designed or changed today will likely remain largely unchanged for one to three years. Agents are arriving in enterprise contexts incrementally but consistently. The customer support and HR agents that are already deployed broadly are the early wave, and the business agents with genuine decision-making authority are behind them.
“API landscapes evolve slowly,” Wilde says. “Whatever you design or change today, you will probably have around for a year or two or three before you touch it again.” The teams that start building API readiness for agents now are the teams whose infrastructure will be in the right shape when agents with more capability and more authority arrive. The teams that wait for agents to become mainstream before improving their APIs will find themselves doing expensive remediation work on a landscape that is already in production and already depended upon.
The recommendation is not to stop shipping features or to redesign everything at once. It is to make agent readiness a standard consideration in the decisions that are already being made. When writing a new operation, write the description for an agent as well as for a developer. When adding examples, add enough that a model can generalise. When defining error responses, add enough context that a consumer without domain knowledge can understand what happened. These are not large investments per decision and they compound over time into an API landscape that agents can actually use.
To this end, “All the platform people out there who are building API platforms or doing platform engineering,” Wilde says, “think about how all of this will change if you have more and more agentic actors and consumers in your organisation, and start planning for that today, even if you can say that right now you do not have it this much and it is going to be another year or two. It is going to arrive.”
In case you missed
Here’s the full Q&A with the interview video featuring Erik Wilde.
🛠️ Tool of the Week
Spectral — open-source JSON and YAML linter with built-in support for OpenAPI, Arazzo, and AsyncAPI
Validates OpenAPI v3.1, v3.0, v2.0, Arazzo v1.0, and AsyncAPI v2.x out of the box.
Supports fully custom rule sets, letting teams build and own their own governance standards.
Integrates with VS Code, JetBrains, GitHub Actions, and Azure API Center for shift-left linting.
📎 Tech Briefs
Google Managed Agents now in Gemini API - A single API call now provisions an ephemeral Linux sandbox with code execution, web browsing, and tool use built in.
Kyverno 1.18 released post-CNCF graduation - First post-graduation release patches two SSRF CVEs and adds cleanup policy support to the Kubernetes policy engine.
OpenAI and Dell bring Codex to on-premises enterprise - The partnership makes the Codex coding agent available in hybrid and air-gapped enterprise environments for the first time.
A2A protocol underpins Google’s full agent stack - Agents built at any abstraction level can be called as sub-agents across the entire Google Cloud agent platform.
43% of AI-generated code fails in production - Survey of 200 SRE leaders finds teams average three production redeploy cycles to verify a single AI-suggested fix.
That’s all for today. Thank you so very much for reading this issue of Deep Engineering.
We’ll be back next week with more expert-led content.
Stay awesome,
Saqib Jan
Editor-in-Chief, Deep Engineering
If your company is interested in reaching an audience of senior developers, software engineers, and technical decision-makers, you may want to advertise with us.






