Building Agent-Ready APIs in Production with Erik Wilde
On OpenAPI 3.2, agent-ready APIs, and why MCP might not be the answer you think it is
Erik Wilde has spent more than 12 years working on APIs in every form, from communication protocols to enterprise API platforms, governance frameworks, and now the question of what it takes for APIs to actually work for AI agents. He holds degrees in computer science from TU Berlin and a PhD from ETH Zurich, has contributed to multiple open standards, and is an OpenAPI Ambassador at the OpenAPI Initiative. He currently works at Jentic, where he focuses on making API landscapes usable for the next generation of agentic consumers.
Erik joined Deep Engineering Live interview session to talk about OpenAPI 3.2, what agent-ready APIs actually look like, and why he is more skeptical about MCP than most people expect.
Watch the full conversation below.
A note on format: this session was recorded live as part of the Deep Engineering Live Interview Series. The transcript below has been lightly edited for clarity and readability. Audience members joined the conversation and asked questions directly during the session.
Q. Tell us about your background and how you ended up working on APIs.
I have been working on APIs, in some shape or form, all of my life. I started with communication systems and protocols and then moved into the API space proper about 12 years ago. I have mostly worked for companies that sell enterprise software in that space, so typically API gateways and API platforms, the kinds of things where large companies have a lot of digital capabilities and a lot of those have APIs. More and more, companies have realized that the better you maintain, manage, extend, and govern that real estate, the easier it becomes to develop new applications and to realize potential that is within the company but needs a little bit of digging to get to.
Then about a year ago I met the two founders of Jentic, and they described to me what they were building. Very briefly, what they want to do is build a platform where agents can use APIs, because oftentimes the APIs that exist might not be the ideal ones for agents, and you also might want to control those agents a little more because you might not be confident they always do the right thing. We all know that AI has a tendency to sometimes have surprising ideas. I really liked that idea, so I decided to join. I have been at Jentic now for just over half a year and it has been a great experience. I still talk about APIs because in the end, without APIs, there is simply no AI.
Q. OpenAPI 3.2 shipped last September. What changes have the highest operational impact for engineering teams, and which are mostly nice to have?
3.2 is a maintenance release. It is backwards compatible and does not change things dramatically. What it is not, and I want to start there, is AI focused. That is what we are planning for the next version, 3.3, where we really want to think more aggressively about what it would take to make OpenAPI specifically more AI friendly.
That said, even in 3.2, some of the improvements are more meaningful than they might first appear. The tag system has been extended so that tags, which you use to group and annotate operations, are now a hierarchical space rather than a flat one. You can have tags and subtags and so forth. That is something people always wanted to do. The reason it matters for AI is that anything that makes an API description semantically richer, anything that allows descriptions to carry more meaning, is valuable for agents. So thinking about how you describe your APIs not just as technical endpoints but as semantic services, with rich schemas, descriptions at every level, and well-defined error messages, that is where I think the real operational value lies right now.
At Jentic we have released a scoring mechanism for APIs so you can find out whether your API is AI friendly or not. A lot of what that scoring looks at is the kinds of things that have always been good API design practice: put in more descriptions, include examples, make your error messages clear and actionable. The difference now is that where a human developer might look at a poorly described API and figure it out from experience and context, an agent that cannot figure out how an API works will simply move on to the next one. It has less context and less tolerance for ambiguity. So the APIs you design now will probably be around for a couple of years, and starting to think about this new class of consumers is worth doing today.
Q. Streaming is also now explicitly supported in 3.2. When teams document streaming, what details separate readable from implementable and testable?
Streaming always was something people were doing. I think it has just become so much more visible because that is how all the AI APIs work. When you use a chatbot and you watch the response appear word by word, that is streaming in action. And what 3.2 does is give you a slightly more explicit way to document that in OpenAPI. That is actually a very common pattern with OpenAPI improvements over the years. It is not that something entirely new is added. It is more that people can now formally document something they have been doing all along, but that was not well covered by the specification.
WebHooks are another good example of this. WebHooks have been popular for a long time. I was surprised when somebody gave me a statistic saying that around 60 percent of the 100 most popular APIs use WebHooks. That is a remarkably high number, but it makes sense because WebHooks are a convenient pattern. You do something with an API, and at some point the API can call you back and say this process is finished, go and fetch your results. People had been doing that for a long time, but it was never explicitly supported in OpenAPI. And then at some point the specification simply gets extended to cover what practitioners are already doing. That is what makes it more complete over time.
Q. The 3.2 tag structure now supports nesting. How do you use tags as information architecture for large API catalogs, and how do you govern that taxonomy across teams?
That is a good and very demanding question, because it goes well beyond OpenAPI and into whether you have a data dictionary or some general framework for how things get named in your organization. Organizations always have a hard time doing that because it is hard to agree on terms, and it is hard to make sure that everyone understands which terms exist, what they mean, and when to use them. Tags are no different. They give you a way to assign meaning to things in your OpenAPI description, but what that meaning is is entirely up to you.
Until now tags were relatively minor things. The typical pattern was to say here are all the operations about customers, here are all the operations about products, and so on, and documentation tools would then group things by tag. With the hierarchical tag structure in 3.2, you could go much further. You could have a hierarchy of unlimited depth if you want, where each thing in your API is linked to some kind of data dictionary or ontology. I have not seen people doing that yet, but I am pretty sure they will start.
That said, my recommendation would be not to go crazy building a complex standalone tag taxonomy inside OpenAPI. If you start introducing complex terminology with different hierarchies and groupings, you probably also need to align that with every other place in your organization where things get tagged, whether that is databases, document stores, or wherever you manage information. So check what your general information architecture looks like. What dictionaries or terminologies are already established? Then think about how you map those into the OpenAPI tag model rather than inventing a whole new taxonomy that lives only in your API descriptions.
Q. On linting as a quality gate: how do you design a rule set taxonomy that maps cleanly to real ownership, the way platform teams and product teams each have different responsibilities?
What linting is being used for right now is governance and a level of automation. The goal is that when people start designing or changing APIs they get quick feedback on whether they are following guidelines or not. A good number of organizations publish their rule sets openly on GitHub. I have a collection of around 30 or 40 publicly accessible ones. The Zalando ones are popular because they have been around for a while. Adidas has some solid ones. There are also some published by government and e-government initiatives. So there are plenty of references.
Linting is useful but it has real limitations. The popular tools, whether that is Spectral, Vacuum, or Redocly, all work in a similar way. You have rules that apply to certain parts of your OpenAPI description and they check for structural conditions. Something like, this operation must have a description and the description must be at least 20 characters. It is really a structural check. And that is useful. I would absolutely recommend doing it.
What I am not a big fan of is just reusing existing rule sets wholesale. I would always say start owning this, build up your own in a collaborative fashion. Have a GitHub repository somewhere where developers can propose and discuss new rules, argue for whether a guideline is worth following, and then get it merged into your shared rule set once there is enough agreement. You might also have different rules for different stages of the API lifecycle. Some rules are so important that every code check-in has to follow them. Others might only apply to APIs you expose to external partners, where you want higher quality standards. So you end up with rule sets that are tuned by the consumer type or the lifecycle stage, or both.
But as I said, linting has limits. At Jentic we use Spectral and Redocly as part of our API scoring checks, but we also have a good number of LLM-based checks, because if you are scoring APIs for AI readiness, what matters is not just whether a description field exists but whether it is written in a way that is actually useful for an agent. Those are the kinds of checks that typical linting tools cannot do because they operate at the structural level. So linting is a solid and by now fairly standard first line of defense, but also look a little beyond it.
Q. How do you set severity levels like error, warning, and informational, and what is an exception policy that avoids lint fatigue without lowering the floor?
Severity levels really should be what you would expect. If something is non-negotiable and needs to be fixed before anything moves forward, that is an error. There is no discussion. Then you have warnings, where the message is that this is not great but it is acceptable, though you should consider fixing it. It gives the developer a signal without blocking them. And then informational messages, which honestly I am not sure are that interesting for developers to act on directly. What I have seen done a couple of times is that informational-level messages are not really meant for developers to read at all. They are intended for downstream tooling. The linter surfaces an observation that is then picked up by some other tool in the pipeline. So the informational channel becomes a way for the linter to communicate with tooling downstream rather than with the developer.
Q. On large specs with tens of thousands of lines, linting performance and PR feedback loops become real constraints. What repository or spec structuring patterns reduce friction without fragmenting the contract?
What you probably want is to avoid always linting the whole thing. Large specifications are never in one file. They are assembled from a whole bunch of sources, schemas, references, and components from various places. So it makes much more sense to have your checks in place at those individual source locations rather than only at the assembled specification level. Instead of linting the full spec at the end of every pipeline run, start linting when you make changes to the schemas and the smaller pieces that feed into the overall description.
If you do that with a reasonable level of discipline, you avoid the compounding effect where you finally lint the big spec and get hit with hundreds of errors you have been quietly accumulating. Do not treat linting as the last step. Do it as early as possible, as close to where the change is actually happening as you can. That is the pattern that keeps the feedback loops short and the debt manageable.
Q. There is a proposal for OpenAPI 3.3. What are you personally most interested in seeing there?
For me, because of where I work right now, the big issue is how we could improve OpenAPI specifically with a focus on AI. We have not done that so far in any serious way. There are a whole bunch of discussions within the OpenAPI Initiative around how that could be done.
Some of it is about semantics. Some of it is about making clearer when and how long an API is actually going to be around, which is something agents care about in ways that human developers traditionally have not. Agents always use an API at runtime. They discover it, decide it looks like a good API to use, and then need to figure out what it does, what it does not do, what its side effects and constraints are. All of that could be surfaced in a much more accessible way through the API description itself rather than sitting only in human-facing documentation.
One idea I find genuinely interesting is the relationship between OpenAPI and Arazzo. Arazzo is a workflow language, published by the OpenAPI Initiative, that lets you orchestrate sequences of OpenAPI interactions. You can say: to accomplish this goal, call this endpoint, then that one, then that one. It is a simple orchestration language layered on top of OpenAPI. What would be really cool is if an OpenAPI description could link to an Arazzo workflow and say, if you use this operation, it actually makes the most sense as part of this workflow you can find over there. Figuring out multi-step workflows is one of the hardest things for agents to do right now, and Arazzo is genuinely good at describing those. We just need to make it discoverable. So that is one of the directions I would love to see 3.3 move in.
And as a reminder, the OpenAPI Initiative is open source and open to everyone. You do not need to be a member, you do not need to pay anything. The discussions happen primarily on Slack. If you have ideas or questions, just come and join. It is a very active and welcoming community. Check out openapis.org, and note that the S matters.
Q. With MCP consolidating under the Linux Foundation’s AI foundation, what is the minimum governance surface an enterprise needs before agents can use tools broadly?
I am still a little skeptical about MCP, honestly. I may very well be wrong, but what I would really encourage everyone to do is first think about your API estate and really invest in your APIs, rather than obsessing too much over MCP specifically. Whatever you invest in better APIs becomes useful for everyone. Developers can use it, agents can use it, partners can use it. If you invest specifically in MCP, that investment is effectively scoped to LLM consumers. And that may sometimes make sense, but it is important to keep in mind that the API landscape is the foundational layer you will be working with long term, and MCP may or may not stick around.
At Jentic we do support MCP because at this point you have to, but we are not deeply invested in MCP itself. If MCP went away and something else came along, that would not be a significant problem for us. We think of what we do as delivering capabilities to agents, and MCP is the current delivery mechanism. You need a delivery mechanism, but I would not build too many things that are MCP-specific. That would be my personal view.
Q. From an audience member: what makes an API truly agent-ready in production compared to a standard REST API?
One of the things I like to use as an illustration is the GitHub API. The current GitHub API version three has around 1,100 operations. GitHub is a complex product and there is a lot you can do with it, so 1,100 operations is not unreasonable. But for an agent to work directly with that API is quite complex, because a large number of those operations need to be combined in a certain way to produce the workflows that you actually want to accomplish on GitHub.
Now compare that to the GitHub MCP server, which has around 70 tools. Way fewer, and they are much higher level. They represent entire workflows, entire things you might want to do on GitHub, rather than the more atomic operations you find in the native API. What I would argue is that if you had a genuinely agent-friendly GitHub API, it might also just have around 70 operations. Not 1,100. Right now those 70 are available through MCP because that is what GitHub decided to build, and that is fine, but the point is that if you have an agent that wants to get things done, it will be significantly happier with 70 well-described higher-level operations than with 1,100 lower-level ones.
The properties that make an API agent-ready follow from that. It should not be too fine-grained. The descriptions should be written at a level that is meaningful for an LLM, which means intent-based and human-readable, not just technical. It should have examples, and ideally multiple examples rather than just one. Error messages should be meaningful and actionable, giving the agent enough information to understand what happened and what it might do next. And if you make those improvements, you almost certainly also improve the developer experience as a side effect, so it is not a speculative investment.
Q. On API deprecation and sunsetting: how should agents handle the lifecycle signal that an API they depend on is entering a sunset cycle?
Deprecation and sunsetting are genuinely important to me. I have written some small standards for how an API can actually surface that information at runtime. And I think we will see more and more of these runtime mechanisms being built out, because agents consume APIs at runtime by design. They discover an API, start using it, and then ideally they should also be able to discover that the API is only going to be available for another two weeks. At that point, a well-designed agent might alert someone, or start looking for a replacement, or whatever the right behavior is for that situation. What exactly to do about it is a separate design question. But as a consumer of an API, this is information that is relevant, and if we can surface it at runtime, consumers can react at runtime. That feels like an obviously good thing to pursue.
Q. On request and response schema design: how do you design schemas so that an LLM can reliably choose the correct operation, handle partial failures, and avoid duplicating side effects?
Schema design becomes part of the general question of how you design OpenAPI for AI consumption. You want descriptions in your schemas, not just in your operations, so that an LLM can understand what individual fields actually mean rather than just their names and types. Names that carry meaning help too. Parameters named X, Y, and Z are much harder for an agent to reason about than parameters with names that reflect their actual intent.
Beyond that, I think we are going to see interesting evolution in how APIs handle the granularity of what they return. Right now the standard REST model is relatively static: here is a request schema, here is a response schema. But if you are working with agents that are trying to minimise token usage and context pollution, there is a real case for APIs that can return only the fields that were actually asked for. GraphQL has a nice built-in capability for this, which is one of the things that makes it interesting for agentic use cases. REST does not have that natively, but you could layer something on top. We will see how that evolves, but it is one of the more interesting design questions in this space right now.
Q. What workflow patterns show up repeatedly when enterprises actually start working with Jentic, and what makes them stable as APIs underneath them change?
One example we were not expecting, which is always a good sign when you start talking to real enterprises, is the partner integration scenario. If you have a relatively complex API that you expose to partners, that is a large engineering effort for each of those partners. They have to understand the whole API even if they only need a small part of it.
What we now actively pursue, because it keeps proving useful, is creating specific workflows for specific partners. You say, this partner only wants to do these particular things, so they get a set of workflows built on top of the API that match their actual use cases. They do not need to understand the full API surface. They just need to understand the workflows that were created specifically for them.
And the stability point is interesting. As long as you develop your APIs in a backwards compatible way, those workflows remain stable even as the underlying APIs change. As a workflow user you do not even need to know that the APIs underneath now do additional things. You just keep invoking the same workflows and they continue to work. The moment you break a backwards compatible API is the moment you also break the workflows depending on it. So the discipline of backwards compatibility pays off at every layer.
Q. Looking ahead six months, what should a senior engineer or platform engineer watch closely in standards, tooling, or governance for agent-facing APIs?
What I would recommend, starting from tomorrow morning, is to begin thinking about agents in your planning even if you do not have them yet. And I acknowledge that the term agent has become fairly meaningless at this point. Everything seems to be called an agent now. But what I do see when talking with organisations is that certain types of agents are already getting real use, customer support agents and some HR agents being the most common. These are agents that are useful across industries, and you can mostly buy them, hook them up to your documentation, and they work.
What you see much less of right now, despite all the talk, is what I would call real business agents in production, where a piece of software can sense things, take action, and make decisions. Agents that actually have agency. And I believe we will see more and more of these, not necessarily all at once, but incrementally. You trust them with a little more next year, and a little more the year after.
Because of that, I would highly recommend making the AI readiness of your APIs part of your standard practice now. API landscapes evolve slowly. Whatever you design or change today will probably be around for a year or two or three before you touch it again. So ask yourself whether your linting and your design practices are optimising only for developer experience, or whether they are also starting to account for agent experience. The good news is that optimising for agent experience tends to improve developer experience as a side effect. You are not making a speculative bet. You are making something better for everyone while also preparing for what is coming. If you work on API platforms or in platform engineering, start thinking now about how your API landscape will need to evolve as you have more and more agentic consumers. Because it is going to arrive. That is at least my personal view.
Erik Wilde is Head of Enterprise Strategy at Jentic and an OpenAPI Ambassador at the OpenAPI Initiative. He is the creator of the Getting APIs to Work channel on YouTube. This interview was conducted by Saqib Jan, Editor-in-Chief of Deep Engineering.




