Deep Engineering Specials: Enterprise AI has an API problem

The next enterprise AI bottleneck is not model capability. It is whether agents can discover, understand, and safely use the systems they need to act on

Jun 02, 2026

If you have 30,000 APIs, you probably have 300,000 endpoints across your organization. While that sounds like a problem of scale, it is actually one of design.

With agents discovering and calling APIs at runtime rather than developers hardcoding them at build time, that design problem has become one of the most urgent infrastructure questions in enterprise engineering.

This month’s special issue digs into how APIs built for developers need to become discoverable, understandable, governed, and safe runtime capabilities for agents, with commentary from Erik Wilde, Head of Enterprise Strategy at Jentic; Nandita Giri, Senior Software Engineer at Microsoft; Rohan Gupta, Principal Product Manager at Harness; and Mayank Bhola, Co-Founder and Head of Products at TestMu AI.

Let’s get started.

Special issue — June 2026

Your APIs were built for developers, not agents

“You don’t just look for APIs when you’re writing an app. You kind of look for APIs every time you solve a problem.” — Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador

Most large enterprises have no idea how many APIs they have. Ask them and the honest answer is usually somewhere between a guess and a shrug. What they do know is that the number is large, the endpoints are larger, and the documentation is somewhere between incomplete and missing. For years that was manageable because the people consuming those APIs could compensate. They had context, experience, and enough judgment to work around the gaps in a poorly written spec or an ambiguous parameter name.

For the better part of two decades, engineering teams designed APIs for a specific kind of consumer, a developer sitting at a keyboard, reading documentation, and making deliberate decisions about which endpoints to call and in what order. That consumer had context, experience, and the judgment to fill in the gaps that a poorly written spec inevitably left open. The API did not need to be perfect because the developer compensated for its imperfections at design time, before a single line of integration code was written.

That assumption breaks down when the consumer is an agent, says Erik Wilde, Head of Enterprise Strategy at Jentic and OpenAPI Ambassador. Agents do not read between the lines, compensate for ambiguous parameter names, or infer the intent behind a generic error response. They act on what the contract says, at runtime, every time they need to solve a problem, and the gap between what most enterprise APIs offer and what agents actually need is where many AI projects begin to fail before they deliver measurable value.

Anthropic’s 2026 State of AI Agents Report found that 46% of engineering teams cite integration with existing systems as their primary challenge when deploying agents, placing it above model capability, prompt quality, and every other factor on the list. The bottleneck is infrastructure, and that infrastructure depends on APIs that were never designed for this kind of consumer.

Masterclass: Building AI-Ready APIs with Agent Skills

Join OpenAPI Ambassadors Erik Wilde and Frank Kilcommins for a hands-on masterclass on building AI-ready APIs with agent skills, covering OpenAPI, Overlay, Arazzo, semantic discovery, deterministic workflows, and governance guardrails for agent-driven integrations.

🗓️ July 1, 2026 · 10:30 AM – 1:30 PM ET · Online

Use code DEEPENG50 for 50% off.

Agents discover APIs at runtime, developers do not

In our live interview, Wilde gave one of the clearest framings for how agent consumption differs from developer consumption. Developers search for APIs when building an application, make a decision, and hardcode the integration so it stays consistent for the lifetime of the application. Agents search for APIs at runtime, every time they encounter a problem they need to solve, against a catalog that may contain hundreds of thousands of options across a large enterprise.

That changes the API problem from documentation quality to runtime selection. The consumer is no longer a skilled person who can fill in the gaps of a poorly written spec. It is a machine that acts on exactly what the contract says, nothing more and nothing less, and it does so without the accumulated context that a developer brings to the integration process.

Wilde illustrated the scale of this problem by sharing a recent experience with a car manufacturer operating roughly 50,000 APIs and 500,000 endpoints across the organization. The point is not that this number is exceptional. For a large enterprise with decades of accumulated systems and services, it is closer to the normal condition than most teams would like to admit. What changes with agents is the cost of that normal condition. When the consumer needs to find the right capability at runtime, the selection problem alone can make the API landscape effectively unusable without a serious restructuring of how capabilities are described, organized, and exposed.

Agents cannot compensate for spec drift

“Think of the API as a contract with a very literal, very curious machine.” — Nandita Giri, Senior Software Engineer at Microsoft

Nandita Giri, Senior Software Engineer at Microsoft with prior engineering experience at Meta and Amazon, works across agentic AI and automation, and the pattern she observes across organizations working to become AI-ready is consistent and predictable. Teams invest in producing a good OpenAPI specification at launch, treat it as a first-class deliverable at the time of release, and then watch the specification and the actual API behavior silently diverge over the following months as the code evolves faster than the documentation does.

For developer-facing APIs, this drift is a manageable nuisance because developers notice the discrepancy, ask questions in Teams, Slack or a GitHub issue, and someone eventually updates the documentation before the next consumer runs into the same problem. But for agent-facing APIs, spec drift is not a nuisance. It is a silent failure mode that is exceptionally difficult to trace because agents have no mechanism for noticing the discrepancy between what the spec says and how the API actually behaves. They act on what the spec says, encounter failures they cannot interpret without the surrounding context that a developer would have, and either produce incorrect results or abandon the task entirely without surfacing a meaningful error to the system that called them.

The only way to stop that drift from compounding, Giri argues, is to treat the specification as a first-class part of the release process on every change, with CI pipelines that validate spec fidelity against actual runtime behavior before deployment proceeds, not as a quarterly audit task but as a gate that blocks release when the spec and the actual behavior have diverged.

Giri is equally specific about what good specifications actually require for agent consumption, and her examples are concrete enough to apply immediately. A field called status that returns values 1, 2, and 3 is useless to an agent unless the spec also documents that 1 means New, 2 means In Progress, and 3 means Completed, because the agent has no way to infer that mapping from the field name or the values themselves. An endpoint that documents only that it returns a 400 error for bad input, without specifying which input combinations trigger that response, leaves an agent unable to prevalidate its requests or recover gracefully when the error occurs. A rate limit that appears only in external documentation and not in the spec itself is invisible to any agent that has not been specifically trained on that external documentation. These are not edge cases that organizations can deprioritize. They are the normal state of most enterprise API specifications, and they are a primary reason why agents fail in ways that produce poor results without surfacing a clear explanation of what went wrong.

The same distinction applies on the API producer side. Standard linting tools check structure, including whether a description field exists, whether it meets a minimum length, and whether required parameters are present. That structural check is genuinely useful as a first line of defense, but it cannot evaluate whether a description is written in a way that helps an agent understand what the operation is actually for.

A field that passes every linting rule can still be useless to an agent if it describes what the endpoint does technically without explaining the intent a consumer would bring to it. Descriptions need to represent intent, including what somebody would use the operation for, what constraints apply, and how the agent should reason about the result. The gap between a description that passes a linting check and a description that an agent can act on reliably is the gap that most teams are not yet closing, and closing it requires evaluation mechanisms that go beyond pattern matching on the specification itself.

Cross-service inconsistency breaks agent workflows

Explore DevOps Insights from Rohan Gupta

“It’s not just about connecting the dots for AI agents. It’s about making sure they understand what those dots mean.” — Rohan Gupta, Principal Product Manager at Harness

Rohan Gupta, Principal Product Manager at Harness, approaches the same problem from the perspective of an organization managing APIs across many teams and many services. His concern extends beyond the quality of any individual specification to the consistency of API design across the entire landscape. When agents operate in enterprise environments, they rarely interact with a single service in isolation. They move through workflows that cross multiple services, passing data and decisions from one system to another, and every inconsistency between how different teams have designed their APIs adds friction at the exact points where agents need to reason about how to connect things together.

Gupta’s view is that API specifications must be well-annotated and thoroughly documented so that agents can understand and execute the tasks they are given with accuracy and clarity, and that the design sloppiness which developers could historically compensate for becomes a structural blocker when the consumer is a machine reading a schema as its only source of truth. Missing descriptions, vague parameter names, inconsistent error handling patterns, and exposed implementation quirks that make no sense outside the context of the original development team all force agents into guesswork, and agents that guess tend to fail in ways that are difficult to reproduce and harder to debug than the original error would have been.

The governance problem becomes harder at the cross-service level. If one service in an agent’s workflow provides ambiguous or outdated information, the agent can be misled into triggering actions on a completely separate system in ways that no individual team would have anticipated or authorized. Lifecycle management for APIs that agents consume cannot focus only on backward compatibility within a single service anymore. It has to account for the cross-platform consistency and auditability of changes across every service in every workflow that agents are permitted to traverse, which is a meaningful expansion of what API governance has historically required.

APIs that work for developers fail agents in production

“If you can’t explain why your agent made a decision, you’re not ready to go live.” — Mayank Bhola, Co-Founder and Head of Products at TestMu AI

Mayank Bhola, Co-Founder and Head of Products at TestMu AI, has a practitioner’s view of where the failure patterns actually surface when organizations move from building agentic systems in development to running them in production. The pattern he observes is consistent across teams and organizations. APIs that worked reliably for developer consumption fail at meaningful rates when agents start calling them, and the root cause is almost always constraints and rules that were documented in external guides or tribal knowledge rather than encoded explicitly in the specification itself, leaving agents with no mechanism for knowing those rules exist until they violate them and encounter a failure they cannot interpret.

The fix Bhola advocates for is not simply better documentation, because better documentation that lives outside the machine-readable contract is still invisible to agents. It requires rethinking how APIs surface information about their own behavior, making all constraints explicit within the spec itself and building API surfaces that are structured to reduce the cognitive overhead agents face when trying to understand what an endpoint does, when to call it, and what the consequences of calling it incorrectly might be. For organizations with established API landscapes, he recommends maintaining two parallel layers, with a legacy developer API preserving backward compatibility for existing integrations and an AI-optimized layer built on top of it that flattens nested data structures, makes all constraints and relationships explicit, and exposes capabilities at a level of abstraction that agents can act on without needing to combine multiple lower-level calls to accomplish a single business task.

Bhola believes the industry’s biggest blind spot is assuming that successful API consumption automatically leads to reliable agent behavior. In practice, many failures emerge after the API call succeeds. The agent selects the wrong tool, misinterprets context, follows an invalid reasoning path, or takes an action that technically satisfies the request but violates business intent. This is why validation infrastructure must be designed before deployment rather than after incidents occur.

Testing agentic systems requires teams to evaluate decision quality, tool selection accuracy, reasoning traceability, and behavioral consistency under changing conditions. The goal Bhola highlights is not just to verify outputs, but to understand whether the agent arrived at those outputs for the right reasons.

Too many endpoints, not enough intent

The structural problem underneath all of this is that most enterprise APIs are too fine-grained for agents to use reliably, even when every individual specification is perfectly written and maintained. As Wilde frames it, accomplishing anything meaningful often requires combining many different endpoints in a specific order that encodes implicit business logic which is obvious to a developer who understands the domain but entirely opaque to an agent that has only the API contracts to work from.

When doing something meaningful requires chaining thirty endpoints in the right sequence, agents become confused about how to combine them, inventive in ways that produce incorrect results, or they make errors partway through the sequence that cascade into larger failures that are difficult to unwind. Wilde’s position is that AI readiness requires reducing the number of endpoints agents are exposed to and improving the business alignment and intent-based nature of the APIs that remain, so that a workflow that wants to accomplish a task ideally needs only a single tool call rather than having to orchestrate many lower-level calls in the correct order. The solution he and his colleagues at Jentic are working toward is a workflow layer that sits above the existing fine-grained API landscape, exposing business-level capabilities that are designed for runtime discovery and agent consumption rather than for developer integration at build time.

This pattern already shows up in enterprise partner integrations. Organizations with complex APIs that they expose to partners face a specific version of the fine-grained problem, where a partner integrating with a large API surface has to understand the full landscape even when they only need a small part of it, and the engineering effort of that integration is significant enough to slow or block adoption entirely.

The solution Wilde describes is building purpose-built workflows for specific partners, so that a partner only needs to understand the workflows that were designed for their particular use cases rather than navigating the full API surface independently. The underlying APIs do not change. What changes is the layer of business-level capabilities that sits above them, designed for a specific consumer’s needs rather than for maximum flexibility across all possible consumers. The benefit for agents is the same as the benefit for partners, with fewer options to navigate, clearer intent at each step, and a much lower chance of combining things incorrectly.

The insight that makes this approach worth pursuing beyond its value for agents alone is one that Wilde makes explicit. This improvement is not only valuable for agents. Any developer who currently has to call fifteen underlying APIs to accomplish a task that should conceptually be a single operation would also benefit from a better-designed capability API on top of those underlying services. The investment in agent-readiness is an investment in the overall quality and usability of the API landscape, and the returns compound across every consumer of those APIs whether that consumer is a human developer or an autonomous agent running at runtime.

The API layer is where the next two years are decided

Wilde’s view of API lifecycle management is the right closing frame for this issue. Agents do not consume APIs the way developers do. They discover capabilities at runtime, decide whether a tool looks useful in the moment, and need machine-readable signals about what the API does, what constraints apply, what side effects it may trigger, and whether it is safe to keep using.

That changes how organizations need to think about versioning, deprecation, and governance. The old model assumes that a developer reads the documentation, notices a migration notice, and updates an integration on a schedule the team can manage. Agent-facing APIs need more of that information to be visible at runtime. If an API is being deprecated, if a capability is nearing sunset, or if a safer replacement exists, the consuming system needs a way to discover that signal before it makes a decision.

This is where API lifecycle management needs to move, and organizations that invest in the governance structures to support it now will be better positioned than those that wait for the pressure to become unavoidable. The agents are already in production, and the limiting factors are no longer model capability alone but integration, security, and operational scalability, which means the API layer is where the most consequential infrastructure work of the next two years will happen for most engineering organizations.

The same design assumption that broke enterprise APIs, that the consumer has context, judgment, and the ability to fill in gaps, is present in every other infrastructure layer that agents call at runtime. Wilde’s framing brings the issue back to a practical rule. Agents should not be used to compensate for infrastructure that fails to express intent, constraints, lifecycle state, or safe operating boundaries. The teams that build on infrastructure designed to make those signals explicit will ship more reliable agentic systems than those still working around infrastructure that was never designed for this kind of consumer.

Thank you for reading this special issue of Deep Engineering on why the API layer has become the most consequential infrastructure problem in enterprise AI.

We’ll be back on Thursday with more expert-led content, and next month, on the first Tuesday of July, with another special issue.

Keep building,
Saqib Jan
Editor-in-Chief, Deep Engineering

Discussion about this post

Ready for more?