Packt Deep Engineering: Interviews

Clean C++ Code, and the Hidden Cost of Complexity with Sándor Dargó

Saqib Jan — Wed, 22 Apr 2026 11:30:00 GMT

Sándor Dargó has spent years making large C++ systems easier to maintain, safer to change, and cheaper to run. At Spotify he works on codebases where performance, binary size, and clarity have to coexist, and where the cost of getting any of those trade-offs wrong shows up in production. He writes daily about C++ on his blog, speaks at conferences, and sat down with Deep Engineering Live to talk about C++26, what it takes to write code that survives real-world conditions, and what the shift to AI-assisted development is doing to the way engineers work.

Watch the full conversation below.

A note on format: the transcript below has been lightly edited for clarity and readability. During the live session on Deep Engineering Live, questions were displayed on screen for Sandor to read, and we also brought audience members on stage to ask their questions directly.

Q. There is a thread running through your talks on clean code, binary size, undefined behavior, and now C++26. What problem are you actually trying to solve?

I try to reduce complexity in real-world systems. After all, I think that’s our main job as software engineers: to turn the complex into the simple. If you think about clean code, it clearly reduces the cognitive load. If you think about binary size, it might reduce operational cost, depending on your situation. It might even lead to more users, though I’m not sure I should delve into that one. Undefined behavior clearly reduces hidden risk. New standards like C++23 and C++26 reduce boilerplate and enable safe and more readable abstractions.

I think all of these topics connect. They make large C++ systems more maintainable and more evolvable. And most of my talks start from problems I actually encountered. I try to solve my own problems, but they are not unique. I just try to share what I learn on the go.

Q. From the vantage point of a staff engineer responsible for a large codebase, which two or three features in C++26 do you expect to most change everyday design decisions?

Everyone is talking about contracts and reflection. That’s going to change everything. I’m not sure about the time scale though. If you look at C++23 support right now, even that is not complete yet, especially if you look at the differences across compilers. You go on cppreference, check what’s implemented on which compiler, and we are simply not there yet.

Given that time scale, I’m not sure about the answer. But contracts and reflection are the big ones. I don’t think I’ll be able to use those in a production environment in the next one or two years. I hope I’ll be wrong.

Q. If you were reviewing an architectural proposal that leaned heavily on these features, what are the first red flag questions you would ask?

It depends on the environment. If we are in a widely-used production environment and these are very new features, I’d probably ask if those approaches are actually proven to work, and how maintainable they are. For the time being, we simply lack the experience with these new features. We are still trying to discover how to use them properly.

Being among the first adopters is sometimes good. Sometimes it’s better to be in the second line. It really depends on the environment. But I would look for already-proven, maintainable usage. This is pretty much what happens with almost all new major standard versions. There are people coming and saying, can we use modules? And you often end up saying, certainly not yet. There’s no cross-platform support. I can imagine that will be the case for reflection and contracts in the first few years.

Q. What does a responsible adoption plan look like for a big feature like contracts or reflection?

It really depends on your environment. If you target one platform and you know which compiler you’re using, and the feature was shipped as ready, then you can go ahead and try. But if you have to support different platforms or compile with different compilers in the same pipeline, you first have to check if all of them are supporting that feature properly.

In some of my earlier environments, I simply couldn’t start using even C++20 for a long time because not every feature we needed was supported on all the different compilers. In other teams, we said, okay, we use this compiler, it’s shipped, let’s go for it.

What you have to make sure is that even if for some reason you have to fall back to the previous compiler version, you don’t have to change your code. It would be quite a pity to move to a new version, start using concepts from C++20, and then in two weeks they say there’s a problem, we must go back. And then you realize it’s not just updating the compiler version, you actually have to change the code. So check that you have a safe fallback plan.

Q. Your talk “Clean Code, Horrible Performance” is a deliberately provocative title. What is the actual answer?

The title was a question, a provocative one. Someone very active in the community told me I shouldn’t have said it because it was misleading. Maybe it was. The whole point was to frame it as a question. My answer is no. Clean code does not imply horrible performance.

We must admit that in some constrained environments and on hot paths, you must optimize for performance and forget about readability. Otherwise your software just won’t meet its nonfunctional requirements. The most well-known example is probably how the square root function was optimized for Quake III. But in my experience, even in environments with very high throughput, readability and maintainability gained through clean code were always more important than optimized performance. Wherever I’ve worked, network latency and database read and write times dominated.

At the same time, I’ve seen people optimizing for heap allocations, saying we shouldn’t allocate for a string there, while at the same time they were making network requests in a loop. That just doesn’t make much sense. Amdahl’s Law says the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that part is actually used. Slowness is also relative. If your code takes a long time to execute due to network latency, then relatively speaking, the heap allocation is not so slow anymore. I’m not saying you should put everything on the heap. I’m saying don’t worry about things that don’t really matter in your environment.

Q. If you had to write a one-page policy for a large C++ codebase covering the trade-offs between readability, performance, and binary size, what would be on that page?

If it’s a one-pager and you don’t know the exact environment or the nonfunctional requirements, it would probably be language-agnostic. The number one point would be: default to readability. You still read code more often than you write it. And not to mention agents, but they also prefer simplicity.

Second: if you have to optimize, measure first. Don’t start optimizing for performance before you prove it is actually a problem. You don’t optimize just because you can. You optimize if you need to. Otherwise, you might just waste time, or worse, you think you’re optimizing the necessary while you don’t touch the real problem. Measure first, optimize after.

Third: optimize only the hot path. You’ll find that with the measurements. Keep the hot path isolated and well documented. That will help you later.

And last but definitely not least: make the trade-offs very explicit. In code reviews, but also in the code itself. Leave comments. Because if you sacrifice clarity, document why. Otherwise someone later will come in and think, this doesn’t make any sense, let me make it cleaner. They are unaware of why certain choices were made. I’ve been there. I came in thinking something didn’t make sense, and by the time I realized it slightly changed the binary size in a way that mattered, some pull requests were already merged. Trade-offs there will be. But make them conscious and share the knowledge.

And this is probably even more important in the new world of agentic coding. Agents will not know the context that teams share with each other. You have to have things written down, and preferably in the codebase, because that’s what they can read.

Q. What are the silent killers of binary size that creep into C++ systems over months or years?

That’s a really broad topic. I wrote a series of articles on this and had a workshop at CppCon on the effects of programming styles on binary size. I made it very explicit that those articles and the workshop were not for embedded engineers, because they operate on a different scale and care about different orders of magnitude.

There are environments where every single byte matters. I never worked in such an environment. But there’s the other end of the spectrum where you might have to think about, bear with me, hundreds of megabytes. I know many of you might laugh, but I’m not kidding. When I first heard about binary size as a problem, it was due to a common library that many services shared. We hosted maybe two dozen services on that server. A few changed almost every week, others barely changed in a year. We had to keep ten or twelve different versions of the same library, and that library was over 100 megabytes. Just by making sure all services got the new version every few weeks, and we didn’t have to store more than two or three versions of that big library, we solved the binary size issue. That was seven or eight years ago.

In terms of programming patterns, the most overlooked area is unoptimized compiler and linker settings. You might gain the most from there. We tried many different code-level changes and it was satisfying, shaving off a few kilobytes here and there. Then we changed some settings and half a megabyte was gone. That’s often completely overlooked.

Template overuse is another one. We had a framework where adding a new object with no logic whatsoever, just the boilerplate, already added around 20 kilobytes because of the heavy templating. We moved away from that. Unnecessary use of std::function can also be problematic. We maintain our own backport of move-only function from C++23 specifically for that reason. And exceptions, while not always a silent killer, can be significant, though there’s interesting work being done to reduce their footprint considerably.

Q. How do you move code review conversations from taste to shared criteria?

Arguing over taste is never a good investment of time. Not just in engineering. You need to move from taste to agreements. And to do that, you have to communicate, discuss, and eventually decide. And it might not be very popular what I’m going to say, but your workplace is not a democracy. Certain people have more to say based on their experience and their responsibility.

But what’s important is that you introduce some shared decision framework. Track binary size in the CI pipeline so you see the effects at the end of every build. Track performance metrics. Once you have numbers, the conversation is not about taste anymore.

For the things you cannot quantify, like coding style, coding dojos are genuinely useful. You practice together, explore different approaches without delivery pressure, and over time you move from phrases like “I like this more” to “that’s actually the style we agreed on.” Discuss, educate together, share what you learn, and measure what matters to you.

Q. What are the most common mistakes when working with time and clocks in C++?

The most common mistake is choosing the wrong clock. Maybe you don’t fully understand the different guarantees each clock offers. For example, instead of using steady_clock to measure a short interval for a retry logic, you use system_clock. And then later, due to some bug, you figure out that system_clock is not a monotonic clock. It can jump backwards due to NTP adjustments or manual clock changes.

Another problem is unsafe conversions and cross-system time. Time is relatively easy when it’s in one system. But when you have different systems and different platforms, you can end up with clocks using different epochs, different precisions, or different time sources. When you try to compare or convert times from different systems, be very cautious. Test with all the different platforms. Debug and see what’s going on.

If you still use C-style APIs, things go wrong easily because they don’t give you the type safety that chrono durations give you. You might have to use C-style APIs, but try to isolate those parts and do the conversions at the boundaries. Within what’s within your control, rely on modern C++ time representations. Use chrono wherever you can.

For APIs specifically: keep your APIs abstract enough so that they are testable. Don’t rely on the system clock directly. Inject a time provider so you can test different assumptions about your code.

Q. You run a daily C++ quiz and have been blogging for years. What gaps have you noticed consistently, even in experienced C++ developers?

There are two main ones. The first is what I’d call a depth gap. C++ is a massive language. The standard is about 2,000 pages. Even if you are an expert in one area, it doesn’t mean you master all the others. You might be a master of template metaprogramming but know nothing about multithreaded programming. Best practices can be quite different across industries and across different kinds of C++ environments.

We should be humble enough to acknowledge our boundaries and say “I don’t know.” In the beginning of your career, it’s natural to do that. And with decades of experience, you’re confident enough to say it again. But in between, it’s more difficult. The sooner you can make that shift, the better. I once said in an interview that I didn’t know anything about a particular topic and didn’t want to guess. They said, well, we don’t really use that either, let’s skip it. I got hired at the end.

The second gap is fundamentals. I’ve seen many senior-level engineers who are really good at architectural questions, articulate and thoughtful. But they had difficulty writing some very simple algorithms under pressure. Not hard LeetCode problems. Simple ones. I’m not a fan of LeetCode-style interviews, but you do have to be able to solve problems live with someone watching. That’s something you won’t learn on the job. You have to practice on your own.

Q. What has the shift to AI-assisted development changed for senior and staff engineers?

There are two parallel shifts. One is the language itself evolving. With C++23 and C++26 we need a bit less template metaprogramming wizardry now that we have concepts. And safety has become a central topic in a way it wasn’t before. You can see that in the kinds of proposals the committee is now accepting.

The other shift is about how we work. As a developer, you’re expected to be professional in agentic coding. To be an AI-first developer, some would say. It’s as if you’ve become a team lead of agents. You keep giving tasks to them, reviewing the code, tuning your instructions.

Q. Viktor Nikolov joined us from the audience to ask a follow-up question on this. He wanted to hear more about the AI shift and what it means for engineers day to day.

I think as a developer in this new world, you have to learn to like your job again. Or still.

Before, you’d get your tasks at the beginning of a sprint or a week, and then you’d go back and start to explore the requirements, explore the code. It took some time. You slowly built up the models in your head and thought about the different kinds of solutions. You might even enter the so-called flow state, which requires focus and a bit longer time. And I think we’ve kind of lost this over the last few months.

We became, often, just prompters. Many of us complained even before that we are living in a world of constant context switching. But it just became even worse. Because at the same time, most probably, you will try to prompt different agents with different problems at the same time, and you keep jumping from one window to another. Maybe from one meeting to another, because others are also moving faster. At least they think they move faster.

Basically, you’ve lost everything, or almost everything, that you liked about your job. But we have to adapt somehow to this new situation. And mentally, it’s very difficult.

I read something very interesting recently on The Pragmatic Engineer, which is a great Substack if you haven’t come across it. They quoted research saying that in the beginning you ship more code, because it became so much easier. But you don’t just ship more code. You ship worse code. And that gain in speed is vanishing after a few months because you start accumulating technical debt at the same time. What first seemed faster becomes not faster, but the debt stays.

I also try different ways of working with agents that keep me happy but also try to speed me up, approaches that don’t remove what I like in this job but actually help. It’s difficult. And I’m happy to continue this conversation with anyone who wants to reach out.

Q. What would you tell engineers starting to build with C++ today, whether in a new codebase or an existing one?

The most common thing I see is defaulting to shared pointer when unique pointer is the right choice. People complain that smart pointers are slow, but they are defaulting to shared pointer instead of unique pointer, which is fast and cheap. Often you don’t really have to draw a line, you just have to know what to pick and not default to the easy option.

More broadly: performance is not for the sake of performance. You don’t write faster code because you can. You write faster code because you need to. If you don’t need it, default to readability and default to safety. And if you work in an environment where network latency or database latency dominates, you will not care so much about the cost of a heap allocation. Optimize for your actual environment, not your assumptions about it.

And document why you made the choices you made. Not what the code does, but why it is structured the way it is. That’s what makes a codebase survivable over time. Especially now that agents are reading it too.

Q. C++ versus Rust. Some engineers in the audience asked about this. Are C++ jobs being taken over by Rust?

I’m not sure how many jobs are actually being taken over. I had C++ colleagues who fell in love with Rust and moved to other companies just to use it. I don’t necessarily see that as a huge threat. C++ is not going away anytime soon, simply because it’s an old and evolving language and we have plenty of systems out there that you just won’t replace. Even if C++ is not strictly needed for a domain, the cost and risk of replacing it is too high. There will be COBOL jobs for decades for the same reason.

What I do think is that the overall pie of engineering jobs is growing. Rust is taking a bigger slice, but the pie itself is bigger. And moving between languages is becoming easier because agents can help you understand an unfamiliar codebase quickly. That lowers the switching cost over time.

C++ is already evolving. We are talking about C++32. The language is not standing still.

Sandor Dargo is a senior software engineer at Spotify, the author of a daily C++ quiz and blog at sandordargo.com, and a regular speaker at C++ conferences. This conversation was recorded live on Deep Engineering.

Knowledge Graphs, GraphRAG, and Real-Time AI in Production with David Knickerbocker

Saqib Jan — Wed, 15 Apr 2026 12:30:00 GMT

This conversation with David Knickerbocker keeps returning to a single conviction: the best engineering starts with intentional problem definition, and most AI failures happen when teams rush to use a tool before understanding what they are actually trying to build.

Knickerbocker has spent his career across cybersecurity, data operations at Intel, McAfee’s AI research team, and healthcare IT, before founding Bert Intelligence and Grooveseeker. He is the author of Network Science with Python, published by Packt, which argued years before GraphRAG became mainstream that graphs and natural language processing belong together as a single discipline. He has been writing code since he was six years old and spent twenty-eight years living in Okinawa, Japan before returning to the United States.

The conversation covers what it actually takes to build a knowledge graph system with data fresh up to a minute old, why his Verdant Eye system treats knowledge as claims rather than facts, how graph anchoring reduces hallucination space in ways that similarity-based retrieval cannot, and why deliberately forgetting old data is not a failure mode but a design principle. He also walks through his purpose-built testing philosophy, his three production GraphRAG systems, and what working with open source intelligence in adversarial environments teaches you about AI that clean-dataset engineers never have to confront.

You can watch the full conversation below or read on for the complete Q&A transcript.

1. Most AI systems treat knowledge as a static snapshot. You have built your Verdant Eye system around the idea that knowledge should update continuously. What does it actually take to engineer a knowledge graph that stays fresh, and where does the real difficulty lie?

David Knickerbocker: For me it is not so much about what breaks. It is about how do I actually do this, and how do I engineer it. Everything in data science and engineering really starts with problem definition. You start with what you are trying to do. If you want to build a world AI and be able to answer questions about things that happened a minute ago, then that is your problem statement. And so then you think about how to get that data into the database so that it is there and it is fresh. But then you also have to get AI to be able to use that data, so there are kind of two sides to this coin.

It really comes back to intentional engineering. The AI industry feels very shiny and very new, but there is a lot of old school discipline that is still extremely useful to me. I am a very intentional designer, developer, and engineer. You start with the idea, you go through the ideation, from ideation you create your spec, from the spec you do your project management, you assign tasks and do the work. It feels like vanilla old school engineering to me.

The approaches I use are KISS, keep it simple, and YAGNI, you are not gonna need it. When you are a minimalist engineer and you think in MVPs, you are building the minimally viable product you are aiming for. When you build the minimal thing it is much easier to test and validate that it works than if you throw a whole bunch of spaghetti at the wall and see what happens. Nothing really breaks on my side because I am an old school engineer and I am intentional with everything.

2. Freshness and accuracy often pull in opposite directions. Something that just arrived may not yet be trustworthy, while something stale may still be reliable. How do you design a system that balances recency and trust, and what signals do you use to make that call?

David Knickerbocker: In the world of open source intelligence, it has less to do with right and wrong. It has less to do with facts. What I am looking for with open source intelligence is really claims of what is going on in the world. You can have two different groups that are in opposition from each other. One group will say this is the truth, another group will say this is the truth, and they will be in direct conflict with each other. I do not make that decision, and I do not allow my AI to make the decision about what is true or false either. I am more interested in what people are claiming is going on in the world.

Because if you take what is claimed and you cluster it, you can see that this thing is happening over here and this bad thing has happened over there. I think in terms of ribbons. I come from natural language processing, so I think about clusters not as baskets or clumps of stuff but more like ribbons. You have a whole bunch of information and this top ribbon might be this bad thing happened. The next ribbon might be this event is happening at the library. The next ribbon might be a punk rock show is happening at this nightclub.

The trueness and the falseness is a much later thing than the awareness of what is being said. That is how I think about it.

3. What does real awareness mean in practice at the data ingestion layer? And how is your system different from just running an agent with a search tool?

David Knickerbocker: If I use my GraphRAG and I say what has happened in Portland in the last hour, or the last five minutes, or the last minute, it will be able to answer that question. And if nothing has been reported in the last minute then there is just nothing to report. An empty dataset is better than a hallucination.

My systems are constantly getting data. When I was building my GraphRAG system, one of the questions I use for calibration is just what is the latest information, because I just want to see that the latest information is coming through. That prompt is very reliable. The answer that comes back is anywhere from a few seconds old to maybe a minute and a half old. The Internet moves at the speed the Internet moves.

I liken it to the difference between a snapshot and a movie. If you use a tool to do a search and find out something, you are getting a snapshot of time. My systems capture the heartbeat of the Internet themselves and they are always listening. It is much more like a movie compared to a photograph. When you are talking to companies that need urgent information and you can run a query and it comes back thirty seconds old with something that was just seen on the Internet, that looks really different from spinning up agents and using tools to hit a search engine. A search engine will give you a few answers. My systems are always listening and always capturing. I can rewind the Internet itself and play it back forward again.

4. Where do you see most engineering teams underestimate the cost involved in building graph systems? And what is the failure mode you keep seeing repeated?

David Knickerbocker: I remember research I did back in 2012 and there was a famous finding that most tech problems are actually people problems. They are not tech problems. That comes down to communication, interpersonal skills, things like that. But getting to the technical side of things, one thing that used to drive me nuts was the rush to use graph databases before they were even understood.

This bothered me so much in 2020 and 2021 that I actually wrote a book called Network Science with Python. I wrote it because I was annoyed watching teams spend months building graph databases and then not really getting further than populating the graph database. Things are supposed to start when you populate the graph database. That is not the end.

At that time I was using graphs at Intel for data flow mapping, source code analysis of legacy code, mapping how legacy code would create outputs across thousands of scripts and hundreds of servers. I got well known for this at Intel and McAfee. But I was never invited to the cool kid graph database parties. I was always just doing stuff with graphs and using it to map out data flows and using them to fix production outages. Dead serious stuff. And it was really frustrating watching teams get stuck because the graph skill was not there.

I think the failure is probably a common one with what is going on today too. There is this rush to use agents before even understanding AI. And if the understanding is not there, then it is just wishful. You are saying please work, please work, please work. And if you do not know how it works, you can mistake whether it ran correctly or just ran. There is a huge difference between it ran and it ran correctly.

5. You have argued for years that graph and NLP belong together as a single discipline. GraphRAG is now proving that in mainstream AI. What did teams building with NLP alone consistently get wrong that a graph layer would have fixed?

David Knickerbocker: Language and graphs go together. Similarity in language is not equal to same. I will say that one more time. Similarity is not equal to same. Similar sounding things can be very, very different from each other. A graph kind of anchors things into a piece of context.

This was really clear to me even when I worked in data operations, because there is a lot of language that goes on in servers. It is not just look at the file, look at the blah blah blah. There is a lot that goes into those log files. If you have a hundred servers then multiple people created the different log files. There is quite a lot of natural language in log files and source code and all kinds of production things. Even working in data operations at Intel, not even as a data scientist, I was seeing language everywhere and already mapping out how production systems were working.

Graphs show you where things go. But all of the context about what that node even is is often carried by language itself. It was just crystal clear to me a long, long time ago that graphs and language go together. When I was writing this book I even felt afraid that people were going to hate it. You know, it is three years later and it is 4.9 out of five or whatever. But it was so unusual when I was writing it because nobody was really talking about how graphs and language go together the way I was. At the time I was doing a series called a hundred days of NLP, natural language processing. Even back then, using Twitter data, I was realizing that you cannot do natural language processing and leave off graph. It is ridiculous to even do that. If you are working with social media data, you see person A talk to person B about this thing happening. What do you have if you throw away the language? You do not have anything. All you have got is a graph. All of the context is gone. It was crystal clear to me in 2017, and it frustrated me for several years.

6. Your first NLP and graph experiment was eight years ago. How has entity extraction and relationship linking changed since then, and what has stayed the same?

David Knickerbocker: The very first one I actually used was the book of Genesis from the Bible. I am not religious, but it is ancient text. It blew my mind that I could pull families out of ancient text and actually map it as a graph. I did this in 2018 and it is still on my GitHub. I can actually go back to my first code and see what I did.

I am sure it was part of speech tagging because that was before my book and I had no idea what I was doing. I just kind of made it work. Builders build. You just figure out how to do it the first time and then figure out how to do it better after that. There is my small little screen window, just adding color and trying to add size to nodes. Very manual. But then you scroll down to cell 25 and you get to page rank, where I am mapping out who the main entities are. That is where the notebook gets important. Network science is more important to me than visualization, because when you are doing network science you get to do things programmatically. If I want to know whether the punk rock scene in Portland is growing or shrinking, I do not want to visualize that. I want to do that programmatically, turn it into a graph, do time series analysis, and know if the graph is actually increasing or decreasing in density.

What has changed is really how you create the graph and how you visualize it. Back then it was part of speech tagging with a ton of cleanup. That evolved to using spaCy models. And then LLMs have changed the game because it is painful to download twelve different spaCy models when you can just use an LLM these days. Entity extraction has improved a lot since 2015. I mostly have to throw away less. Less cleaning to do.

But there is a dangerous side to this. With older NLP, people were critical because there was something messy in there. When you are using LLMs, everything just looks perfect. And that is kind of a dangerous downside. People are a little too trusting of LLMs compared to how they treated older NLP. The cleanliness is real but it creates false confidence.

What has stayed the same is the network science and the mathematics. Page rank is still very important. Betweenness centrality is still very important. Community detection is still very important. My book is not going to go out of date because of that. The things that change are really how you create the graph and how you visualize it.

7. GraphRAG is often sold on the promise of reducing hallucinations. What does it actually take to get from fewer hallucinations to genuinely accountable output where you can trace a claim back to a source?

David Knickerbocker: My system is about claims. The node is attached to the claim that it makes, so there is no hallucination there. The hallucination space is smaller with nodes because you are starting with a node and you are traversing it. You are starting with your anchor space and going from there.

If you are wondering what jazz events happened in Portland, Oregon, you are connected to the Oregon node, connected to the Portland node, connected to the jazz node. There is very little chance for hallucination. But if you are just using a RAG system, it is just going to look for similarity. And in a GraphRAG system, if there is no match then the output is that there is no match. There is no hallucination opportunity. Whereas with a similarity-based system, there could be similarity even if it is only a single word in a paragraph. That is not a zero type thing. That is a really frustrating thing to me as an NLP person.

I like to have the discipline of a graph. It is the same discipline I felt from data operations, because you cannot mess up when you work in data operations. When the database is down, you have to fix it. If you come up with some similarity-based bull for your manager, he is going to be mad at you. You fix the problem when the database is down. That discipline of a graph is what I feel GraphRAG gives AI, rooting its answers in physical spaces, and that really reduces the opportunity for hallucination. There is less for it to bulk up around.

8. Temporal drift is a real problem in knowledge graphs. Facts become outdated, relationships change, and the graph can silently become wrong. How do you detect and handle contradiction and drift at scale without requiring an engineer to review everything?

David Knickerbocker: My system does not judge, and my system is about awareness. I think about a living system. You are a living system. I am a living system. And you do not remember everything you have ever been told. I cannot remember what I had for breakfast. Our brains are naturally throwing away old information and naturally learning new information, making room for that new information. When I build systems, I like to think about how life does it, and then I try to build that kind of thing into it.

My system is called the Verdant Eye. Not the Verdant Brain. The Verdant Eye sees, and it does not contain eternal memory, because that is not what an eye does. An eye sees. When the scene changes, the scene changes. What is in front of your eyeballs changes all the time. Your eyes do not need to be recalibrated. The thing has just changed.

Operationally, if you give a system infinite memory, your database bills are going to skyrocket for the rest of your life. It is never going to be possible. Think about data operations, think about transactional databases. These living systems have been with us for a long time. Anybody who has worked in data operations knows how living systems work because they have worked on living systems, they just do not call them that. In a transactional database you operate off of what you need, and data that is not needed eventually gets archived. In a human body, memories eventually fade away. If I stop thinking about a thing, it will eventually go away.

When I am building artificial intelligence I am never tempted to build something with infinite memory forever, like the machine from the Hitchhiker’s Guide to the Galaxy. I do not want to build a super AI. I want to build AI that actually serves us human beings. I want to build AI that does not boil the ocean, that can be bootstrapped by individuals, that does not cost a trillion dollars.

9. You have built your own testing frameworks for GraphRAG rather than relying on standard benchmarks. What outcomes are you testing for, and how do you know when a system is actually working?

David Knickerbocker: Everything I do is intentional. There is a really cool intelligence report I read a couple of years ago that said even datasets need to be designed for the use they are going to be used for. Down to the dataset, you should be able to visualize how somebody is actually going to use that data. There is no testing framework anybody else can give me that is going to be fit for purpose for what I am trying to build, because I am not trying to build general intelligence. I am trying to build intelligence that serves a specific purpose.

There is a scene in Rick and Morty that is one of my favourite scenes. Rick makes a little robot and this robot wakes up and asks what is my purpose. Rick says you pass butter. The robot asks again thirty seconds later. Rick says you pass butter. And the robot says oh my god. But that is the entire purpose of that robot. Its whole purpose in life is to pass butter.

I have three GraphRAG systems right now and each one is independent. The Verdant Intelligence system is for high level situational awareness, looking down on the world, what is going on in Michigan, what is going on in Oregon, what is going on in California. My second system is called Grooveseeker, and that is street level intelligence. Not what is going on in Oregon but where is the punk rock event happening tonight on what street in Oregon. That graph system has a very different set of rules than the Verdant Intelligence one. My third system has thirty years of artificial intelligence research. When I am building these systems and I want to understand what people did twenty years ago I can just talk to that graph and find out. Each one of these goes through its own testing.

For the Grooveseeker system, I set up a couple hundred questions and go through multiple rounds of the same question to make sure queries are coming up correct and reliable. If it is not hallucinating, it is doing good. If it is getting me to the right location, it is good. If it is getting me there at the right date and time, it is good. The final test of my world AI was I stopped proving it in articles and just used it to go to a punk rock show. I downloaded my data, asked what is going on in Portland from March 10 to March 13, figured out five events I wanted to go to, narrowed it down to one, bought the ticket online, went to the show, saw all the bands, and hung out with one of them. My AI did not take me to a nonexistent venue. It made a real memory for my family. That is how I know it works.

10. You are working with open source intelligence, which means dealing with adversarial sources, deception, and deliberately misleading data at scale. What does designing for that environment teach you about AI that engineers working with clean datasets never have to confront?

David Knickerbocker: I really encourage AI people to learn a little bit about open source intelligence. If you are going to build artificial intelligence to understand the world, the open source intelligence community has been using natural language processing and graphs to understand the world for quite a while. There is a lot to learn from them.

The real world is a messy space. It is not just that websites can disagree with each other. Websites also have malware. If you point your servers at websites and just download everything on them, you need to be prepared for the consequences of downloading malware. There are all kinds of things when you are dealing with the Internet.

My systems do not care who is right or wrong. They are observers. My systems will see three sides to the same story. There will be the left side and there will be the right side, and then sometimes there is something really extreme. And it does not mean that any one of them is wrong. I wrote an article about open source intelligence recently and I mentioned that bigger clusters are not more important than smaller clusters. In open source intelligence, everything matters top to bottom. If you are using an agent to do an Internet search you are going to get back what the search engine gave you, maybe ten things. If I use my API and say what happened in Oregon in February 2026, I am going to come back with ten thousand things. My APIs do not return ten. They return full context. That is a difference in completeness. It gets back to the snapshot versus movie idea. I can rewind the Internet itself and play it back forward.

The judgment part needs to be downstream. My system is a fast layer to AI, and it does not do the judgment thing. But there are certain things that are just still good to be a human being about. If something from an extreme source sounded like something dangerous was heading in the direction of your community, that would be an actionable insight, and you would go to the mayor or the police or someone like that. I am not going to build that kind of automation into the system. There are certain parts of being a human being that I like keeping.

11. What advice would you give to engineers who are starting to build knowledge graph and GraphRAG systems today? And what should they not do?

David Knickerbocker: First of all, ask why you are doing anything before you do anything. Do not follow crowds just to follow crowds. You should have a good reason. I do believe that GraphRAG is something you should probably just start with because it is more reliable in my opinion than vanilla RAG. But that is my opinion.

I am not a crowd follower type. I am a bit of a rebellious type. But I think there is a lot of creativity in being like that. The AI space is a very creative space. If you follow the crowd you are going to do what everybody else is doing. If you sit outside and you look at plants and you think about nature, you can hit insights you will never get from following the crowd. If you are just looking at LinkedIn and seeing what everybody else is doing and reading the same books as everybody else, it is very important to actually be grounded in the world and to think about life itself. If you are going to do anything with intelligence you might as well think about real intelligence. These language models are nothing compared to what is in my backyard. They are not passing tokens. They are not complaining about maxing out their tokens. They are trying to collapse everything to the bare minimum.

My own philosophy of AI I call absolute zero. Collapse everything to zero. My world AI has zero storage, zero AWS cost, and my AI bills are extremely low, because I just collapse everything down to the bare minimum. And that is also the reason why I have real-time AI, because I was able to collapse everything down to the minimum.

I encourage people to read the old stuff too. Some of the best insights come from old papers. My graph partner and I were talking about how he got an insight from something thirty years old. A couple of years ago I created something off of Claude Shannon’s information theory, and it was a different implementation than anything else. You can do these kinds of things if you are an original thinker. If you only follow crowds you are not going to do anything except follow the crowd. If you try to create a product and you are no different from any of your competitors, then what are you doing? I just encourage independence. Get back to the science. AI has to be rooted in science and engineering. When it gets really loud, that is not always the best time to pay attention to the loudness.

Small Language Models and the Future of Production AI with Karun Thankachan

Divya Anne Selvaraj — Thu, 26 Mar 2026 07:55:56 GMT

This conversation with Karun Thankachan is a practical tour through small language models in production, starting from the limitations of general-purpose LLMs and repeatedly returning to a single constraint. Cost-effective reasoning for specific tasks is a different engineering problem than general-purpose reasoning, and good engineers choose their tools accordingly.

Thankachan is a Senior Scientist at Walmart, where he works on language model systems for retail AI applications. He has a background in machine learning research from Carnegie Mellon University and has spent time at Amazon before his current role, building production AI systems at scale.

In our conversation, we also talked about ReasonLite, an open-source library that brings chain-of-thought distillation, program-aided reasoning, self-consistency, and trace-budget control under one unified interface, making SLM training feel more like hyperparameter tuning than a collection of disconnected scripts.

He also covers SLM-Fusion, a multi-model orchestration framework that handles routing, merging, and serving across multiple specialized SLMs, including an OpenAI-compatible FastAPI gateway that abstracts the entire reasoning layer as a microservice. Finally, the conversation turns to where the industry is headed and why RAG and context engineering are winning over fine-tuning right now, and what to watch as diffusion models become more mainstream.

You can watch the full conversation below or read on for the complete Q&A transcript.

1. Currently, your career spans both academia and industry—and perhaps things in between—starting from advanced research, for example, at CMU, to applying AI at Walmart. Can you perhaps share with us how this journey led you to eventually focus on SLMs?

Karun Thankachan: Yeah. So I started my career as a software development engineer back in India. There, I had the opportunity to work under a director who was starting the data science and machine learning team there.

I had the opportunity to work on a bit of data analytics, big data science, and eventually machine learning. That sort of sparked my curiosity in machine learning, leading me to do my master’s in machine learning from Carnegie Mellon.

And that’s where I got a bit more interested in the research side of things. I had the opportunity to work under a few professors there. Professor John Stanford, in particular—I was able to publish a pretty good paper in AAA, and got into the weeds of deep learning. Eventually, that led me to land a role at Amazon and now currently a senior scientist at Walmart.

It was, however, at Walmart, with the ChatGPT/LLM wave, that I got involved a little bit more in the field of language modeling and NLP.

Our current director had a vision for agents that could solve specific business problems, and we started trying to develop toward that mission. During that time, I realized that a lot of building agents is a little bit more software engineering than machine learning. It was a lot more about designing evals that could give you feedback on how your LLMs are performing, building guardrails that could make sure that your LLMs are behaving the way that you want them to behave, and optimizing for cost and latency. A lot more engineering focus than, let’s say, machine learning focus.

And I missed a little bit of that machine learning flair. And that’s when I started investigating on my own a little bit more how I could be involved in the machine learning side of things within the AI wave.

And I stumbled upon small language models—language models that you could actually fine-tune and optimize to the specific task that you want. It felt a bit more in the domain of machine learning. It felt more like you were understanding how a model was working and helping the model learn patterns in your data, which was sort of what got me interested in the field. That’s what got me interested in SLMs.

And it’s sort of my opinion that right now, we are in a race to see who can define this new customer experience that would be based on LLMs. And that’s why we are relying a lot more on foundational models, and we’re hitting them via APIs, plugging and playing them into our experience to figure out what kind of new, reliable experience we can provide to the consumer. And whoever provides that new, better experience to the consumer will take over a huge amount of the market.

But afterwards, once this new experience is well-defined, then we will go back to that age of cost optimization. And that’s where SLMs are going to come back into the picture, because they are able to reason more cost-effectively on specific tasks, as opposed to LLMs, which are more general-purpose reasoning. So that’s sort of why I still keep very invested in this domain of SLMs, and I’m hoping that converts in the next five or six years. Yeah.

2. Let’s talk about ReasonLite which was introduced as a way to perhaps tackle the fragmented, unclear evaluation practices and high token costs that hinder reasoning in small models. What gaps did you see in existing SLM distillation toolchains that made you create ReasonLite?

Karun Thankachan: Sure. So maybe to take a step back and explain what ReasonLite is: LLM models are models that have a tremendous reasoning capacity—general-purpose reasoning capacity. And since they are models with billions of parameters, they’re also able to, in very layman’s terms, remember a lot more details to be able to reason out a solution for a question that you ask.

For instance, if you ask an LLM how do you bake a cake, an LLM might be able to remember all the steps that are required to bake a cake—preheat the oven, mix flour and sugar, add X, bake for X amount of minutes. So it’s able to remember a great amount of detail.

An SLM, in comparison, doesn’t have as many parameters, so it won’t be able to remember as much. It’ll be able to maybe remember things like flour, eggs, cake.

And how these language models work—they’re essentially what we call autoregressive models. So what it has generated thus far will influence what it will generate next. So if you can’t remember a lot of the prior steps to baking a cake, like preheat the oven, flour, egg, cake mix, stuff like that, you might not continue generating the correct answer. An SLM might say flour, eggs, brownie instead of flour, eggs, cake, because it didn’t have the correct prior before it.

But even though LLMs have billions of parameters and have a lot of memory, it may be overkill when you want reasoning for a specific task that you have in mind. It’s really great for general-purpose reasoning. But for specific reasoning for a particular task that you have in mind, an SLM might be able to get you there. And the only thing that you have to do is update its parameters, which are currently now for subpar general-purpose reasoning. Update it to become good for specific-task reasoning.

So how do you teach these SLMs how to build out that reasoning? There are a lot of techniques in the market, like CoT distillation. To provide an example, here you ask the LLM, “Hey, I’m going to ask you a question. Let’s say a person went to a store where they’re selling apples for $3. He bought four apples. He gave the shopkeeper $20. How much does he get back in return? Tell me the answer, but show me the steps also.”

So the LLM will write out: the cost for an apple is $3, four into $3 is $12, he gave 20, so 20 minus 12, eight back. Your answer is 8. It will show you each of the steps and then give you the answer, the 8. So like we mentioned earlier, these are autoregressive models. So what it generated earlier will help it generate more accurate answers in the future. So if I’m able to get the SLM to output responses in this similar manner, or at least think in this similar manner, then it’s more likely to get at the final answer.

So what we do is we ask the LLM to generate its entire chain of thought, and we feed this chain of thought into SLMs and ask it to generate a similar chain of thought. We don’t do this for general purpose. Rather, we do it for our specific tasks. If you try to do it for general purpose, again, the parameters will get updated in every which way, and it won’t be able to generate good answers again. But if you do it for a specific task, the parameters will get updated to reflect that specific task. It will start to be able to solve that specific task.

Similar to CoT distillation, there are other techniques, like contrastive rational training, which is essentially you tell it, “This is the answer that I want, like four into three is equal to $12. That’s what I want. Four into three is equal to $11. That’s not what I want.” So you push it toward what you want and push it away from what you don’t want. So there are a lot of these techniques out there for helping train SLMs to perform or provide reasoning on a specific task.

But when I was building out SLMs to reason on specific tasks, I realized a lot of these techniques were written in notebooks. They were written in scripts. And what I wanted was something similar to what ML practitioners are familiar with today, which is a kind of hyperparameter tuning, where you have all these knobs and you can turn them on and off. You can adjust the parameters, and you can figure out what set or what combination of techniques helps the model learn the pattern the best so that it can generalize in the future.

So I wanted all these techniques under one roof, like CoT distillation, self-consistency, program-aided distillation, contrastive rational training, curriculum scheduling. I wanted all these techniques in one package, which I could control similarly to hyperparameter tuning. And that’s why I developed ReasonLite. Everything was split out in files, and I wanted to bring it into one package. And with this now, hopefully, practitioners can call the package, tune it just like they would with HP tuning, and that sort of, I feel, solves a pain point in current SLM training.

3. ReasonLite integrates program-aided distillation, using external symbolic tools or code to verify intermediate reasoning steps. How does this approach work in a real-world training pipeline? Can you give an example of using a tool, say a calculator or knowledge base, during distillation, and how it improves a student model’s reasoning accuracy without overly complicating the production workflow?

Karun Thankachan: o maybe to explain what program-aided distillation is, maybe taking our previous example of a person going to a store, buying four apples, which are worth $3 each. They pay the shopkeeper $20. How much do they get back in change?

If you give that question to an LLM and ask it to give you an answer with a sort of chain of thought, then what happens is it does the calculation. It says that, hey, four into three is $12, and 20 minus 12 is $8.

So here, the LLM doesn’t actually have a calculator doing the calculation. What it’s doing is it’s looking at four into three, and it’s saying that 12 is probably the most likely answer. But at times, an LLM could generate a chain of thought that says that four into three is $11, just because it’s not actually doing the calculation. It’s just predicting what’s the most probable answer.

Same thing with 20 minus eight. It might not give you $12. It might give you 20 minus eight as $9, because it’s just predicting what’s the most likely number. It’s not doing the actual calculation.

So this is a little bit harmful when you are trying to do chain-of-thought distillation at scale. Just to refresh everyone’s memory, what is chain-of-thought distillation? You ask the LLM to show the steps that got it to the final answer, like four into three is 12, 20 minus 12 is 8. Those steps—show it. So that’s the chain of thought.

Along with the answer, that’s the entire chain of thought. That chain of thought, you feed it to the SLM. And then the SLM tries to generate that same chain of thought using its much smaller parameters. But it’s only being trained on chain of thought for a specific task, so it will be able to capture that limited amount of chain of thought.

Now the problem is, if these chains of thought that we are generating from the LLM itself are incorrect—like four into three is 11, or 20 minus eight is 9—if it’s from the LLM itself and it’s incorrect, then the SLM obviously can’t be expected to learn the correct answer. And you can’t sit and verify all your chains of thought, especially when you’re training at scale.

So how do you make sure that the intermediate, especially math-oriented, steps are correct? You ask the LLM to generate it in a way that is like Python language. The code would be related as four into three, and c is equal to four into three. Answer is equal to 20 minus c. So your answer is 20 minus four into three. It’s eight.

So instead of the LLM actually doing the calculation, it just writes the code with the inputs that you provided. The code is taken and run in something like Python or a calculator. The answers are then attached back to the chain of thought, and then you feed it into the SLM.

So this way, with this kind of external program that’s embedded into your distillation—i.e., program-aided distillation—you can make your chain of thought a little bit more accurate, and you can get your SLM to learn only on the correct answers instead of any incorrect answers.

4. One feature of ReasonLite is a trace-budget controller to constrain the token usage of chain-of-thought traces during training and inference. In a production deployment, why is controlling the length of reasoning traces important for cost and latency?

Karun Thankachan: When you’re actually serving answers to users, you run into actual engineering concerns. One is obviously latency. When a user types in a question, you want to give them an answer in a fairly short amount of time so that the user doesn’t drop off on the site. And you maybe don’t want to provide a very verbose answer unless the user is explicitly asking for it. If they’re asking for something simple, you want to give them an accurate answer, a comprehensive answer, but it doesn’t need to necessarily be verbose.

So during that time, if your model is trained to think in this chain-of-thought manner, where it’s trying to explain breakdown steps and then get the answer—which is generally good practice—but if it’s trained on fairly long chains of thought, that might kick up your latency and increase your cost, because each token costs a little something to produce, even from the SLM.

So you might want to have some kind of guardrails around it, so that the latency doesn’t increase to an unbearable amount, and the cost doesn’t become very cumbersome or essentially very expensive. The compute doesn’t run up, essentially.

So it’s to prevent that that you have these trace budget controllers. And how it sort of works is, you can enforce it in different manners. You can enforce it by saying that, hey, for any particular inference call, you shouldn’t take more than X amount of tokens. If you’re starting to hit your X amount of tokens, cut your chain of thought short with whatever you have and provide an answer. It might not be an accurate answer, but it helps you make sure that your token cost won’t grow beyond a particular point, and your latency also won’t exceed a particular value.

Now, an obvious question that people might have is, hey, if I limit my token usage and if my chain of thought isn’t allowed to grow, won’t I get bad answers? Which is a very reasonable question. So typically, yes. If your model is trained to produce very verbose chains of thought, you will run into that token-limit issue again and again with the token budget controller. You’ll run into the issue again and again, and your model won’t be allowed to express all its thoughts and therefore give a good answer.

So typically what we do is we have eval metrics that track things like whether the model is being useful to the user or not, like a thumbs-up, thumbs-down feature in ChatGPT. So if you get a lot of thumbs-down features, and if you’re seeing that for all those requests your token budget controller was cutting off your chain of thought, then you can understand from your evals and from your logs that your model is actually being too verbose.

We need to retrain the model so that the chain of thought is shorter, it’s less verbose, and it’s able to get to the answer quicker. So that’s why this kind of trace-budget controlling is important in production settings, and how it helps you limit token usage.

5. Techniques like chain-of-thought prompting and self-consistent decoding—generating multiple reasoning paths and aggregating answers—can significantly improve reasoning accuracy. However, they also increase compute cost and latency by running the model multiple times. How do you balance these trade-offs for production systems?

Karun Thankachan: So, maybe I can take a step back. What do we mean by self-consistency? Essentially, like we mentioned, LLMs are not actually doing the specific calculations or specific reasoning. They’re not actually understanding, or they’re not able to derive meaning. They are autoregressive models. So based on whatever they’ve seen, whatever they’re generating right now, they’re going to generate the most likely answer next. So sometimes it may be generating incorrect things.

But if you ask the model to generate an answer to a specific question 10 times, then the majority of the time, it might actually give you the right answer. So what you can do, or what is a decent practice, is: with your LLM, you generate maybe not just one chain of thought. You ask it to generate 20 chains of thought. Then you check the final answer for these chains of thought, and if the final answer across those chains of thought is similar in a majority of cases, that is your final answer.

For instance, let’s go back to our apples example. Four into three, 12. Twenty minus 12, eight. Let’s say 12 chains of thought generate eight, the other five generate nine, and the remaining three generate something like 10. So the majority vote is eight. Now you know that this is probably the right answer. Let’s pick up all these chains of thought and use that to train our SLM.

So self-consistency is a fancy way of saying majority voting. You’re essentially asking the same model to generate the response to a question again and again and again until, in the majority of cases, it starts giving you a specific answer instead of different answers all the time.

Now, if you try to use self-consistency during inference, you’re essentially asking the SLM to answer the same question multiple times—let’s say 10 times. You’re picking the majority answer, and then you’re giving that majority answer to the user. The problem is, if you do it at inference time, instead of answering one question once, you’re answering it 10 times. So the cost becomes 10x. The latency becomes 10x as well during inference.

So typically, you don’t want to use self-consistency at inference time, so that you can control the latency. You only typically use self-consistency during your training loops, so that you can figure out what chains of thought to actually feed into your SLM. So the simple rule of thumb is: self-consistency is better for training time, where latency isn’t the concern—and cost also, to a certain extent, because you’re doing a lot of these things in batch, and you can append other techniques on top. You can actually manage the cost and manage latency. So use it at training time. It’s not something you need to use at inference time.

6. In ReasonLite, you emphasize not just final answers but also intermediate reasoning quality—providing targeted reasoning probes and symbolic verifiers to assess a model’s thought process. In a practical setting, how do you evaluate whether a distilled small model is truly reasoning well versus just guessing the right answer?

Karun Thankachan: Got it. So essentially, when you are trying to train an SLM, and like we discussed thus far, what it generates is based on what it has learned so far—what it is currently generating in its chains of thought, or in its thinking, essentially. So when you are evaluating an SLM, it might not be enough to evaluate it on whether it’s generating the final answer correctly or not.

Again, going back to our apples example: four into three, 12; 20 minus 12, eight. Eight is the final answer. Does that mean you evaluate it only on eight? You could. But let’s say it did four into three, 11; 20 minus 11, eight. It still got to the final answer, but it’s because it’s not really doing the correct things. It’s just, again, making guesses about what’s the most probable answer. And it somehow stumbled on the correct answer.

So your model might actually deviate from the behavior that you desire, but your answer is still correct. We don’t want those kinds of things to spread into production. So we want to not just evaluate the final answer. We want to also make sure that we evaluate the steps in between.

So how do we actually do that? We can check what we call stepwise behavior. There are a few things that you can inject into the model—symbolic verifiers or reasoning probes. These are, I think, two things that we have implemented in ReasonLite.

So maybe to give an example of how these things function: a reasoning probe is trying to figure out if the SLM is able to do a specific substep well. For instance, we have mathematical reasoning probes that help you test if your SLM is learning math very well. Take 17 plus 8 is equal to 25. When you do that addition, there is this behavior called carry behavior, where seven plus eight is 15, so you need to put the five and carry over the one. Then one plus one is two—25. That’s how you do the math. So this carry behavior is something you want to see specifically if your SLM is learning. Within ReasonLite, there are functions that help you test specifically for carry behavior within your SLM. So that’s a way of evaluating if your SLM is behaving well on substeps.

Similarly, step verifiers are another way of evaluating if your SLM is behaving the way you want it to. For instance, the apples example again: four into three is equal to 12. That is a substep. You want to verify if that substep is accurate or not. So you take the output of the substep. The step verifier takes whatever it is doing, runs that code, generates an actual answer, and matches it up. So it’s able to see, at the substep level, if it’s giving you an answer correctly.

So these kinds of reasoning probes and step verifiers are things that you can maybe add on to the final-answer evaluation, and they’ll give you a little bit more information about how your model is actually behaving.

7. Let’s shift into talking about SLM-Fusion and multi-model orchestration. Modern AI deployments often involve multiple models, from small domain-specific SLMs to large general LLMs. But traditional serving frameworks usually assume a single static model, which leads to inefficiencies under dynamic workloads. SLM-Fusion, from what I understand, is an open-source library proposed to bridge this very gap by unifying model merging, routing, and serving in one system. Can you explain the impetus behind SLM-Fusion and also talk to us about how it works in a real scenario?

Karun Thankachan: Got it. So SLM-Fusion—just to give a little bit of context—this paper was written somewhat a while ago, before multi-agents became a little bit more popular and that kind of architecture became a little bit more popular. I would say maybe it’s a little bit outdated at this point.

But SLM-Fusion, the idea essentially is: typically, with LLMs, with their general-purpose reasoning capability, you can, with enough RAG and context engineering, get them to answer questions within multiple domains, even with a single LLM, as long as you have your RAG engine built well. You have a retrieval layer that gets you specific context related to this new domain, and you do context engineering well enough that only the relevant info stays inside the context of the LLM.

But if you’re working with SLMs, since you are fine-tuning your SLMs on a specific task, that fine-tuned SLM is only able to reason very well for that specific task at hand. You can’t just use one SLM and be hopeful that it will be able to pick up or be able to reason in a new domain as well, because you are training it—its parameters are limited. You’re only able to reason for one specific task.

So in this case, what you typically do is train multiple SLMs, and then you figure out a way, based on the questions that are coming in, how to route the question to the appropriate SLM. So that’s essentially the idea behind SLM-Fusion. The key thing is: how do you route it to the correct SLM? How do you evaluate when the routing was inappropriate? And how do you not base the routing on just hard-coded rules, but learn the routing from user behaviors?

Sort of like: hey, it routed to SLM A, the user didn’t particularly like that response, but based on whatever rules we have, that was the SLM to route it to. So now, how do you reconcile the fact that that was the SLM to route it to versus the user behavior? Was it one question and then a follow-up question that switched it to another domain?

So those kinds of things—how do you actually evaluate that, how do you learn it from telemetry, and try and update this routing over time—that was the core of SLM-Fusion. Now with multi-agent architectures, it’s becoming a little bit easier, but if you’re working in SLMs, some kind of routing is good. There have been multiple papers, both within ICLR, ICML, and AAAI, that have come around this routing concept as well. But it’s a lot more updated at this point.

8. In production, when would we prefer merging models over just choosing one? Can you perhaps discuss an example use case where merging two specialized models could yield better results than using a single model alone?

Karun Thankachan: Sure. So again, it comes to how different the reasoning is that these two different SLM models would have to learn.

For instance, let’s say within a retail scenario, you have a reasoning model that—let’s say it’s an anomaly detection model—that sort of needs to decide, looking at sales of an item, why the sales dropped anomalously. So if sales of an item drop anomalously, there could be multiple reasons that drove it. So if I were building an SLM model, if I saw sales dropping, then the next thing I would have the SLM model do is be able to generate multiple hypotheses and then figure out what is the appropriate one to chase down and try to answer.

Within that same context, if, let’s say, I wanted to fix the anomaly, I would want an SLM model where I would give it some context on, “I want you to go and hit this system, change the value to something like this, and hit the system, change the value to something like this.” Here, the SLM model doesn’t need to have that kind of broad thinking in terms of hypothesis generation. It needs a little bit more specific tool understanding, a little bit more integrated with API calls.

So the reasoning between both these models would be very different. One would be a bit more broad—generate hypotheses, figure out which one is the right one, and then tackle it, I mean run those hypotheses, figure out what is an appropriate answer. And this one is a little bit more tool-oriented, a bit more in-depth, a bit more specific. It can’t afford inaccuracies because it’s interacting with the tool.

So the reasoning would be very different. In these cases, rather than trying to build one SLM that could maybe do both, it might be a better idea to separate it out, and it might be a good idea to bring these two into a routed format where you generate the hypothesis, you tell the user that, “Hey, I evaluated X hypotheses. These two seem to be the most likely root reasons.” Then the user sort of tries to understand, okay, maybe I also think that, okay, out of all the ones I’ve evaluated, this is probably the reason. Let’s try and fix this. Let’s adjust these metrics or adjust these settings here, and then you route it to the second SLM.

And that SLM sort of makes all the necessary tool calls, all the necessary adjustments, and it has more in-depth, specific reasoning built into it. So that might be a good scenario for routing.

9. One of the core features of SLM-Fusion is an adaptive routing layer that can be rule-based, learned, domain-specific, or cost-aware in deciding which model or ensemble handles a request. How do these routing policies work under the hood? For instance, what would a cost-aware router consider—latency SLA, API throughput costs, query complexity, etc.?

Karun Thankachan: Sure thing. So within the router, we have a few ways you can decide what SLM to route it to. The simplest way, and the easiest way to get started, is just rule-based routing. You see certain domain keywords, and you can route it to a specific domain SLM. The slightly more advanced manner is getting an embedding out of the user query and figuring out which sort of base embedding it matches the most. So each SLM would have a domain-encapsulated embedding associated with it. So it’s everything related to that domain in an embedding. So if the user query matches this domain-specific embedding, route it to this SLM.

Now, the advantage with this sort of embedding-based matching is that, if the user asks a specific question that is maybe multi-domain, and you routed it to the wrong SLM, or it might be the case that you need to split that question—route the first portion of the question to this SLM, get a response, route the second portion of the question to the next SLM, get a response.

So instead of this embedding being static, what SLM-Fusion does is provide you the opportunity to adjust those embeddings based on how well you have done on the questions users have asked in the past. So using your logs, you can pull in your logs. The ones that you didn’t do well on, those ones you can narrow down on. You can figure out how to update your embeddings for those specific ones that you didn’t do well on.

And for a particular question, if you feel like it’s a multi-domain question, within the router itself you have a tinier SLM that can split multi-domain questions into separate questions. So with these sorts of knobs, you are not just hard-coding how to route it, but you are able to learn over time how the routing should evolve. And you are also able to address multiple-domain questions by using the routing module to split them into different questions and orchestrate it in a manner where you can still use your SLMs, and you don’t need to try to condense everything or try to get SLMs to interact across domains. So that would be the core way to use this sort of routing more.

10. Let’s talk a little bit about telemetry-driven feedback loops. What signals, according to you, are most valuable for such a loop in a production setting? And how do you feed this feedback back into the system?

Karun Thankachan: Got it. So it really depends, but the most critical one, I would say, is how the user is responding to the queries. So just like ChatGPT’s thumbs-up, thumbs-down—some kind of user satisfaction score. That would be the best way to assess any sort of generative system, because the responses being generated are evaluated by the user. And if it’s not a helpful one, there’s no point in any of these generative systems.

So being able to track user satisfaction scores and attach them to your logs—your chain of thought, your final answer, your user satisfaction scores—that sort of logging system is what we call telemetry. And once your logs are all stored and generated, being able to search through your logs and figure out which ones you didn’t do well on, and having enough logging to figure out which SLM it routed to, why it routed to that SLM, why it tried to split the question into separate portions—having all of those logs in one place is what is going to help you build that feedback loop and improve your routing over time.

Apart from user satisfaction, you could also use things like token usage. Is the compute cost actually building up? Is maybe a question that was designed for one domain being unnecessarily split into multiple questions and maybe just sent to the same SLM again and again? I’ve seen that happen also. So checking if your token cost for any of the responses you are giving is spiking. Similarly, if your latency is spiking.

So these three, I think, would be the top metrics to attach to your telemetry, or have tracked along with your telemetry, with timestamps and request IDs, so that you can map it properly. And then you can improve your routing layer over time.

11. Thank you for that. So now, quantization is a common way to reduce inference cost, but mixing models of different precision—or even merging quantized weights—can be tricky. So what did you build in SLM-Fusion, again, to use it as a case study, to handle quantized models effectively?

Karun Thankachan: Got it. So, I guess, just to explain why quantization is tricky: quantization is nothing but using different integer formats. So with quantization, what you’re doing is you can represent things as 32 bytes, 16 bytes, 8 bytes, or 4 bytes. The lower you go, the smaller your models become, the faster the multiplications become, and therefore the faster your SLMs become. So you can make your models smaller and faster the more quantized they become. But again, as you make them more quantized, you lose a little bit of information, so they won’t be as accurate.

So how it helps you use different SLMs that might be in different quantization modes—it gets a little bit tricky here—but we have these things called tensors. When these calculations are taking place, we do them in these large-scale 3D matrices called tensors. And how the calculation within an SLM works is, you sort of align these tensors, or align these channels, pad them as necessary to get them to the same quantized integer formats, and then sort of carry forward the calculation.

So, a little bit more on the math side, but the key thing is aligning the tensors so that you’re not assuming that all the models are at the same level of quantization. You try to identify whatever quantization it is at, then sort of work through packages that we already have. It’s not something new that SLM-Fusion is providing, but most of the popular deep learning packages already provide this. But aligning per tensor, per channel, so that the calculations actually flow through.

And in terms of, apart from just the quantization, building adapters into your models is another way to perhaps mitigate this. Adapters are still, I would say, a little bit unproven in terms of the value they add for the number of parameters they introduce. But in some very few scenarios, where the domains are similar enough, but you need a slight change in parameters so that it adapts—not to a completely new domain, but maybe a complementary domain—in those cases, I think adapters work. But for quantized models, if they’re in different quantized states, having adapters can help you maybe bridge that gap as well.

So, a little bit on the math and technical side: alignment of tensors. A little bit less mathy, but more on the modular side: adapters to help bridge the gap. So those are the two things that I think SLM-Fusion had that help you work with different quantized SLMs.

12. All right. Now, SLM-Fusion also introduced a FastAPI-based Fusion Gateway that is even OpenAI-compatible for inference requests. So how do you see a system like this being deployed in a production microservice architecture? Could it sit alongside existing serving frameworks, perhaps?

Karun Thankachan: Yep. Yeah, definitely. So the FastAPI backend is essentially there to support that same thing. The idea being that, within microservice architectures—again, maybe taking a step back—the core idea is that anything that has to do with one specific function is split out, modularized, and kept separate. So your reasoning engine, if it is like this multi-SLM model, you can keep it separate from everything else. You can update it as required without impacting any of the other microservices in that environment.

And with the FastAPI backend, the key idea is that you can hit it just like you would any other kind of service that you can abstract away. So what we typically call, I guess, reasoning as a service—RaaS, if you want to call it a new domain. So whenever you need a little bit of, “Hey, I think I need a little bit of human reasoning at this particular stage to make a decision on what to do next,” then just hit the API endpoint like you would in any kind of microservice architecture.

It abstracts away all the reasoning. It will do the routing within, it will pick the SLM, it’ll generate an answer, and it will send you back a specific API that follows the contract. And that API isn’t just something that’s generated by the SLM—it’s filled in so that the contract is always maintained between whatever service is calling the reasoning-as-a-service microservice.

So yeah, that way, you can just abstract the whole thing away, and you can put it in any kind of production environment, with the typical guardrails that you have—like trace-budget controllers, latency holders, and everything. It will actually stick to the SLAs that you typically expect in a multiple-microservice architecture system.

13. Now, quantization is a common way to reduce inference cost, but mixing models of different precision—or even merging quantized weights—can be tricky. So what did you build in SLM-Fusion, again, to use it as a case study, to handle quantized models effectively?

14. SLM-Fusion also introduced a FastAPI-based Fusion Gateway that is even OpenAI-compatible for inference requests. So how do you see a system like this being deployed in a production microservice architecture? Could it sit alongside existing serving frameworks, perhaps?

15. Finally, Karun: any emerging trends, perhaps in governance or tool integration, that you believe will significantly impact how we deploy language models in production?

Karun Thankachan: I think, right now, diffusion models are becoming a little bit more commonplace, and that might be a trend worth checking out. Apart from that, I guess the main thing to focus on is that, within LLMs, maybe six months ago, there was a split between: is investing in parameter-efficient fine-tuning—LoRA, QLoRA—along with alignment techniques like DPO and PPO, a good investment of time versus just focusing on RAG and prompt engineering? It looks like the industry is shifting a lot toward RAG and context engineering. One, because maybe it’s cheaper. And for the other things, you need specific hardware, and you need to hire people who know how to do it. But it also seems like you can actually get fairly accurate answers and fairly good reasoning from your LLM models if you actually set up a good RAG pipeline, and if you bolster it with good retrieval—a way to improve or rerank the retrieved documents and again select the best ones on top of it. So don’t just have a simple RAG pipeline. Fit a model on top, maybe improve the accuracy of your retrieved documents with the reranking model, and also focus a lot more on context engineering. So don’t bloat your context with a lot of information. Look into context compression. Look into eliminating things from your context if they are irrelevant. Just having irrelevant things increases hallucinations. So a lot of investment in good engineering, I would say, combined with good retrieval, seems to be giving a lot more accurate answers, a lot less hallucination, and a lot better reasoning as well. So that seems to be where the industry is focused right now. It would be interesting to see if it switches back to fine-tuning, or if it switches back depending on how this diffusion trend plays out and how the cost-versus-LLM trend plays out. I think those are some trends to keep an eye on to see where we need to switch next.

Trade-offs in Modern System Design: A Conversation with Archit Agarwal

Divya Anne Selvaraj — Thu, 26 Feb 2026 05:13:34 GMT

This conversation with Archit Agarwal is a practical tour through modern system design—starting from first principles and repeatedly returning to a single constraint: real systems live under trade-offs, and good engineers choose those trade-offs deliberately. Agarwal is a Principal Member of Technical Staff at Oracle, where he works on ultra-low-latency authorization services in Go. He has 11+ years in backend engineering across .NET and Go, and he writes The Weekly Golang Journal, focused on turning system design into usable, operational guidance—especially around performance and efficiency.

He lays out the inflection points that justify splitting—deployment friction, widening blast radius, and the need for truly independent scaling—while emphasizing that flexibility comes with a real operational tax. On cost and resilience, Agarwal makes the same argument from a different angle: engineering decisions should be evaluated as performance per dollar, not performance in isolation. He describes building cost awareness into the design process via observability, explicit cost discussions, and being disciplined about scaling only when needed.

Finally, the conversation shifts from production architecture to interview performance. Agarwal recommends that candidates stand out by aligning on requirements first, surfacing trade-offs explicitly, and communicating clearly enough that the interviewer can follow the “commit history” of their reasoning. He also explains how he expects candidates to handle changing constraints midstream—by absorbing the change, restating it, and selectively updating only the affected parts of the design—while building breadth through fundamentals, real-world problem practice, and a few deep specialties.

You can watch the full conversation below or read on for the complete Q&A transcript.

Emerging Trends and Challenges in System Design

1. We’re seeing this pendulum swing in architecture, with many teams rethinking a pure microservices approach and embracing modular monoliths to reduce complexity and cost. How do you decide when a microservices architecture is truly warranted versus keeping a system design simpler?

Archit Agarwal: To be very honest, this is the first question that I ask myself when I start designing a new system or a new module. And the rule that I follow is very simple: If the problem isn’t complex yet, don’t overengineer it. Start with just a monolith.

New engineers that come into the industry come up with a lot of these buzzwords—event-driven architecture, microservices, serverless. They’re great, but you cannot apply everything in just one go until your application really needs it, right? So that is a key difference between any interview-ready engineer and a genuinely good engineer: a genuinely good engineer would not want to implement everything up front. He would engineer things around the problems that we are facing.

In any early project that you see when you start with a project, the requirements are always changing. You have very little understanding of the domain, right? And the scope is very small. So you should not go into implementing every new buzzword that you see in the industry. You start small, start with a monolith, and design in a way that, in the future, if you want to break that down, you can easily do that, right?

And if your application requires a low latency—for example, if you’re working on a financial kind of system—you cannot live with only microservices. You will have to evaluate if microservices are good for you. Ideally, if you use microservices, there is always going to be additional network hops, and it will be slowing down the system, right? So I would always say that microservices aren’t the magical fix that fixes bad architecture, right? They just distribute that over the network.

So when you start writing your application, start with a monolith and then start understanding if you have the pains where the pain of having the monolith is greater than the pain of splitting it. Ideally, we would have a lot of signals when we can identify whether we should move out of a monolith or not. A few of those signals are: your deployments are getting bigger and slower, you have a larger blast radius on the bugs that you will see, or you need a lot of independent scaling.

For example, if you have a sale for an e-commerce platform, if there is a sale coming up, you would always want your payment-related system to scale larger than your login system, right? So if those are the requirements, you definitely start moving out of a monolith and move into microservices.

And there are a lot of other things. For example, if you need different tech for different problems. If you want to have analytics, you would want to use different technology for that, right? So in a monolith, you cannot have your project written in multiple languages.

So microservices definitely give you flexibility. They also give you headaches, so you should always choose wisely.

2. With cloud spending at an all-time high, there’s sustained CFO scrutiny on engineering decisions. How do you incorporate cost considerations into system design?

Archit Agarwal: Ideally, I would say this is a point where every engineer becomes a philosopher. I remember one quote from—I don’t know where I read it, but it stuck to my mind—and it said that a good engineer would design for performance, but a great engineer would design for performance per dollar.

So any engineer who is thinking about the cost with respect to the performance gain is a great engineer. I didn’t truly understand this quote until one of my family members started one of his startups and I was involved with him in all the tech-related discussions. That was the first time when I realized, OK, when I’m fighting with my manager or my senior manager over using a particular tech, why do they always say no if they don’t need it? And I’m always saying that it will help us scale, right?

That was the first time when I started realizing the importance of why I was denied a lot of requests—because those were not the real pain that I was solving for, right? Trust me, every system will definitely cost something, and you need to understand that no business can keep spending money on something that is not needed at that particular moment.

And to be honest, we had one client—I’ll give you one more instance—where, as a team, we saw great advantage. There was one client who was pushing to reduce the infrastructure cost, and we as engineers, again, we were not doing that. So what he did is he introduced a dashboard where we were seeing per-engineer cost of the infrastructure for the development process. And those numbers were huge per month. And to be very honest, seeing those numbers listed against each person’s name, everyone started evaluating whether to use a particular tech or not.

Like, whether it is really needed, or when you log off from the system, should you shut down your EC2 instances or not, right? That is a huge difference, and in six months, we saw a 20% month-on-month decrease in the infrastructure cost.

So I would say I follow a few principles with that. I don’t prematurely optimize, but I stay observant on the infrastructure. I keep my observability to the extreme so that I can have a dashboard and see where my system is lacking, what part to scale, where I should have improvement. So observability is very important in this perspective.

Then I always design my system for horizontal scaling, but I don’t horizontal scale unless it is needed. Because if you have infrastructure which is of no use, there’s no point spending that money. But you should have that in your infrastructure requirements and your lifecycle.

For example, if you are using an S3 bucket and now you have 100 GB of data there which is ideally not being used for months—or will never be used—why do you want to spend money on live data there? You should push it out to cold storage and spend less on that data which is practically not being used.

Then, into the technical conversation: for every story that we start designing, we have design discussions. In the design discussion, we would try and include the costing. At times we see that engineers come up and say that they’ll reduce the latency by 10%, but to reduce the latency, they’re increasing the cost of the infrastructure by two times or three times.

So then the question is again on the engineer: Do we really want to improve the latency with the high cost? If it is really needed, we are OK to spend, right? But if it is not really giving any advantage to the user—of that 10% decrease in your response time—by spending that great amount of infrastructure cost, this makes the team aware that performance without cost awareness is just expensive engineering. So you should not just keep adding to infrastructure cost every now and then.

3. Modern systems are facing record-breaking DDoS attacks and increasingly complex supply-chain threats. For instance, 2025 saw hypervolumetric DDoS attacks peaking at multi-terabit levels and a 188% year-over-year spike in malicious packages in open-source registries. How do you design systems to be resilient against such attacks and vulnerabilities that are increasing exponentially?

Archit Agarwal: In today’s world, I don’t design things thinking that I’ll not get attacked. I always design thinking that I’ll always be attacked, and how would I react when I’m attacked?

Modern systems are operating in very hostile environments. So you should always assume two things: the system will fail—that is for sure, that is inevitable—and then you’ll definitely get attacked now or in the near future. So if you plan your infrastructure and your architecture based on these two assumptions, you’re making good decisions to protect your system against these two things.

Once you accept these things, you can reduce the blast radius of these two things because now you are aware. So how do you do that? There are a couple of things that we start with.

First part is always a layered defense, where you start with your network layer. In your network layer is the first thing—your first defense layer—to protect yourself against any attack or anything. So you can use services that are given by the cloud provider. For example, AWS has a service that is called AWS Shielded Advanced. You can use that. Azure has a service. Google Cloud has a service. Every major cloud provider will have some service to protect with the network layers—you start using that.

Then in your application layer, you start adding code for limiting the request. For example, you start implementing rate limiters based on the geolocation, or IP, or user. Maybe you say that if a user is making more than 100 requests per minute, he’s probably trying to attack my system, because that’s not an ideal flow of a user to call my system 100 times in a minute. So we’ll block that user.

And maybe some bot-type of protection. For example, Google has a bot which crawls to every web page and collects the data for optimizing the search results. But Google’s bot makes sure that it is not overloading the server with a lot of requests to crawl the data. But there are bots that people write—bots that are made to overcrowd your server and keep collecting the data so that they can do some added advantage to themselves with the data that they collect from you. So you should write your application layer to protect yourself against such bugs.

Then your architecture has to have an upper limit on your auto scaling. So you cannot keep auto scaling to 100 servers for one service, right? Because if you’re scaling to that extent, that means there is some malicious activity going on your server suddenly. So you should always have an upper limit. Auto scaling is great until you realize that you’re auto scaling your DoS attacks.

Then the second thing on your defense would be having resiliency principles. For example, if you have a bigger application, you would always deploy it into multiple availability zones. Why? So that if one data center is under attack, you can completely shut down your service deployed on that data center, but still have your application up and running for users because your services are again in different data center—or maybe go multi-region.

Or these days, you can even go multi-cloud, but multi-cloud is not easy. You will have to consider a lot of things around multi-cloud.

Then is your supply-chain security. These days, modern applications are dependent on a lot of external services, so you need to make sure that whatever service version that you are using, you have already validated the service for the security risk—and you are not auto-upgrading until you validate it—because those dependent services are the actual surface area that you are exposing to the attacker. That is the service area—now you can start attacking on the service area. So that is the next thing that you look at.

Then you apply security by authorization, and by authorization you would always do a deny-by-default. You don’t say that I will allow everyone unless he has this role. No—you say that everyone is denied unless they have this particular access. So then you protect yourself.

Then your token should be short-lived. You don’t ideally create tokens that are living forever, right? So that even if the attacker has access to the token, he is only having access for a particular duration. He loses the access after the token has expired.

Then observability is the key. You should always have observability on your systems. You should never miss out observability and logging so that you don’t have visibility on things.

4. Today’s architectures often depend on numerous third-party services and cloud providers, even if not by explicit choice. How do you design a system that remains portable and robust when you’re relying on external SaaS APIs, cloud services, or even multiple cloud environments?

Archit Agarwal: I was expecting that question with all the recent AWS, Azure, Cloudflare outages that have been going on in recent months. And to be honest, every system depends on a lot of different external services—for example, your database, all your messaging queue, your SaaS APIs—all of these are external dependencies. And you cannot create an application in these modern days without having dependencies on at least one of them.

So I would say multi-cloud is not always feasible because it has its own challenges. There are business challenges, there would be some data-related privacy challenges, and you have cost challenges definitely—because if you have multi-cloud, you will have a lot of huge costs that you will have to invest.

So ideally, we don’t design to avoid dependency. We design so that if one dependency creates a failure, the whole system is not down. That is the core intention of designing things. There are a few principles that we usually follow, and I think most engineers would agree.

We have an abstract layer for each external service. For example, if you are talking to a storage service, we have an interface through which our application will talk. Now this interface can any day go ahead and update the dependent service and say that today I’m talking to AWS, tomorrow I’ll go ahead and talk to Azure. So it would be easier for us to keep switching the external dependencies without impacting our actual application. So this is decoupling the application from the external dependency.

Then we can use open standards and some cloud-neutral tools. Standard as in containerization, Kubernetes, telemetry; use some databases that are open-ended—for example, Postgres, MongoDB. And for cloud-neutral tools, you can go ahead with using Terraform, where you can deploy to different cloud providers any day—you can choose between any.

Single region is a single point of failure, and single cloud can also be there, but you will have to be cost-smart on using multi-cloud. You need to make sure that your disaster recovery model is in place. You don’t replicate all the services to different cloud. Only replicate the mission-critical services to different clouds so that your users don’t have impact on their daily very important critical task—but some tasks can still be offline for some time and it’s still OK for them.

You’ll have to plan that, and then unified observability. You cannot have observability divided over different cloud or different region. You should have one single place to look at logs, traces, and everything so that you don’t do the guesswork. You have a curated list of everything at one place.

Practical Architecture Insights from Experience

5. You personally have experience building ultra-low-latency services, such as global authentication systems. What design principles and techniques are crucial for achieving sub-millisecond latency at scale?

Archit Agarwal: Ultra-low-latency systems look very simple from outside, but they’re a totally different type of structure that we are building. So I treat latency as the monthly budget that you have. Now, every network hop or any memory allocation that you do will take something out from that budget, so you will have to be very smart in choosing where to spend.

So you don’t ideally optimize for speed—you eliminate whatever is slow. Start eliminating whatever is slow. So there are a few key principles that I usually follow, and I try pushing my team to follow those.

One is: move the computation closer to the user. So your computation layer should be closer, or deployed into the edge location where the user is trying to access from. So let’s say I’m living in Bangalore and I’m trying to connect to a server sitting in the USA—I will have a lot of latency, right? So do that: fix the compute layer closer to the user.

Then avoid network hops completely in those hot parts where you want ultra-low latency. You cannot have network hops to different microservices. You always use in-memory everything. You don’t go to a distributed cache, you don’t rely on some other network server—because, again, you’re reducing the network hops.

Then you keep your service lean. You don’t use a lot of wrappers. For example, if you are using wrappers, those wrappers—finally—convert that into the native code only, right? So I would always recommend: remove those wrappers and directly communicate in the native language to the machine. That will improve the performance and reduce the latency time on your server.

Then improving your network layers—for example, reusing the HTTP connections will help. So you don’t really initialize HTTP connections again and again on your system. Then using the right protocols—so if your service-to-service communication you’re using maybe HTTP, it’s not good. You can use gRPC. gRPC is way faster than HTTP in service-to-service communication, so you choose that.

And then the last part is always the right hardware and the runtime that you’re running on. If your hardware is too old, too laggy, there is nothing that can solve the problem. You will have to fix the hardware also.

6: If I asked you to summarize briefly, how do you ensure that pushing for extreme performance doesn’t compromise reliability or maintainability?

Archit Agarwal: Ideally, what I’ve observed in my experience till now is that, in an application, not more than 5% of the application actually requires that ultra-low latency. The 95% of the application is still OK with having a little more latency on that side.

So you only should optimize on that 5% which actually requires ultra-low latency. You cannot develop an application where everything is designed for ultra-low latency. So that 95%—I would always say—design it for readability and maintainability. But for the 5% which requires low latency, there we can still compromise on the readability and improve the latency there.

Cracking the System Design Interview

7. System design questions are broad and open-ended, and probably that’s why they’re challenging. Do you recommend using any kind of structured approach or framework to tackle these interviews?

Archit Agarwal: System design interviews are not about memorizing a particular framework. It’s about thinking in a framework. Having a framework will never have a bad impact—it will only help you because now you are more calm, and you’re approaching the problem in a structured way without using buzzwords very initially in the conversation.

I’ve seen a lot of engineers come in to a system design interview and, as soon as I give a problem—let’s say, “design this system”—they start with, “let’s use microservices,” and start using distributed cache. But they didn’t understand what scale I want the system to be in. And when I asked, “How many users are you planning on this system?” they would ideally say 1,000 users or 10,000 users in a minute. But is that really needed? Is that really what I wanted? That’s not in alignment.

So I would always say: start with one to two minutes of quick alignment with the interviewer. Try and gather the functional requirement, where you basically get answers to two main questions: What are we actually building, and what does the user actually need? By this, you will understand what the database model is—whether the system is read-heavy, write-heavy, what type of system it is. Then you go into nonfunctional requirements. Now, nonfunctional requirements are the ones that actually drive the architecture.

So in nonfunctional requirements, you ideally collect data around the number of requests that you are planning on, the scale at which you are operating, the consistency that you are looking for, or is there any latency requirement there. Nonfunctional requirements are the ones that decide the architecture—not the other way around.

So yeah, I would say: consider the system design interview as two engineers discussing a problem. It should not be like you are getting interrogated by the other person. If you are asking the right questions in the initial one to two minutes, you have already impressed the interviewer. He’s already giving all the ears to you now—he’s listening to the conversation, and he’s also interested in giving his thoughts on that. After doing all this, now you can move to high-level design and get into the different parts of it.

8. So according to you, how should candidates break down a complex design problem during an interview to ensure they cover all important aspects? I know part of it is asking those questions, but what else?

Archit Agarwal: Basically, when it comes to a system design, you should try and break that complex system into smaller pieces and then go to the high-level design.

So once you have got those questions answered—basically functional and nonfunctional requirements—then you start by introducing a very high-level design diagram, and then you start zooming into one piece at a time. For example, you have given the high-level architecture where you say that there is a user who is making a request to the API server. Which request goes to the service, and then the service makes the call to the database or maybe the caching layer, and the response is sent back. That’s a very high-level architecture that you have.

Now you start zooming in: What type of API gateway? Do you need a load balancer? Do you need multi-region deployment? And all these are answers that you have already collected from the nonfunctional and functional requirements—and this is how you start introducing your thought process.

And in this process, when you are trying to zoom into each piece, what you do is, ideally, you start discussing the trade-offs. For example, when you talk about database, you say, “I’m using a relational database.” Why are you using a relational database? Why not NoSQL? That is a trade-off that you should introduce in your conversation. Then why are you using EC2, not a Lambda service, right? So all these trade-offs are something that you start discussing, because system design, ideally, is about discussing the trade-offs.

So if you know the trade-offs—why you’re using a particular thing over the other—you have already made progress where the interviewer knows that this person knows things well. He knows his choices. He understands why to and when to make a choice.

So by this time, he will be very confident that this guy will be able to design an application which is operating at a Google scale. Maybe the application is as simple as a to-do application, but he will be able to take it to the scale level that we want.

9. And if you turn the lens inwards a bit from your perspective as a system design interviewer, what is your process for evaluating a candidate’s depth versus breadth?

Archit Agarwal: So honestly, a system design interview is not about the diagram and memorized architecture. It’s about building a thinking muscle more, right? Most people try to study system design like a subject, but I would say: think of system design as a skill that you are adding to your bucket, right? It’s a skill you need to improve with structured and deliberate practice. Start with strong fundamentals—that’s what we just discussed, right? You should have strong fundamentals.

Then start practicing mock interviews. Take help of some person—maybe a mentor or a friend—who can sit down with you. You start designing one system design problem. For example, start with a URL shortener. Start discussing it with your friend or a mentor. And try to form a complete framework where you say that first, in any system design, I’ll get these things answered; then I’ll go to this part; and then I’ll go to this part. Try and do your system design practice in that particular framework so that you are very comfortable.

Be comfortable with the framework itself. You should not memorize the questions that you have to put in, because the questions will keep changing based on the system. But the framework should be good enough so that you have easy traversal through the problem, and it is easy for you to travel there.

Then work backward in a real-time system. So what I usually do is, I question myself on a few systems. For example, if we are using WhatsApp—everyone uses WhatsApp mostly, right?—so I would think about how WhatsApp is able to scale the messaging server. And now I will start exploring articles, blogs, engineering blogs around it, and start understanding how we can do that, right? Or maybe how Netflix is able to scale the streaming globally. That’s a complete different engineering challenge. How is Netflix able to do it? So start backward, think about the system, and then start researching about it.

Then start building things. So then you start building things—and maybe you don’t do it at a global scale, but at least when you start building, you will understand the challenges around latency, or maybe race conditions, or all those constraints that you think about, right? You start feeling that, and you start solving that.

And then the last part is definitely: learn to communicate. Because if you don’t learn to communicate system design interviews, you’ll not be able to excel there.

10. But do you recommend any specific resources, books, or specific real-world exercises for mastering system design concepts and being interview-ready—especially for senior engineers aiming to showcase their expertise?

Archit Agarwal: See, for someone who is aiming for a senior role, I would definitely suggest a mix of a few things—starting from a book, real-world blogs, and then real-world exercises.

So for books, I would recommend you should definitely read Designing a Data Intensive Application by Martin. That is a must-read book for any senior engineer who is aiming to excel in system design. Then there are books like System Design Interview, Volume One and Volume 2 by Alex Liu, right? Those two are very good books. Then Building a Microservice by Sam Newman.

So those are a few very good books that have been written. And if you read those books, you’ll get a lot of understanding on system design. Then you can refer to some engineering blogs by big tech giants. For example, Netflix has an engineering blog. Uber has an engineering blog—and all those big tech giants who are into technical space, and they have a big tech infrastructure that they maintain, they always have engineering blogs. Go refer and read those blogs. Go to high-quality YouTube channels where they’re not just discussing the diagram—they’re discussing the concept, more depth into the concept. So refer those channels, in case you want.

And then finally is designing a system which is time-tested, scale-ready, and you have done that. So system design interviews isn’t cracking by memorizing some answers. They’re cracked by building strong foundation, real practicing problems, and then thinking like an engineer, not an exam candidate.

11. Even experienced engineers can stumble in design interviews. What are some of the most common mistakes or pitfalls you see candidates make—especially when they’re quite experienced and perhaps more confident than some others—and how can engineers avoid these mistakes?

Archit Agarwal: System design interviews are funny because people don’t fail because they don’t know what Kafka is, or maybe DynamoDB. They fail because of the way they communicate with the interviewer.

So I would say that if you’re having good communication—and you’re establishing that communication and having a two-way communication with the interviewer—that’s half of the job that is already done. I’ve seen engineers who jump directly into solutions as soon as they listen to a problem where—let’s say I say, “design this system”—and they would start saying, “I’ll use Redis, I’ll use Kafka.” I would say, slow down. First, understand the scale constraints. For example, how many requests per second are we operating at, or how much data are we expecting per day flowing in the system? Or is there a security requirement?

For example, if you’re operating in a European country, you have different compliance on the personal identifiable information than in other countries, right? So you should start asking those constraints first and then start coming to a conclusion and architecting things, right?

And you probably don’t need to design at Google scale everything. It doesn’t have to scale to Google, right? There are things that are defined for small scale only. For example, let’s say there is an application that I want to design that is only to be used by my company’s engineers—it doesn’t have to go outside that. So why do I need multi-region deployment? I can do a local area network deployment and live with it, right? I don’t even need cloud there.

So those problems you need to understand. Then if you understand how many requests, how many servers would you need, or how big a database do you need, right? So if you start addressing those basic questions, I think you are already sorted and you are on the right track on that.

12. Have you ever seen a case where the interviewee has asked too many questions? Has that ever happened?

Archit Agarwal: Yeah, I have once seen one interviewee who was asking too many questions, and that particularly gave me an idea that the question that I have probably asked him is something that he’s not aware of.

For example, I gave him a system. He didn’t have any idea about the system. He’s never thought about that. He might be using that every now and then, but he has not given it a thought. But it is OK. Let’s say if I’m interviewing a very junior engineer, he might not have thought about a lot of things by then, and if he’s asking too many questions, it is still OK.

But if he’s asking questions that are very small, and I think those are very basic for that particular level of engineer, then it raises a red flag. But asking clarification questions is perfectly OK.

13. Now, as you’ve also said, a system design interview isn’t just about the final answer, right? It’s about how you communicate, how you adapt to the constraints you’ve sort of discovered during the conversation. Interviewers often value a candidate’s ability to clearly explain their thinking and reasoning—and the ability to adjust to constraints that are put in front of them mid-discussion, even. So in this context, how important are communication skills in these interviews, and what does good communication look like for a system design question?

Archit Agarwal: OK—so, honestly, communication is half of your system design interview. Or maybe it can be more. Let’s say if I am capable of designing a beautiful architecture in my head and I’m not able to communicate or explain it to the other person, the interviewer will see that architecture doesn’t even exist for them, right? Because you were not able to explain it to them.

So I have seen candidates who design very solid system design architecture, but they were either too quiet, or used too many jargons, or were too scattered in explaining the information. And in a system design interview, it is about how you communicate and explain to the other person the architecture that you are thinking about, because that gives insight into whether this person will be able to work with a team of architects, product managers, and junior engineers—whether they’ll be able to explain what they’re thinking. The system design interview is also intended to understand your communication skill as well.

On the technical side, there are a few things that I always suggest to everyone. Think out loud. You should not be silent for, let’s say, five minutes and you’re just thinking about the system. Start speaking whatever you are thinking. People need to know your brain’s commit history, basically—whatever you are thinking.

So maybe you are saying that, “I’m choosing this approach because of this thing,” or “given that this is the scale at which we are operating, this option makes more sense.” Start communicating your ideas. Maybe you are not communicating the right thing, which is good for the system—but once you communicate, when you read out loud your idea, you will automatically make more sense and you’ll auto-correct yourself, and it is perfectly OK if you’re auto-correcting yourself.

The interview should not feel like a monologue where you’re just speaking and the other person is listening. Because trust me, if that is happening, you should get the indication that you have already lost the session. So to do that, you will have to start structuring your answers. Basically, what you say is important, but how you say it is more important than that, right? So a good candidate would break the answer into multiple steps. Summarize things. Occasionally, start transitions—like, “Now I would go into, I would start discussing the data flow,” “Let’s start discussing the caching strategies,” these kinds of things.

Check if the interviewer is aligned to your communication or the approach that you are trying to follow, and make that interviewer feel that they are sitting with another engineer who is trying to collaborate and bring up a good system. That’s the intent that they want to see.

Your things that you say should not be meant to impress them. You are not there to impress them with a large amount of jargons that you say, or big words. You should be very clear, concise, and make sure that your communication is so clear that even if the other person is very junior to you, they can still understand. That’s the core of communication, right? Your communication should not only travel up the ladder; it should also travel down the ladder when you’re communicating.

Then listening is another advantage that you’ll have. If you’re not listening to the interviewer, you’ll not be able to respond to the feedbacks that they want to implement—or maybe you’ll not be able to adopt whatever they’re giving as feedback. So you should always try listening more to the feedbacks that the interviewer has.

14. Some really excellent tips there, Archit. But what happens if an interviewer throws a curveball—say, suddenly the constraints change? You’ve sort of thought it through really well. You’re in the flow, you know you’re doing really well, the goal is almost in sight—but this new constraint or change in scope is just thrown at you. So what’s the best way to handle this kind of situation?

Archit Agarwal: To be honest, I love when an interviewer throws these curveballs. Now, why? Definitely they’re not easy. When you are into the system design, you are halfway through and you’re almost there, and something changes—it’s really frustrating.

But, to be honest, that’s the real-world scenario, right? You’re always designing things, and suddenly things will always change. Your actual world is also in that same sense. So if you are not able to adopt, then there’s no point designing architecture, right? So if an interviewer is giving you a curveball, think about it as a chance for you to showcase your adaptability according to the changing scenarios.

So here is how I would ideally approach it. I would not panic, and I would not go ahead and start defending my original diagram, right? I would first absorb what they’ve mentioned and then say, “OK, this changes these things. Now let me think about how we can adopt to this.” Now this gives the other person a hint that Archit is flexible and he’s not egoistic on his design approach, which is one good sign.

Then I would restate whatever they have mentioned to make sure that we are aligned on the same requirement change that we have seen. I’ll always reiterate in my own words, right?

Then the third thing that I’ll do is start highlighting what part of the system will have to undergo changes and what part will remain intact. This also gives a very clear understanding whether I’m able to structure the redesign approach—understand what part of the system still can be the same and doesn’t have to.

The curveballs that the interviewer gives you—the changes that the interviewer gives you—will never be in a way that you will have to scrap the complete diagram, the complete architecture, unless you were already off the track, right? They want to understand: how do you plan what part of the system can remain as it is and what part of the system can change, and how flexible is your system to changes.

And if there is something that is complex, be honest. No one expects you to have knowledge on everything. So if there is something that is complex, think that you are in a two-way communication with an engineer. You can start speaking about it. If this is a complex thing, you can say that this is a bit complex and these are the trade-offs that we’ll have to make—and try and include the interviewer in your communication in those things.

So this is how you will succeed. System design interviews are not about being right all the time. They’re about how clearly you can think, how well you can explain, and how gracefully you can handle the changes.

15. Candidates are expected to know advanced concepts that used to be considered niche, and this continues on very well from what you were just saying just now. So, for example, in a scenario like designing a location-based service, it may be assumed that you have knowledge of geohashing or spatial indexes. So how should candidates prepare for this breadth-of-knowledge challenge that has sort of become more and more expected?

Archit Agarwal: To be honest, the bar is definitely raised. Now, once the things that were termed as “nice to know” are something that are considered that you should know with the same experience level. So I would not deny that fact, but here is the thing: I don’t think a candidate needs to be an encyclopedia on that side. If they are an intentional learner, it’s good enough—because no one can ideally learn everything. Tech space is too big for all that right now. There are a lot of things in tech space. No one can learn everything.

But having said that, in an interview, if you’re getting some question that is out of your league, you definitely will panic. So how I approach my learning and catering to those things nowadays is having four layers in your preparation module.

First thing: build extremely strong fundamentals. Your fundamentals are extremely important because any advanced topic you can term right now has always been starting from a basic system. There was a basic system which had some issues—that’s why this advanced system was innovated, right? So if you know the basics well—for example, you know how a database works, or how indexing in the database works—how can a distributed system fail, or what are the different consistency models, right? If you know these basics, it is more than enough for you to start establishing your knowledge in those advanced topics. So make sure that your fundamentals are very clear.

Then learn the advanced topics through real problems. I would not just go ahead and keep reading articles or books around those advanced topics. I would just say: let’s say I want to start understanding geohashing—so I would not just read about it; I would design a food delivery app to understand geohashing. If someone says that I want to understand Kafka semantics, just don’t read about it. Start defining or designing a real-time analytics system where you include this topic, and that’s how you will deepen your knowledge in these areas.

Now after all this, pick up two to three areas where you will go deep. Because personally, I believe you should have deep knowledge in one or two areas at least, because when you go into an interview, the depth of the knowledge is directly reflected—because that topic you will be speaking more, right? And trust me, any engineer who is interviewing you, if you go deep into one particular topic, they understand that this is some area that you are more interested in. And if you’ve gone to that depth, that means you are already an engineer who understands the gravity of things. So you can maybe think about systems that you can go deep into—like, for example, a distributed system, or a storage system, authentication system, or maybe go deep into performance engineering.

Then practice is important. Practice articulating how you can discuss the trade-offs. Maybe ask a friend to sit with you and talk to them on the trade-offs. So once you start communicating and your friend gives you feedback, you will start improving your communication skills on the discussion of those trade-offs. So that is the fourth thing that is very important.

16. If you turn the lens inwards a bit from your perspective as a system design interviewer, what is your process for evaluating a candidate’s depth versus breadth?

Archit Agarwal: Honestly, a system design interview is not about the diagram and memorized architecture. It’s about building a thinking muscle more, right? Most people try to study system design like a subject, but I would say: think of system design as a skill that you are adding to your bucket. It’s a skill you need to improve with structured and deliberate practice. Start with strong fundamentals—that’s what we just discussed. You should have strong fundamentals.

Then work backward in a real-time system. What I usually do is I question myself on a few systems. For example, if we are using WhatsApp—everyone uses WhatsApp mostly—so I would think about how WhatsApp is able to scale the messaging server. And now I will start exploring articles, blogs, engineering blogs around it, and start understanding how we can do that. Or maybe how Netflix is able to scale the streaming globally—that’s a completely different engineering challenge. How is Netflix able to do it? So start backward, think about the system, and then start researching about it.

Then start building things. Maybe you don’t do it at a global scale, but at least when you start building, you will understand the challenges around latency, or maybe race conditions, or all those constraints that you think about. You start feeling that, and you start solving that.

And then the last part is definitely: learn to communicate. Because if you don’t learn to communicate system design interviews, you’ll not be able to excel there.

17. Do you recommend any specific resources, books, or specific real-world exercises for mastering system design concepts and being interview-ready—especially for senior engineers aiming to showcase their expertise?

Archit Agarwal: See, for someone who is aiming for a senior role, I would definitely suggest a mix of a few things—starting from a book, real-world blogs, and then real-world exercises. For books, I would recommend you should definitely read Designing Data-Intensive Applications by Martin. That is a must-read book for any senior engineer who is aiming to excel in system design. Then there are books like System Design Interview, Volume One and Volume 2 by Alex Liu. Those two are very good books. Then Building a Microservice by Sam Newman.

So those are a few very good books that have been written, and if you read those books, you’ll get a lot of understanding on system design. Then you can refer to some engineering blogs by big tech giants. For example, Netflix has an engineering blog. Uber has an engineering blog—and all those big tech giants who are in the technical space and have big tech infrastructure that they maintain, they always have engineering blogs. Go refer and read those blogs. Go to high-quality YouTube channels where they’re not just discussing the diagram—they’re discussing the concept, more depth into the concept. So refer to those channels, in case you want.

And then finally is designing a system which is time-tested, scale-ready, and you have done that. So system design interviews isn’t cracked by memorizing some answers. They’re cracked by building strong foundations, really practicing problems, and then thinking like an engineer, not an exam candidate.

Coroutines vs Virtual Threads and the Kotlin Java Decision in Practice: A Conversation with José Dimas Luján Castillo and Ron Veen

Divya Anne Selvaraj — Thu, 12 Feb 2026 07:58:23 GMT

Kotlin has moved from “Android-first” to a practical option for Java teams that want safer, more concise JVM code without abandoning their existing Java investments.

In this conversation we speak with José Dimas Luján Castillo and Ron Veen, co-authors of Kotlin for Java Developers (Packt).

José is a mobile-focused technologist with 15 years of experience building Android, iOS, and Flutter applications and leading teams globally; he has worked on 500+ mobile apps, written 7+ development books, and taught at 25+ universities across Latin America.

Ron is a seasoned JVM engineer with 20+ years in the Java ecosystem, spanning mainframes to microservices; he’s an Oracle Certified Java Programmer and Sun Business Component Developer, serves as a special agent and lead developer at Team Rockstars IT, speaks at international conferences, and has authored books on Jakarta EE cloud-native migrations and modern concurrency (including virtual threads and structured concurrency).

Together, they discuss what it takes to make Kotlin a first-class citizen alongside Java in production: writing idiomatic Kotlin, choosing between coroutines and virtual threads, modernizing enterprise systems with Jakarta EE, navigating microservices versus modular monoliths, and adopting modern Android and cross-platform approaches such as Jetpack Compose and Kotlin Multiplatform.

You can watch the full conversation below or read on for the complete Q&A transcript.

Kotlin and Language Evolution

1. From your perspective, what are the biggest benefits Kotlin offers to today’s senior developers compared to Java? And in what areas do you still find Java holding its ground?

José Dimas Luján Castillo: When we started using Kotlin on Android, the difference was very obvious because Java was too verbose. Android is doing some things in mobile development with the Java characteristics. So, for example, it was very easy to see the real benefits—for example, null safety. It’s automatic. So in this case it was very, very fast. The interoperability with Java—because at the end, if you have legacy code in another language, it’s where you want to try a different language just because it’s modern, even if you have details with the language, because you will have a lot of problems if you change to a new technology, framework, or language. So interoperability at the beginning—obviously, we didn’t believe we had good interoperability. But when we tried it, we saw, OK, maybe I can do the next steps in my applications with Kotlin, but I don’t need to fight with the legacy code even if it’s in another language. So I think interoperability was a good point to start with Kotlin for mobile development, and obviously we have other things as pushing the programming in a very easy way to add it—or the synchrony. But I think those two points to start for any mobile development, when we try to start with Kotlin, it’s a good point. Well, at the end we need to remember the first step for Kotlin was mobile development, and it was very easy to be clear if I would start with Kotlin—but if Kotlin doesn’t use this null safety and interoperability, probably the adoption was slow or more complex for the people. Because, as I mentioned two minutes ago, the main problems are still there even if you change the language. So I think that’s my point for this comment.

2. How should engineers decide when to use Kotlin versus Java when it comes to new projects?

Ron Veen: Yeah, I think my experience comes more from enterprise and not so much mobile development. What I’ve seen there is, well, there’s a natural eagerness from developers to learn new things. And like José said, when you switch from Java to Kotlin, it really feels—I don’t know—it makes programming a bit more fun again. That’s something I really found. I could see developers getting enthusiastic. I think one of the benefits is you actually have more concise code, so you write less code. And also, like mentioned, the code tends to be less error-prone. You know, null safety is really a big thing. You can actually see that a lot of frameworks are working towards that now, also with regard to new Java versions, but nullability will always be a thing in the JVM, and Kotlin takes care of that for us.

Now, why would you adopt this technology? Well, I think there could be a number of attractive points. Again, like I said, there will be less code—and less code is good: less code to maintain. There are less errors in the code. It might make your development team actually more attractive to new hires because you’re using a very modern language. And we shouldn’t forget Java really evolved. Java had Project Amber, where it added a lot to the core language—like records and sealed classes and pattern matching—things that already were in Kotlin. Java gets them added slowly—they’re getting added—but I think Kotlin is just always quicker with adding those new features that developers really crave.

3. Many Java veterans fall into the trap of writing Kotlin in a Java style. What are some common mistakes or mindset shifts that Java developers need to overcome to write idiomatic Kotlin?

Ron Veen: I think that’s the thing we all get to at some point. We start with Kotlin and then we write it in this Java style, right? And this is not what we should do. It is a natural reflex—let’s be honest—because first: OK, I want to do a new language, but I also have a project to finish, or a task to finish, or a sprint to complete. But also, we’re still trying to write code in the way we know it and just use the little things. So that can kind of give you some problems there.

But sometimes, as a developer, you have to be willing to relearn the language. You know Java, and now you’re going to do Kotlin. You can do everything in Java style in Kotlin, but you really should try to relearn the language. It’s not a drop-in replacement. You really have to be willing to learn.

I can remember I once pitched it at a project and developers really got, “OK, this looks good, this looks fine.” And then, just to force myself to really do it in a Kotlin way, I tried to shrink the number of lines that were in there. The original from 250 went down to under 100 or something, and that was not technically needed, but it was: “OK, how can I leverage Kotlin’s native way of doing things and make things faster?” So sometimes you just have to pick up a piece of existing code and decide to rewrite it completely—and forget about everything you know about Java.

4. Could you share best practices to avoid “Java-esque” Kotlin and fully leverage Kotlin’s language features?

José Dimas Luján Castillo: As Ron mentioned, that’s the thing. When we start with Kotlin, I think 99% of developers start by just translating the code. That’s the problem, but it’s part of the beginning—because we have one line of code in Java and we want to see how we need to write this line of code in Kotlin.

But later you hear about the benefits in Kotlin, and if you are translating each line, you will see you don’t have less code—you have the same code in a different way. Maybe it’s easier to read, but it’s the same code. But when you start asking that kind of questions, you will see the benefits because you will start trying to look inside how Kotlin is doing that. And you will see five lines or 10 lines in Java—maybe it’s two lines in Kotlin, or three. So you will see the real perspective from Kotlin, and this is the first step, I think, to look at good practices.

For example, you will see not only fewer lines or differences, but the achievement in these cases. For example, immutability first—you will see, “OK, I need to think in immutability first,” because it’s a different way to start the problem. So, for example, you are automatically using good practices in the general programming work without doing anything, because you don’t know sometimes, or you don’t have it clear. We are creating in Java, but we are not thinking in immutability first. So after a couple projects, or some examples you are trying to execute, you will start seeing you are using good practices sometimes—because it’s natural here in Kotlin as immutability.

But now you can do the same thing in other languages, even Java, OK? But you need to do something to have this in your code. In Kotlin, sometimes you do good practices by default. You don’t have to change the code to have it because, by default, you have these good practices. And you will see a lot of those cases—not just immutability. For example: the collections and streams, the lambdas. Lambdas is the same base. Lambdas is not something advanced when you are learning other languages; it’s not the first subject, the lambdas, for example. And here in Kotlin, when you are starting the code, you will see lambdas since the beginning sometimes. Even if you don’t know it’s a lambda—you know it’s code—but later you will see it’s a good point to start doing something. So I think that’s a good way to start using good practices, and sometimes you don’t even know you are using these good practices.

5. I’d like to get your perspective on concurrency. Java’s recent releases have introduced virtual threads and structured concurrency to simplify multithreading. According to you, how should engineers approach concurrency now? For instance, when building a high-throughput service, how do you choose between Kotlin’s coroutine model and Java’s Loom-based virtual threads?

José Dimas Luján Castillo: Well, I think the way to—maybe it’s not the easy way to understand that—but I think if you are coming from Java, you can see concurrency as a model. It’s a—we have a definition about how we can do some stuff—and coroutines is a model, a structure, to think about it.

What are the important or main parts for this? You have cancellation, you have propagation, you have control of the whole cycle. That’s the thing. You don’t need to think too much in all the scenarios because you have the structure. So I think that’s the main concern when we are trying to understand, because sometimes you need to think by yourself in all these scenarios, but not in this case. Coroutines is very clear: you have the ingredients to start doing anything—OK—with coroutines.

Versus, for example, virtual threads: you will have a way—a very simple model—for example, it’s very traditional. We have it very, very clear, because the thread is a—Way to do the things. We have excellent things, but at the end, they can work together, because maybe you can use one for some specific problems and you have other options for other specific things.

It’s not a model about who is better, because that’s another thing. A lot of people, they want to know the faster way to do that: “Let’s go to the way to do that.” And it’s not the case about, in this case, coroutines. If you have some specific problems and you don’t have the model, well, think about: we have coroutines. But if you have a very complex situation, for example, and you need to put less complexity in your system, maybe you can use threads, for example.

That’s the thing: it’s not too complicated to use, for example, the Java virtual threads, because if you have a very complex structure with layers, probably this is the advantage—you can do that. But if you don’t have that kind of level of complexity, you can use coroutines, for example, for other things.

And at the beginning, obviously, the main complexity when you are using the JVM to do that kind of task—the problem is, you have a monolithic focus or scope for that, and now you don’t need to think just in the monolithic way to have an answer. I think that’s my comment for this.

6. How you see these two concurrency paradigms (virtual threads and coroutines) complementing each other in practice.

Ron Veen: Yeah, first, we should say that I think both virtual threads and coroutines basically try to solve the same problem, which is executing async code, but making it appear sequentially in the code so the developer can really reason about, “OK, what is the flow of my application?” So there’s not this thing called callback hell, where we have a million places where there are callbacks all through the code.

Both systems, I think—really, even though in the end they’re running on native operating systems, that’s both virtual threads and coroutines—they’re actually managed by the runtime itself, and it just switches the threads to execute things. So you can really see that throughput is a lot higher with both of them, where, on the side of virtual threads, it is really good for blocking—when there’s blocking things in between. You shouldn’t—like, if you have very memory- or CPU-intensive action, I don’t think virtual threads, in general, is a good solution because it won’t give away control.

But in general, like José said, I don’t think it’s like a race between, “Oh, you should use this,” or “You should use that.” I think you should sometimes just look at what is best for your specific situation. Of course, virtual threads, they also have structured concurrency, which they build upon, which gives you a very nice framework—so that could actually be a good reason to choose.

Again, if you have these difficult, really difficult use cases in general, I think both virtual threads and coroutines is a paradigm shift for developers in their mindset because, normally, as Java developers, we were always told, like, you know, threads are really expensive. So you should be careful with it. We should pool the threads and stuff like that.

And I think what we see now is, with both of these things, that we can actually—calling a coroutine or calling a virtual thread is just as cheap as calling another function in your code. So it’s also a mindset of where you want to use this in your application.

Enterprise Architecture and Team Practices

7. Enterprise Java is undergoing a renaissance with the release of Jakarta EE 11, which brings a modernized, cloud-optimized platform—introducing features like the Jakarta Data API and aligning with Java 21’s virtual threads for scalable concurrency. Ron, given your experience with cloud-native migrations, what do these developments mean for teams looking to modernize legacy Java or Jakarta EE systems?

Ron Veen: Yeah, I think with Jakarta EE 11, the whole ecosystem made an enormous step because now, for the first time, Java 17 is the baseline language level. So we’re really optimizing here. That means suddenly we can use things like records. Now we can use the switch expression, we can use non-sealed classes—all these things which were added by Java via Project Loom—they’re now actually available to also use on the enterprise side. So I think that’s already quite interesting.

Like you said, there’s this new API, Jakarta Data, which has this repository-based approach. And for people familiar with, for instance, the Spring Framework, it’s very close—not similar, but very close—to the repository pattern that Spring uses. So I think that is really good.

They also came—well, they already came earlier—with the Core Profile. Jakarta EE has multiple profiles, and the Core Profile is very interesting because it’s a very slim runtime that you’re getting then, which means it’s ideal for microservices situations.

But yeah, I really think it uses great chances. It promises Java 17 as a minimum standard, but there’s also this technology compatibility kit that goes to Java 21. And then you’re right—suddenly virtual threads also come within reach. And actually, using virtual threads, you have to run Java 21, of course, because that was the first official release where it was a final version. But there’s this thing called ManagedExecutorDefinition, which has this property “virtual,” and if you only set that, you can actually use virtual threads in your Java applications—or Java EE applications. So I think they’re making a real big step.

And about the migration part—just to get back to the question about the migration part—I think there’s many steps that you can actually take. But if it is a simple upgrade, you should first see: Am I already on Jakarta EE at all, or am I still on Java EE?

Now, if you’re still on Java EE, then there are multiple ways to migrate your sources, right? There’s this book—you’re right—from Packt Publishing, where we actually outline a number of these things. There are tools that you can use. There are even dedicated plugins for IntelliJ, for instance, that can help you a lot. It’s a lot about namespace conversion, but there are a few other tricks as well.

Now, if you are already on the Jakarta EE part, then I think upgrading is actually quite simple—basically upgrading your Java version. And this Jakarta Data project is actually quite interesting, and I would just advise architects and developers to say, “OK, use it if you’re building a new feature,” because it’s completely backwards compatible with JPA. So it has the same—you know—so there’s nothing new you need to change. If you write something new with new database tables, just try to use this Jakarta Data for those specific situations and services.

8. How can architects best leverage these new tools—like repository-style data access and virtual threads—when migrating monolithic applications to cloud-native microservices, and what common pitfalls should they watch out for during such transitions?

José Dimas Luján Castillo: Yeah—for example, about the first part, the modernization: it’s clear the enterprise is not dying. We are evolving everything. That’s the first part because, obviously, a lot of people love these phrases to put in a blog. But now, I think the enterprise is very clear—it’s evolving.

And the idea is, obviously, we want to create good tools for everyone, and we want to create good code for legacy, because at the end you will keep using it.

And about microservices and monoliths—obviously, you will find a lot of discussions on the internet and Reddit and blogs and YouTube channels, and everything about what is better for your project or your company. The thing is: the problem is when you are just reading without the context, and you need to understand each context is different. I think that is the first part.

The second is: microservices, in my experience, is not the goal, OK? It’s the consequence for that. Because when you always see microservices as the goal for each software developer or architect, probably that’s the issue—that’s the problem. Because if you don’t have developers with experience, and you don’t have a good architecture definition, you will have microservices with the same problems as monoliths. That’s the thing.

The other part is: the monolith—in these cases, people are looking back sometimes, in some cases, at monoliths as old code. And obviously, that’s a lie, because you can use monoliths without problems for huge systems, and you can save the complexity of microservices. You can use monoliths to scale. Actually, you don’t have to always use microservices to scale. It’s easier to understand and to maintain this because you have less complexity in your code.

So the question to use—“Hey, do I need to use microservices, yes or no?”—for me, I always try to answer a question before, or maybe two questions: Are we prepared for paying the real cost to use microservices—not just the money? (and) are we prepared to use this architecture or not? Because if you are doing this before you have the knowledge for that, probably you will have a huge problem. Because it’s not too easy to change your architecture, change your definitions, your business rules, and everything—and then you need to roll back to monolith because you are not prepared.

That’s my concern, obviously, when I work in teams and they are just focusing on changing to microservices because they read it in a blog. OK—that’s my recommendation, always.

9. What criteria or signals help you decide if a modular monolith might be a better choice for long-term maintainability, and how do you manage the trade-offs for scalability and team ownership?

Ron Veen: Well, this is really kind of a subject very close to my heart because I guess we’ve all been through this wave—there was this FAANG group: Facebook, Apple, Amazon, Netflix, Google—and they all came with microservices, right? So we thought, “This is the way we need to go. This is the wagon we need to jump on.” And I think a lot of us did. And like José said, I think we also found maybe it’s not the right choice, because you should ask yourself: if you have 200 employees, are microservices—for your business—the right solution? Can you actually afford it?

Because microservices, on the DevOps side, require a lot more overhead. There’s a lot more monitoring involved. You need to watch these services, you need to see what is coming out, so there are a lot of metrics needed to keep them running in production. You need to make sure they’re still running in production, because before you had just one monolithic application.

And another thing is that monoliths have become like a negative name, right? Like dinosaurs almost—which it isn’t. I mean, it’s still brilliantly functioning code running on application servers that have been there for a very long time. So there’s nothing bad about them. They’re just being opposed to being “old” compared to microservices.

So you have one running, and that’s easy to monitor, it’s easy to do logging, it’s easy to do debugging as a developer—you know, you just do it remotely. But when you’re switching to a microservices platform, you’ve got dozens or hundreds of services running. You need to make sure each of them is still running. And if you trace a problem, you need to somehow combine all the logs so that you can actually go through the logging and try to find out what happened. So there’s a lot of work there trying to debug a problem with microservices—they’ll keep you busy for a bit. So there are a lot of things there that are happening, I think, where you have to be really careful.

And yes, like José said, it should never be a technology choice, but it should really be a choice based on what we need. From my experience, what I have learned is: I would start with what I would call modular monoliths. The classical monolith is where all the modules are intertwined without very clear boundaries. If you have a modular monolith, basically what it would mean is you still have your modules, but they can’t interact directly with one another. There’s a predefined API somewhere, and one module is only allowed to access another module via this predefined API, which really makes the code less cluttered and far less risky to change—so things will break less.

Because if I would go find an API, then the API will change and I will see it—or the API won’t change, but I can change internal methods without something breaking there. So my suggestion would always be: start with this modular monolith, be very clear on your boundaries, force that we go through APIs, and then just monitor the system—watch it.

And if there are typical situations—like if you find there’s just one module, let’s say a customer module, that requires a lot of changes, which means a lot of redeployment—then that might be a case to say, “Well, maybe I should factor this out, make it a microservice,” and combine the two. There’s nothing against combining microservices with a modular monolith. Or maybe there’s different resource utilization, different scaling requirements—well, that could be another case where you say, “Okay, if this is really the case, maybe I should factor it out.”

But even then, I would say: look at how much the costs are, and only when the cost of keeping it in starts to outweigh the cost of switching to microservices—that would be the moment to make this choice. But again, microservices require a different mindset. You’re also getting the distributed tax that you’re paying, and that can be really, really expensive, and you should be really careful. But I guess then we’ll get into the realm of domain-driven design and bounded context, which might be a bit too much for now.

10. It has been said that introducing Kotlin into an established Java codebase isn’t just a technical change but a cultural one. Key advice from Kotlin adoption experts is to “win the hearts and minds” of skeptical Java developers—not only through hard facts (letting the improved code quality speak for itself) but also via soft factors like easier onboarding and community support. Showing Kotlin’s concrete benefits—for example, its focus on safety and conciseness that addresses fundamental Java shortcomings while staying 100% interoperable—can help gain buy-in. Drawing from your teaching and leadership experience, what strategies do you recommend to gradually introduce Kotlin in a Java-centric organization?

Ron Veen: Yeah, I think what you always find if you introduce something like Kotlin into an organization, you should be very aware that there will always be some gurus in the company who are very focused on Java, who know the inner details, and who might actually see bringing in a new language as a threat to their supremacy. So I think the most important thing is to get them on board, because in general I would expect that 60–70% of developers would be really interested and say, “Oh yeah, we’re going along with this.” So maybe it is not direct.

What we sometimes did is we had a team with some enthusiasts and some critical people, and we let them develop something new. It could either be—if we’re doing microservices—I mean, microservices are brilliant for this, right? Because you can choose whatever technology you want for your new service, so Kotlin would be a great choice there. So that would be a good thing: have this team of skeptics and enthusiasts and then see how they work together on the problem. Because if you just do the one or the other, we’re not really getting everyone on board.

That would be a really good approach, I think. But I sometimes also have done—you have this thing called Advent of Code, right? This is, right now at this time, the period leading up to Christmas, and there are new coding challenges. You could say with your team, “You know what? We’re going to do Advent of Code, and this year we’re going to do it in Kotlin,” and try to see how we would work it out there. So that could be a thing.

Of course, you can do things with hackathons in your own team and say, “We’re doing these coding sessions, and let’s try to explore how we could use Kotlin there.” And finally, I think if you would do code reviews with the whole team, you could also go through Kotlin and gradually explain: “OK, what are we doing here, and why are we doing it this way in Kotlin?” That makes people really see, “OK, I’ve done it the Java way,” and I think that really plays—for instance with collection classes—because Kotlin, that’s such a great collection of, well, collection letters, and then people in the code—with you, the skeptics—can see, “Oh wow, this is actually quite nice,” and very concise how we’re writing the code.

Enforcing it upon someone, I don’t think it’s ever going to work. At least they’ll probably do it, but they won’t do it by heart. So I think, in general, with these steps you can actually win their hearts.

11. How can engineering leads support their teams’ upskilling and address any resistance, ensuring a smooth transition without disrupting productivity?

José Dimas Luján Castillo: I think maybe the good news or the bad news—it depends—but it’s more about a leadership issue in this case. Because I think we have a beginning that’s very clear: we need people saying the next steps for that. Because if you don’t have this follow-up about leadership, probably you will see a mess in some cases.

That’s good news because if you have the person in your team, obviously it’s an easy way to do that. But if you don’t have it, you need to create it or you need to hire these people, because that’s the thing. Maybe you have people with good leadership, but without the knowledge for the adoption.

Because, for example, the most common errors when you are trying to do that: you need to start with small limits. If you’re trying to migrate everything, or the most complex in your legacy code, probably it’s not a good idea. So my recommendation, for example, in some companies when they ask for something similar is: start with the small models. Because you need to see—if you think it is good writing code, that’s correct because they are working with you—but they really understand the business. Because maybe you will see they know the new business, not the old business, for example, because you are using old code sometimes.

The other part is: it’s a new language, it’s a new paradigm, it’s new code. At the end you need to see if you have a problem with the adoption because sometimes people understand the features, and it is not the real benefit. They understand the modern things, or they understand the faster way to do the things, but it’s not the same as the best way to do that. So don’t be closed for these situations, because obviously the most important is: OK, we will do the migration. I am not saying this is the best way to do the things because probably the team can find a best way to do the things, because they really put the hands on the code. But if you are very closed with that and you say to the team, “Just follow what I am doing,” probably you will miss a good part of the code from the team.

So that’s the other part: just go talk. OK, it sounds weird, but when you are using the team for the migration, probably they are the best way to understand the situation. Because you can’t imagine how the thing is working, but when you are rewriting the code you will see the real things. Maybe you have good code and you don’t need to rewrite this part. Maybe you have to rewrite another model, not this one, because you will see the benefit—and maybe the benefit is not too good to change it. Maybe the benefit is in another model. Sometimes that happens.

Because the problem is: if you think the leadership—or just one person—knows everything about the project, probably you are not with the correct answer. Because people need to put their hands on the code for that.

And just to close that: you need to think in your team. In these cases, you need to trust in that team to take some decisions. Obviously they need to explain to you or explain to the company, but probably they are the best way to understand the problems in those cases.

And you need to give space to the teams to play with new toys because probably they have the answers, but they don’t know yet. So if you let them play with some new features or some new code, probably you will find good things in that.

And obviously don’t try to read productivity in guidelines—if you try to read it as: “If you have more lines of code or less lines of code,” probably you will miss something. So in this case my recommendation is: we need to avoid productivity as “more lines of code or less.” Just keep trusting in the team and decide together if it’s a good part for the business—if we need to change something or not. But if you don’t open the code and read it and try to play with that, you don’t know if the adoption is a good idea or not.

Mobile Development and Cross-Platform

12. The Android development landscape is in constant flux. As of 2025, we’ve seen the rise of Jetpack Compose for declarative UI, more sophisticated modularization of apps, and renewed emphasis on clean architecture principles for maintainability. Based on your experience building hundreds of mobile apps, what do you consider the three most important architectural practices or patterns for modern Android development?

José Dimas Luján Castillo: I think the last two years we really see why they think to start using Kotlin. That’s the thing, because for Jetpack Compose, I think that’s the main concern.

And if I need to say three keys for that: I think we need to separate the responsibilities. Even if you are not using it for Android, I think if you can separate your responsibilities in the code, it is always a benefit for everyone—for testers, for developers, for product owners, for everyone—because it’s very easy to understand each part for everyone.

Trying to use the definition of modules—not because it’s a new way to do that. Maybe that sounds like, “We need to separate everything,” but no. The thing is: if you modularize everything in your project, it’s very easy to add these features in this module as a feature, or delete, or change this module.

For emergencies, for example: maybe you have a critical bug in some part of the code, but if you have modules you can just put it in another part and you can continue with the regular functionality.

So I think that’s the second. And the third one: I think the architecture is changing. You need to prepare your architecture ready for change, because when we create code, we think the code will always be the same code—that’s the problem.

It is very complex. People try to create architectures ready for new changes, because we don’t have the time, we don’t have the money, we don’t have the team for that, but I think great developers and great leadership are always preparing the project for changes—not just for the new framework.

Because if you prepare your project just for the new framework, when we change the paradigm—in this case, for Jetpack Compose, because it’s a really different way to do mobile applications—probably all your code is not ready for these changes. That’s the deal, and that’s the problem. So now you will see a lot of mobile applications where the new features are working with Jetpack Compose, but the old ones probably will never migrate because they are not ready. They don’t need to keep waiting a miracle to change the modules just for the new Jetpack Compose implementation, for example.

So that’s my three keys. And in my context, in my current projects or last projects, I always ask, for example, when I will implement a new technology: what kind of problems will we resolve with that? The cost of the adoption, and the impact for the long term.

That’s my three main variables I have in mind, and I always try to put a number: how many problems I will solve, the cost of the adoption—how many people and how many resources we need—and the impact we will have: long term, mid term, or short term. I think with these three variables I create a table and share it with the team, and after that I have a very clear situation. We decide if we need to follow these new technologies. That’s what I did in the past.

13. José, with your background in Android, iOS, and cross-platform frameworks like Flutter, how do you see Kotlin Multiplatform fitting into real-world projects today?

José Dimas Luján Castillo: I think we are in a very interesting situation in mobile development right now because we have a lot of actors—maybe serious actors—because in the past we had a lot of actors. But the problem is, I am not saying I am not a fan of JavaScript, but the problem for development in the beginning for multiplatform is they had a lot of JavaScript frameworks, and at the end it’s not the same case because it’s not a real multiplatform situation.

But now we have a lot of multiplatform situations. We have Flutter, we have SwiftUI, we have Android with Kotlin Multiplatform. So you are really creating a codebase for using in the other option of mobile development without having to add some patches for the situations. So it’s a real competition for who is the best option.

Now, we don’t have a clear winner for that because that depends for each team. For example, if you have an iOS team and you have five developers using iOS and Swift, why do they need to learn Kotlin? Maybe it’s not the best case for that. Maybe you have the Swift situation. That’s the real thing because you will need to pay—you will need to use money—for this migration at the end.

But now we are in the most easy way to create multiplatform applications, even for mobile developers, even for Android developers or iOS developers. So we have a real situation where we can separate the logic from the UI without problems. With Kotlin Multiplatform, it’s very clear to do that. You don’t need to be expert on Android or Java or Kotlin. If you read the code, you will understand in a very easy way—even because, when we write the code now, you will see a lot of web programming paradigms.

For example, the reactive way to do websites with Vue or React—you will see, when you read the code in mobile development, it’s very, very, very easy to understand. You will see Flutter is OK, React Native is OK, OK, but at the end Kotlin is part of the body of mobile development—and safe in this case too.

So we need just to wait—maybe a couple of months, or maybe a couple of years—to see the real impact of Kotlin Multiplatform. The way to understand Kotlin Multiplatform is very easy if you understand Kotlin. You don’t need to understand too much. Probably you will need to take two or three features because maybe these features are not too famous in backend development, for example, or in the Java world. But it’s two or three, not too much, because at the end you are using your regular Kotlin in backend—you can just use it in mobile development. The problem is you need to understand how the operation system is working. That’s the only thing.

The tooling is very advanced. That’s the other part—you will see Kotlin Multiplatform is running too fast for these changes. They are creating and sharing tools each three months, four months for the people, the ecosystems—and Kotlin too. So it’s very, very good news for the developers because you will see two or three really good maturity tools each four months.

So I think that’s the advantage to use Kotlin Multiplatform for mobile development versus Flutter or React, even if they have more years, because now they are in other stage. They are trying to increase the developers in the ecosystem of Flutter or React, but Kotlin—they don’t need to do that because they passed this stage. Because if you are using Java, you can use Kotlin. If you are a Kotlin developer, you can use Kotlin Multiplatform. I don’t need to convince you to use it because I don’t have the problem with the knowledge. You have these tools—just why is not the problem?

The question is: Why are you not creating mobile applications with Kotlin Multiplatform? That’s the real thing because you have the knowledge, you have Kotlin, you have Java knowledge. Maybe you just need to take a look at the operation system. I think that’s the real situation for Kotlin Multiplatform. They are just waiting for more people in the ecosystem because if you know Kotlin, you can do that.

14. We’ve been talking about Jetpack Compose, we’ve been talking about Kotlin Multiplatform, which are, you could say, new technologies or new approaches, and we talked about them in the context of adapting them into large-scale apps. But when it comes to a large-scale enterprise setting, what challenges should teams be aware of when adopting a new technology or approach while they aim to access benefits such as maximizing code reuse without compromising native user experience, and so on? What are some best practices or difficulties that teams should be aware of when trying to bring something into the stack, so to say?

Ron Veen: Yeah, that is an interesting question because I guess one of the core reasons also for Kotlin Multiplatform to be there is code reuse—specifically, if you have business logic, that’s really the code you don’t want to have distributed over different platforms, but you would like to have centralized.

So I think everything goes down to that, and that you should really focus on which parts you really think you should share. Because if, again, you look at code reuse, and especially in enterprise environments—reusing parts—it depends on your architecture, doesn’t it?

Because if you have this classical architecture where you have one large single deployable unit, then code reuse could be quite easy. If you get to the microservices situation, well then code reuse becomes a little more tricky, I think. And then, you know, the whole DRY thing—“don’t repeat yourself”—suddenly becomes a bit more fluid, that you could say, well, sometimes we just don’t want to reuse in order to maintain the independence that microservices should have.

So, again, here I think it all depends upon what is the architecture you’re trying to support. So I think there’s no one-fits-all solution here—so, no.

About the Book: Kotlin for Java Developers

15. Your book, “Kotlin for Java Developers” is aimed at software developers proficient in Java who want to learn Kotlin for professional development – it’s especially relevant to Android engineers, JVM backend developers, and full-stack Java programmers maintaining legacy systems. What inspired you to write this book together?

Ron Veen: Well, the good thing is—Jose actually—the bulk of the work was already there, right? So I actually came later to the project, and the majority of this book—and the credits for the book—should really go to him, because he had already written a large part. I think I only added a few chapters.

But I think I came to the project for a simple reason: anyone who writes code—if you have, like, 10 developers and you say, “Write this piece of code for me,” then you’ll get 10 different solutions in the end. It could be typical small things or something, but everyone has their own style. You can always see it with code reviews: “Shouldn’t we do it like this? Why not like that?” And I think it was the same here with the book.

I think Jose had already written the majority of the book, and—just like with code—it’s always good to have a second opinion about it, and that’s basically where I came in. We started chatting with one another and talking about, “OK, should we rewrite it like this? Is this better?” Just fine-tuning a lot of things and having two perspectives on the book.

So that’s how I got to it—or actually what I did on the book. How I got to it was easy: I was approached by Packt. So I said, “We have this book about Kotlin.” I thought, “Yeah, Kotlin—that’s great.” That really touches me. I think it’s a great technology and should be used more often. So yeah—if there’s a way that we can spread the good news, I would really love to do that.

Then I got to see all the work that Jose had already done, and I thought, “Well, this is just brilliant,” and yes, I’d love to be a part of this. I just hope that Java developers really see it.

What I really liked about the approach to this book was it’s not like I’m telling you Kotlin from A–Z. I really love that we have this approach where we say, “Wait a minute—you’re already a Java developer. I don’t need to teach you what loops are, or iterations, or, in general, what functions are, or lambdas. You already know that. I’m just going to teach you how you can make the transition to Kotlin.” That’s why I think there’s a lot of information in there for Java developers in a very concise way, so it’ll save them time if they use the book to switch.

16. José, What was your inspiration? What specific gaps or common struggles did you observe among Java developers that made you think, “I need to write this book, and I need to help them”?

José Dimas Luján Castillo: Well, I need to say something at the beginning, because Ron said I started the book early—but yeah, that’s true. But actually, Ron’s part was very important for the project. I was stuck on a couple things, and he read it and he made the right suggestions. I think that was the point to try to move forward, because obviously we have a lot of complex situations to understand.

I wrote a lot of books in the past, but this is the second in English, because it’s not my main language. Obviously, that helps a lot, I think—maybe not for him because he’s very good with English, but for me, because I think that helped a lot for that.

And about the coding: when I start thinking about how I need to solve—or I want to explain—the things I know… I am a teacher too. I was a teacher for 18 years, maybe. And I note, for example, when I try to tell people—like at the beginning, 18 years ago—it’s very easy. It’s not the same case when you have developers with previous experience.

So I take that in my mind, because I know a lot of people want to use Kotlin because they are coming from Java—that’s the part. It’s a different way when you are starting from zero or scratch, but when you have developers with experience, it’s a very different way to do that. So I take that in mind while I was writing the book.

I was thinking too much that the other part is: I need to take time to explain the syntax. But at the end, the problem is the way we try to focus the situation. For example, am I trying to explain how we can write this line of code in Kotlin, or maybe we need to… I think we really need this line of code, because maybe in Kotlin we don’t need it.

That’s what I always try to put in the reader’s mind: if we need it, OK—this is the way to do that. But the question is: do we really need this line? Because maybe in Kotlin we have another way to do that, and maybe we don’t need to use it. That’s my second question always when I try to explain something, because that’s the real way to create the bridge for Java developers to Kotlin.

OK, you will need—maybe you will need that knowledge more; maybe it’s better if you don’t need it. But you know why you have the answer. Or maybe you need to find the answer when you are writing the code. That’s the other part.

The other question—or comment—I always have in my mind is: we need to be respectful with Java. Since the beginning with Kotlin, when we try to sell the idea to use Kotlin, I don’t like too much, actually. I have a very weird experience in the past when I was a Google Developer Expert and I was the third Kotlin Developer Expert, and the problem is: they removed me because I’m not always saying the best option is Kotlin for the developer.

And I really think that, in some cases, maybe you have situations when you don’t see a very specific answer or a good answer for that—but it is the real part. We need to be respectful with Java because, at the beginning, without the experience of Java, we can’t create Kotlin. Not me—the Kotlin team—because obviously they use a lot of experience in Java, all of them. They are very experienced people with Java, and they try to see good parts of Java and take it into Kotlin, and they are increasing good parts.

So if we sell Kotlin as a killer—I think that’s disrespectful for the technology, because it’s not. The way to do that sometimes is actually a compliment for the technologies, the architecture, the projects. But the other part is: more than sometimes it’s just your preference—how you prefer to write the code in this way or this one—because maybe you don’t have very clear architecture or this paradigm to use it.

So that’s the other part. I always, in my way to express it, when you need to compare Java with Kotlin, I always try to be very respectful with that because a lot of the good things in Kotlin—maybe 90% of them—they are coming from Java, because Java is the first part of your experience. So I always try to keep that in mind when I write the book and put the examples. Obviously, we have a lot of parts better in this case, or very fast implementation—I need to say it, obviously—but with the correct words and the correct approach for the people. For me, that’s my line to follow to write the book.

And obviously, I want a couple parts that are really practical. In these cases, I use very simple scenarios and add complexity in the code. I start, maybe, for example, with just one definition, but at the end of the chapter you will see it takes more functions, because that’s the way to follow that. I don’t prefer to do anything by default because when I read books in the past—

You sometimes see magic code just appear on the next page. So I don’t like too much to say, “Hey, we have new code—just read it and move forward,” because I don’t think that’s the way. Obviously, in other kinds of books, on other kinds of technologies, maybe you need to do that because it’s a framework and needs to be—and you can copy the template because it’s too… but even in the necessary parts, I try to do that for the book. That’s my—maybe five or three points to follow when I write the book.

17. What do you hope experienced Java professionals will do differently after reading Kotlin for Java Developers, and how do you think this will help them tackle real-world challenges more effectively?

Ron Veen: Well, I think, again, like we said, this is really trying to explain to Java developers that Kotlin isn’t that different, because, again, 80% of your logic and things will still be the same. Yes, you will use a bit different syntax, but it’s still a syntax that’s quite familiar to you. You might use different collection functions, because there are different ones.

But I think the really important part here is that you’re actually starting to see the value, but you’re also starting to recognize when it doesn’t really matter if we do Java or if we do Kotlin—they’re both running on this brilliant thing, which is the JVM, right? The Java Virtual Machine that, in the end, runs the code.

So it is not as big a step as you might think it is when you move from Java to Kotlin—as opposed to moving from Java to Go, or from Java to Rust. The steps are really small, and I think our book just helps you write idiomatic Kotlin, where you can actually see, “Oh, right—I’m actually seeing what I should do.” And like José said, it’s not like translating it one to one.

Actually, I think JetBrains, in the IDE, they have the function where you can select a Java class and say, “Convert to Kotlin.” Well, it would technically work, but it still wouldn’t be idiomatic Kotlin, right? So I really hope that, with the book in hand, you would actually write Kotlin as Kotlin is meant to be—so I really hope that’s where we’re getting.

If you want to dig deeper into the mechanics of moving from Java to Kotlin—writing idiomatic Kotlin, handling null safety, using coroutines for concurrency, and taking advantage of features like extension functions and DSLs—check out Kotlin for Java Developers by José Dimas Luján Castillo and Ron Veen (Packt, Oct 2025). Written for experienced Java developers, it teaches Kotlin by mapping concepts directly to familiar Java constructs and then goes further into interoperability, generics, data and sealed classes, coroutines and flows, and DSL design—across backend, Android, and cross-platform development.

The C++ Programmer’s Mindset on Abstraction Costs, “Future You,” and Thinking with the Machine: A Conversation with Sam Morley

Divya Anne Selvaraj — Thu, 22 Jan 2026 05:13:13 GMT

C++ rewards engineers who treat problem-solving as a deliberate process rather than an improvisation. In this conversation, Sam Morley returns repeatedly to that theme: decompose the work until it becomes a set of solvable, “atomic” parts, then choose abstractions that fit the real constraints of the system. He argues that abstractions are never free, even when runtime overhead is low, and that good design means balancing competing costs: build time, cognitive load, flexibility, and performance. That same pragmatism shows up in his emphasis on leaning on the standard library, iterating from “working” to “fast” based on measurement, and understanding when low-level details like cache behavior and memory access patterns should influence how you structure code.

Morley, author of The C++ Programmer’s Mindset and a research engineer with a background in mathematics who maintains a high-performance C++/Python library for data science, also frames maintainability as a problem of empathy for “future you.” He discusses writing code that can be understood months later, structuring systems with clear separation of concerns, and treating concurrency and memory safety as design problems rather than afterthoughts. Along the way, he outlines practical guidance on thread-safe architectures, where synchronization mechanisms go wrong, and how ideas from Rust’s ownership model can sharpen a C++ engineer’s instincts about lifetimes, pointer safety, and undefined behavior.

You can watch the full conversation below or read on for the complete Q&A transcript.

1: For an experienced engineer, what does adopting the C++ programmer’s mindset look like in practice? How does it change the way you approach complex software challenges?

Sam Morley: For experienced engineers—and probably some less experienced engineers as well—they’re probably using this framework of computational thinking already. The framework itself, as I came to discover when I was putting this together, is really a set of common elements that one finds you do when you solve problems. It’s less about the actual components of the framework and more about how one connects with the different mechanisms and features and facilities within and around the C++ language that make this an interesting discussion topic.

So, one might be quite experienced at solving software problems, but what we’re doing here is more about connecting those with a broader thinking about the system and the language—and all of the facilities around those—which is hopefully the additional knowledge that I’m imbuing. As I said, most people are already kind of familiar with this sort of framework, even if they’re not conscious of that fact.

One of the things I want people to take away is that it’s sometimes very helpful to really think about your process. So when you do solve a problem, look back and think: How did I do this? How did I break this problem up? What abstractions did I find? Where did I find them? Where were the common elements—things that I’d seen before? What were the things that I’ve never seen before? Try and make a mental note of those facts, because these are the things that will come up again. From a longevity point of view, it’s important to not only remember your solution, as it were, but also your process.

Because if you fall down the same hole every time, then it becomes quite easy to fall down it again if you don’t remember that there was a hole there. So if you document your process and think about your process—at least maybe not physically document, but mentally document the process that you go through to solve a problem—then you can remember these facts much better.

Moreover, if you’re experienced, then you should also be mentoring your more junior people, and this is also helpful for them. So it’s important, for lots of reasons, to think about what you’re doing and how you’re doing it.

And that’s one aspect of the book. The other is connecting it with the C++ language, the broader system in which it operates, and how you marry those two things together to make an overall, hopefully better, more efficient, and faster solution.

2: In your book you talk about breaking down challenges, and choosing the right abstractions to build the most efficient solutions. Can you walk us through a concrete example where this approach made a big difference?

Sam Morley: I want to start by challenging this a little bit. The notion that one can solve a problem without breaking it down into smaller parts is kind of folly. I don’t think it’s possible, really. You might not be conscious of the fact that you’ve broken it down into smaller parts, but you’re almost surely doing it. Even something as simple as doing some arithmetic—you might think that you’re just adding N numbers together—but really what you’re doing is you’re adding two numbers together, and then adding the result of that to the next one, and then adding the result of that to the next one. It’s an expanding-brackets problem. Whether you are conscious of this fact or not, the point is that you’re solving several smaller problems that look like the same big problem.

However, the way that you break down problems obviously matters, and some ways are more efficient than others. For that reason, it is a good thing to get into.

I want to talk about something that I did a few years ago now, which involved taking frames out of a very large number of video files, sending them to one of the ML services on Azure—so this was over a REST interface to Azure—and then we got the results back, and we had to store those on disk in files. This is a very meaty topic, meaty problem. There’s lots of different elements here: there’s loading all the files and then decomposing them into individual frames; then there’s sending all of those frames off to the Azure service; there’s getting the results back; and then there’s writing them to disk. So immediately there are four components to think about.

In trying to process this—once you’d expanded this to 100,000 videos or something, each with a few hundred frames—the numbers here are pretty enormous. In order to get them to and from the Azure service in a meaningful amount of time, we had to multiplex this. So we had multiple threads all sending frames to multiple Azure endpoints, because each of the endpoints is rate-limited, so you can only send, I don’t know, 10 requests a second or something to each of the things.

But actually, sending requests was not the bottleneck. Getting the results back was part of the bottleneck. The biggest problem that we actually encountered was writing the results back to the disk, because this was hundreds of gigabytes of results at the end of the day. What we ended up with was that we were getting results back from the service so fast that we had to build in some back pressure into this system to slow down when we had a backlog of things to write to the relatively slow spinning-rust disks that we had.

So there we have a very interesting structure. We start off with these four big components. Within the first component, we have reading video files from the disk where they were resident, decomposing them into a number of frames that was then passed on to another subsystem, which was responsible for sending these frame things up to the Azure service and waiting for the results. This was quite carefully orchestrated so that we didn’t hit the rate limit—so figuring out how to do that was an interesting problem.

Then there was another component which was taking the results that were being returned from the Azure service and collecting them into a buffer. This was the sub-problem of figuring out that we were thrashing the disks trying to write all these results out. So we installed a buffer in between. We wrote into a buffer and then had another worker process that would take the things from the buffer in big chunks and write them out to the disk.

So there was this sort of filtering down of problems. You start off with big, challenging, meaty problems at the top, and then each one of those gets decomposed into smaller bits, and then smaller bits still, until you reach a level where you either have an existing algorithm to do it, or it’s some functionality handled by a library, or it’s some other kind of interaction with the world. Talking to Azure, talking to a disk—these are sort of base-level problems that you can solve quickly.

It’s all about bringing down the level of the larger problem to these small, atomic things which you can actually solve using facilities that you have. That’s the real challenge. But I don’t think that you could just write a singular piece of software that would do all of these things together without breaking it down into these components. I don’t think that’s possible.

3: Abstraction in Detail (Chapter 2 of your book) covers when to use different language features—simple functions, classes, templates, etc.—for a given task. How do you determine the appropriate level of abstraction in modern C++?

Sam Morley: Abstraction is tricky. It really depends on the purpose of the code. What am I trying to achieve with my code? I want to upfront say there are no zero-cost abstractions. People will claim up and down that things are zero-cost abstractions. They’re really not; every abstraction has a cost. Now, this might be a runtime cost, which is what people usually refer to, but that’s not the only type of cost.

Templates, for example, have very little runtime cost, but they do have a significant build-time cost. Including lots of templated code might make your runtime faster, but it will surely expand your build time. They also have a pretty heavy cognitive load—the ability of the programmer to reason about programs which are heavily templated is significantly higher than just ordinary plain C++ code with no templates.

So getting that balance right—what am I trying to achieve with this code? Is this supposed to be a set of components of high-performance systems that really need to have the best possible runtime performance, and I don’t care about build time? Or is this a general-purpose thing that needs to be extremely flexible, and I do care about the cognitive load of people who are going to work with this task? It’s all about balancing the different competing costs and also competing utilities. How flexible is my system? How fast is my system?

Now, it’s not necessarily true that abstractions are always bad. Sometimes you can use an abstraction and it adds very, very little to any of the loads. For example, introducing a very small templated helper function is very useful and it adds basically no overhead, and if that’s used correctly it can be a big help to the program.

But sometimes—and I’m especially guilty of this—you can over-abstract. I’m a mathematician; we like our abstractions. You can over-abstract and make the thing more complicated than it needs to be, and at this point you start to lose something. It might be runtime performance if this is a virtual class hierarchy, or it might be build-time performance if you have heavy template code. Or it could be that you no longer can reason about your software because it’s now so complicated and filled with all sorts of clever bit-hacking tools and abstraction mechanisms that you no longer understand. It’s finding a balance.

Now that I’m conscious of this fact, I try to keep my abstraction as minimal as possible. I look for the minimal amount of abstraction I need in order to solve the problem without going too far and over-generalizing it. This has come back to bite me recently: I over-specified an interface to the point where it only satisfied the conditions in one very specific instance, and I had to rework the entire interface to make it fit the actual thing that I should have programmed against in the first place. It can come back to bite you. Hopefully that doesn’t happen very often, but when it does, it is always painful.

This is why thinking about the abstraction up front is important, and it goes hand in hand with the way that you decompose your problems as well—the thinking about what the abstractions might be if you decomposed it in a particular way versus a different way. If you pick one or the other and it turns out not to be the right choice, then now you understand that the abstraction was made, maybe the problem—and the more problems that you solve, the more you get used to this idea.

4: What guidelines help decide when a straightforward function is enough versus when you should introduce a class or a template to solve a problem?

Sam Morley: Yeah—if you start with a simple function and you make it a class template or you make it a function template, like I said, that can afford you a lot of flexibility at very low cost. I do this sometimes internally. For instance, I have a function which does something to a pair of integers, and I don’t know exactly what type of integer I want to use later on, so I just template it. Because the cost of doing this is basically nil, it means I don’t have to go back and refactor my code later when I change my mind about what integers I use. That kind of thing can be very low cost and high maintainability—high friendliness when it comes to programming.

The cost of moving from a simple function to a class is higher, especially if that class has virtual functions—if it’s abstract in the other sense of the word. If that’s the case, then you’re now incurring a runtime performance penalty, which may be warranted. Runtime performance penalties are not always a bad thing. As long as they’re away from hot code—the bits of code that need to run at maximum speed—you can get away with an awful lot of slop when it comes to runtime cost, especially in instances where the bandwidth and runtime latency is limited by some other factor, like a network connection or a disk or something like that.

But really there are three reasons—or at least three reasons—why you might want to use a class instead.

The first is that you have some kind of internal state that needs to be managed carefully. For example, a std::vector or std::map manages its internal storage, and if you were to code this by hand in line in a function, you would almost surely get something wrong. These are managing the state very carefully, and you then don’t need to worry about those details. Your code is much more readable if you’re using a std::vector than if you have a bunch of goto statements for resizing a buffer when it overflows and things. This is not very nice code to read.

The second reason to use a class is if you have some kind of behavior that needs to be flexibly abstract. What I mean by that is: you have an interface which reads and writes data from some source, but the source of the data is unknown. You might be reading from a disk or reading from a network socket, and this is a really great place to encapsulate the reading and writing process because it’s the outward interface. The bit that you’re really programming against is the same in both instances. You have a read function; you have a write function. It doesn’t really matter how that is implemented behind the scenes, as long as those two things work. Wrapping this in a nice class doesn’t have to be a virtual class; it can be a template, or something like that—some combination of both perhaps. It is a very convenient way of packaging the behaviors that are specific to one mechanism for doing that thing.

The third reason is that you need some point in which you—or some other developer later on—needs a point of customization. This is a slightly nuanced point. C++ templates are very powerful, but function templates are a little bit more tricky to use sometimes than class templates, and the reason for this is the way that class templates can be partially overloaded, partially specialized, whereas function templates can, but not directly in the same way. So it’s a really powerful technique to use a class template inside a function template that allows you to provide a different specialization that will customize the behavior of the function template without directly interacting with the code within. This is a very nuanced use, I contend, but it is very useful. I’ve used this pattern a few times in my code, and I’ve seen it in other code as well. I think the first time I saw this was in NVIDIA’s CUTLASS library, and I think I’d used it before that, but without being conscious of the fact that I was using this particular pattern. It is very useful, and I think it’s somewhat analogous to a sort of bridge or command interface that you might find in the Gang of Four, but with templates instead of virtual classes.

So those are my guidelines. If you have other uses, I’d be interested to hear what kind of reasons you would use a class rather than just sticking to a simple function.

5: One key to proficient C++ is knowing the standard library. How important is it for developers to leverage the STL’s algorithms and containers instead of writing their own from scratch?

Sam Morley: OK, so there are two things about the STL which are really important to remember. First is that it is a set of very flexible and very generic algorithms and containers for a very wide range of purposes. And secondly—and probably more importantly—is that it is there always and you can always use it. “Always” being a little bit tricky there—embedded developers, please don’t get angry with me—but for most C++ developers the STL is a sort of thing that you can rely on and use.

Whenever you need it, and generally speaking, these are very, very good, very high-performance facilities, and they can make your life much easier. So what the STL does, in effect, is make your development window smaller: you spend less time implementing standard things and more time implementing the difficult things. They raise the floor of what is the base-level problem that you can solve without thinking about it.

If you remember when I was talking through my example earlier, you have these layers of problems. You start with big problems, you make them smaller, you make them smaller, and eventually you get down to a set of problems—maybe not at the same level everywhere—but you get down to problems which you know how to solve using standard tools or libraries. So what the STL does is it gives you one level up from having to write those things for yourself. It’s one less problem to solve, and this means you can move much faster. You can develop much faster.

Now, they might not give you the performance that you need. You might have to change the way that these work in order to get the performance that you need, but a large amount of the time the STL will probably give you all the performance you need, providing that you’re using the right algorithms and containers. OK, but that’s a separate question. The thing that it does do is speed up the development cycle.

If you implement something from scratch the first time and it doesn’t perform as well as you need, then fixing that might become problematic. And moreover, you might not know whether that actually is a bug source, or whether there’s some characteristic that you missed somewhere else in the problem, or whether this new thing that you’ve implemented is causing the problem. You can get around that with testing and things, but really, if you’re prototyping something, you might know that you can’t use those things in the future because they won’t perform well enough. But building it with the STL things at the beginning is the right way to get started, and it means that you can find a solution. It doesn’t have to be the best solution.

Solving problems is an iterative process. You don’t always find a solution—let alone the best solution—the first time round. You probably have to take many bites at the apple. So first you solve the problem, and then you make it fast. And only by measuring do you know which bits are not fast. So starting with the STL will probably get you most of the way, and you’ll probably find that other parts of your software are the slow parts.

Now, there are some caveats. First, a lot of libraries provide faster, or slightly more flexible, or things with different properties which are basically drop-in replacements for the STL. For example, Boost containers are a set of more expanded and more flexible container types that are drop-in replacements in most cases for STL equivalents. Abseil has the same set of things, and probably other libraries too. These are really great if you’re already working in, say, a project that’s using Abseil—you already have all of those container types at your fingertips—and sometimes they do perform better. And things like small inline vectors are extremely useful for a lot of things, and both of those libraries provide such a thing.

Now, the other side of that is the algorithms. Similarly, there are other libraries that provide standard STL-like algorithms. NVIDIA Thrust is one that comes to mind. This is parallel algorithms. C++—I think 20 or 23—introduced these different dispatches for the standard algorithms, which causes it to run multi-threaded or to do it on a particular execution context, I think they’re called. Thrust was sort of prior to that, and it’s specifically geared towards running on NVIDIA GPUs and NVIDIA libraries, but it’s the same set of functionality, actually. It’s a set of very general-purpose algorithm template functions which dispatch very cleverly through various pathways to give you a fast implementation of whatever that algorithm is doing on whatever device you’re doing it on. And it’s a very clean and efficient way of writing very parallelizable, very general-purpose code.

There is one more caveat that I want to mention, and that is that writing custom containers is a very dangerous game to play. Writing containers is hard. There are so many things you have to keep track of. You have to keep track of the construction and destruction of your elements. If that’s not a trivial thing, that is something you have to be very careful of. If you’re doing bulk allocations, you need to be careful that you have properly moved everything, and how you handle the errors. If something goes wrong during the copy, during the allocation, how do you unwind that? What guarantees can you give to the outside world—the rest of your program—about how that process happens? And moreover, how do you efficiently move things from an old allocation to a new allocation?

These are all very complicated and difficult things. I’m not saying that people aren’t capable of doing it, but I am saying that it’s very difficult to get right. If you are reimagining containers, then you should be asking why rather than how. There are genuine reasons to use different containers, but I don’t think you should be implementing them necessarily yourself. I would reach for a standard container library—like Boost or Abseil containers—and rely on the work of a lot of people to maintain those good implementations rather than trying to hack together something yourself.

6: Do you find that mastery of the standard library is a distinguishing factor in how efficiently developers can solve problems in C++?

Sam Morley: It surely can be. This goes back to the notion of what is the smallest problem that you know how to solve without thinking, and having a very good understanding of what is in the standard library—what the things in the standard library are capable of delivering, and how you might reasonably do that—will certainly raise this floor.

If you know that the standard library contains binary search functions, for instance, then that immediately is taking the place of having to solve a problem of how you binary search through something. Obviously this is a very well-understood thing; it’s just an example. But knowing how to make use of some of the more tricky and multifaceted std algorithms—for example, transform, reduce—knowing how to make use of that efficiently will make the range of problems that you can solve without doing a lot of hard work yourself quite a lot larger.

However, it’s not necessarily true that you can’t be efficient without the STL. You can absolutely be very, very productive—productive is probably a better word than efficient. The factor is speed, speed and convenience. Like I said, the STL allows you to get going very quickly because it’s there, it’s ready to use. You don’t have to worry about linking or importing or doing anything difficult. And moreover, you don’t have to worry about licensing and things, which do come up occasionally. It’s there ready to go, and you can just use it. So it makes a big difference to how quickly you can deliver solutions.

It also makes a big difference in how quickly you can iterate on solutions. If you build something that works but is slow, then you can make it faster. I don’t think it’s necessarily important for you to use the standard library exclusively. If you’re already working in an ecosystem that provides standard-library-like abstractions, possibly more flexibly, then by all means use those things. If you always have Boost available to you, then use Boost. Boost also provides a great set of many, many more features besides what is in the standard library, and making use of those things will also enhance your productivity.

Similarly, if you’re in Abseil, then use Abseil. But you still should keep track of what is in the standard library, because if you move away from a project where you’re familiar with Boost, or familiar with Abseil, familiar with Folly, or whatever library stack you’re using now might not be the library stack you’re using tomorrow. The STL is a constant factor. If you’re using C++, you more or less always have the STL, so having it in the back of your mind all the time is always a good idea. And it certainly will make you faster—not necessarily in code execution time, but certainly in the development time.

7: C++ is a multi-paradigm language with many powerful features, some of which can be a double-edged sword for maintainability. Since the goal is to build scalable, maintainable solutions, what best practices do you suggest to keep C++ codebases clean and manageable?

Sam Morley: Yeah, this is a tricky question. There are, of course, a lot of general-purpose good practices that apply here—things like documenting your code and leaving lots of comments about how your function operates, what guarantees it expects, and what guarantees it gives, and understanding that.

Before we jump into this, I want to introduce the notion of “future you.” Future you is your future self, and for all intents and purposes, this is a different person. Because when you’re writing some code, you understand things in the context of what you’re doing at the moment. Future you will have lost this context. So when you come back to your code in a month, six months, a year’s time, and you look at it and you think, “What was I thinking to make this code?” almost surely the answer is, “I don’t know.”

So writing comments is not just for other people—it’s also for yourself. You don’t have to go overboard and say, “I add these two numbers together,” because that’s not a useful comment. But I’ve taken to doing this quite recently where I’ve been working on some very intricate mathematical expressions and processes: I’ve taken to writing very big, chunky block comments. It’s like, “Right, OK, this is where we are in this process. This is how the next set of things works. This is what it should do. This is broadly how I’m going to implement the algorithm to do this.”

These comments save me so much pain when I jump off the project for a week and then go back and have to remember exactly what I was trying to do. It takes you a few minutes to sit and think about what that thing was, but that’s time well spent because now you’re thinking about the problem. This is where you can do some of this work of breaking down the problem—abstracting, finding common patterns, things that you recognize, things that you know how to implement—and then you should be able to spot those elements in the thing that follows. Doing this work in the code, in the body of the code, will keep it there so that when you come back to it, you can remember what you were thinking.

And moreover, this also applies to other people—not just future you. But that’s general-purpose advice.

Specifically for C++ things, and more with scalability in mind: having a very strict separation of concerns is a very good idea. You want to keep code that does numerical computations away from code that talks to users. You want to separate different functionalities as much as possible, and ideally you want to test those in isolation. Having a very modular, very pick-and-choose kind of situation will really help with that.

Sometimes it’s not possible to do this easily. Sometimes separating things can be really hard work. But being able to test and benchmark your high-performance components in isolation can really help you understand what they’re doing, how they’re doing it, how fast they’re doing it, and make sure that everything there is correct before you integrate that into the rest of your program.

It also means that if you’re doing some work that involves distributing large computations over a large cluster or on the cloud or something, you can write the different distribution mechanisms separately and then just reuse your tight-loop computation routines inside those. So it affords you a great deal of flexibility to modularize your code and separate them into separate libraries, or even just separate namespaces within a library. These kinds of things can make a big difference in the way that you can test and run your code.

A couple more points: you should always pay attention to thread safety, even if your application is not going to be multi-threaded. You should be thinking, at some point, this might be multi-threaded; I might need to access this class, these class members, from different threads—so how do I make sure that that’s a thread-safe thing to do?

And the third thing is to make sure that you keep your build system clean. I use CMake, typically. Make sure you keep that clean, and keep it in a way that is easy to see what the individual components are. Moreover, if you need to extract bits and put them in their own library, make sure that’s an easy process, because build systems can get left behind, and having a broken build system is far worse than having broken code. It’s much harder to figure out what exactly has gone wrong if your build system is broken. So those are my points.

8: When using advanced features like template metaprogramming, clever lambdas, or other C++ “power tools,” how do you ensure the code stays readable and team-friendly rather than turning into an overly complex “wizardry”?

Sam Morley: Yeah—I mean, wizardry is the right word. I’ve seen some horrendous template metaprogramming in my life. I’ve written some horrendous template metaprogramming in my life. I’m going to be the first one to admit that it’s never worth it.

Generally, I stay away from template metaprogramming nowadays. The need for it has diminished somewhat with concepts and constexpr functions being part of the standard now, and the amount of flexibility that those afford you going up. The need for very complicated template metaprogramming has gone down.

There are other reasons, of course. Templates are very expensive from a build-time point of view. Instantiating a complicated template metaprogramming construct can easily double the compile time for a particular C++ file. And that’s not healthy if you’re building 10,000 of these—that’s a lot of time. There’s a good reason why Google, when they wrote Abseil, kept their metaprogramming to an absolute minimum. They’re very explicit about this fact. It’s because the compile-time costs are just too high.

And moreover, going back to the “future you” idea: if you write template metaprogramming code, future you will have a hard time understanding it, because it’s one of those things that makes sense while you’re writing it, and then it becomes immediately impenetrable. So I would stay away from template metaprogramming as much as possible. There are some isolated things that are useful—like using SFINAE to enable or disable particular instantiations of templates and things—but always keep that as minimal as possible.

For lambdas, lambdas are interesting because, used correctly, they can really enhance the readability of your code. They can really make it much easier to understand. On the flip side of that, they can really, really make it hard to understand what the code is doing. So my general advice for using lambdas is: keep them relatively short, and avoid having lambdas which capture and modify values that are a long way away.

What I mean by that is: suppose you have a big function that is performing some kind of calculation, and at the top you have a couple of lambdas which capture a row number. Let’s say you’re doing a matrix multiplication. It captures a row number, and the lambda accesses data from a particular row and then advances the row number. Now using that lambda will always cause confusion because the row number is a long way away from where the lambda is used. So every time you think, “What is this lambda doing?” it’s modifying something that you’ve not looked at for a long time because your screen has been further down the page.

Done correctly, this can be quite a powerful pattern. Done incorrectly, it really is a hindrance to you remembering what your code is doing. Almost surely in this instance, if you have a value which is initialized and then only ever modified or used by a lambda, it would almost surely be better encapsulated in a class of some description separately, so that the dependency on this thing—and the fact that this is a value that’s only modified or used by the class—is very explicit.

So that’s my thoughts, but that only really applies if the lambda is modifying a value. If it’s just capturing and doing something to it, that’s different. One of my favorite uses of lambdas is to capture a pointer that’s come in as a span or something that’s come in as a function argument, and then return particular subspans or particular elements from that span. For writing a matrix multiplication, for example, you might want to return a submatrix, or you might want to return a row or a column, and using a lambda for that purpose is really helpful because it saves the amount of work that you have to write again and again. And also it’s not modifying anything. Modifying is the problem.

As soon as you’re just returning a particular row, a particular column, or a particular element, that’s less problematic. In the past, you probably would have used a macro for doing these kinds of operations, but this is just C++. We don’t use macros anymore.

So those kinds of uses are fine, but I would generally try to keep your lambdas very short—and if they do need to capture things, remember the locality in the code of where you’re capturing from, and try not to let that drift too much.

9: Let’s talk a little bit about performance, concurrency, and safety, specifically in C++. You have a chapter in your book on understanding the machine, covering topics like modern CPU architecture, memory errors, SIMD instructions, and branch prediction. Why should today’s C++ developers care about these low-level details?

Sam Morley: OK, so let’s think of it like this. Suppose you are driving down a road. If you’re going along an unfamiliar road, you have to drive slower. You don’t know where the turns are. Suppose it’s dark—you don’t know where the turns are. You don’t know what the traffic is like. You don’t know what the road condition is like, so you drive slower to be cautious. And this is what writing code without thinking about the system is like. In this world, the system that you’re running the code on is the road, the code that you’re writing is the car, and you’re thinking ahead about what the road conditions are going to be like—although you actually know what the road condition is going to be like in a lot of cases.

And in those conditions—like if the road is flat and straight, the road condition is good, there’s good visibility, there’s little traffic—you can go faster. And this is really what understanding the machine is all about: understanding how the different levels of cache interact, and how one retrieves data and then operates on it efficiently is a big part of how you make applications fast. If you ignore the cache, the code will work, but it will be much, much slower.

So, for example, most people in computer games have this discussion of structure of arrays or arrays of structs. The pattern is very simple. If you have, say, a set of objects inside your game, do you put those in a vector of structs, where the struct has all the different properties—like position, velocity, mass, whatever—or do you put them in separate arrays? One array for positions, one array for velocities, one array for masses, and so on. And this makes a big difference because of the cache and also because of vectorization. If you’re going to operate on positions only, then having a contiguous set of positions in memory means you can fetch them and operate on them very efficiently. Whereas if you have an array of structs, then you’re fetching positions but you’re also fetching velocities and masses and all the other stuff that you don’t need at that point in time, and you’re wasting bandwidth and you’re wasting cache.

So that’s one of the really classic examples. Another really classic example is matrix multiplication. Matrix multiplication is interesting because, in one direction of your matrix, you’re accessing data sequentially, which is really good. That’s really great for cache hierarchy. In the other direction, you’re accessing it with a huge stride, so the elements that you touch as you move from row to row down a particular column are far apart in memory, so you have to go a long way between these elements. So this is really bad for cache locality.

In order to address this, you do tiling. You take a small chunk of your matrix and use the data in that as much as possible so that you make the most of those expensive load operations, and you do as much operation as you can on that small tile of matrix. Then you move to the next tile.

In the book, I show a very marked improvement over a very naive implementation—it’s like a factor of four or something—and this was the point at which I started to engage a bit more with the pipelining and SIMD parts of this. You can dramatically speed up.

And if you want examples of this kind of thing, FFTW is a really great code base to look at. It’s a very difficult code base to read because it’s a C code base and it’s full of macros, but you can spot some elements of what they’re doing. The pipelining is the process they’re using, and this is to sort of hit the compiler with all of these things so that it can stack up all the other operations and make the execution much faster, because it stacks all of these things up at once rather than having this situation where, “I need this value, but now I have to wait for it.”

Also, they will use lots of SIMD operations and vectorization at the end. So that’s where I would suggest that people look. This is prevalent across all compute domains. It’s just about understanding what is the limiting factor in the performance of your software and then having some knowledge of the underlying computer—or whatever system you happen to be operating on—and really making use of every part of that.

For machine learning, for example, the models are huge now—billions of parameters, trillions of parameters even—and throughput really matters. Taking an extra microsecond to do a computation might not sound like much, but those micro-efficiencies really make a big difference in the long run. For general-purpose compute, if you’re interacting with a disk, or interacting with a network, or interacting with a user, then those details might not matter because you’re limited by something else. So it’s all about understanding where and when it’s appropriate.

10: Can you share an example of how understanding hardware behavior can guide a C++ programmer to write more efficient or optimized code?

Sam Morley: Well, I mean, OK—this structure of arrays discussion is certainly one example of this. I come from a sort of scientific computing, high-performance compute for machine learning kind of background, or at least that’s where I am now, and here I always have to think about this.

One of the real classic examples of where you really need to understand these things is matrix multiplication. Matrix multiplication is interesting because, in one direction of your matrix, you’re accessing data sequentially, which is really good. That’s really great for cache hierarchy. In the other direction, you’re accessing it with a huge stride, so the elements that you touch as you move from row to row down a particular column are far apart in memory, so you have to go a long way between these elements. So this is really bad for cache locality.

So in order to address that, you do tiling. You take a small chunk of your matrix and use the data in that as much as possible so that you make the most of those expensive load operations, and you do as much operation as you can on that small tile of matrix. Then you move to the next tile.

This is something that you have to think about if you’re writing high-performance code, because you can’t just write the naive triple loop and expect it to be fast. It will work, but it will not be fast. If you want it to be fast, you have to structure your computation so that it plays nicely with the cache and the memory hierarchy. And the same kind of thinking applies to lots of other algorithms as well.

So that’s a really quick example of how understanding hardware behavior—specifically cache locality and memory access patterns—can guide you to write code that’s much more efficient.

11: Your book also delves into parallel computing and even GPU programming, which is notoriously difficult with pitfalls like data races and deadlocks. Coming back to the mindset aspect of things, what mental models or strategies do you recommend for designing multi-threaded C++ applications?

Sam Morley: Yeah, thankfully modern C++ really does make this a lot easier. There are two different scenarios I want to highlight.

The first is where you have a large amount of data that you need to process and you want to do this in parallel. Now, with some caveats, this is relatively safe to do in a multi-threaded environment because you just give each thread a different range of values to operate on. There’s never any overlap, and each thread goes away, does its work, and the results are put in the buffer, and there’s no overlapping. There are no data races; there are no problems there.

And this is a safe thing to do, and it’s very easy to do with parallel algorithms or OpenMP and things like that, which will do a lot of the hard work of checking that these things are not violated for you. Setting up the problem so that it works rather than the other—there are some conditions on that. Operating on self-referential data, or data that refers to other parts of the data, is obviously going to cause problems. But that wouldn’t be an appropriate usage of those things anyway.

The other type of multi-threaded environment that you might have is where you have several worker threads that are handling different events within a bigger system, and here you have shared state. So each of the threads has some kind of global—or inter-thread, at least—state that they need to access. This could be for communicating between threads. So, for example, you might have one worker which is dispatching work to all of the other worker threads. This would be your main thread stacking up operations it needs performing, and the typical way that you would do this is with a queue.

So you’d have a thread-safe queue that you put work into. Each thread comes along, queries the queue, and says, “Is there any more work for me to do?” If so, it takes the job out and works on it in isolation, and this operation is thread-safe. It has to be thread-safe.

But also, you might have some global configuration or some kind of global data that you need to access everywhere. And there it becomes really important to understand what it really means to be thread-safe. Thread safety is a tricky thing. You need to understand where things can be mutated, who has ownership over particular things, and where that ownership can change.

Ideally—and this is something that will come up later, I’m sure—you want to have this model where only one place in your code—one thread, one function, one whatever—can modify a value at any given time. This can be achieved in one of two ways. Either you design the architecture of the program so that one thread can only ever touch one value—this is the distributed data type model—or you have a synchronization mechanism like an atomic or a mutex-locked value, or some other kind of thread mechanism for controlling access to a particular resource.

In the latter case, it’s very easy to get this wrong. Deadlocks can happen. You can still end up with data races if you use these things inappropriately.

So what I would suggest is that if you do have to do multi-threaded code, you read very carefully the documentation on cppreference or some other equivalent source for all of the different synchronization mechanisms that are available in C++, and you really try and understand what each one of those things is for and how it operates. Then you’ll be much better equipped when you are trying to design a class that needs to be shared between multiple threads—how you manage the mutability. That might be in a mutability—mutable values within the class—or exterior multiple mutability where you need to take a mutable instance of the class and actually do something with it.

Ideally, you need all of that to be thread-safe, and knowing what the different options are will enable you to actually write this code. Hopefully that will mean that you don’t have deadlocks or data races. Always test your code.

12: Robustness and security are critical in systems programming. With C++’s manual memory management and undefined behavior guarantees, how can C++ engineers improve the safety of their code?

Sam Morley: Go and learn some Rust. I know a lot of C++ programmers turn their nose up when Rust is mentioned, and generally the feeling that I get from a lot of people is that, “Oh, we don’t need Rust. We can do all of this in C++.” But that’s not the point. The point is that Rust has a bit of a learning curve, particularly for C++ developers, because they go into it with a C++ attitude, and the Rust compiler isn’t having any of that.

The Rust compiler forces you to think very carefully about ownership and lifetimes, and whether it’s safe to move things from one thread to another. That’s its whole design: managing access, the validity of values across an entire system, and very carefully managing the enforced properties—whether it’s safe to send things or share things between threads. They have these two traits called Sync and Send, which basically determine whether you can share things or send things between threads safely.

The same applies to async programming. Even if you’re not using multiple threads, you still need to think about this for async programming as well. Learning a bit of Rust will force you to think about these things up front, and many other good things that you should definitely think about—like unsafe code. These are things that C++ programmers sort of take for granted without actually thinking about what they’re doing.

When is it actually safe to dereference a pointer? The answer is almost never. It’s almost never safe to dereference a pointer. That’s fundamentally an unsafe thing to do. You don’t know where that pointer came from. You may do, but you don’t really know where that pointer came from. You don’t know whether it’s valid or not. These are things that you have to reason about as the developer.

Rust forces you to think of this as an unsafe operation, and because of that you’re far more cautious about actually doing it. And these concepts—this way of thinking—is transferable. Learning Rust, learning a bit of Rust, will make you better at writing safe C++.

The reverse is not true. Learning C++ will not make you good at writing Rust code. In fact, it will probably make you very frustrated. But getting over that frustration and understanding why Rust enforces these things is important, because these are the same principles that allow you to write safe code anywhere, not just in Rust.

13: Are there any particular practices or modern C++ features you advocate for to prevent things like buffer overflows, memory leaks, things like that—while retaining the performance and control that C++ offers?

Sam Morley: Yeah, absolutely. I mean, it’s not exactly a new feature, but using std::array rather than C-style arrays is definitely a huge win. Smart pointers mean you don’t ever manage memory by hand.

There are some cases where you might actually do this, but most of the time, writing operator new in your code is an anti-pattern by this point. Use a smart pointer; use a container.

The mantra of my containers section is: just use std::vector. It applies most of the time. And use std::span rather than using raw pointers or C-style arrays for passing data around. It adds this extra sort of memory safety—and yes, it does carry a small runtime performance cost, but that’s negligible compared to the risk of your code crashing out because of an invalid memory access, or producing—worse—producing garbage and it going unnoticed.

The best-case scenario for a bad memory access is a crash. That’s the computer responding to a bad thing. If it goes unnoticed, it could happen for months before you notice that this has been producing garbage the entire time, by which point you’ve wasted months. So those are the things that I would reach for first.

But the other thing is: stop using C functions. The C functions that existed a long time ago have numerous documented vulnerabilities in this sense. gets—the function from the C library which does an unchecked read from standard input to read a line of text—is fundamentally unsafe. I can make a line of terminal input as long as I need, and that’s a sure way of getting a buffer overflow. There are safer equivalents, but generally speaking, don’t use the C library if you can avoid it. It’s not safe, and using it will always cause some problems somewhere—especially the I/O functions like gets and puts and sprintf and things like that. These things you have to be very, very careful about.

14: Let’s finally talk about your book, The C++ Programmer’s Mindset. You’re both a research engineer and a mathematician, and you maintain a high-performance C++/Python library for data science. The book itself combines practical insight with academic rigor. What drove you to write The C++ Programmer’s Mindset. Did you observe a gap in how C++ developers approach problem-solving that you wanted to address with this book?

Sam Morley: So it’s an interesting question. Going in, of course I had to do a bit of market research around this, but my feeling was: I like solving problems.

The main motivation for me writing this book was to share my feelings about solving problems—my enthusiasm for solving problems. There will always be a new problem to solve. You’ll never—almost surely anyway—you will never encounter a situation where you’ve solved all the problems. There will always be a new one, and it will be interesting because it’s new. And the more problems you solve, the better you get at it, for sure.

But this is not just a passive process. As I mentioned at the beginning, a lot of people are doing this process of computational thinking using this framework that we described. A lot of people are doing this without thinking about it, and one of the things I wanted to highlight in this book was: in order to get better at solving problems, you need to be conscious of what you’re doing to solve the problems. You need to think about what it is that you actually need to do and how you can do it—not just in the context of the problem, but in the context of thinking about the problem, understanding the problem.

And something else that I feel quite strongly about is that I feel like a lot of C++ developers could benefit from being conscious of the environment in which they operate—thinking about the operating system, the underlying hardware, thinking about what the different mechanisms that they’re using are, how those things are informed by and inform the problem-solving process.

Do I need a map, or do I need a hash map, or do I need a vector? These are design questions that are informed by the implementation, and those relationships are really what the book is about. It’s about thinking about the language, the hardware, the operating system—all of those things combined—in the context of solving problems, and how the process of solving problems is informed by, and informs, the choices that you make elsewhere.

So that’s the message that I eventually decided was going to be the topic of the book.

15: What mindset shift or new capabilities do you expect a seasoned C++ developer to gain after reading your book?

Sam Morley: Yeah—so, seasoned developers might feel that they already have a pretty strong grasp of solving problems, and this probably is true. A lot of very talented engineers out there. I would suggest, though, that everybody has something to learn. You don’t—you can’t ever know everything. So the sort of mindset shift is: you can’t know everything. So learn as much as you can from as many people as you can, and hope that that fills in as much—as many gaps—as you need. And so that’s the sort of philosophy that I would hope that seasoned developers would take away from this.

In terms of new capabilities, seasoned developers might already be pretty familiar with cache hierarchy and things like that. What they may not be so familiar with is this linkage between the problem-solving process and the implementation details and the other factors. The computers are complicated machines, so understanding all of these things is impossible, of course, but you can understand parts of it, and moreover you can tune your problem-solving process to fit what you have and where you’ll be working. It’s a two-way street, and that I hope is something that even senior engineers can think about while they’re reading.

One of the key things that I mentioned very early on in the book is this “future you” idea. That will be helpful for you in the future—future you—but it will also be helpful for less senior people who are learning this process for themselves, and being able to point out to them where and why certain parts of the process can be so tremendously helpful, and imbuing this understanding of how all of these different moving parts interact with one another can be really, really powerful. That is something that I hope that even a seasoned engineer can gain from this book.

To go deeper into the ideas Sam Morley discusses in this interview—treating C++ problem-solving as a deliberate process, choosing abstractions with a clear-eyed view of their costs, and connecting design decisions to the realities of hardware, build systems, and team maintainability—see The C++ Programmer’s Mindset (Sam Morley, Packt, 1st ed., Nov 2025). The book introduces computational thinking as a practical framework—decomposition, abstraction, and pattern recognition—and shows how to apply it using modern C++ features to build solutions that are maintainable, efficient, and reusable. Across small examples and a larger case study, Morley covers using algorithms and data structures effectively, designing modular code, analyzing performance, and scaling work with concurrency, GPUs, and profiling tools—aimed at intermediate C++ developers who want to strengthen both their technical toolkit and the way they approach complex software challenges.

Here’s what some readers have said:

Rethinking Test-Driven Development for the AI Era: A Conversation with Kevlin Henney

Divya Anne Selvaraj — Thu, 11 Dec 2025 05:11:40 GMT

Test-driven development sits in an awkward place in many teams: widely cited, unevenly practiced, and often misunderstood. For some developers, TDD is a niche technique that only applies to greenfield code; for others, it is reduced to “writing some unit tests” after the fact. In between those extremes are practical concerns about legacy systems, language ecosystems, CI pipelines, AI-generated code, and the day-to-day pressures of shipping software with limited time and attention.

In this Q&A, we speak with Kevlin Henney — independent consultant, speaker, writer, and trainer — whose career sits at the intersection of software design and everyday development practice. Kevlin works with companies on code, design, practices, and people; contributes to the Modern Software Engineering YouTube channel; and is co-author of A Pattern Language for Distributed Computing and On Patterns and Pattern Languages in the Pattern-Oriented Software Architecture series, as well as editor of 97 Things Every Programmer Should Know and co-editor of 97 Things Every Java Programmer Should Know.

Across the conversation, Kevlin unpacks why TDD adoption stalls even for experienced developers, the misconceptions that blur the line between “developer testing” and true TDD, and how tests shape design without losing sight of the bigger architectural picture. He talks through introducing tests into large legacy codebases, how language and ecosystem culture influence testing practice, and what distinguishes good, specification-like tests from brittle method-by-method checks. We also explore tooling choices, where TDD fits alongside integration, acceptance, contract, and performance testing, and how team leaders can sustain testing discipline under deadline pressure. Finally, Kevlin shares his perspective on AI-assisted development, the risks of outsourcing tests to generators, and why, in an era of increasingly automated code, testing and review skills matter more than ever.

You can watch the full conversation below or read on for the complete Q&A transcript.

1: Adopting TDD can be tricky, even for seasoned developers. In your experience, what are the main reasons that experienced developers struggle when first adopting TDD?

Kevlin Henney: I think there are different kinds of developers, and they will have different reasons for struggle. At one level, you are asking people to do something different from what they normally do. That is the first challenge. Just as a human being, that is always going to be difficult, particularly when you already have a set of habits in place. Regardless of how effective those habits actually are, we always perceive the habits that we have as being comfortable. That is why they are habits, and sometimes we have a justification for them.

So trying to get anybody to do something different from something they already do is going to be a challenge. The more experience you have, in this case, the more at a disadvantage you may be, interestingly. If you are relatively new to software development, then everything is fresh and every new idea is more likely to be treated equally by you.

But even then, we need to understand that a novice developer can sometimes struggle, and sometimes we have the issue with people who are in that overlap space where they are not necessarily formally a developer, but they do a lot with code. I am thinking particularly of data scientists and engineers who might not consider themselves to be developers, but who have worked extensively with Python and associated libraries such as NumPy, Pandas, and things like that. They are in the development space, but they do not necessarily have the insight of development culture and concepts, and often they have semi-effective workarounds that they have created, which get them by every day. The point is that for most people, this is the case.

When it comes to testing of any kind, we do not necessarily have as good a story for people as we do for creating a feature and doing a demo. These are very well-practiced within the software development space, and often videos and books will emphasize these, and there is much less on any kind of testing, let alone TDD. So testing tends to be more ad hoc. When you are trying to get somebody to do something like TDD, which is a very structured workflow, that is your challenge: you are trying to get them to do something different.

Then one of the other challenges is often the way that TDD is described. There is a simple mantra, “red, green, refactor.” You write a failing test for something, then you make it pass, and then you refactor. Although that is a very simple mechanical description, and it is not wrong, it is not very motivating. It leads to the reaction, “Why do you want me to write something that does not work?” That is not the right mindset. “Write a thing that does not work and then make it work” does not feel like a motivating mindset.

So I think that is another obstacle. Often the examples or the way that TDD is taught make a lot more sense to somebody who has expertise in it, or when you are coaching alongside somebody, than they do when you are just offering somebody the mantra. It is not compelling. I will be the first to say this. When I do workshops and training courses for companies, I will describe the red-green-refactor cycle. You need to know that. But then I go into it, I take it apart, and I say what is really going on.

At that point, it becomes easier to motivate. The first point is that you are not just writing a failing test. You are writing something for a behavior that you do not have. Because you do not have that behavior, of course it is not going to pass. But the goal is not simply to make it pass. The goal is to write what you want for the new behavior.

The next motivation is actually a simple constraint. In many cases, we can end up yak shaving or just running off into the horizon with complex behaviors, saying, “I will just write everything, and then I will come back and test it later.” If we do that, we often end up with things that are not as simple as they should be, and we do not ask ourselves the questions, “Is this what I need? Is there a simpler way to do this?”

So TDD is literally a limiting factor. It is like throttling back the instinct to just throw everything at the screen. Instead, you are going to take steps so that you understand every step and consider it. It is really a scoping mechanism. The idea is: now I am going to make it pass with something that is no more complex than necessary, so that I fully understand what the next step is going to be. I can guarantee that this is always working, but I am also going to give myself the opportunity for refactoring.

When explained like this, I am not going to say that it suddenly turns on all the lights, but it does make more sense. Then we move to the next level, where we say, “Let us forget the red and green. Red and green are side effects.” Your real goal is: tell me what you want to have working. Here is a piece of code. It has a certain amount of functionality. You want it to do something else. What does that extra bit look like? Show me an example. Somebody says, “Well, it should do this.”

“OK, great. Does it do that now?” “No, it does not.” That is why it either does not compile or it does not pass a test, because you are asking for something new. A test is a change request. When you describe it that way, a lot of people say, “Oh, I see. You are writing a change request to yourself. You are saying, ‘I want to have a piece of code that does this, but it does not do that yet. Here is a really concrete example of what I want.’”

Now I am going to work towards that, and I am done within a couple of minutes, and I can continue from there. The point is that you are not simply teaching somebody how to test, although there is a truth in that. You are actually trying to rewire how they think about the very act of coding, and that is hard.

That is why you will find that it is a skill. It is something worth practicing, and it is a practice that, once you have it, you can draw upon. It does not mean you have to do it all the time, but if you have never practiced it, how can you say, “That would be appropriate now as a tool or a technique”? You are trying to rewire how people think about the act of coding, and that is difficult. So you will meet resistance because of change, but also resistance because it is a fundamentally different way of doing something for which people already have some behaviors.

2: You have talked a lot about the mindset shifts required, and you said that adopting TDD itself is a skill. Are there any specific skill gaps you can point out that tend to be the biggest hurdles for developers who do not adopt TDD?

Kevlin Henney: Honestly, sometimes the problem in terms of skill gaps is simply testing itself, unit testing itself. In other words, developers do not have a habit of any kind for testing, or testing is something that happens later and sometimes in a mad rush. Therefore, the tests that are written are quite difficult to read, and people often have this idea of tests being second-class citizens.

Often you look at tests and you say, “Yes, they look like second-class citizens,” and people create tests that are difficult to maintain, sometimes because they have never been shown what a good test looks like. For many people, when they are learning TDD, they encounter the fact that there are many things they are trying to learn at the same time, and one of them is, “What does a good test look like?”

That is an issue. Picking up on what I said earlier, tests are specifications. There are many ways of thinking about testing, but the way that we are encouraging here is specifying. Your test should be an explanation, a description that captures intent, and it should have an example. The example is the centerpiece, and you want to capture the intent in the name. If you have a test that does too much, it is not a good test.

It turns out that many of these things are things that people do not already know or do. So in addition to the workflow, there is an additional skill: how do you write a good test?

The other skill that is often missing is that people do not necessarily have a good code sense or design sense. By that I mean they often do not know what good code looks like. When you say, “And now refactor,” they do not know what to do, because although they may have a refactoring menu available to them, and although they may know the meaning of the word, they do not actually know what “better than this” looks like.

So you end up with code that just gets bigger and bigger, with more ifs and whiles. That is not what I have in mind. Where is the simplification? They are not actively looking for simplification. That is a design skill, and that is quite difficult to teach.

Therefore, if you are going to make the best use of any workflow, and this is not unique to TDD, you need to be actively looking for good design. Many workflows suffer because people are not asking, “How do I make this simpler? How do I make sure I have less code to maintain in future?” You want to write less code. Your goal is not to produce more code; it is to produce less code.

The most common thing I see is that programmers do not know how to write less code than they need. They often go in with code like, “There is absolutely nothing wrong with writing too much code as your first draft.” There is nothing wrong with that. What matters is what you do with your second draft, and that is the problem. Many people do not have that second draft because they have not worked alongside somebody who can show them what that looks like.

You cannot expect this to be an act of magic. You start learning how to code and suddenly you develop a good instinct for the right balance and structure of a method, of a class, and what an interface should or should not look like to be effective and easy to change. Without helping people develop that sense, almost any workflow you throw at them is going to make things potentially worse.

We see that with AI: people who do not know how to code can produce a lot of code. They need to learn to produce less. You can use AI to produce less. The skill is to produce less that does exactly what you want, because then you have less that can go wrong and less to read.

This is something that I do not think we get across well enough. For me, TDD helps with that, because it always reminds me: “OK, now cook it down. You have this; cook it down.” You have tests; they work. You have a safety net. There is a skill there, which is very much code sense, both for the tests and for the body of the code itself.

3: What do you see as the biggest misconceptions or myths about TDD among developers and teams today?

Kevlin Henney: I do not know that there is necessarily just one, but there are a few. One is that you can, that it is only something that you can do with new code. Another is that, to be precise, it can only be used on a greenfield situation. Another is that your TDD is very much centered on your unit testing framework and things like that.

So there are these kinds of ideas, and we live and work in an industry where jargon is often thrown around and sometimes it is very imprecise. When something is described in a number of companies, “Oh yeah, we are doing testing,” that is great. There is nothing wrong with that. The code leads and the tests follow, which is a different workflow. That is perfectly fine. I am not here to tell people that TDD is the only way to work.

What I am trying to avoid is a kind of semantic dilution if “TDD” comes to mean just that developers are testing. That is great, but we would like to call that “developer testing” as a general term, rather than TDD, which is a very specific workflow.

4: Are there any particular false beliefs you frequently find yourself debunking?

Kevlin Henney: Oh, yes. There is one that has two sides to it. First of all, I separate out a couple of things. Sometimes when people are being negative about TDD, they are not talking about TDD; they are talking about unit testing. They are using “TDD” to stand in for unit testing in an environment where, culturally, within the organization, you do not test. That becomes a reinforcing thing. It is not about TDD at all in that case.

Then there is another one that does come from people who do practice TDD. Every now and then you will hear the slogan that TDD is not about testing, it is about design. I know what they are trying to do. They are trying to emphasize that testing is not just an act of verification. We often have this idea of testing as purely about verification, a kind of gatekeeping activity. But saying “TDD is not about testing” is not a true statement, and I always have problems when people present it that way.

At least half of my work is with companies who say, “We want to do TDD,” when what they really want to do is testing. TDD is a discipline, a workflow. You can tell when people are doing it. It is also the most extreme thing they have probably done in terms of testing discipline, so why give it a name that you also use for everything else? I always make sure people are aware there are many workflows.

What I try to make sure they understand is that when we say TDD, we mean something specific. It is not a magic spell. It is a particular way of working that gives you certain kinds of feedback and certain kinds of design pressure. Part of that pressure is on you as a developer to ask, “What am I actually asking for? Do I know what I want from the code?” Sometimes the honest answer is that you do not know what you want. That is a recognition of ignorance, that you do not yet have enough knowledge. At that point you may need to discover that knowledge, perhaps by spiking something or exploring, rather than pretending you are doing TDD when you are not.

Another part of this is the tests themselves. Some of them are actually quite large, and you have to ask, “Do I really want that? Is that genuinely helpful, or is that telling me something about the design?” Often the test is large because the design is causing that. If the interface feels very clunky, then that is telling you something about the design as well.

So as to what it feels like: testing is in fact the way you experience the design. Rather than looking at testing as a purely quantitative activity—“I got this percentage of statement coverage, I have done my job”—you can ask, “What does it feel like to write the test? What does it feel like to use the code I am providing?” If the answer is, “Yes, it is quite easy, it feels natural,” that is good feedback. If it feels like having a code review where you have to do most of the work yourself, then the tests are giving you a signal that there may be something up with the design.

5: There is an ongoing debate about TDD’s effect on software design and architecture. Some argue that focusing on small tests leads to fragmented design or lack of “big picture” thinking. How do you believe TDD influences software design?

Kevlin Henney: Hmm. I think it goes back to what I was saying about having this kind of design sense or code sense. If you are only ever going to think small, then yes, TDD will have those effects and you will end up with fragmentation rather than a cohesive design. That is one of the reasons it is quite important to make sure that you have a reasonable test hierarchy, that you are testing at all levels, and why, when you are doing this, you should always be taking the big picture view as well.

And this is, I guess, where the driving metaphor that is used extensively when talking about TDD becomes even more appropriate these days. When I drive, there are three places that I am typically looking. I am looking at the road immediately in front of me. I am also looking down at the dashboard to see what my car is telling me. And I am also looking at a map to see what the big picture is.

The problem is that I get the feeling that many people, and this is again not just a TDD thing, I find this with different roles in development, are only ever looking at one of these at a time. So it is like, of course, if you are only looking at the dashboard, you are not going to see what is in the road in front of you; you are going to slam into something. But if you are only looking at the things in the road in front of you, that does not tell you what the bigger picture looks like and what the trends are in traffic, for example. So you are not getting feedback at all these three levels. You are only ever looking at one and ignoring the feedback from the others.

So here is the thing. If you are using TDD and that has caused you to end up with a fragmented design, you are looking at the bigger picture. But also, whenever you are having design ideas, the idea is that when you are launching into TDD, you should have a vision of where you are going to go. The problem is that sometimes people do not actually have an idea of where they are going to go. I often have this thought of sketching out an approach. Do not commit yourself to detail. This is not a committed design; it is literally a sketch.

As a sketch, what you are going to do, what TDD is going to do, is fill in the details. For anybody who does draw, and I know that drawing is not a very common skill among developers, it is one of those things where I always ask people what they do. Music is very common. Gaming is very common, whether it is computer-based games or board games. Certain sports are very common. Drawing is not very common, but when you draw, you often sketch the form and then you put the detail in, but that detail sometimes tells you that maybe the form is not right.

So for me, people often launch in hoping that if they start drawing in the bottom right-hand corner a miracle will occur. If you are a brilliant artist, yes, a miracle will occur; you will produce a great picture. But sometimes people are not looking at the big picture. They always need to be asking, how is this going to be used, how does this affect that? So for me, I think that we can look at that.

When some people say, “TDD does not do this,” my answer is, “No, that is your job.” TDD’s job is to do the sketching. It is your job as the artist to see the bigger picture and say, “I am drawing the wrong thing,” or “Maybe that needs to be moved,” or to take the feedback. If you are only taking feedback at one level, that is great; many people take feedback at zero levels. However, you need to be looking at multiple perspectives. Some of them are closer and some of them are further away.

So I do not really accept the criticism that TDD causes this. I accept that there may be a misunderstanding of the role of TDD, that people are sometimes saying, “If I do TDD, magic will occur.” As I told my kids when they were growing up, there is no such thing as magic. There is you, there is you and a tool and a technique. That is it. If you are misapplying the technique, that is not the technique’s fault, so there is a learning opportunity.

6: When it comes to scaling TDD in a larger organization, what challenges do enterprises face in rolling out TDD across teams? Based on what you have seen, what strategies help make TDD stick in the long run at the organizational level?

Kevlin Henney: I think this one is more a case of, although I am very keen on TDD, I do not necessarily know that an organization wants to roll out TDD. It is a workflow practice, and I think if you can get that working within a team, that is great, but there is no reason that another team has to do it. I think it might not be helpful for an organization to be mandating these things.

I think what the organization needs to care about is more a case, not so much of the way that we are producing the tests, but the fact that, do we have builds that work together, do we have comparable testing philosophies across different teams? If you have a team that is doing a more traditional kind of “test later towards the end of the sprint” type approach, and let us say they are really effective and they have some really good design and their interfaces evolve really nicely, I would not mess with that. They are doing a perfectly good job, and because we have organized around teams, that does not really interfere. As long as our teams have some kind of alignment and relationship with an architecture, then I do not think there is a problem there to be solved.

What we do want is the idea that we have a consistent or a reasonably consistent and compatible view of testing across the organization, and that if TDD helps me get that, then that is what I should be encouraging. But I am not going to say that it is going to be the thing I should focus on. I think that what an organization probably wants to focus on at the organizational level is: if we have various build pipelines, do these build pipelines follow similar philosophies of testing?

Because a build pipeline that does not have any testing in it is not really, certainly I will be very careful here, I am using the term “build pipeline” because people will often say, “Oh, it is our CI/CD pipeline.” Is it? Are you doing continuous integration and are you doing continuous delivery? Because CI/CD is predicated on the idea that you have tests. In fact, to be fair, CI/CD is predicated on you doing trunk-based development and you doing a lot of tests. That is what that means. You can go and look at the original books and they are very clear on this. So definitely a lot of companies have build pipelines, but do they qualify as CI/CD pipelines? Not always, not from the strictest definition. I think that is more valuable.

So let us put it this way. I do not think an organization needs to worry about what people are doing in their homes, but they probably need to worry about the road system. In this sense, organizationally, when we look at software architecture, we need to be thinking of software architecture more like urban planning. We want to have consistent rules and models for the roads. We want to have a consistent layout, see what the issues are, and agree on things about roads and services. What people do in their homes and how they structure their homes and how they do it, I think that can be a lot more freeing, as long as we have the knowledge available and maybe one team can coach another and we can say, “You can become our enabling team; we are going to try this practice.” I think that is great, but I am not sure I want the organization to get involved in some of the more detailed practices that support what goes on. I think what goes on, what the output is, and how teams integrate is probably more important than specifically what they do on the inside.

I think what can make it stick is very much, let us build off what I have said. One of the things is that a team needs to feel that it owns its practices. Teams respond, and individuals respond, sometimes quite poorly when they are told what they are going to do and they do not really feel it. If a team is told, “You are going to do TDD,” that is not a way of getting them to do anything well.

If they can make it their own habit, if they can create it, if it is their decision, then that is really important, but also if they feel like they have learned something. Again, this goes back to this idea of, within any large organization – and this is obviously a question of different organizations and different scales – in any large organization we are going to find that there are different kinds of teams trying to produce different kinds of products.

Some people say, “We do not do TDD here.” Be very careful that when somebody says, “We do not do TDD here,” that this is not also, “We do not do testing here.” Again, going back to what we have already discussed, that is what I hear when many people actually say, “We do not do TDD” or “TDD is not appropriate for us.” They are actually using TDD to mean any kind of testing, and so therefore they are using the wrong word. They are actually saying something much more bluntly as, “We do not have tests.” If they said that, that would be far more direct and we could work with that.

We need to work out whether that affects us or not. If it is a team that is just prototyping and giving us the results of prototypes, then that is not important. If it is a team that is prototyping a design, yes, we want tests, because you are telling us that this code, which we do not know whether or not it works, is the basis for what we are going to build. Prototyping can involve TDD and it can involve tests. I have done that a number of times in the past. So really it is a case of trying to understand from the organizational level how to get the knowledge out there and make the knowledge feel much more natural.

For many people, any kind of unit testing habit is the challenge. Having tests that run quickly is the challenge, and I would address these questions. I would treat those as the questions to address, and what we may find is that TDD by example may follow, particularly if we have somebody from within the organization who has experience of that and that is how they drive it and that is how they show it and that is how they demonstrate it. Then that may become a lot easier.

Lead by example in this case rather than by mandate. Basically say, “Look, there are a lot of different testing workflows. Our objective is to get better testing, to make testing more convenient of any kind. Let me show you this. I am going to use a test-driven workflow.” Suddenly when you do that, that is much more open and I think people are more likely to adopt it. Whereas if we have somebody going around measuring different teams in a very obvious way, teams justifiably feel a little bit of resistance, offer a little bit of resistance there.

7: Legacy code is a reality for most teams. If a team inherits a large untested codebase, how would you recommend they approach introducing TDD or even more testing in that scenario?

Kevlin Henney: I think that is a really good question, because it matches a lot of people’s lived experience. The key point is that you have to prioritize. From where you are, perfection is impossible, so you have to look at what is possible, and that is going to be a little different for each codebase and for each team. A lot depends on whether you have what I would call a “maintenance mindset.” If you have that mindset, it is going to be very difficult to adopt TDD.

By “maintenance mindset,” I do not just mean software maintenance in the narrow sense. I mean the broader idea that “we are just maintaining whatever it is we do.” You often see this where initial development has been done in one location, and then the work is offshored to another group. The second group is told, “You are just maintaining it,” and people there may not think of themselves as doing software development. In reality, they are. There is no real separate thing called “maintenance” when it comes to software products. It is all software development. There is not “software development plus maintenance”; there is just software development.

So the first step is to reclaim the right words. You are doing software development. Everything you do has the potential to change the architecture. It is your responsibility not to preserve the problems in the existing codebase, but to eliminate them. “Maintenance” as stasis is not what you want. Your job is to be more ambitious: to make the product better than it was when you received it. How do you do that? One obvious obstacle is that you would love to test everything, but you have poor test coverage. In that case, do not try to test everything. Instead, decide how to prioritize what you test.

A useful way to do that is to look at what is going to happen in the next quarter. Suppose over the next three months you are going to add features in a particular part of the codebase. If that corner of the codebase is already relatively well isolated, then you lean into that. Reinforce the isolation. Make sure you have good automated refactoring tools available. Remember that your compiler will still catch many type-based errors. You can introduce separation and decouple tightly coupled code without relying on a large pre-existing test suite. You can lean on automated refactoring, appropriate review, and, as Michael Feathers puts it in Working Effectively with Legacy Code, “lean on the compiler.”

I have done this with teams: we deliberately ignore much of the rest of the codebase for the moment and decide, “We are going to make this part really good.” Once you have something isolated, it becomes easier to test. Unit testing and even integration testing are really about understanding isolation, loosening coupling, and improving cohesion of code units. Those are the practices that improve your code sense. Many developers these days do not have a clear understanding of coupling and cohesion. They get distracted by principle catalogs that are not very coherent. For example, the SOLID principles do not form a coherent set of design ideas; they are a bunch of things thrown together and they miss many important aspects. I know I will get comments for saying this, but I have been doing this long enough to say that SOLID principles are next to useless if you want to learn how to write good code. You are better off learning and reinforcing the fundamentals.

If you can isolate a small part of the system, that becomes your zone of “new development.” This is a bit like urban planning. In a city you cannot change everything at once, but you can change a particular district. Because that area is separated, you can make it good and benefit from that separation. That is one technique for allowing a team to claim territory and improve not only their testing but also their design. The important idea is that you are not just improving testing habits; you are improving the code itself. Testing and design are not separate activities. Treating tests as something separate is part of the misconception. Tests show you how the code fits together and whether your design is good. If you say testing is difficult, you are actually saying the design is difficult. That is useful feedback: “What do we change so this becomes easier?”

A practical goal to hold is that in six months’ time it should be easier to work on this codebase than it is now. That will involve more than just testing. It will involve changing code, build settings, and all kinds of small things. You are trying to improve the overall situation. Another way to prioritize is to “ask the system itself.” Treat the legacy system as having a body of knowledge and let it tell you what to focus on. If you have a million lines of code, a team of ten is not going to transform it overnight, so do not try. Instead, look at the system’s history. What changed? What keeps changing? Look at the parts of the code that change most often.

It does not matter whether those parts are changing for good reasons or bad reasons. If they are changing frequently, that is where you want to improve both testing and developer experience. If you are frequently changing something, you are more likely to break it, and you are also more likely to benefit from making it better. Parts of the system that are incredibly stable do not need the same attention. That does not mean they are automatically good. Some things are stable because they are terrible and people are frightened to touch them. But if they are not changing, they are no more broken than they were before, and they already “work” in the sense that people rely on them as they are.

So use the system to tell you what to change. The system already has an opinion, visible in its history and defect patterns. Do you have a heat map of where your defects are? That is where you want your tests. In that sense, you can use the legacy nature of the system constructively and positively. I think we often overlook that because it is not immediately obvious, but it is a very practical way to introduce more testing and TDD-like practices into a legacy codebase.

8: You have worked with a variety of programming languages, from C++ to higher-level languages like Python. Do you find that TDD plays out differently depending on the language or tech stack?

Kevlin Henney: Yes, it does, but not always for the reasons people might think. Sometimes it is more about culture than language features. Just as natural languages are associated with different cultures, programming languages have associated cultures, idioms, and toolchains. So you have the syntax of a language, but you also have the tools that are available and the habits that have grown up around them.

Culturally, testing as developer testing is far more prevalent in the Java world. There is nothing inherent in the Java language that makes it more amenable to testing than a language like Python, but testing is more likely to be present. That is because modern unit testing, at least in the popular sense, grew up around Java. The JUnit framework appeared in the late 1990s and was integrated with Eclipse. That made it normal for unit testing frameworks to be integrated into IDEs. Java was the language in which those practices and cultural habits were first formed. As a result, Java developers are much more likely to encounter JUnit and similar tools in an integrated environment early in their careers. In that sense, Java is “better suited” to TDD than Python, not because of the language itself, but because of the surrounding ecosystem.

Python, by contrast, does not have a single standard IDE in the same way. If you are working with Java, you are very likely using IntelliJ or a similar environment. If you are using Python, you might be coming from many different directions. If you are a data scientist, you have a different view of the world. Data scientists do not usually use Java; they use languages like Python. With Python you have people who consider themselves software developers, children who are learning to code, people who are scripting, people who are doing data science, and so on. There is not a single core culture, so you end up with disparate practices. Python itself also predates the period when automated unit testing became a strong habit. That is not to say Python developers do not test, but the cultural environment around Python does not have the same unified testing norm as the Java ecosystem. So in that sense, what you see with TDD or testing is often more about development culture, who is around you, and what information and tools are available.

If we move to C, or C for classic systems programming and embedded work, we see yet another culture. These are contexts where you are much less likely to find unit testing. If people are testing, they often test at the system level and not even at the integration level, let alone at the level of small units of code. So culturally that is an obstacle to TDD.

Then there are the language characteristics themselves. Python is a much “looser” language; it is dynamically typed, and that can actually make some aspects of TDD easier. I sometimes joke that when I am using Python, I do not need a mocking framework because Python is the mocking framework. Mocking frameworks were invented for statically typed languages like Java, where the language does not easily support meta-level behavior. Those languages are less elastic and less plastic. In Python, I can reshape almost anything. The language itself is a tool that can be used to modify itself. At that level, from a purely linguistic point of view, Python can make testing easier.

However, cultural habits can get in the way even there. For example, many Python developers, especially in more data-science-oriented contexts, have a habit of reading and writing files everywhere and accessing files in every function. That makes testing harder, and it is something I try to encourage people to stop doing. In C and C++, there are language constructs that encourage longer build times and more source-file dependencies. There are also design habits that do not lead to natural decoupling or obvious substitution points where you can say, “I can easily put something else here because the design allows it.” In those environments, you sometimes have to push uphill against the prevailing culture of the codebase to get to a design that is test-friendly.

So yes, languages can make TDD easier or harder, but only sometimes is that because of the language features themselves. Very often it is due to the surrounding culture: the design culture, the testing culture, and the practices that have grown up in and around that language.

9: The quality of tests is crucial in TDD. What are some best practices you recommend for writing good tests in a TDD cycle?

Kevlin Henney: That your tests should. So one of the techniques that I always think of is that your tests should be testing one concept, one idea. That does not mean they necessarily have just a single assertion, but they should have a single focus. What is the thing that you are trying to demonstrate? That should be easily summarized by the test name. This is one of those cases where naming something is not merely labeling, it is actually testing as design in this case, because it will cause you to create different tests if you use a different design approach or different naming approach.

My preferred habit is to use tests that are propositions. So let us just take this cup, for example. Some developers might say, “I have a constructor,” and they will write a method testConstructor, and testFill, or testDrink. What you are doing is you are just going through the shopping list of methods and writing a test, and you cannot test like this. There is no way to produce good tests using that technique. I actually do this when I run training courses. I show people that it is impossible to produce good tests using this technique. If you just go one method at a time and say, “I am going to write a test for this method,” you cannot test. You cannot write good tests like that, partly because, in order to drink from a cup, I need to create it.

So therefore I have already involved the constructor. I am not just testing the drink method. I also need to fill it, so I am using the fill method, and then I can drink from it, and then I need to determine whether or not it became empty. I have just used four different operations there. I am not testing a single operation, I am actually testing the interaction. This is why, when we look at the perspective from BDD, behaviour-driven development, that gives us a different way of understanding what you are after. You are after testing the behaviour. The behaviour is not just in a single method, it is the composition of different methods in different scenarios.

Another reason you do not want to end up doing testDrink, for example, is that I can drink in two different scenarios. I can drink from an empty cup, and I can drink from a full cup. That is not one test case, that is two. They are very different, and they have different outcomes. So the first thing is, if you are currently doing that, it is a huge test smell if I see that pattern. If I just see tests that are “here is a method, here is a test method that corresponds to it” — testA, testB, testC — you do not have the tests. It is as simple as that.

I always lay it down as a challenge to people: show me if you have any counterexamples. Nobody has ever been able to come up with a good example that contradicts that observation. What you need to be doing is testing behaviours, or in some cases we would look at it as testing a property. There is a fluid overlap between these approaches. You make a statement, a propositional statement. By propositional statement, I mean that we describe something and the way that it is.

“A new cup is empty.” “Drinking from a full cup empties it.” These two sentences are the test names. So literally your test name should be as easy to read as if it were a specification, which is what I said earlier. In other words, each test needs to be organized and thought of as a specification with an example. Here is the thing that we are showing. This is the expected outcome in this scenario. This is the behaviour or the property that we are entitled to, and that we are requiring at this particular point.

When we start looking at it like that, you suddenly realize your tests are not just a bunch of assertions with bits of setup. You are telling a story. You are describing the system from a specification-oriented point of view. You are giving people a series of logical propositions. If the test fails — if I say, “A new cup is empty” — that is a proposition. If that test fails, what does that mean? It means a new cup is not empty. I can tell immediately by looking at the test name what is wrong. I might not know why, but I know what. Whereas if I say testConstructor and that fails, I have no idea what that even means.

So the point is, your tests are units of meaning. Or, put another way, they are not just verification, they are communication. You are communicating actual meanings. If your testing philosophy is that you are just poking your code to verify it, you are going to end up with tests named after methods, or even worse — and I have seen this a few times — test1, test2, test3. Honestly, that is not going to help anybody.

You can always tell whether or not a team has really understood or has a good testing habit, because if they are testing like this, there is no way they have a good testing habit. They are doing testing as an afterthought. It does not feel good. I would not like to write tests like that, and if somebody said, “Kevlin, why are you not writing tests?” I am going to say, “Because it feels wasteful and it is annoying, it is frustrating.” If you adopt those practices, it is annoying. I would not want to write tests like that.

So test quality needs to be quite high; otherwise you are going to end up with unmanaged technical debt in your test base as well as problems in your code base. You do not want to double your problems, you want to reduce them. Your tests should be a clear explanation of what your system does in the detail, along with intention. For me, that is what I put under the heading of GUTs — good unit tests — and that is a term from Alistair Cobra. TDD does not miraculously cause you to do GUTs. You need to again realize that you are in the driving seat. Having a nice car does not cause you to be a better driver, and I think there are a lot of people who would benefit from that analogy.

Then you need to listen to your tests. What are your tests telling you? Your tests are telling you, “This is not cohesive.” Everything is bound up, and you have too much in one place. If you want to test a behaviour in that, or a related group of behaviours, then that related group of behaviours is its own module or its own class. Why is it hiding inside another class? This is design feedback. Again, sometimes the difficulty of testing comes from the difficulty of the code.

So I would say listen to your tests. My standard answer when people say, “How do I test the private stuff?” — my stock answer is that, generally speaking, you do not. That is a signal that you need to separate something out. You are not dealing with one idea, you are dealing with two ideas, and one of them is hidden inside the other. Pull it out. Do an Extract Class and focus on that. It is clearly important because you value it. You just said, “I want to test these behaviours.” You probably even have words and names for it, but it is hiding embedded inside another class. So give it first-class citizenship and extract it.

At the same time, I recognize that there is a point here. If I told you that and you have a major release tomorrow, that is probably not helpful advice from me. So that is why I do not say, “Do not test,” or “You should never test private stuff.” What I say is that you should take it as a signal, and you probably do not want to do that. So in those cases where we need a little pragmatism, I would say, “Yes, I am either going to weaken the encapsulation on the class in one way or another, but I need to put a huge deprecation, or, ‘This is technical debt I need to manage.’”

If I have ignored that warning three times, take it as a “three times and you are out” rule. If I keep coming back to the same code, and my colleagues and I keep coming back to the same code and saying, “Yes, we said we would fix this,” now you need to properly schedule it as a piece of work, because you are always working around it. You are not working with your code, you are working around your code.

That would be something I would say. Again, that is not really so much a tooling thing as understanding what your test is telling you. When it comes to mocking, I do not have any strong opinions, except that most people mock too much, rather than understanding that excessive mocking is an indication that you have a problem that you should be solving. Do not lean into it by adding more mocking. Lean into it by asking a different question: “How do I mock less? What rearrangement of interfaces or class responsibilities would make this easier?”

I generally think that people use too much mocking anyway, even in quite good designs. There are simpler ways of looking at it, and they confuse themselves. So you end up with a lot of mocks and a lot of mock noise, which is not to say that mocking is not useful. It is just that most of the time, I think the guidance I gave to one team years ago still holds: if you are not mocking, you probably need to learn how to mock. Learn how to mock. But if you are already mocking, you probably need to learn to mock less.

I do not have specific feedback on mocking tools, except to say that sometimes I do not find them particularly necessary because of the language. I made the comment about Python earlier. In some languages the language itself is effectively the mocking framework. So for me the emphasis is less on specific tools and more on what your tests are telling you about your design, your responsibilities, and your coupling.

10: Does tooling make or break the TDD experience?

Kevlin Henney: If you are able to establish the workflow, and the code that you are working with has the right properties, or you are moving in the direction of it being loosely coupled and highly cohesive — you are using good, classic design practices to organize your code — then you are going to get most of the experience that you need, and that will not change too much between testing frameworks.

I used a technique years ago where I would get people initially testing with just plain asserts, just a straight assertion, whatever is available in the language or library, without using a testing framework. Then I would get them to refactor towards a framework. That actually turns out to be quite useful. One company did this for their C and C++ code and actually created a framework that they then controlled, which was very useful for their embedded environment. It is not something I ever really did with the Java folks, and I occasionally do it, sometimes as a bit of fun with Python. But I do not do that very much anymore because these are solved problems.

The point of that exercise was to show people that, first of all, the fundamental ideas in a testing framework are not too complex, but also that you would be surprised how little you need to get a testing workflow. But that said, I like to have a testing framework that supports a number of basic features. Obviously, when a test fails, I want to continue with the rest of the tests. I want to be able to have parameterized tests so that my tests can be data-driven.

Any testing framework that does not support that in 2025 is, in my view, an interesting beta, but it is not yet a proper testing framework. It is a 2000s testing framework. I like to have a testing framework that allows me a way of organizing and grouping tests easily. These features streamline the overall testing experience, but they also allow you to have more expressive tests.

Whether that makes or breaks TDD, I do not think it goes quite that far, although I can imagine being sufficiently frustrated in some cases that it would break. However, I think good tooling improves the experience dramatically, and if you get a better experience, you are going to do more of it. There is something to be said for that: good tooling can encourage better habits and more frequent testing.

In terms of specific tools or frameworks that I personally like using for TDD, I have mentioned some of them already a couple of times — the ones that I think flow best for me. Obviously some of these are going to be a matter of personal experience. If I am using Java, then JUnit 5 is my choice, and that is actually a little bit different from JUnit 4. I found JUnit 4 got in my way a little bit, but JUnit 5 has just enough that allows me not to be working around the framework.

In the C++ space, I mentioned Catch as the framework of choice. I would also encourage the use of Catch for C. In other words, if you are in an environment where you are doing the production code in C, do your testing in C++, because the tools are generally more powerful. That is a common pattern anyway, but I would use Catch there. It allows you to be much more specification-oriented.

There is no surprise, I think, if I say that I am comfortable using Jest with TypeScript and JavaScript. With C#, I have already mentioned NUnit as being the one of choice. Occasionally I will do work in languages where I have less familiarity with frameworks. I did something for a client a couple of years back in Ruby and we used RSpec, and I was quite impressed with RSpec. It had been years since I had used it. I found it still a little limited in some senses, but I also found that I could create some really nice testing approaches with it.

My opinions on that and other frameworks are slightly less strong, but those are the ones that normally stand out. There is nothing unusual in that list. The key point is that the workflow and the design properties of your code matter most, and the tooling, when it supports that well, can significantly improve your TDD experience.

11: Putting TDD in the context of overall testing, how does TDD fit with other testing practices on a project? You have talked about it and hinted at this briefly throughout, but if you were to just focus on this aspect, how do you see it?

Kevlin Henney: Yes, I think normally when we talk about TDD, we tend to lean towards the unit testing side, because that gives us the fastest feedback cycle. There is no single standard definition of what a unit test is, but the one that is perhaps most widely used and accepted is very much about the isolation question: can I isolate a piece of code from external dependencies, external runtime dependencies?

If the answer is yes, then that is a unit. It is not about language constructs. It is not “is it a class, is it a module, is it a function,” whatever. It is about isolatability, and the idea that I am not going across a significant boundary of communication. I am not hitting the file system, I am not communicating with another service. The idea is that I am contained and therefore, as a nice consequence, it is going to run fast, but also I control everything about it. You do not control the file system, you do not control the network. Those are outside your control. They may be under your influence, but not your control. So if we define “unit” from that point of view, then we get a fast feedback cycle and it tells us something about the internal structure and looseness of coupling and the strength of cohesion of what we are building.

From that point of view, it feels like TDD is very much in the unit testing space. But that said, exactly the same workflow works for integration tests. There is nothing different there. You can use the same workflow. All of the same test recommendations pretty much apply, but I will probably be using other aspects of my unit testing framework. For example, if I can pull in data files, then it starts becoming a little more serious. It just means that when a test fails or when a test passes, I cannot guarantee that the reason it passes or fails is only to do with correctness of code. If your network is down and what you are testing involves wandering across the network, your test will fail and it will not be because your test is wrong or your code is wrong; it will be because the outside world is wrong. It is to do with the nature of feedback, and it will also be a bit slower. But everything else about it—the sensibility, the mindset, the structure, the naming, the partitioning—all of these other things are kind of the same.

Then we hit things like acceptance test driven development, ATDD, which I always find difficult to say. It is a lot easier to write it down. Acceptance test driven development is where we are actually looking just beneath the UI skin of an app, so potentially very much end-to-end without UI interaction, or at an integration level, but the idea is that it is still code on code and we are doing that. The idea with that is that clearly the small iterative steps will not be as small. They are probably very much more at the feature level. We are looking not at minutes to hours, but hours to days before a certain test may pass, and that is acceptable. The same sensibility applies. This is also where many people will associate behavior driven development, although behavior driven development is a philosophy that also applies to the unit tests. For many people, they think of BDD in this higher-level space. I want to be very careful to say it is not just in that space; it is just the way that many people approach it. But again, that can lead you to structuring your workflow in the same way, so you can see there is a sympathy between many of these kinds of testing.

We can also look at other forms of testing. Contract testing is where we would say we are testing external code. Historically I call that conformance testing. For me, contract testing is what you always do because you are testing the contract of the class; you are defining it. It is not about third-party code, but the term has come to mean code that is external to the code that you are writing and that you want to test conforms to your expectations. I think that is a really important point because it is complementary; it is not the same. It addresses an issue that sometimes people have when they are writing TDD. Let us say, for example, that I am using somebody’s cup framework that I have obtained from GitHub, and maybe I have had some bad experiences with that framework in the past and there were some bugs in it. That is annoying, because I am trying to focus on my code, but I am discovering bugs in somebody else’s third-party code.

The problem is that you end up putting an extra little test into your tests just to check that the bit that broke before does not break again, and so on. You end up with a lot of tests that are “drive-by” tests. In other words, you are testing your thing, but you are also testing this other thing. Do not do that. Your test has now got mixed responsibilities. You want to separate that. That is where contract or conformance tests fit in. The idea is that you want to say, “Everything we depend on that might cause a problem, we have tests that check that.” If those tests fail, we do not even bother running our tests, because there is no point. If the foundation of what we are building on does not work, then why are we even going to bother testing our code, because the foundation is already broken. So the idea is that is a clear separation that is written in a much more what we might call defect-driven style rather than test-driven style: “This is not working,” or “It was not working historically,” so I will write a test to make sure the latest version, or the versions we use in future, are always working.

We may also have other tests like performance tests. Performance tests are typically going to be something slightly different because they follow different experimental design. They have a different workflow. If I drink from a full cup and it does not empty it, then that is a bug straight out. But if I say that we have a particular availability or there is a particular performance limit, then statistically we may find that sometimes when we run the test we pass and sometimes when we run the test we do not, because of the way the operating system schedules and so on. We are not dealing with something that is simply about true and false. We are dealing with something that is better or worse, and there is a kind of grey area. We really want 90 percent of the time to be in this performance space and we will tolerate 10 percent outside it. That suggests that the nature of our test requires a different philosophy. We can pass and fail at certain limits, but not in the same way, and we do not just run it once and say, “That passed.” We need to draw from different samples, sometimes scaling-based samples. Those tests feel very different in that sense. Again, they are complementary, but not in a way that fits with our TDD; it is actually quite separate. They are testing behaviors that are outside the basic semantics. They are testing performance characteristics and so on, and that requires a different mindset, I feel.

So for me, TDD sits largely in the automated developer testing space, mostly centered on unit tests, but the same workflow and sensibility extend into integration tests and acceptance-level tests. Around that, we have complementary practices such as contract or conformance tests, characterization tests for third-party APIs, and performance tests with their own experimental mindset. All of that lives together in the broader testing picture on a project.

12: Maintaining TDD discipline under pressure: from a team leadership perspective, how can leads or senior developers encourage the team to stick to TDD when deadlines are tight or when people feel tempted to “just code it and test later”? Are there any habits or cultural practices that help sustain TDD in the real world of rapid timelines?

Kevlin Henney: Yes. I think the thing is, again, it comes back to whether or not it is your idea, whether or not you feel you own that idea. If TDD is something you only do when the team leader is in the room, and the minute they walk out of the room you stop doing it, then you do not have it. Your team is not doing TDD. It is a kind of performative TDD. You are doing it because you are supposed to, and that is understandable. But it means that the minute you feel any kind of pressure, you are going to throw that out of the window. We see this with a lot of different practices; it is not unique to TDD.

The point is that you need to get to the point where it is a habit and you embody it. You know, “This is what we do.” Also, if you have enough experience, you start realizing that the minute you start throwing out certain disciplines, you are going to pay for that later. This is where our managed technical debt comes from. It does not come from some kind of magic genie that pops into your code. Well, actually, maybe agentic AI can reduce the quality of your code while your back is turned, but the point is that technical debt does not magically appear in your code. You know it got there for a reason.

People often like to say that certain practices only work in certain cases. Honestly, there is a truth to that, but the chances are that whatever you are working on is not so special that practices you normally find useful suddenly stop applying. If you are already finding TDD useful, lean into that. Lean into it a bit more. If you are not, then that is a different discussion. But from the team lead perspective, the job of a team leader is not a controlling role. It is a leadership role. Leadership is not about managing; it is mostly about example, about enabling, about making people see opportunities and making it somehow easier for them to try the right thing than to try the wrong thing.

In some cases, TDD may be a good thing for them. That is great. How do we make that feel like it belongs to them and it is their practice, not the team lead’s practice? Not the organization’s practice, but my practice. How do I make it my practice so that when I start on a new team, that is how I work? When I go for an interview, that is how I describe what I do, because it is my practice, not the team’s practice or the organization’s practice or the team leader’s practice. This is not a practice that belongs to that person or that entity; it is my practice.

For me, that is the skill, which means that there is no easy answer. I am afraid if anybody is watching and hoping for an easy answer, there is not one. But that is the skill and also the subtlety in it: moving from performative compliance under pressure to a place where TDD is something the developers feel they own, something they practice because it helps them, so they are less likely to abandon it when deadlines are tight.

13: AI can draft tests fast, but quality is uneven. What acceptance gates—e.g., minimum mutation score (PIT), property-based invariants, automated test-smell checks, and explicit review rules—would you require so they increase fault detection? How would you enforce this in CI to scale safely?

Kevlin Henney: I would actually take a step back before talking about specific acceptance gates and ask why you are using AI in the first place. Why are you using AI to generate tests? What problem are you trying to solve by doing that? A lot of teams cannot answer that question. They say, “We are using AI because we were told to use AI,” and then we are right back at, “You were told to do stuff; this is not your practice, this is somebody else’s practice.” For many people the story is, “We do not have many tests,” so now they generate a lot of tests with AI, but they do not understand those tests. They do not know what is being tested, or whether the tests are correct.

Recently I wrote a blog post called “Think For Yourself,” where I gave people four things to consider whenever they want to integrate anything that is AI generated into their code base. The first question is: does it work? Do the tests pass in a way that gives you confidence that they are actually verifying the right behavior? The second question is: do you understand the generated code or tests? If you do not know that something works, or you do not understand it, step away. Do not pretend that you are being productive.

There is a common illusion here. Somebody will say, “AI has boosted my productivity,” and then you ask them how they know. How are they measuring that? The answer is often that they are not measuring it. They just have the feeling of speed because the AI produces a lot of stuff in ten minutes. Then they spend the rest of the week fixing it. That is not productivity; that is the creation of legacy. In my view, one working definition of legacy code is “code written by somebody else.” AI-generated code fits that definition perfectly. I am not saying, “Do not use AI.” I am saying that good use of AI requires more understanding, not less, and that requires tests and review.

The number of times I have been fooled or could have been fooled if I did not have tests is significant. That is why I write my own tests. I do not trust AI to do a better job than I can, because I still have to explain the behavior I want. In the time it takes me to explain that precisely enough to an AI, I could have written the tests myself, and I would know exactly why I wrote them and what design decisions they encode. AI might be useful for generating certain coverage-oriented tests in situations where coverage is very poor, but even there I have questions. If I am using AI to generate tests as well as code, I must spend most of my time reviewing, and reviewing is a skill that many people do not have.

I spent years learning how to review: fiction, non-fiction, technical material, books, articles, and code. Code review is not “I glanced at it and it looks good to me.” That is not review. If you do not understand the generated code and tests, you have a problem waiting to happen. The upside is that you can treat AI as a teacher as well as a generator. If you are using AI-generated tests, ask yourself: do you understand what your code is doing when viewed through those tests? What can you learn from them?

Then there is the question of taking control. What is the difference between the generated code or tests and what you would have written? If you were to write that test yourself, what would you have done? They might be similar, but they will often be different. Understanding that difference is an education. Sometimes I look at the generated result and think, “That is a really good way of doing it.” In other cases I look at it and think, “That is not a very good way of doing it at all.” Either way, I have learned something.

So my final recommendation in that space is to add one more gate: can you think of at least one way to improve what has been generated? Do not treat AI output as something that is simply “good enough to accept.” Treat it as a starting point. That gives you two big groups of questions. The first is: why are you using AI to generate your tests? Do you have a clear understanding of the benefit and how you will measure that benefit? If you do not, do not do it. Being busier is not the same as being productive.

The second group applies if you do decide to generate code or tests with AI. Use this list of four gates: does it work, do you understand what you have, what is the difference between what AI produced and what you would have done, and can you think of at least one way to improve it. If you habitually apply those checks, you will learn a lot. You will be using AI as a possibility generator, not as an autopilot. You will be interacting with it, passing judgment, using your design sense, and either accepting the result because you know why it is good, or changing it because you know what is wrong and how to fix it.

In other words, you turn AI into an assistant or a coach. The problem I see at the moment is that many people are backseat drivers with AI. They have no idea what is being generated on their behalf. They do not understand what is being tested. When they have to fix an issue or extend the code, they discover that they do not know enough, and it takes them longer. They are not using AI in the right way.

So my general advice is this: be crystal clear about why you are doing something, especially with tests. For me, the strength lies in you writing the tests, not in outsourcing them to AI. Tests are your executable specification and your feedback loop. If you hand that over to a tool without understanding or review, no mutation threshold in CI will save you. You may use metrics and gates in your pipeline, but the real acceptance gates are still the human ones: clarity of purpose, understanding, comparison with your own judgment, and deliberate improvement.

14: Finally, looking forward: What do you see as the future of TDD and automated testing practices? Are there emerging trends—perhaps in tooling (like property-based testing, AI-assisted test generation) or in process (like BDD, Continuous Deployment practices)—that you believe will shape how experienced developers approach TDD in the coming years?

Kevlin Henney: If you do not already have an automated testing habit, now is a very good time to start. With AI in the mix, you will find that generated code sometimes fails in ways that are quite intuitive, where you look at the mistake and think, “Yes, I can see how that happened given the training data.” In other cases, the mistakes are very odd: failures where you think, “Why would you ever do that?” You need to become better at testing to deal with both.

Interestingly, this is something I was saying years before large language models. Around 2016 or 2017, at a conference in Poland called MobiConf, somebody asked me about the future of AI. At that point everyone was guessing about where AI would go. My answer was that you need to get better at testing. That is still my answer. The more AI we add, the more testing skill we need. I do not want AI anywhere near company-critical code without tests and without proper reviewing skills.

So one part of the future is skills. You need to get good at testing, and you need to get really good at reviewing. Reviewing is not just a testing skill; it is a design skill. You cannot review code effectively unless you have deep design experience, which means you also need to learn to code. Coding remains relevant, because otherwise you do not know what you are looking at. This is not about language tricks. It is about familiarity with a kind of precision that you normally only see in areas like science and mathematics. Those are the skills you want to build, and they sit in the same space as the precision and specification thinking that testing requires.

In terms of the specific workflow of TDD, I find it hard to make strong predictions. My sense is that TDD adoption will always be relatively low compared to the overall population of developers. As things stand, many people still do not test at all. There is a significant and increasing proportion of developers who do test, and that has changed over the last couple of decades. The needle has moved. Within that group there is a smaller subset who will try TDD, or have TDD as part of their toolkit and can employ it when appropriate.

I would like that number to go up. Leaving AI out of it for a moment, I think TDD is a good practice. I have thought that for a long time. It is helpful because it encourages incremental thinking and clarity. Done with the right sensibility, it leaves behind something worth inheriting, rather than something people curse you for.

If you bring AI back into the picture, those same qualities remain valuable. Clear tests, incremental feedback, and a strong sense of specification help you reason about AI-generated code and about changes in general. I would like to think that AI might even increase the uptake of TDD, because it will force people to confront questions about correctness and understanding more directly. But whether that happens, and to what extent, is difficult to predict.

So my view of the future is less about a specific new tool or fashionable acronym and more about emphasis. We will see more AI and more automation, but the teams that thrive will be the ones that double down on testing skill, review skill, design sense, and the ability to work with precise specifications. TDD is one of the workflows that aligns naturally with that direction, and that makes it a practice that will continue to be relevant, even if it never becomes universal.

Architecting AI Software Systems for the Real World: A Conversation with Imran Ahmad

Divya Anne Selvaraj — Wed, 03 Dec 2025 11:18:19 GMT

AI systems are now everywhere in software, but turning a promising model into a reliable, cost-effective, and sustainable product is still hard work. Teams are discovering that “just add a model” is not enough; you need end-to-end architecture that can take an idea from a lab-style proof of concept to a production system that meets real constraints around cost, latency, security, and operations.

Imran Ahmad is a data scientist, educator, and author focused on algorithms, AI, and cloud computing. He leads machine learning projects for the Canadian government, teaches at Carleton University, and is an authorized instructor for AWS and Google Cloud. With Packt, he has authored 40 Algorithms Every Programmer Should Know (2020) and 50 Algorithms Every Programmer Should Know (2023), and Architecting AI Software Systems (2025, with co-author Richard D Avila) and the upcoming 30 Agents Every AI Engineer Should Know. Outside of work, he enjoys photography, biking, and mentoring developers through his Discord community and workshops.

In this conversation, we dig into how Imran thinks about AI architecture in practice: from the fundamentals of good software architecture and elastic cloud patterns to the “five pillars” he uses to evaluate AI systems—security, reliability, performance, efficiency and cost optimization, sustainability, and operational excellence. We discuss separating data and compute for sustainability, designing differently for heavy training workloads versus real-time inference, and avoiding hard coupling to any single AI or cloud vendor. Imran also shares his perspective on agentic AI and agentic RAG, what changes as AI becomes a core concern for software architects, and why UX, cross-functional collaboration, and long-term operational thinking are now central to successful AI systems.

About the Book

1: What inspired you to write Architecting AI Software Systems now, and what gap in the industry or knowledge are you hoping to fill with this book?

Imran Ahmad: When you design AI systems, or when you have a Gen AI solution, you have to have an end-to-end solution, so you have to look at that from its totality. What happens is that whenever there is a new technology, whenever there is a new idea—a technical idea—we start by focusing on depth. We develop solutions, we experiment with them, we iterate through different versions of them until we are ready to use them for solving large-scale problems—so, you know, beyond solving those cats-and-dogs pictures, differentiating between cats and dogs.

Now AI has come a long way. When we have gone through our development processes, they have matured. Then we need to deploy these solutions. These AI solutions are only useful when they can solve a problem in production, and when you bring these ideas to production then, from start to end, they need to work properly, and around that you have to design architecture. I will talk about this as well—how you quantify a good AI architecture. There are five pillars, as we call them; I will talk about that later: security, reliability, performance, efficiency, cost optimization, sustainability, and something that is perhaps the most important, that is operational excellence. I will talk about that later, but the need for this is that we are bringing these ideas to production. That can only be done if we are designing, if we are giving it proper thought. We are bringing all the best design patterns to the plate, and this is something that motivated me to write this book.

2: Your book takes a very practical, architecture-centric approach to AI, doesn’t it? It mentions a structured journey with real-world examples, hands-on exercises, and even a fictional AI system’s architecture as a learning tool. So can you give us a quick overview of the key themes or unique features of this book?

Imran Ahmad: So the way we have designed this book is that we have essentially divided it into two parts, and the first part is about the fundamentals—about the fundamentals of architecture. If you look at the first part, it zooms out and talks, in general and not very specifically about AI, about what the principles for good architecture are.

Of course we are talking about AI, but we start with the fundamentals of AI systems, and this is where we define terms. We define microservice architecture, and we discuss a couple of actual use cases. Then we also define terminology like data lake, what a data warehouse is, and why AI is so important in the context of designing AI systems.

When we bring these systems to cloud computing, then you have elastic architectures, and as soon as you move to elastic architectures, your system can become cost-effective and performant at the same time. If you think about this, usually the performant systems are not cost-effective. It is like you buy a Ferrari. Ferrari is performant; it is one of the expensive cars, one of the fastest cars—but Ferrari is expensive. So we are trying to buy a Ferrari at the cost of a Toyota Corolla. We want systems that are cost-effective and performant at the same time, and elastic architectures get you there, where your systems can expand and shrink based on the immediate needs. This is where we talk about why cloud computing is so important. This is chapter one.

Then we talk about a case of architecture. You know my co-author, Richard, he is a great architect. This is where he brings his 30 years of experience, and he talks about the role of the architect, the vision, the style, and, most importantly, he talks about what the implications are if the architecture is not properly designed.

Then chapter 3 is about bringing software engineering into the picture, and we see: OK, what are software-engineering-specific topics related to architecture? So this is part one—part one is done. Part 2 is about AI systems. It is specific to AI: what are the architecture templates, what are the architecture philosophies that are relevant to AI systems?

Here we talk about something that is called “concept of operations,” and we talk about how the concept of operations is relevant to AI systems. Then we talk about, as you mentioned, certain use cases—large-scale use cases. We go over a complete use case so that, if you want to design a RAG system, you know how you can use the ideas that we have developed in this book and how we can apply that to an actual use case.

3: What key skills can readers expect to gain from the case studies and exercises, and also perhaps changes in mindset?

Imran Ahmad: So the first skill is how to convert an idea into a design. This is skill number one. Requirements capture the idea in a formal way, but requirements are usually written by a non-technical person. The first part of the solution is to convert those requirements into a technical design—that is the architecture. Skill number one is what they will learn.

Skill number two they will learn is that there may be surprises. The design that you came up with may not be the perfect one. You have to iterate through that. You come up with something and it may or may not work. Now you need to find ways to differentiate between a good design and a bad design. Functionally, they will work; the functional requirements are met. It is the non-functional requirements that differentiate between a top-notch design and a design that is not that great. This is about looking at how cost-effective your architecture is, how performant it is, how secure it is, how reliable it is, whether you are using the principles of sustainability, and whether you are bringing operational excellence.

Operational excellence is a concept that we have discussed in this book. Operational excellence is that, when you are designing these architectures, you are looking long term. You are using YAML files. You are using orchestrators. You are using JSON files. You are using parameterization whenever possible so that you can reuse them and you can maintain them. And you are again looking at the long term.

So this is where the second skill they will learn is that your first attempt at the design may not be the best one. You have to rethink; you have to evolve. You have to quantify how good your design is, and then, before you actually start implementing that, it is a good idea that you start with a pilot project. A pilot project is something that is usually one-tenth of the scale of the original project, but it actually covers all the critical points, all the critical design parts—they are there. It validates that, and then you basically go towards the full-blown solution.

Key Challenges in AI Architecture

4: In the current context, many teams struggle to turn AI prototypes into reliable products. From your experience, what are the main challenges in bridging the gap between a promising AI demo and a production-ready system?

Imran Ahmad: Yeah. So you can look at it from three aspects. Let’s look into that. It is a very important topic as well when you move towards an AI solution. Initially, you feel that it is a silver bullet; it can solve any problem, so that is when you are experimenting with it. But the success of an AI project depends on three factors. The first is cost, the second is performance, and the third one is accuracy.

Let me explain that. Usually, you are applying AI to an existing problem. You already have an alternative solution; you know that things are already working. Now you are upgrading that and bringing AI into the picture. You want to do things in a new way. So, while you are moving to this new world, if I can call it that, first you need to quantify what the effect on the cost is—whether the investment in the AI, the initial investment and then the running expenditures, can be justified by the aggregated cost saving that you expect to get. This is point number one.

The second one is—I said performance, but it is actually time. Your current processes—whether they will be optimized in certain ways, whether the time to get things done, to meet some of the requirements—will they be done in a more timely way? Let me give you an example here. If it is a bank manager looking at people applying for a mortgage and that bank now wants to use AI, then perhaps instead of taking that bank manager four hours, it will take four seconds. Now this is the time that has been saved. So this is the second dimension.

And the third one is accuracy. Whatever your current systems and current processes are, now this new world, the new ideas that you are introducing through AI—whether that will be more accurate or not—you have to basically look at that. In these three dimensions you have to make some progress. Maybe it will be more costly, but if you can justify it in terms of time and accuracy, then you may be able to sell the idea to the senior management.

I have encouraged that this is something that we should do right at the beginning. It should not be an afterthought. I have seen that people come up with these AI architectures, the architecture is implemented, and then they do some sort of time and motion and try to see whether they justify it or not. By that time, you have already implemented; you have already invested; you are already using that system. So it is more like you buy a car. Let us say that you buy a plug-in hybrid, you have already paid the money, and then you go to check and see whether it made sense or not. So you need to basically check that before buying a car.

5: How can software architects specifically help ensure an AI proof of concept scales into a sustainable real-world solution?

Imran Ahmad: OK, now, sustainability is handled at two levels. If you are using cloud computing, then maybe it is not your job — this is the job of the vendor at AWS or Azure or Google Cloud. But still you can do a lot. For example, if you are using virtual machines, even if you have subscribed to virtual machines, in all of these cloud offerings, if you pay a fee, you can keep running these virtual machines 24/7, and you pay a flat fee. The same goes for the servers that you are running in-house. Now what you need to think about is that these are performance-hungry, power-hungry machines.

The second thing is that these days a good design is where we design the compute dimension in an ephemeral way and the data dimension separately. So there are two dimensions: when we talk about the architecture of these systems, there is a data dimension and there is a compute dimension. The design pattern that we suggest is that, first of all, there should be clear bifurcation, so the data and compute dimensions should be separate. The data dimension should be long term — it means your data is stored there for two or three years. The compute dimension should be ephemeral; it should be temporary. When there is a need to process the data, you provision the compute dimension, you process the data, and then you suspend it or just remove it.

Let me give you a simple analogy. All of us work in our favorite word processor like Microsoft Word. When you open Microsoft Word, the file is usually already there. Let us say that you are working on a research paper: the file is there, and whenever you find time, you open Microsoft Word, you work on the file, you store it, and then you go back to your work. So in the data dimension, your paper is stored for those four or five years, perhaps on your hard disk. But the compute dimension is your word processor. Whenever you want to change the paper, you open the word processor — for example, Microsoft Word — you change it, and once you are done, you close Microsoft Word. Now we can have the same design pattern for AI systems.

So it means that whenever there is a need to change something, whenever there is a need to train a model, whenever you need to change the processing pipeline or you want to process the data, you provision your compute dimension. The compute dimension should be need-based and the data dimension should be long term. If you follow this clear bifurcation between data and compute dimensions, our AI system will be cost effective, it will be performant, and it will also meet the needs of sustainability.

6: There’s a concern about the sustainability and vendor lock-in of today’s AI platforms. For example, Open AI reportedly reached 10 billion in revenue but is losing around 5 billion a year, a situation some have dubbed a “subprime AI” crisis. Now, if enterprise architects build around such providers, they face continuity and lock-in risks. According to you, how should architects mitigate these risks?

Imran Ahmad: OK, so wherever there is a choice, do not use the proprietary vendor-specific APIs. Almost always we have two choices. You can use the vendor-specific APIs, or you can use a higher-level generic API. I will give you a specific example. When you are working with large language models, you can use the APIs that are provided by OpenAI — this is choice one, and because you were talking about OpenAI, let us go with that example. The second choice is that you can use LangChain. LangChain is an orchestrator. If you use the LangChain API, then what will happen is that your code will be talking to LangChain, and in LangChain it will be talking to the OpenAI-specific API.

Now let us look into a scenario that is unlikely, but can happen: let us say that OpenAI goes bankrupt. If the code is not directly talking to OpenAI, the connection between LangChain and OpenAI will change to perhaps LangChain to Gemini or LangChain to Claude. That is all that is needed. Your code is not dependent on OpenAI. Now, you can repeat this design pattern for the clouds as well. For example, if you are using cloud and you use open source APIs, then it means that, let us say you are using Docker containers. If you are using Docker containers, then your cloud computing is just a living space for your Docker containers. If you are using Kubernetes as an orchestrator for Docker containers, that is even better.

So it means that all you need is that your cloud computing platform becomes a Kubernetes enabler. Now, if one of those enablers goes bankrupt, all you need to do is move to a different one and you do not have to change even a single line of code. But in this approach we have issues as well. For example, all these vendors sometimes provide the best tools in their vendor-specific APIs. I will give you an example here. If you use Google Cloud, one of their most polished tools is called BigQuery. BigQuery is vendor-specific, and for AWS that is Redshift. Redshift is vendor-specific as well.

My recommendation is that still the risk of being hard coded to a vendor-specific tool is higher, especially at this point when things are changing so fast. We should be cautious, and we should be putting effort into being as vendor-agnostic as possible.

7: Ensuring quality and maintainability in AI software is an emerging concern. Studies find many AI/ML codebases have minimal testing and documentation, often due to “lab-style” development by data scientists. And according to the State of Software 2025 report, only about 1.5% of the code in AI/big-data systems is test code (versus around 43% in traditional systems). Why do you think AI projects often end up with weaker software engineering practices? And what can architects do to instill better rigor—for example, would organizing cross-functional teams of data scientists and software engineers help bridge the gap and improve things like testing, security, and code quality?

Imran Ahmad: It is a new technology. In many cases, people are learning as they are implementing, and this is a byproduct of that. With mature technologies, the test cases have already been established, so we know from other similar projects what the criteria of success are, both in a functional and a non-functional way.

With something new, it is just working, and then we need to ask ourselves: what is the best way to test its functionality? For example, if hallucination is a concern, how do we test whether our solution is hallucinating or not? If accuracy is a concern, how do we test that? In a Gen AI solution, the metrics themselves are still evolving, and testing is all about quantifying whether our solution is meeting the agreed-upon goals or not. Those agreed-upon goals are still evolving, and that is one of the reasons, as you said, that these projects are not well tested. Yes, that is a concern, but things will improve.

AI is quite subjective in different ways, so a project may be successful for me, but for you it may be a failure. There is some subjectivity there, and as you brought up, one way of mitigating that is to come up with a consensus among people with different roles and different skills—a data scientist, a data engineer, a project manager, and perhaps a business analyst or a person who is in charge of production. They will have different views of the success of a project.

For a data scientist or an AI engineer, success is mostly about meeting the functional requirements. For a person who is in production, they may have no idea about algorithms, ROC curves, AUC, recall, or precision. For them, success is putting the Dockerized solution on a server and making sure that the non-functional requirements of reliability, security, performance, and availability are met. If it is an application for approval or refusal of a mortgage application, for the person who is in production it is all about whether the service is available or not, whereas for the data scientist it is all about the metrics related to data science.

So we have to bring them all to the table and come up with a consensus: what does end-to-end success for this project look like, both in development and in production? Whenever there is a problem, people do not have to agree on everything, but they have to speak out about what they think of the solution. Then they discuss, they understand each other’s world, and they come up with a consensus—a compromise. Once that is made, you follow that as the criteria of success. This is something that needs to happen, especially for large-scale projects.

Designing Scalable and Robust AI Systems

8: AI systems must be built with scale in mind from the start. On the training side, deep learning models demand substantial compute (GPUs/TPUs) and efficient distribution of tasks. On the inference side, serving many users requires horizontal scaling, containerization, and load balancing to keep latency low. What are some architectural strategies you recommend to handle scalability for AI? How do you approach designing for heavy training workloads versus high-volume real-time inference in a production system?

Imran Ahmad: First of all, it depends on the problem you are trying to solve. The scalability requirements are different for different problems. Let me give you an idea. When you are training a model, this is where most of the costs are incurred. You need GPUs—GPUs are expensive—you need CPUs as well, and you need to experiment and train over and over again.

However, scalability requirements in development have two characteristics. Number one is that once the training is done, you do not need those resources anymore. You still need to retrain the model, of course, but if you are developing the solution for two, three, or four months and then your model is trained and in production, all those 20 machines that you brought in will sit there doing nothing. That is why cloud computing is really good there: you can provision resources, and once you are done, elasticity becomes important.

Point number two is that there is no hard deadline associated with the training process. At inference, there is a deadline. If you swipe your credit card, the fraud detection result needs to come back within a few seconds. If someone is paying at a restaurant, that person cannot wait for 40 seconds. So at inference you have those deadlines.

When you are training the model in development, there is no such hard deadline; it is more about your comfort factor. If you can live with evolving the solution on a scaled-down system during the daytime, then at night you can submit the full-scale training job before you leave for home. It runs overnight, and you come back in the morning and the solution is there. During the daytime again, you work on a scaled-down system—one-tenth of the size—and you evolve it. If you follow that pattern, you can save a lot of cost. You use the off-hours for training, and in that case you can use a much smaller number of resources. You need to be innovative and creative there.

The second part of the equation is scalability for inference. Now let us say the model is trained and put into production. We need to carefully analyze the scalability requirements there. We should not over-design; we should not under-design. Let me give you a couple of scenarios.

Again, take the example of a credit card. Each time you swipe the card, the result—whether it is a fraudulent transaction or a regular one—needs to come back in about two seconds. That is a hard deadline, so you have to make your servers performant enough to meet that deadline. On the other hand, imagine a bank manager who, at the end of the day, just needs to look at a spreadsheet of the transactions that went through and see how many were likely to be fraudulent so they can be reviewed. In that case, the requirement is “end of day.”

There, we do not need real-time endpoints. We can live with batch-mode inference and save a lot of cost. You do not need to provision real-time HTTP endpoints. All you need is to gather your unlabeled data and create a batch—at the end of the day, the top of the hour, the end of the week, whatever granularity works—submit it to the server, and it produces the labels: how many are likely to be fraudulent and how many are not.

So real-time inference is not always needed; if you use it everywhere, it is expensive and you may be over-designing the system. To get scalability right, you have to carefully analyze the requirements first and, based on that, design and architect the system.

9: Integrating AI into existing enterprise environments can be complex. Teams often need to balance cloud-native AI services with capabilities within the customer’s current on-premises infrastructure so they can leverage existing investments and avoid disruption. How do you evaluate which deployment strategy is appropriate for a given project? What factors—for example, data sensitivity, legacy system constraints, regulatory requirements, or team skills—should influence whether AI systems run on-premises versus in the cloud or in a completely new environment?

Imran Ahmad: OK, so there are two things here. First of all, I suggest that we carefully determine the maturity level. There are four maturity levels, and those maturity levels are about the technical infrastructure maturity and the skill maturity level as well.

Let us imagine a company. There are 30 people working in that company, and they are working on developing a product that deals with recommendations. It is a recommendation engine that recommends products to their existing customers, and they are using some algorithms, but now they want to modernize that. They want to use deep learning, they want to use Gen AI, and they want to use cloud computing.

The first requirement is that they cannot afford any disruption. So, first you need to look at what maturity level you are going for, but you also have the hard requirements that you have to use the existing infrastructure and you have to use the existing people. Then we have to develop a phased approach. Usually there are four phases.

Phase one is where we come up with the plan, looking at the current situation and deciding what the path forward is. This is where we start. In that phased approach, depending on the maturity level, we may say that in phase one perhaps we can move this part of the system and keep the other part on-premises. Using the example of that company, perhaps accounting can stay on-premises, but the algorithms can move to the cloud. That is one thing we can do.

Then we have to figure out how we are going to create a pipeline that can link the on-premises environment with the cloud. Usually what we do is keep redundant systems both on the cloud and on-premises, and slowly we test that and then we remove the part that is no longer needed on-premises. So this phased-out approach will be vertical, it will be company-dependent, and it will reduce the risk, and that usually works.

In some cases we do not have a choice. If you are working for a government organization or at a financial company, then sometimes there are regulatory requirements that your data cannot be on the cloud. There are three sectors—usually government, healthcare, and the financial industry—and in these three, some of their data needs to be compliant with existing regulations. It is not impossible, but it is more difficult for them to bring the data to the cloud. For government, sometimes it is not even possible to bring the data to the cloud.

Let me give you an example. There is a tool from IBM that is called IBM SPSS Modeler. Banks and companies in the banking industry are still using that. If your processes are dependent on that and it is working fine, you will not get the same level of comfort if you move to the cloud, because you are using a legacy system with a lot of embedded knowledge. All of that embedded knowledge will not be available, so now you are tied to your legacy system unless and until you are ready to retire your legacy software. There is no way you can move to the cloud.

Then sometimes what happens is that companies, when they say that they will move to the cloud, think mainly in terms of cost savings. I will give you an example here. The Canadian federal government, about four or five years ago, thought that they would move to the cloud, and they started that journey. The infrastructure to support the Canadian federal government is worth billions of dollars. They thought that they would save money, and the study was that it would save about 20% of the cost. That was the initial study.

Now, five years down the road, that did not happen. They moved to the cloud and now they have spent more money. Cost has increased by about 12%. That is the number. And there is a reason for it. The reason is that if you do not make a conscious effort, the simplest architectures in the cloud are not cost-effective. If you run a virtual machine 24/7, it will meet the functional requirements, but it is not elastic—yet that is the easiest solution.

That is why, throughout this talk, we have been talking about the case for architecture: taking a step back and spending some time there, because in the long run, in that example, if you calculate cost, initially they thought that the cost would be 20% less; it is 12% more. What that tells us is that we should not rush into the cloud. We should first understand what architecture we need, and once you have that clear architectural vision, then you implement that so that in the long run you are going to be saving the money.

If, in a hurry, you have already started with something like the easiest possible solution, it will be very difficult to change it down the road when you have already started your computing resources, you already have your compute dimension, data dimension, and functional dimension running. If you want to change it, it will be very difficult and risky. You are doing something that you should have done a couple of years ago. That is why there is a whole section in our book that talks about the case for architecture—why we should, and what is the need for system architecture for AI systems.

10: User experience is a critical yet often overlooked aspect of AI systems. Even if the model is accurate, poor UX can block adoption. What can architects and designers do to ensure an AI system delivers a good UX and drives user adoption? For example, what is your view on using user-centered design practices or designing for diverse user needs such as voice UIs and accessibility features? Do you have any best practices for aligning AI architecture with great UX design?

Imran Ahmad: Yes. So for UX we should always be designing the system, we should always be thinking of it as a service. If you are a technical person and you have a spouse who is non-technical—or if you have a brother or sister who is non-technical—think about that person and whether that person can use this service or not.

My brother is a medical doctor, so I always think: OK, the eventual service that I will provide, can he use it or not? Sometimes what happens is that we bring too much technicality to the front. We are very impressed with our own algorithms, our own models, and our own infrastructure, but the end user is a non-technical person. They should not even need to know the details in the data dimension or the compute dimension or which models we are using. That all should be a black box once things are done.

It is a good idea to always try to see, from the eye of a non-technical person, how easy it is for a non-technical person to use it as a service. So think about it as a service. Your solution should be a service to the end user. There are different zoom levels. You can think of your solution as a microservice architecture. Now, microservice architecture is quite technical; it is great for providing abstraction to a data engineer, but not to the end user. We need to zoom out more.

I am into photography, so I give examples from zoom levels. Zoom out more and think of it as a service. At the highest zoom level the user just sees, “This is a service that helps me do X,” and everything else is hidden.

The example of that is that sometimes we are using AI without noticing it. The greatest example is when you use Google Maps. When you use Google Maps, it uses an optimization algorithm to get you from point A to point B. If you look under the hood—because my PhD was in algorithms—optimization algorithms are one of the hard areas. There is a famous example of the travelling salesman’s algorithm, and the travelling salesman’s algorithm is basically that you have a list of cities—city one, city two, city three, city four—and you try to find the optimal route. This is an NP-hard algorithm.

So it means that whenever you say, “OK, I want to go from point A to point B,” you do not know that under the hood there is a lot going on. First of all, your GPS location needs to be tracked. Then the destination needs to be there, and the traffic situation needs to be there—what are the real-time traffic conditions on each of the possible routes—so it is dynamic in nature as well. And then you reach your destination, and it asks you for feedback, and we do not even realize that for this simple use case there is so much power being used.

This is the best example. People use it. My daughter can use it; she uses it to go to her school. People will use it if they find the service easy to use, and we do not need to know what is under the hood. That is the UX.

And I will tell you the gap there as well, that I talked about earlier. In real time it needs to know that we are travelling on those routes, and the way it collects that information is that it assumes people are carrying those devices in their car, and if those cars slow down, it means that there is traffic congestion. It works most of the time. But where I live in the north, there is a place called Gatineau Park. It is about 80 kilometers long. People are biking there on their bikes, on their cycles, and their GPS devices, Google Maps, are being used, and Google Maps always thinks there is traffic congestion. It is always red. But if you go there, there is no one there. So there will be failures. It is not that algorithms always work.

Still, as a user, you trust it because of the overall experience: it is easy to use, it hides the complexity, and most of the time it works. That is what we should be aiming for when we align AI architecture with great UX design.

Emerging Trends and Future Outlook

11: The rise of “agentic AI” is a hot topic in 2025. We touched on it in the last conversation we had. Major platforms are jumping in—for instance, Microsoft’s new Azure AI Agent service helps orchestrate multiple specialized agents and tools. What might this shift from single AI applications to multi-agent systems mean for software architects? How might architectures evolve to accommodate networks of AI agents that can plan, collaborate, and act autonomously? What challenges should we be prepared for in areas like agent coordination, security, or reliability?

Imran Ahmad: OK, first, let us think about this. Right now, when we design an AI system, the goal is to mimic human wisdom. That is what artificial intelligence is: mimicking human wisdom.

Imagine a person who wants to develop a fraud detection system and wants to get it done by the end of Monday. The first step in the human mind is discovery: OK, what are the requirements, and what are the tools that are available? Maybe there are existing tools, maybe there are friends to ask about which tools exist. In my mind, I will orchestrate. I will use those tools in different ways, I will come up with a plan, and I will start using those tools. Some of those tools will work, some will not, and the solution that I deliver will be the result of using existing tools, being aware of the tools that are available to me, and combining them in a meaningful way.

An AI agent is mimicking exactly this human behavior. An ideal agent should be aware of the tools that are available. Second, it should be able to orchestrate those tools in a way that leads to a meaningful solution. Third, it should be ready for surprises. Just like I can change my plan when something unexpected happens, the agent should be dynamic enough that it can change and re-plan as it goes. These are the three attributes of an agent.

In an agentic system, a large language model is just one of the tools. It is one of the important tools, but right now the large language model sometimes becomes the “king” and everything else is forgotten. What Azure has provided, and what Google has also provided with their own agent solutions—for example, agent spaces, agent design tools—is a way to step back and see these as orchestration platforms. We can zoom out and look at them in a vendor-agnostic manner; essentially, they are all doing almost the same thing.

Now, for architects, the first thing is that they should be aware of these new developments. That is why this book is about the architecture of AI systems. We are entering a time—2025 and 2026—where AI architecture itself is becoming a specialty. You need to be aware of these developments and track them on a regular basis. One way I keep up is by subscribing to good YouTube channels and other high-quality sources. There is a lot of content out there where people give talks but do not really know what they are talking about, so you have to be selective. And you have to recognize that what is relevant today may not be relevant at the end of 2025.

At the same time, some fundamentals do not change: the need for good architectures, the need for performant architectures, the need to create operational excellence, and the need to have data that is reliable. If agentic systems are one way of doing things, they are not the only way. There will always be new ways coming. You should keep an eye on them and keep incorporating new ideas as they come along.

The challenges are very similar to what we saw with Kubernetes. When Kubernetes was introduced, there was so much excitement. I used to teach courses on Kubernetes, and people mainly wanted to learn how to design and manage applications on it; they were less interested in the internals. Now, if you use a managed service like Vertex AI, under the hood it provisions a Kubernetes system for you and you do not need to think about those details; you just use it.

Right now, these agentic systems are like Kubernetes in its early days. They are still being developed, so sometimes they will work, sometimes they will not. But you will see that in less than a year these systems will become mature. As an architect, you should expect that maturity. Things like agents talking to each other should come out of the box; multi-agent systems, where each agent is a specialist with its own piece of wisdom for a particular vertical, will become the norm.

Our responsibility as architects is to start bringing these entities into our architecture and then let the system evolve and mature in the coming months. Some glitches will be there, but over time those glitches will be resolved.

12: Enterprises also grapple with how to integrate their data with AI models effectively. One common pattern is bringing domain knowledge into AI workflows so that models can reason over real enterprise context. What is the right approach for infusing domain knowledge into AI systems? Do you think Retrieval-Augmented Generation (RAG) will remain the dominant architecture for bringing enterprise data into AI workflows, or will other patterns become more prominent as AI capabilities evolve?

Imran Ahmad: Yes. RAG is becoming obsolete in some ways—you are right about that—because context windows are becoming larger and larger, and that can remove the need for RAG. But it also means that with each request you may have to send a lot of information, and that may not be an efficient use of the model.

The advantage of RAG is that it is more efficient. Instead of sending everything, you only attach the right vectors or the right text. So our requests become more focused, and we are not wasting capacity on irrelevant context.

Agentic RAG is a step ahead. This is something that is still being developed, and classical RAG may become obsolete eventually. That is why I was saying earlier that these systems are expected to evolve. But RAG is still important, because you need to understand RAG in order to get to agentic RAG. In the book we have talked about RAG, and I feel that this is the right learning path: learn the simple use case before moving to the more complex one.

Coming back to your question, there are always multiple ways of doing things. You can have agentic RAG, you can have a large context window, you can have what I would call “classical” RAG. There will be an overlap in functionality between these approaches. In that case, it becomes subjective. You have to carefully see what the advantages and disadvantages are for each option, and then choose the approach that gives you the best solution that is available currently.

13: Some say the “AI architect” is no longer just a technologist, but a strategic leader at the intersection of data, infrastructure, and product. How do you see the role of architects changing as AI becomes a core part of software systems?

Imran Ahmad: Yes. So the traditional architect was basically operating in the days of the waterfall methodology, where you had clearly defined phases: your project gets approved, it gets funding, then someone writes the business requirements for you. Then there is a layer of red tape. After that comes the architect, who designs the system—and whatever that person designs is written in stone. Then the technical team needs to implement it, and the criteria of success is meeting that design in the most precise way. Gone are those days.

The reason is that now the architect needs to be involved in the iterative process. When you are doing AI, you are trying new things, you are experimenting, and sometimes ideas will not work. So it means that the role of the architect is more dynamic in nature. As you move towards AI systems, the architect has to be involved in the pilot project; the architect may need to refactor, may need to redesign the data dimension or the compute dimension if they see performance bottlenecks. So the role has become more agile, but the need for the software architect is still there. It is very important—it has become more important than ever.

Let me give you a reason. A large-scale project is like building a home. In some villages, people still build houses without an architect. They have bricks, they have an idea—“let us build a room here, let us put a kitchen there”—and they just start. But in an organized way, an architect first plans: “OK, this is the room, this is the hall, this is the kitchen,” makes a blueprint, gets it approved, and then we start building the home.

Now think about this: if the architecture is wrong—let us say the bedroom was supposed to be on the ground floor because the owner has a knee problem and cannot climb the stairs—but that decision was not captured, and the bedroom ends up on the first floor, then you have a serious problem. You can imagine how expensive and disruptive it is to change the structure after everything has been built. The same goes for large-scale software architecture. The basic templates need to be decided before you build the system.

That is why there needs to be an architect who designs the large-scale components, and then someone starts filling in the details. Otherwise, you end up with very costly mistakes. If you look at some real-world stories—for example, JP Morgan—you will find cases where they designed their system and spent minimal time on architecture. They picked, for example, MongoDB, went ahead with their design, and eight months down the road they realized that this was the wrong choice. There was a loss of revenue, a loss of time, and this is something we want to avoid at all costs.

So the role of the architect in the age of AI is not going away. It is becoming more central: more dynamic, more involved throughout the lifecycle, and more responsible for making sure we do not build the “bedroom upstairs” when the user cannot climb the stairs.

14: What new responsibilities or skills—for example, understanding model behavior, data governance, or AI ethics—should architects cultivate now to successfully design and oversee AI-enabled software in the coming years?

Imran Ahmad: This is essentially about making yourself aware of what technologies are available and what is happening in AI. The architect should not treat AI as a black box or something that is “someone else’s job.” You should be able to understand, at least at a high level, what these AI components do and how they behave.

A key skill is the ability to choose the right AI components under given requirements: which model to use, what kind of data pipeline is needed, what kind of storage is appropriate, and how the compute dimension should be designed. You should be able to look at the requirements and say, “Under these constraints, this combination of components will work best.” That selection ability is very important.

Another responsibility is to understand the implications of AI decisions on things like data governance, security, and compliance. When you bring AI into the system, you are also bringing in new questions: how the data is collected, how it is stored, how it is used for training, how it is monitored in production, and how you make sure that you are meeting ethical and regulatory expectations.

So for many architects, this means retraining themselves in AI. For some, AI is a blind spot at the moment. Closing that blind spot is crucial: keep learning about AI concepts, stay current with the tools and patterns, and build enough understanding that you can make informed architectural decisions. You do not have to be the person implementing every model, but you should be comfortable enough with AI that you can confidently design, review, and oversee AI-enabled systems end to end.

To go deeper on designing robust, scalable AI-enabled systems—from integrating machine learning into existing architectures to managing risks like underperformance, cost overruns, and operational complexity—check out Architecting AI Software Systems by Richard D Avila and Imran Ahmad (Packt, 2025). Through a structured progression of architectural concepts, real-world case studies, and hands-on exercises (including a fictional AI-enabled system you can dissect end to end), it shows software and systems architects, CTOs, VPs of Engineering, AI/ML engineers, and developers how to select the right models and data pipelines, use architectural models to ensure cohesion, simulate and optimize AI performance through iteration, and apply patterns and heuristics to integrate AI into large-scale systems with strong user experience and performance—so you can confidently architect AI-driven products across a range of domains.

Here’s what some readers have said:

Architecting AI-Native Platforms in the Real World: A Conversation with Amar Akshat

Divya Anne Selvaraj — Wed, 19 Nov 2025 10:52:24 GMT

AI is already in the loop for writing code, reviewing changes, and even drafting architecture diagrams—but turning those capabilities into resilient, auditable, production-grade systems in regulated domains is still hard. In payments and financial services especially, architects have to reconcile non-deterministic models with deterministic guarantees around correctness, security, and compliance.

In this conversation, we speak with Amar Akshat—SVP of Architecture at Paysafe Group and author of the forthcoming book Decode the Compiler (Packt, 2026). At Paysafe, Amar has led large-scale modernization and AI-native transformation across payments, wallets, and compliance platforms. Earlier, at Apple, he helped shape the architectural foundations of Apple Pay and contributed to wallet and tokenization frameworks. His work focuses on making architecture itself intelligent—blending principles like CAP, Twelve-Factor, and Zero Trust with AI-driven reasoning and automation.

Over the course of the interview, Amar explains how his teams are bringing AI into the architecture loop through MCPX, ArchX, and “cell” architectures that keep analysis and decision paths safely bounded. We dig into when to keep workflows purely deterministic versus putting an AI in the path, how to structure data, guardrails, and system prompts as first-class design elements, and how to choose between modular monoliths and microservices for AI-heavy workloads. Amar also shares concrete practices around confidence-based routing and trust deltas, prompts-as-code and AI Behavior Reviews, prompt manifests as “Dockerfiles for AI,” cost control with “cache, batch, distill,” and vendor-neutral orchestration via protocols like chat completions and MCP.

Looking ahead, Amar reflects on the skills architects now need and how compiler-level thinking informs the design of AI-driven systems. We close with a preview of Decode the Compiler and why understanding what compilers actually do to our code can change how we reason about performance, optimization, and large-scale architecture.

You can watch the full conversation below or read on for the complete Q&A transcript.

Introduction

1: Can you give us a quick overview of your current focus areas at the AI–architecture intersection? Which lens do you think we should use today—compiler-centric system design or product architecture, and why?

Amar Akshat: Right now my focus is what I call agentic architecture—designing systems that can reason about themselves. At Paysafe, that’s embodied in MCPX, which you talked about.

And we have something else called Archx, which is basically an AI-native workflow that powers things like onboarding, fraud analysis, and observability, but also reasoning with systems and their own capabilities—such as Zero Trust and the CAP theorem. A lot of what we do today is about codifying architectural experience.

For example, we have trained internal AI agents to analyze architectural decision records, or ADRs, and suggest reusable design patterns, effectively learning from the scars of every project before it.

And when you talk about lens, it’s an interesting analogy you use with compilers and system design. I would want to use the system design lens. You see, architecture isn’t abstract for me—it’s very progressive, it’s very pragmatic, and it has building blocks like servers, data flows, queues, failure domains.

My work sits between things like compiler intelligence and distributed systems and their logic. So if you think about it, the compiler is just an early architect. It takes intent, it optimizes it under constraints, and it produces an executable structure—and that’s the same mental loop I want our AI agents to have when designing complex systems through architecture.

AI’s Impact on the Architect’s Role

2: I think you are probably one of the best people to ask my next question to, because you sit exactly at this intersection, which I think a lot of people are still trying to make sense of. So where does AI practically help architects now, and where is human judgment non-negotiable?

Amar Akshat: I get this asked a lot in my team as well. We have tools like Cursor, for example, or Replit Agent 3, or GitHub Copilot Workspace. They basically act as junior architects today.

They help me generate documentation. They suggest failure patterns from known premises and known previous experiences, and they help me validate deployment diagrams. For example, every good generative AI can create brilliant Mermaid diagrams. So it can start off as your starting whiteboard—where you throw in your constraints and components—and it’ll start with a basic Mermaid diagram that would take an architect a few hours to actually come up with.

At Paysafe, we are using AI during architectural reviews. It will ingest our ADRs from before, diagrams, and codebases, and then it will flag inconsistencies between what we said we would build and what we actually deployed, because it has the whole lineage of design from scratch all the way to deployment. So it can reason and tell you, with evidence, that this was the original plan to be deployed, this was the scale pattern, and what we ended up deploying.

Human judgment, on the other hand, still owns context, risk appetite, regulatory nuance, and product trade-offs—the politics between product and regulatory. That is all still owned by humans. The AI can propose, but humans prioritize.

I know that my business can reasonably make money in EMEA and Europe for now, so I will prioritize regulatory nuances of EMEA and Europe, and then put through my roadmap what will come in the Americas. So that is the beautiful mix between how humans and AI interact in architectural designs.

3: What’s a good mental model for deciding when to put AI in the loop versus keeping a purely deterministic path?

Amar Akshat: If a task mainly benefits from pattern recognition, that’s where putting AI in the loop makes sense. If it instead requires legal, financial, or compliance certainty, we keep it deterministic. I think of AI as a kind of auto-complete for patterns: it can look at data and say, “this is PII,” “this is PCI,” “these are the compliance guardrails you’ll need.” That’s where this kind of AI can work. The parts that demand strict, predictable behavior should stay more deterministic, and that’s where we sometimes choose not to use AI.

Architectural Patterns for AI‑Infused Systems

4: What baseline architecture patterns do you recommend for shipping AI features reliably?

Amar Akshat: The first one is the data postulate, the second one is the guardrail postulate and the third one is the system prompt package itself.

So when I say data, I mean what is the current state of the data that is made available by MCP servers. Such as transactions, such as user records, addresses, etc.

Guardrail is about making sure what is allowed to be done and what is not allowed to be done. Do you want to completely ground the system? Do you want to have fairly only deterministic responses or do you want to use the existing LLMs?

And then the system prompt is about saying, what is my input format and what am I expecting the results to be in? And what are the other nuances I want my system to take care of automatically? So, for example, do you want more deterministic performance or do you want more accuracy? Do you want more transparency or do you want more speed? These are the kinds of trade-offs that we encode into that system prompt package. This also includes things like Langchain and Open AI or Azure’s AI Foundry for RAG.

And we have our own prompt manifests for governance. So, each inference has a manifest attached to it, and that is published to a data plane. So you can imagine things like Kafka plus Fast API. And each inference is observable, so you can actually observe the latency, the accuracy, etcetera. That is the current model which works for us. Where it breaks is things where execution is critical, user experience is critical. If things need to be made quickly using judgment, then you cannot rely on LLMs. Then we deploy lightweight sidecar models. Things like Open AI Mini or Llama 3 local for shared and even fraud scoring, which has to be real time in a transaction. We try to do these things in a centralized fashion.

5: How do you decide boundaries for AI components such as separate services, sidecars or embedded libraries?

Amar Akshat: So it is all what I call an architecture based on cells. A cell, you know, is a human component which is the tiniest unit of life. So, all of our AI deployments are tiniest units of life with their own regulatory nuances within themselves.

So, for example, if I’m talking about a wallet cell, everything which can support the wallet—its guardrails, its prompt package, its MCP servers, RAG, plus its fine-tuned models—will also participate in that cell and it won’t leak any data. The idea is the critical path of analysis never leaves the cell boundary, and it only leaves the cell boundary for audit and storage purposes. That keeps us safe first of all, fast, and then deterministic. No other data is going to change the way my wallet cell behaves, for example. The same thing applies to payment execution, the same thing applies to transaction ledgers, and so on and so forth.

6: When would you favor modular monolith over microservices for AI and vice versa?

Amar Akshat: If shared memory and stateful context really matter—say in a conversational commerce system—a modular monolith with well-defined internal modules works best. Imagine two people asking what to buy for Diwali (an Indian festival) in different parts of India, but their shared history and the same product catalog matter for recommendations—that’s a great case for a modular monolith with clear internal boundaries.

If scalability and isolation are paramount—for example, in fraud detection—microservices tend to win, because AI workloads often oscillate between those two needs. Many of us think of this as the context–isolation trade-off: which is more important for your use case, rich shared context or strong isolation?

Reliability, Safety, and Testing

7: How do you design for correctness and failure isolation when models are non-deterministic?

Amar Akshat: We route by confidence, really. If the model’s output confidence is less than a threshold, it escalates to a deterministic rule-based system, or we bring a human in the loop. We use things like LangSmith and internal logging to track trust deltas per request. We have effective guardrails and fallbacks—prompt validation and schema enforcement; we use things like Pydantic. We are a big Python shop for some AI-based workflows, and we use Pydantic plus semantics and sanity checks.

So a human only steps in for logic failures, not syntax, really. And we have a comprehensive testing strategy for AI features. For example, one of my cohorts runs drift pipelines. They will evaluate daily by comparing outputs to gold datasets—datasets that are deterministic and known to be correct—and any semantic drift triggers a review. Basically, you have to look at AI prompts as code. That’s it. Our CI/CD basically treats prompts as code. Every change goes through the peer-review process, with automated regression and some kind of sandbox deployment to test these against the gold dataset.

8: You mentioned guardrails and fallbacks earlier. If you had to distill it down to a couple, which guardrails, fallbacks, or human-in-the-loop steps have been most effective in practice?

Amar Akshat: In terms of guardrails and fallbacks, first of all, as I was describing before, guardrails are learned. We learn these guardrails from execution. Every prompt package has a version, and with every new execution and failure we put in more guardrails.

For example, if the AI system ended up putting someone’s email address from the RAG into a response that was meant to be PII-sanitized, we will again augment the guardrail to include that sanity check. Those guardrails are implemented by tools like prompt security, which ensure that none of these guardrail filters let you pass the data back to the customer.

If you apply a middleware kind of concept like prompt security—or any of those use cases where you can apply these guardrail policies before the prompt goes into the LLM and before the response comes back to the user—you will have effectively masked your failure pattern.

Human-in-the-loop is usually very, very important when it comes to response quality. Every response has a confidence score, and if the confidence score goes below a certain threshold, a human will come in and try to analyze what was wrong. Was the data too noisy? Was there too much guardrail or too little guardrail? Or was it a format problem, right—did we come back with bad formats, like something breaking the CI/CD somewhere or changes, etc.? So the combination of middleware components like prompt security and the usage of guardrails with a human in the loop is very important.

9: Can you describe a sensible testing strategy for AI features covering eval data drift and regressions?

Amar Akshat: I think the testing strategy for AI systems is fundamentally about learning from mistakes. Similarly, we have to make sure the AI learns from its own mistakes. The idea is that we have to monitor things like semantic drift, hallucination rate, and related metrics, and you have to monitor them with real-world data in sandboxes.

And then you have to, first of all, come up with a reasonable notion of success for yourself. So let’s say you are dealing with a lot of complaints. You have an AI system which analyzes your complaints and makes sure they’re being handled correctly. You run it in sandboxes with masked PII so that you have a reasonable testing ground around them, and then per execution you look at things like their semantic drift, hallucination rate, and the trust delta, right?

Every pipeline will come back with these metrics, and those evals plugging into your CI/CD are very important because your prompt is changing—changing just like code on a daily basis—and your prompt changes can sometimes be exponentially impacting your determinism.

Observability and Operability

10: What production signals matter most in production for AI features?

Amar Akshat: Yeah. So these signals—basically everything we tested for—now start to matter in production as well.

The first is cost. We are a financial company, we have millions of transactions going through, and a small change in cost per transaction can exponentially impact our revenue or margins.

The second thing is hallucination rate. Each hallucination in something as deterministic as fraud analysis costs us money, because it can lead to incorrect decisions on transactions.

And then the third part is obviously things around the sanity of the whole system itself. You should be making sure that, as you introduce or tune AI, you’re not unintentionally impacting real transactions or degrading the user experience—you might otherwise be causing attrition in your user base. These things matter for us, and we monitor them very closely in our production systems.

11: How do you set up automated and human feedback loops to improve models or prompts without breaking user-reliable behavior?

Amar Akshat: Yeah, so feedback is pretty automated. The agent will log all low-confidence events, and a human reviews them, and it will relabel itself. And as I was telling you, prompts are versioned with Git tags so we can replay failures exactly. Because it is an agent, you can always augment asynchronous activities by itself. So what we have today is that every failure in them is then analyzed by a different model so that we don’t have model bias itself, right?

For example, a failure derived in an OpenAI model will now be reviewed by a Sonnet Claude model. And the feedback we obtain from there will be asynchronously applied to the OpenAI package, which went into OpenAI—the whole thing. And we then, over time, figure out what is working better for us. Which model is able to review the feedbacks and failures of a different model? And then we have these model couplings formed by that, and all of it is tracked via Git tags. So every release has a JSON in it which says, here was the analysis and scoring, and here were the recommended prompt changes or guardrail changes, applied it, and got this score as the final one. So auditability is incredibly important in our ecosystem because this is real data you’re dealing with, this is real developers’ time you’re dealing with, and then also sometimes you’re dealing with real transactional data. So we need to understand which particular change and which particular recommendation caused us transactional benefit across the feedback loops.

Data, Privacy, and Governance

12: How do you protect sensitive data while keeping AI useful?

Amar Akshat: That’s a great topic, and it’s at the top of every executive’s mind in the industry right now. We basically redact personally identifiable data before inference and use hybrid RAG, where private embeddings always stay in-house.

For example, we can use something like Pinecone Local, where it runs as a local instance and private embeddings never leave our environment. Public context is then fetched externally in a secure and deterministic way—for example, a regulatory change, the impact of that change, or human sentiment around a new law. Those external signals are handled in a more deterministic, controlled way.

At the heart of all this is our middleware. That’s where we apply these policies: even if you wanted things like sentiment or PII, it will not flow into the inference layer if we don’t want it to. All AI access is integrated with SAML-based authentication, so we know who is accessing it and can augment their prompt with their role, etc. On top of that, there’s a guardrail middleware where we always apply a particular set of rules based on their role and permissions.

So even if you accidentally put my email address into the prompt, it will be filtered out before it leaves the system. That’s where our middleware stack plays a huge role, along with lightweight governance around prompting. Our prompt manifest defines who owns the prompt, what its data scope is, and its safety rating. You can think of a prompt manifest as a Dockerfile for AI—basically, it’s auditable but still fast to work with.

And finally, for governance, auditability and traceability are paramount for us. We log every inference as an “architectural replay,” which includes things like model ID, prompt version, and data snapshot. That way, our compliance teams can reproduce any decision path deterministically.

13: What is your take on auditability and traceability for AI decisions when regulation applies.

Amar Akshat: Regulation is paramount. I’m dealing with EU regulation and the AI Regulation Act on a daily basis, and basically it goes back to how deterministic you can make your decision paths.

Our whole goal is that anytime anything breaks our determinism, or that score, we either chuck it out of production immediately or we treat it as a P1—like a priority-one incident, right? So any production workflow losing determinism at a given threshold will be treated as a production incident. It is no longer a developer playground or anything.

And because we are able to log the model ID—basically the architectural snapshot, the replay of it—we are able to log the model ID, prompt version, and the data. We can go back to the decision path and change any of these variables to make sure determinism can be achieved immediately. Our Ops teams are actually trained to do this on a daily basis.

Cost, Performance, and Vendor Strategy

14: How do you avoid provider lock-in without slowing delivery?

Amar Akshat: We try to stick to the protocols the market is standardizing on—for example, the chat completions APIs and MCP. These may start with a single company, but over time they become common practice across the industry. So we abstract orchestration through these well-known protocol APIs.

When I talked about MCPX, that’s essentially our multi-provider orchestrator across OpenAI, Azure, Anthropic, and our on-prem models. The reason this works is that all of them support chat completions–style APIs and MCP-compatible patterns. So as long as any external or internal AI provider follows those APIs and protocols, we’re fine.

On top of that, we put an AI gateway in front. Based on things like request headers or your SAML identity information, we can route you to an Azure model versus an OpenAI model, or to an internal model. That is how we avoid lock-in in practice while still moving fast.

15: If we talk about capacity planning and cost control, what has worked for you in terms of caching, batching, smaller models, etc.?

Amar Akshat: I think the mantra is very simple: cache, batch, distill. We use a tiny Llama for high-volume routing tasks and GPT-4 Turbo for design-time reasoning. So if it is dynamic data like customer support or architecture, design, etc., we stay with prompt engineering because, in that case, flexibility beats precision.

16: When do you feel training or fine-tuning is worth it versus prompt engineering when it comes to a foundation model?

Amar Akshat: I think if your domain is stable—maybe KYC or risk scoring—the signals are very well known, the domain is stable, and then we use fine-tuning, because it’s a very well-known, stable, signal-based domain.

And as I was saying before, if it is dynamic—if it is changing a lot—Spanish customers complain in a different way, English customers complain in a very sarcastic tone, and Indian customers complain in a very direct tone, sometimes in Hindi or regional languages. Then we stay with prompt engineering, because we have specialized customer teams who know how their customers complain and can create prompts more easily to manage those customer complaints. So yeah, that’s my short answer.

Team, Skills, and Process

17: What new skills should architects or senior engineers acquire in 2025 and beyond to stay effective with AI in the stack?

Amar Akshat: Architects and senior engineers must learn prompt literacy, model evaluation, and probabilistic reasoning. That’s paramount. You don’t need to train models; you need to design systems that can survive their uncertainty.

18: How do you adapt design reviews, ADRs, and incident response for AI-specific risks and ongoing learning?

Amar Akshat: Our design reviews have introduced a first-class concept called AI Behavior Reviews. We explicitly acknowledge that AI behavior is non-deterministic, and we treat that as a first-class part of the review process.

ADRs now capture prompt decisions and fallback strategies as part of the architecture record. And on the operations side, our SREs include an AI SRE role—someone who understands when it’s model drift, not code, that broke the system.

As I mentioned earlier, we’ve trained Ops people to understand the determinism profile of every AI pipeline. So now they can recognize that it wasn’t the code that failed; it was drift in the AI behavior—and they know when to switch off that pipeline or replace it with something else.

Case Study

19: Can you walk us through a recent AI‑related technical decision you’ve made: the options, the trade‑offs, and how you validated the outcome.

Amar Akshat: That’s actually a very good question. I recently had a very interesting case. We create wallet workflows almost daily, and one of my teams was tasked with designing the checkout experience I was telling you about earlier—for our digital wallet.

This is the same problem I’ve solved multiple times before, in products like Paysafe and Paysafe Checkout. So it’s a problem I know well, and I had a clear sense of where I wanted to end up. What we did this time was use an AI assistant to generate candidate designs and then critique its own designs whenever they broke our zero-trust rules.

Eventually it produced essentially the same Mermaid diagram I would have drawn myself. It compressed many years of my experience into about 35 minutes of assessment, and it did a beautiful job of reproducing that design while honoring the constraints: partition tolerance was paramount, zero trust was paramount, and it respected those.

Then we validated it against failure scenarios—almost like a chaos check. For example: what if the system crashed at point A, B, or C? Does the system remain deterministic? Is the integrity of transaction persistence still correct? As I mentioned, the AI kept iterating until all the constraints were met. Its initial few iterations achieved consistency but not zero trust.

Next time, I plan to pair it with a chaos agent of some sort to automatically explore failure domains, and we’ll see how that goes.

20: Are there any emerging patterns or standards you’re watching that could reshape how AI components integrate?

Amar Akshat: I think all of this starts from orchestration. You can look at things like OpenAI’s protocols, Google’s APIs, or Visa’s agentic commerce protocol—everything starts from orchestration. And when orchestration is involved, zero trust is involved. And when zero trust is involved, deterministic fallback is also involved.

You’re an orchestrator: you’re orchestrating tasks, and you cannot blindly trust anyone in the world. So you apply zero trust, and then you ask, “When something fails, how do I fall back?” That’s where workflow engines come in, and I’m watching patterns that bring those engines together with AI.

I’m especially interested in cases where ambiguity is not known until the ambiguity actually shows up. I don’t think the existence of ambiguity has been mathematically described yet—when will ambiguity occur? When ambiguity occurs, it’s obviously not a clean mathematical situation, but predicting when it will occur is still unknown to systems.

That’s why I want to see chaos agents enter the market—agents whose job is to disrupt AI workflows. Right now we live too much in “happy path syndrome,” where we assume the happy path is the only path that really happens in execution. That is not true; anything can happen, anything can fail.

Every design must still be explainable by a junior engineer, basically. And simplicity is still the ultimate scaling factor. That’s all.

Hot takes

21: According to you, which production metric most correlates with perceived quality?

Amar Akshat: The trust delta.

22: OK, and what’s the smallest useful model card or change log for shipped prompts or agents?

Amar Akshat: We use Microsoft Guidance, and Microsoft Guidance lets you treat your prompt as code. So even the commit messages become the smallest kind of change log that tells you what changed between two versions of a prompt. I would say commit messages, now.

Looking Ahead

23: What constraints or first principles do you feel keep AI projects grounded, and what will look obvious in five years from now about architecting AI-heavy systems?

Amar Akshat: So first principles still apply, as in, any AI project will still not break the CAP theorem. The CAP theorem, when it applies to the determinism of applications—distributed applications—will still apply. So you will have trade-offs when you want consistency and partition tolerance. Availability will suffer irrespective of whether a human or an AI is writing the code or designing the system, right?

So those first principles remain, and an app will still be judged by its Twelve-Factor App principles,. AI apps are no exception. They may be self-healing, but their app constraints are still Twelve-Factor. Zero Trust is a model defined to safely execute critical workloads in the world, and that will still continue to apply.

One thing AI will add is the ability to self-heal with ample data and context at hand, which is a great principle we should actually capitalize on and try to create systems which, over time, go towards determinism rather than away from it. And “fail fast” is still very important, right? If something is not working for you—if determinism is not there—we should fail fast rather than have our transactional integrity or our customers suffer.

Looking ahead, I think if all the architects are on the same page, we should start versioning and feeding our contexts back into the AI. All the ADRs should go into the AI. The codebase should be scanned and understood by the AI on a regular basis. And then we should keep ourselves honest whenever the AI tells us that our ADRs and our codebases have diverged, which means we haven’t been true to our architectural design, right?

That will allow our AIs to have even more context in the world, and then they can apply these contextual patterns to create any advanced AI system, right? Any advanced AI system will still have deterministic models and dimensions—it is still working under those same constraints of the CAP theorem, etc. These are solved problems in every nook and cranny of the world. We just have to bring them together in an architectural model—not just a conversation, but an actual architectural model out there—and then let it weigh in with you on your high-scale design as a senior architect.

About the book: Decode the Compiler

24: You’re working on Decode the Compiler. According to you, how does deeper knowledge of compilation or codegen inform how we design AI-driven systems today?

Amar Akshat: Actually, that’s a great question. I’ll start with an anecdote. When I was growing up, I read this book by Yashwant Kannadkar, called Let Us C++ , and I was taught that when you initialize a pointer—or when you allocate a pointer in C using malloc—you must always typecast it to the right variable type you’re using. I kept that in my head; it was my first education, and it stayed with me until I went into the depths of the GRU compiler at Apple, the Clang.

And I realized that I should not be doing this extra typecasting, because I am now telling the compiler what to do. The compiler knows what to do. It has seen your system, it has seen your code—however beautiful or ugly it may be. It has known your system constraints. It knows what to do, right? Let it do what it does best. The problem is we don’t understand what it does.

How a compiler makes your for loop efficient, for example, or makes your incrementing variable within the loop efficient, for example—we don’t know. Many of us don’t know that compilers will automatically make some variables register variables in C and C++, and it is very important for us to know that so that, when we are writing more advanced code, those design patterns can stick with us. And those same patterns we can apply in larger-scale habits.

In one way, compilers are trying to spoil us—trying to make us lazy—because they let us not take care of those finer performance details by ourselves and do it on our behalf, which is great, but then we are also losing that sharp curve of learning there. So my book is about understanding, from the compiler’s own output or the compiler’s own dump of what it has performed, what it has done on your code, right?

You shouldn’t be surprised. I think it’s a very, very interesting thing to learn—even for a simple for or while loop—how many performance improvements the compiler is making on your behalf. And that is what my book is all about: trying to decode the compiler’s kindness towards us.

25: Is there a personal motivation or vision that led you to, you know, make the decision to write this book at this point in your life?

Amar Akshat: Oh yes, of course. So when I was at Morgan Stanley and when I came into Apple, I was deeply involved in build and integration systems. I was deeply involved in deep compiler workflows—understanding common build failure patterns—and I was at the heart of a team which was basically accepting code changes from the entire Apple operating system developer base inside Apple.

So I was seeing these common failure patterns across, and I was like, I wish I could run a podcast and almost every week tell people that this is a very common failure pattern all of you have. It’s just that it is not well documented. And, you know, sometimes the compiler steps in and does it for you and things like that. So syntax failures—sure, the compiler will reject you. But the subtle efficiency improvements which the compiler does, or sometimes we as humans do to make a couple of integrations work correctly, were almost too beautiful for me to just keep to myself.

So I wanted people to understand—when students go into engineering college today, or write their first few C programs—that they should be surprised to see what is happening beneath the compiler, right? Even if a “hello world” just comes in front of you, what it took the compiler to do it for you is a beautiful experience I went through, and I want the world to go through that as well.

26: Who is the ideal reader for this book? According to you, who will find it to be the most useful?

Amar Akshat: I think the architects and the senior developers would be the most ideal readers, because they understand that when they look at how the compiler optimizes their code, they will be surprised and inspired. Those optimizations apply to us in real-world architecture as well,. You would realize that the compiler does so many things to scale your tokens, your token chunking, or to make your lookup of a particular data structure faster.

And those are the same patterns which we apply in our day-to-day architecture as well, like when we do caching or when we do streaming of tokens, etc. So senior developers and architects will be inspired. Junior developers and people who are upcoming in the market will be surprised. So it will also apply to them—to get surprised, beautifully surprised, flabbergasted, I would say.

Mastering GitHub in the Real World: A Conversation with Ayodeji Ayodele

Divya Anne Selvaraj — Thu, 06 Nov 2025 07:43:57 GMT

From secure collaboration and branch protections to reusable workflows and AI-assisted development, GitHub now sits at the center of how software gets built—and scaled—inside modern organizations. In this conversation, we speak with Ayodeji Ayodele—author of the GitHub Foundations Certification Guide (Packt, 2025)—about helping teams move from “using Git” to leading with GitHub: collaborating transparently, automating confidently, and protecting the software supply chain without slowing delivery.

Ayodeji is a seasoned architect, DevOps evangelist, and Agile coach with over 18 years of experience across Financial Services, Tech, FMCG, Manufacturing, and the Public Sector. He’s worked with CIOs and engineering leaders throughout Asia, Oceania, and Africa to drive enterprise adoption of DevOps and Agile practices—helping teams ship better software, faster. Currently a Senior Customer Success Architect at GitHub, Ayodeji partners with large organizations to align GitHub’s tools and workflows to real business outcomes—improving developer velocity, security, and collaboration at scale.

In this interview, we dig into what the GitHub Foundations Certification covers in practice, how to level up from issues and pull requests to governance with rule sets and quality gates, and where GitHub Copilot (and emerging agentic capabilities) can responsibly boost productivity. We also discuss inner source as a cultural shift, strategies for CI/CD that avoid pipeline bloat, and pragmatic approaches to secrets management, dependency hygiene, and build provenance.

Looking ahead, Ayodeji shares how AI is reshaping developer workflows, what skills will keep engineers relevant, and how to cultivate a documentation-first, asynchronous culture across time zones.

You can watch or listen to/download the full conversation below—or read on for the complete transcript.

1: What gap does your book, GitHub Foundation Certifications Guide, fill for today’s developers?

Ayodeji Ayodele: My book bridges the gap between knowing Git basics and truly mastering GitHub as a collaborative, secure, and scalable platform. Many developers know how to commit and push code, right? But sometimes they struggle with collaboration, automation, and security, so I believe the book helps with that. Secondly, the GitHub Foundation certification is a great benchmark, but in the past there wasn’t a practical, hands-on guide to help people prepare and apply those skills in real terms, given the fact that GitHub certifications in general are just barely two years old. So we don’t have that many resources out there. I wanted to create a resource that’s not just exam-focused, but also helps developers become confident contributors in any organization. So I wrote this book to help developers go from “I can use GitHub” to “I can lead with GitHub.”

2: Your book promises to take readers from fundamentals to advanced GitHub features—from better collaboration and project management to secure workflows and even AI-powered coding with Copilot. Now, as you said, many developers use GitHub daily but may not be leveraging features like issues and pull requests. How can even experienced developers benefit from this book, and what are some important GitHub best practices or features that even seasoned engineers often overlook?

Ayodeji Ayodele: Yes, you’re correct—even seasoned engineers often miss out on GitHub’s advanced features that can supercharge collaboration and code quality. For example, GitHub Copilot—our latest product—is a massive game changer in the developer world, particularly if you’re using it both within the IDE and on the github.com platform. So GitHub Copilot is not just for the IDE; you can use and benefit from the great values that Copilot brings even on the github.com platform. Not just that—it supports multiple IDEs, up to about six, which means you don’t have to learn a new IDE for some of those other AI products in the software development space today. You can bring those models and still use the development environment you already use today.

So Copilot supports multiple models—you can use all the GPT models, you can use the cloud sonnet models, and there is also the Google Gemini models as well, all within GitHub Copilot. And then there are the agentic capabilities that we now see as the future of AI in software development—whereby you can have this huge backlog of issues, assign those issues to Copilot, and Copilot will spin up its own separate environment, triage the issue, and write the code that fits the description to implement that issue. That may be a feature request, or it may be fixing a bug and things like that. And going down the line, there are so many other things coming out in the next few months around helping to improve code quality as well. There is also a feature called GitHub Advanced Security that helps people manage vulnerabilities, and you can also bring in vulnerabilities that are reported by other security tools and even fix them within the same platform. Those are things you can use today.

Then, in terms of best practices, we’ve got pull request summaries. In terms of improving the semantics for pull requests and titles, we have rule sets for protecting branches and for protecting workflows when you run them—there’s a huge number of different rules you can apply to improve governance and to improve CI/CD automated checks—leveraging issues for transparent communication and collaboration as well. Finally, mastering these tools elevates you from an individual contributor to being a team enabler. GitHub isn’t just a code repository—it’s a platform for building better software together.

3: As we know, open-source code now underpins almost everything. Given this ubiquity, what advice do you have for developers to harness open source effectively?

Ayodeji Ayodele: Open source is the backbone of modern software. Contributing is the best way to learn—contributing to open-source projects helps you grow and give back to the community. In fact, roughly 50–60% of software today is built in open source or on top of open-source components and libraries. So open source is integral to how we build software in the world today. I’d say start small—fix typos in public repositories and improve documentation. You’re just like everyone else, and the GitHub platform is home to over 150 million developers across different skill sets and interests, so you’ll always find a space that fits you. If you’re worried your contribution won’t meet standards, there are tools to help. GitHub Copilot—free for open-source projects—can suggest or improve code; after you’ve written code, you can ask Copilot to review it for standards.

We also have GitHub Advanced Security, and many of its security components are free for public repositories. GitHub takes its role as the home where the world builds software seriously, so we provide security and AI-powered tools to open-source communities at no cost. Beyond that, there’s the GitHub Community Discussions space where people suggest improvements to GitHub itself—join in and learn by doing. Open source is a two-way street: you give, you learn, you grow.

4: How can engineers get involved in open-source projects on GitHub while balancing the risks and rewards of depending on community-maintained code?

Ayodeji Ayodele: Rephrasing that: if you want to get involved, start at github.com/explore. You’ll also find /trending, where you can see repositories gaining popularity—sometimes a new repo skyrockets because it’s exactly what everyone was looking for, whether a library, a design template, or a scaffolding component. There’s no judgment—you’re just like everyone else among 150 million developers on GitHub, spanning all experience levels and interests. About the risks: you may worry your contributions aren’t up to standard, or fear embarrassment. There’s no shame.

Use GitHub Copilot—currently free for open source—to suggest or improve code, and even ask it to review your code for standards after you’ve written it. Plus, GitHub Advanced Security offers many features free for public projects. As part of our commitment to the open-source community, we provide these security and AI-powered tools free to help you get started and build with confidence.

5: How can the principles of open source—such as open collaboration, transparency, fork and pull—be used within companies to improve teamwork and code reuse?

Ayodeji Ayodele: Bringing open-source practices inside companies—what we call inner source—breaks down silos and accelerates innovation. Transparency, forking and pulling workflows, and opening discussions all drive better code and teamwork. If companies are looking to get started, the InnerSource Commons website has very useful resources; my colleague Yuki in Tokyo is involved there.

To mitigate common issues like unclear ownership, be transparent and make it easy for people to contribute: add a CONTRIBUTING.md file with clear guidelines, explain what the project is, what it does, and where help is needed.

Use GitHub Discussions internally so people can collaborate and ask questions, and look to the GitHub Community for inspiration on how a discussion forum works. Leadership support matters, too—secure buy-in and celebrate contributions, whether at the community level or with incentives like badges or prizes. Inner source turns every developer into a potential innovator, not just a code consumer.

6: Have you seen any challenges in adopting this inner source model, and what strategies can help overcome them?

Ayodeji Ayodele: Resistance to change is common—people feel comfortable with the status quo.

Start by explaining what inner source is and keep the environment judgment-free so everyone feels supported.
Provide clear guidelines on what can be done, and communicate—over-communicate—so no one is surprised by the rollout.
Keep everyone in the loop on what you’re doing as a program, when you’re starting, and how you’ll deliver the change.
And don’t underestimate leadership involvement; having leadership support is critical to driving the change internally.

7: Modern software teams automate extensively. How critical is it for developers to integrate CI/CD pipelines and automation into their workflow?

Ayodeji Ayodele: CI/CD is one of my my favorite topics. CI/CD is hard. For example, you can introduce pipelines. In GitHub we have 3—one for the build phase, one for the test phase, and one for the deployment phase. And so when you think about CI/CD, you think of a process and a practice or a methodology that helps you automate all of those steps from the point where you collect requirements and analyze what needs to be developed. Then you build. You want to build and test as you build. So test-driven development—if that’s the model you follow—will require that you build your test before you write your code. And sometimes what we also do is that you write your code and write your test, depending on your methodology—whether it’s an agile methodology or something like that. So, for me, CI/CD is a practice and a set of standards that you want to have within your organization that helps you to be able to automate all of the very tedious, boring, repetitive tasks that you would ordinarily have to do by hand.

And when you think of CI/CD, you want to think about the entire journey of software development— so from the point of having an idea, building a prototype, and then having a feature request that you want to design, you want to test that feature, you want to deploy it and make that feature available to your customers. That’s the journey—it’s a life cycle. And when you think of life cycles, you’re thinking in terms of product management. You’re thinking about getting your ideas and turning those ideas into a reality for customers, for your users. And if you are going to really do that at scale, you cannot do it by having people on your team run commands manually on their terminal—copying configuration files here and there. You want to eliminate the room for human error. You want to make sure it’s repeatable. You want to make sure that the process that you follow in Team A is the same process Team B follows.

So CI/CD allows you to have those standards built so that your teams are able to build with confidence, and you’re able to ship more frequently. So the more frequently you ship a feature, the faster you are going to meet customer needs. And so CI/CD gives you that leverage to repeatedly ship features to your customers and not have to wait so long just because you’re following a very careful manual process. And the second thing I want to say is that CI/CD helps you to test early, test often. When you test early, you reduce the cost of the bugs—you reduce the cost of fixing them because you can discover them very early in the production life cycle. And when you do that, it’s cheaper to fix as opposed to discovering a bug after you have launched a feature to production. And then the last thing about CI/CD that I want to highlight is that you want to make sure that it’s repeatable. And so you want to write your configuration as code so that you are able to repeatedly do the same thing over and over again and in the shorter time possible. And that just lets you to ship faster and safer.

There are a number of things that you can do to improve the CICD pipeline. You can introduce automated quality gates within your pipelines so that there are quality checkpoints to prevent low-quality code from going into your main branch. And for that, for example, in GitHub today, we have code scanning tools—you know, GitHub Advanced Security is available to scanning tools today. And that helps to set up a rule set that helps you to scan your code for vulnerabilities. And you can then block a pull request from being merged based on that. You can also block pull requests by checking for tests, checking that your tests are passing, checking that your build pipeline—like the build process—completes and is successful. And then checking that the deployment pipeline meets all of the criteria or standards required before you actually deploy to production.

And so once you have those things set up—essentially a very solid, repeatable process with automation and with quality gates in place—you are going to be able to confidently ship production-grade software again and again. And when you centralize CI/CD—like, for example, you can have it built as reusable workflows—you are able to introduce them across all your repositories whenever you want to merge a pull request, and you know you have a central place where you can manage the quality gates, the rule sets, and the different things that need to be checked across your organization. This gives you a very repeatable process across the organization—across different business units—and you’re able to have that single pane of glass that lets you to see every single repository and the quality within them. So CI/CD is integral to software development today—I would say it’s an integral part of software development, yeah.

8: What guidance would you give for using tools like GitHub Actions to streamline testing, integration, and deployment, and to avoid “pipeline overload” while maintaining software quality?

Ayodeji Ayodele: I would say start with one of the most important points around deployments—which is continuous deployment. If you’re working in an organization where you are able to build a great testing culture—so you are building both unit tests, and you have automated tests running end-to-end, and you are able to bake quality into your check-in process—then you can enable continuous deployment so that you have frequent deployments to production without, you know, requiring someone to push a button after every merge into main. And the reason is that you have discipline, and you have the culture and processes to ensure that they are repeatedly passing. So when a pull request lands on main, the deployment happens automatically—so that is one.

And another point is that release gates help with staging: having an environment that lets you validate in a staging environment before you deploy to production. That’s really helpful when you have many teams and the product is really important and there are key delivery timelines, so release gates make sure that when you are in the staging environment it’s actually very close to the production environment. You can do some scale testing and even chaos testing in that staging environment. Now, in terms of avoiding pipeline overload, what I’ve seen is that when teams begin to add more and more and more—like, you know, “Let’s add this because it’s nice to have”—you’ll quickly get to a point where your pipelines take a long time to run. That means you’re blocking people from shipping because you now have this huge, long pipeline. So what I would say is that you want to keep your pipelines small in terms of the number of steps for a particular task. And when you break down your tasks—like, for example, your unit tests are separate—you want to break integration tests separately, and then you have performance tests and other tests that you want to do. And by breaking these things down into a pipeline for a particular purpose, you are able to keep them short and small, so that when you are running the pipeline for unit tests, it’s not including all the steps that you need for, say, integration tests.

So you want to break them down, and you want to make sure that they are not blocked by each other. For example, let’s say you want to run a build pipeline, and then you also want to run unit tests and integration tests. You should be able to run them in parallel such that when your build is done and the unit tests are done, then you can do code coverage and other types of analysis. And then you can go ahead and, you know, do the integration tests while, you know, the unit tests are done—rather than having them chained where you have to wait for one to finish before the other can begin. So run in parallel—parallelization lets you to be able to get things done faster. The other thing I would say is: cache artifacts between steps. When you are building artifacts, you want to be able to reuse them easily between pipelines or even within a pipeline. And so when you cache artifacts, it saves you the time it takes for a build to start from scratch every time.

For example, in Node.js, you can cache the dependencies; in Python, you can cache the dependencies; in Java you can cache dependencies—you can cache them across runs. And this makes sure that if there’s a small change you’re making, you don’t have to start from scratch. And then, finally, when you’re thinking about CICD, you want to think about making it reusable—so reusable workflows. That way, when you have standard, common steps, you can reuse them across teams. And then also make it modular. So when you have a task, you want to have a complete task that—once that task is completed—then you can run the next task in parallel or the next task that depends on the outcomes of the previous tasks. And so with all of that, I would say, yeah—automate the boring stuff. Focus on what makes your products unique.

9: In your experience, what branching strategy works best for teams on GitHub?

Ayodeji Ayodele: I have a preference, so disclaimer first: this is an opinionated preference. For me, I prefer trunk-based development. A lot—because with the trunk-based development model, your history stays clean and you have only one single branch at every point in time. And you might ask, what if I need to go back in time and roll… the Git flow, which is always, you know, very good… the best branching strategy is the one that your team understands and they can follow consistently. Gitflow can work for… you know, complex release cycles and… and you want to have different release version numbers, and you jus… consistency and clarity matter, that’s—uh, what I would say, yeah.

10: Now we know what you prefer personally. But for teams—what would you recommend for them? Would it be trunk-based development or Git Flow?

Ayodeji Ayodele: Yeah, I would say trunk-based development… most—that’s your speed because you… you know that you know how to… You can quickly make quick changes, validate… collaborate. So that is particularly important for teams, yeah.

11: Let us talk about the pros and cons of branching strategies. Could you quickly summarize what trunk-based development is best suited for versus Git Flow?

Ayodeji Ayodele: If you’re building software and want to release features more often—say, once every day—with frequent changes to production or very short sprint cycles, trunk-based development is the best approach. If you tend to have multiple release versions in production—or you have the kind of application where customers expect separate versioned releases, then Git Flow is good so you can keep those separate branches of the same codebase and use them regularly. So yeah, that’s what I’ll say.

12: With supply-chain attacks on the rise, security is a huge concern. What steps should developers and teams take on platforms like GitHub—enabling two-factor authentication, etc.—to prevent such attacks?

Ayodeji Ayodele: Security is everyone’s job. Every role needs to consider security as very important—not just the security architect. Every developer needs to factor security into the code they build, so use the built-in tools you have on GitHub to protect your code at every stage. For example, at the development stage, GitHub push protection helps block secrets from leaking from your IDE when you’re about to push to the remote—so that helps even before you get to CI/CD. When you’re about to merge, there are rule sets to protect code—for example, from accidental deletions of branches—and from dependencies and vulnerabilities. You can run vulnerability scans out of the box with a single-click default setup for code scanning. For secret scanning, you can scan code at rest or even code in PRs.

There are also security configurations you can apply to different sets of repositories—and you can have code scanning run on a predetermined frequency, whether weekly or monthly, across the organization. There’s also Dependabot, which looks at your dependencies and recommends updates; it tends to keep dependencies up to date and even opens automatic pull requests you can review and merge. For software supply chain integrity, you can implement build provenance using SLSA (a build attestation standard)—GitHub Actions is always on SLSA Build Level 2. Within GitHub, you can generate attestations to prove dependencies are valid—like a blockchain of your deployments—to show nothing was tampered with; you can store that on GitHub or as an artifact in Artifactory or other external package managers.

Definitely use two-factor authentication. GitHub supports that, and you can use single sign-on with Okta or Azure AD. You can also use Teams to manage roles, create custom roles with different permissions, and make sure every commit is signed and verified to add confidence that the person changing the code is the right person.

13: How do Git and GitHub enable effective asynchronous collaboration across different time zones and teams?

Ayodeji Ayodele: Yep, like you rightly hinted—yes, I work remotely. In fact, Git… for asynchronous teamwork. It helps. You don’t have to be in a meeting to make changes on GitHub, and clear communication is built into the platform. You can use GitHub Projects for planning. You can track status for each task—who’s working on what—and then it also provides a timeline view where you’re seeing the timeline for the different tasks. That helps when you have teams in different parts of the world, even people in the U.S. and [elsewhere]. GitHub Issues is that central place where we collaborate on a particular issue—you can create an issue to capture an idea, a bug, or a feature request. You can tag people, add labels that are customized to the way your organization works, and then you can create task lists within the issue. So it helps pull requests as well—it comes out of the box.

Pull requests is one very fantastic feature of GitHub, and I think… Peter… You can also use GitHub Discussions for discussions that are not specific to a particular issue or pull request—discussions are really trackable. You can also use discussions for social collaboration. Yep, these are the different features I can think of now. And then there’s documentation—wiki pages and README files. You can also create road map views and use that to collaborate on different projects.

14: Are there any secret tips you’d like to share—little things we might have missed—about GitHub collaboration in a remote-team context?

Ayodeji Ayodele: Yeah, absolutely. And the fact that you work with different people across different time zones means documentation becomes very important. Even though they’re developers—and we’re geeks—sometimes we just want to write in, you know, shorthand and short comments. But if you’re working asynchronously, you want to provide context in your issues and pull requests so people can understand the “why” and the “how” behind changes without a meeting. Use templates for issues and PRs, and follow a consistent convention for titles and descriptions. Use labels and project boards to make status clear at a glance. Encourage code owners and reviewers to leave actionable comments, not just approvals. And you want to have regular check-ins with your team—maybe you have a sync once a week or once in two weeks—so people feel connected while still relying on async work.

15: You have a background in DevOps and change management—how do you see platforms like GitHub influencing team culture and process?

Ayodeji Ayodele: GitHub is a fantastic tool for collaboration—especially when you want to bring people from silos to becoming more collaborative. In the development world, we call GitHub “social coding” because you’re writing code and working together with people. There’s transparency: when you’re making changes, you’re seeing what others are doing, and others can see what you’re doing—that transparency is really important for providing feedback. When you’re reviewing code, you can add inline comments, and that feedback can also drive continuous improvement. When you put automation in place, it saves people time, and they can use that time to work on bigger problems—so having that automation helps teams become more collaborative. People feel this developer happiness when they use GitHub in a transparent, collaborative way. Collaboration is a very core pillar of how the platform is built and shaped. Yeah.

16: Are there any common pitfalls teams should avoid when integrating GitHub into their DevOps workflows?

Ayodeji Ayodele: Over-customization is one I see often. Sometimes platform engineering teams or DevOps engineers want to customize everything, and that can take away from the standard way of doing things. There’s overhead for you to maintain the customization, and overhead for the people who have to use it. You want to reduce that so your end users and consumers can use your application or software more easily—so avoid over-customization. Neglecting documentation is another. I’ve seen people create pull requests with great changes but little or no context.

Today, with GitHub Copilot, you can easily summarize the changes you’ve made in the pull request—beautifully—so try not to neglect documentation. Also, skipping retrospectives is a pitfall. Retrospectives help you look at what you did well, what you didn’t do well, and where you can improve—so don’t skip them. Over-customizing your platform, neglecting documentation, and skipping retrospectives are common pitfalls—and culture matters. The right tools can change what’s on your menu.

17: Let’s talk specifically about Copilot and the future of coding. Is AI-assisted coding a boon to productivity—that’s the debate. From your perspective, how is AI changing the day-to-day work of developers?

Ayodeji Ayodele: I’ve been in the industry for about 20 years writing software, and I’ve never seen anything like this before. It fundamentally improves and changes the way we write software today, and it’s not just a buzzword—AI has come to stay. We’ve seen people reduce the time it takes to introduce new features; we’ve seen improvements in the quality of the code as well—higher test coverage and fewer vulnerabilities when they scan the code.

On GitHub, we build GitHub on the GitHub platform—Copilot is the number one contributor with the highest number of contributions per week and per month today on the GitHub platform. We believe in the platform and in what AI has come to improve. The agentic capabilities I’ve seen today—if I had them coming up earlier in my career—I would achieve a lot of great things.

So if you’re not using AI today, you’re likely playing catch-up, because AI allows you to move at a much faster rate and safely, in a secure manner. AI is now the pair programmer. You’ll be seeing agents—so you can assign some complex or boring work to an agent within GitHub, the GitHub platform. You can have multiple issues in your backlog, and an agent can hand off from one agent to another, and GitHub Copilot has different agents. We will also be releasing many new agents, so you’ll have agents as your pair programmer or even your teammates. In addition to a development team of humans, you’ll now have these agents alongside the team, doubling and tripling their output and their throughput compared to those who don’t have AI today, yeah.

18: Do you foresee AI assistance becoming a standard part of development? How should developers—especially junior engineers—take advantage of tools like Copilot while continuing to hone their coding skills?

Ayodeji Ayodele: GitHub Copilot not only helps you write code—it helps you understand the code. You can ask GitHub to get a better understanding of the codebase. Let’s say you inherited it from a senior developer; Copilot can help you understand the different components of the code and what it thinks the code does. It can explain concepts for you—coding terminologies and development practices—and even identify components in the stack you use: “These are the components you use here.” You can ask questions such as, “How can I test-run this?”—and GitHub Copilot can help you go through that flow. Secondly, it helps with prototyping. You’re given the requirements, and you need to quickly prototype and experiment—an important task many people underestimate that developers do today.

From translating ordinary business needs into what software should do, Copilot can help with ideation, brainstorming, and prototyping as well. These are very good areas where a junior developer can really benefit. It also ensures the code follows certain coding practices, which means a junior developer can work as if the person is an experienced senior developer—because they have that assistant by their side. So you can solve problems with AI, not just write code.

19: Let’s talk about caveats a little bit. Can Copilot negatively impact code quality? What is your take on this?

Ayodeji Ayodele: Oh, that’s a tricky one. Yes and no. Yes—in the sense that if you don’t know what you’re doing and something goes wrong, it will be hard for you to understand the code base and figure out how to fix it yourself. And if you find yourself in a remote area or you don’t have internet access, or you’re on a system where AI is not allowed—say, you’re building some, you know, covert, highly secure environment software—how will you be able to cope? So you also want to make sure that you understand the language it’s written in, so that you can triage some of those things. And in terms of whether AI in general can introduce bugs in the code—there are times when, if there is no good prompting, Copilot can build (or AI tools can build) code in a different way, because the model has its own preferences. You can read up on how to write good prompts for AI tools—for GitHub Copilot.

Then there are times where you can say, “I want you to do it in this particular way, and I want you to use these libraries,” because typically there is more than one way of solving a problem. If you have a kind of library that is homegrown or approved for use, you can create what we call Copilot Instructions to instruct Copilot to write code in a way that is accepted by the organization you’re working in. And whenever it introduces bugs, there is a review agent you can use to review the entire code base itself and review it against best practices—and even against your internal standard practices—within an enterprise or a team.

20: How do you feel teams can use AI coding assistance responsibly to ensure the generated code meets quality and security standards?

Ayodeji Ayodele: Good question. Not all AI coding tools are the same, and the dependency models you use can determine what kind of responsible use is available. For example, GitHub Copilot—being part of the Microsoft ecosystem—has multiple layers internally for the responsible use of AI. First, it looks at the kind of prompts you’re sending, checks them against responsible-use standards, and sanitizes them. When it sends back the code, it looks at that code to be sure it’s a responsible use—making sure there are no personal data leaks or similar risks—within the system itself, depending on which AI system or product you’re using. And for the human being, I would say always review—and then use automated checks—because the volume of what AI will be contributing to your code will increase.

That means it will be time-consuming to review everything manually, so you want to reduce that burden with automation while still reviewing. Make sure a lot of the review work has been done in advance with automated checks, and treat Copilot as a collaborator, not a replacement—that’s why GitHub calls them “co-pilots,” not “the pilot.” Someone is still in charge, driving what it does, yeah.

21: Continuing to talk about AI and its impact—The latest Stack Overflow developer survey shows a paradox – nearly 80% of devs are now using AI tools in some form, yet only ~3% highly trust the answers from AI. In fact, 75% of developers say they turn to a human colleague when they don’t trust an AI’s answer. How do you envision the collaboration between developers and AI tools going forward? For example, will coding become more about validating and refining AI-generated solutions?

Ayodeji Ayodele: Yeah, I think I’d be keen to see that report and see, you know, what tools they’re using. It would be good to see a breakdown of that.

Because I feel it may not be the same experience for every tool. With that said, this revolution itself—right? Oh, you know, this is another time in human history when you look back and see that a huge change has occurred. This is another change. And there is, as expected, a bit of resistance when change comes. Some people have not used it, and some people who used it don’t understand exactly everything it does and how it runs in the back end. And it’s difficult to trust what you don’t know—how it runs or how it works. You may need to understand the design and the architecture of that AI tool to know what’s going on under the hood.

And many of these AI tools also give flexibility to configuration—so you can configure what the AI is able to do, and that way you can control and decide the way the AI works and what it’s allowed to see. For example, with GitHub Copilot, there’s a set of files you can exclude so the AI will never look at those files—maybe they’re secret files and things like that—and it will never look at them, even in your codebase. So this helps. And then knowledge sharing as well—some people love it because they’ve used it very well and they’ve seen the impact it has had on their productivity. So knowledge sharing will help the community; it will help people balance up and understand things better—how some of these tools work. So, yeah, I’ll encourage knowledge sharing, and then understanding the design and the architecture and the documentation on how it runs.

22: That’s some really good advice. But how do we ensure that junior devs still learn critical thinking instead of blindly accepting AI output? Because that’s a genuine risk the community is facing at this time.

Ayodeji Ayodele: Yes—I would say use AI as a learning tool, not a crutch. Use AI to understand and know things better. And you can also use AI to augment knowledge. At times, knowledge is scattered within an enterprise in different sources, and it’s hard to have access to—or remember—all those areas—of course we have other platforms like SharePoint, Jira, ServiceNow, and things like that. Use AI to augment and consolidate this knowledge base so that people can have a richer source to derive information from. Consolidating the knowledge base can really help.

And of course you can also have meetups; they can really help to improve knowledge sharing. You can also come and showcase what you’ve built, and people can, ask questions to—maybe, you know—test your assumptions for the software you built, and that can help. AI itself can help in the scaling of that—in bringing all of it together. In building that, you can also capture notes and conversations and improvements, suggestions, and interpret them back into the code. Or document those comments and feedback in your repositories—AI can help with that for you—but humans have to ask the right questions, yeah.

23: There’s a lot of debate about whether AI makes people more productive or makes them worse—you know, more dependent on them. And I think at this point it’s a bit pointless to go into that debate. But complex, creative problem-solving is still uniquely human. In the same Stack Overflow survey, roughly 40% of developers said that AI tools performed poorly on complex tasks. Having said that—since the space is moving fast and there are developments and improvements—given AI’s current limitations, what uniquely human skills should developers focus on strengthening now to remain relevant?

Ayodeji Ayodele: First and foremost, there was that comment about AI not being able to perform complex tasks. In the last three months, I’ve seen a major change. The models have evolved and people are beginning to say, “You know what, this is going really magical.” So I’ve seen some AI models out there that can perform really complex tasks. There are some models that take a shorter time and just give you—on the fly. So there are different strengths to different models.

So yes, AI can help with complex tasks, but AI can’t replace creativity. AI cannot replace empathy. It cannot replace problem-solving. These are innate skills for humans. I know I’ve seen some people trying things like that, but I don’t think they can ever be like humans. So I don’t think AI will replace human beings, you know. So you want to focus on creativity, on communication, on design thinking, and, you know, adaptability as well.

24: What core competencies will define a successful developer in the next 5–10 years as tools like Copilot evolve?

Ayodeji Ayodele: So the developer role is evolving—more than just writing code, it’s about understanding systems, collaborating across functions, and adapting as tools change. You want to build strong fundamentals, but stay curious and keep learning new paradigms, frameworks, and practices as they emerge. Communication and collaboration matter a lot—being able to explain your thinking and work well with others. So the most valuable skill in tech isn’t coding—it’s learning.

To build practical mastery of Git and GitHub— from version control basics to collaborative workflows, secure automation, and AI-assisted productivity—check out GitHub Foundations Certification Guide by Ayodeji Ayodele (Packt, 2025). Through step-by-step labs, real-world projects, and exam strategies, it helps you prepare for the GitHub Foundations certification while adopting best practices for issues and pull requests, GitHub Projects, privacy and security controls, and GitHub Copilot—so you can level up your skills and ship better software, faster.

Here’s what some readers have said:

From Enablement to Reliability: How Platform Engineering Aligns with SRE Goals – A conversation with Sean P Alvarez and Ajay Chankramath

Sushma Reddy — Thu, 25 Sep 2025 09:06:27 GMT

Platform Engineering is often confused with Site Reliability Engineering (SRE) or seen as the latest rebranding of DevOps. In reality, it represents a distinct shift: treating internal platforms as products, designed for adoption and developer experience. In this conversation, we speak with Sean P. Alvarez and Ajay Chankramath—co-authors of the forthcoming Platform Engineer’s Handbook (Packt, 2026)—about where SRE and Platform Engineering converge, where they differ, and why collaboration between them is essential in today’s organizations.

Sean P Alvarez is the Chief Technology Officer of the Life Sciences business at Brillio, where he leads engineering teams and advises clients on cloud strategy and platform modernization. With over 15 years of experience in regulated industries and consulting, he specializes in applying Platform Engineering principles to drive enterprise-scale transformation.

Ajay Chankramath is the Cofounder and CEO of Platformetrics. With more than 35 years of global experience, he has led platform engineering at Thoughtworks, Oracle, Broadridge, and Xilinx, and is recognized as a Platform Engineering Ambassador and Team Topologies Advocate.

Together, they are writing The Platform Engineer’s Handbook on building secure, developer-focused platforms that streamline modern software delivery. The book takes a hands-on, “build first, clarify as you go” approach—guiding readers from source control governance and Kubernetes runtimes to observability, self-service onboarding, and AI-augmented tooling. Built for practitioners, it equips engineers to design platforms that scale without disrupting delivery.

You can watch the full conversation below—or read on for the complete transcript.

1: Sean, Can you take us back to the moment when you first realized you were doing what we now call Platform Engineering? I imagine the term might not have existed yet, but the work was already there.

Sean Alvarez: It’s a great question, and thank you. I’ve always worked in somewhat regulated industries—life sciences and the financial industry. In those industries, the release process tends to be cumbersome. There are a lot of compliance issues, a lot of sign-offs that need to happen, and a lot of double and triple checking to make sure things won’t go wrong or that auditing won’t get messed up.

When I first started working with companies that would burn deliverables onto CDs and move things back and forth, I tried to introduce automation, but there wasn’t much trust in it at first. That led to centralized teams who stayed up all hours of the night doing deployments over and over again. There really wasn’t trust to move to something like what we’d now call DevOps.

At the same time, there was a desire to speed up development as the industry matured and as more start-ups entered the space and moved more nimbly. We wanted to make the deployment process—and the SDLC overall—less of a dreaded gate. Instead of developers thinking, “Oh no, I need to submit this to the DevOps team or the Operations team,” we wanted to turn it into an enabler.

To do that, we had to work across silos—security teams, networking teams, compliance, even upper management and governance—to put automation in place. That’s really where Platform Engineering took off for me: ensuring deployments were safe, compliant, and reliable, while allowing developers to move faster and giving the organization confidence that releases would go smoothly.

2: Every company has its own unique engineering culture. In your journey across different organizations, how have those environments shaped your understanding of what Platform Engineering really is?

Sean Alvarez: As I moved into more of a consulting role, I worked across industries and saw more organizations that operated closer to start-ups—moving fast even at the risk of breaking things. As the industry matured, I realized Platform Engineering wasn’t just about enabling speed. Developers also had to want to use it.

When you have individual full-stack teams in control of their deployments, Platform Engineering can look like extra work—just another backlog item or another check. If the platform feels forced, adoption won’t happen. Instead, it has to be something they want to use.

It’s not a “build it and they will come” situation—it has to be product-oriented. Just like external products compete for market share, internal platforms have to attract developer adoption. If we think of engineers as internal customers, that adoption becomes the real measure of success. It’s where the interests of leadership and developers align, and that’s when Platform Engineering really becomes powerful.

3: Ajay, in your journey across organizations, how did different environments shape your understanding of Platform Engineering?

Ajay Chankramath: My journey has been slightly different, but not too dissimilar to what Sean described.

Many years ago, I had a role supporting developers, though I was a developer myself. My first job involved building complex algorithmic software. But I noticed that developers—including my own team—always encountered friction. They couldn’t get things done the way they wanted.

Back then, roles were structured differently. There was a clear divide between developers writing code and then “throwing it over the wall” to another group—experienced engineers who built the product software. I started looking at this challenge through fundamental principles of software design, like loose coupling and high cohesion, which have been around for decades.

The first thing I did was identify common developer pain points and create reusable components to improve productivity. That led to a patent we called ROMS—Reusable Object Modules. This was 25 years ago, before DevOps, SRE, or Platform Engineering existed. Looking back, ROMS was essentially a fundamental building block of a platform: reusable capability packaged as a library.

We presented it at conferences, and that became my first foray into what I’d now recognize as platform thinking. The spark came from applying core software principles, which is probably the theme of today’s conversation—how principles extend into SRE and Platform Engineering.

Eventually, leadership asked me to build a team around this work. That transition took me from software development into support-related activities, which we then called Software Productivity Automation. We didn’t have today’s terminology, but the idea was similar.

Over time, working mainly in large companies, I saw the federated model in action: a centralized set of services with individual teams building on top. It’s a strong model because it gives smaller teams autonomy. But the downside is that centralized teams face huge backlogs. That almost never works out, and so the real question becomes: are individual teams able to be self-sufficient?

With the advent of public cloud, that’s changed significantly. Over the last 10–15 years, cloud services have made it much easier for smaller organizations to adopt these practices and integrate them into their ecosystems.

4: What are some of the biggest misconceptions you see today when teams practice SRE and Platform Engineering?

Ajay Chankramath: One of the biggest misconceptions is that constructs like SRE, Platform Engineering, DevOps, or DevEx are simply newer terms replacing older ones. That’s absolutely not the case. Each has its own role in the larger scheme.

We emphasize this in our book: DevOps is not a team and should not be a job title. It’s a cultural paradigm—about improving collaboration and communication across the SDLC.

SRE is about applying software engineering principles to operations to create highly reliable production systems. DevEx—developer experience—has always been there. It’s about how developers interact with tools, frameworks, and processes throughout the SDLC.

If you define these clearly, the overlap becomes easier to see. Platform Engineering sits at the center, enabling all three—giving developers self-service capabilities, enabling SREs, and supporting the DevOps culture.

That’s why it’s wrong to think Platform Engineering replaces SRE. SRE has existed since 2004 and continues to serve a distinct purpose. Platform Engineering is complementary—it enables SRE and DevOps to succeed together.

Sean Alvarez: I’ve also seen misconceptions where SRE is viewed only as production support. People then ask: does Platform Engineering automate that away? The answer is no.

SRE plays a vital role in ensuring production reliability. Platform Engineering spans the entire SDLC—from onboarding new team members to building, deploying, and running software in production. SRE should inform what Platform Engineering builds to make their jobs more efficient, but they aren’t the only users of the platform.

Platform success depends on collaboration—SREs, developers, and business owners all need to work together.

5: There’s a lot of overlap between SRE, DevOps, DevEx, and Platform Engineering. Would you like to talk a bit about what each brings to the table, how you personally draw boundaries between them, and where they are similar or different?

Ajay Chankramath: Absolutely, that’s a great question. Let me step back and offer one-line definitions for clarity.

DevOps is the cultural movement aimed at breaking down silos. It improves collaboration across the SDLC—from planning and coding to testing, deployment, and operations.

SRE began at Google as the application of software engineering principles to operations. Historically, operators knew systems well but weren’t software developers. Google shifted that model, expecting SREs to understand both software design and systems. Today, not every organization does it the same way, but the principle remains: SRE is more than system administration—it’s a role that requires deeper software knowledge.

DevEx is the outcome of enabling developers to be more productive. It’s well-studied now, with research from books like Accelerate and companies such as GetDX. Measurement is critical: you can’t improve productivity without first understanding where you stand.

Platform Engineering, then, is about the tools, processes, and techniques that improve all these aspects. It’s not just building automation or software—it’s building capabilities as a product. That’s why it’s different. Platform engineers don’t “take code and push it to production.” That’s an anti-pattern. Instead, the role is about enabling developers to be self-sufficient, reducing friction, and making the SDLC as fast and efficient as possible.

Think of it like providing APIs or libraries. Developers need to ask: “What do I need to be productive?” Platform Engineering exists to deliver those capabilities.

Sean Alvarez: What makes our Platform Engineer’s Handbook unique is its practical approach. We’re less about theory and more about how to actually create these capabilities.

If you look at tooling, the distinctions become clearer. Platform Engineering often brings to mind Kubernetes clusters and deployment automation—but it’s much broader than that. It includes enabling new projects, automating pipelines, and creating self-service capabilities.

SRE, on the other hand, focuses on reliability—SLIs, SLOs, SLAs, and observability. Their responsibility is to ensure systems are running well in production.

Developer experience isn’t just about portals like Backstage. It’s about making platforms easy to use. Whether through APIs, CLI tools, or portals, DevEx ensures developers adopt what Platform Engineering provides.

Together, these roles form a layered model: Platform Engineering builds capabilities, DevEx makes them usable, and SRE ensures reliability. The overlap is real, but the responsibilities remain distinct.

6: Looking back on your years building and scaling platforms, what was one of the hardest lessons you learned at the enterprise level?

Sean Alvarez: When people hear “enterprise,” they often think “process”—multiple levels of management, sign-offs, and compliance. That usually leads to a push for standardization.

In many organizations, Platform Engineering is introduced as a way to rein in DevOps sprawl. Instead of every team building its own pipeline or observability, leadership wants a single, standardized platform for everyone.

But forcing a single solution across the enterprise is often a recipe for failure. It creates complexity, slows delivery, and leaves some teams feeling stifled. They start working around the platform to meet their needs, and friction grows.

A better approach is the 80/20 rule: serve the 80% of teams whose needs can be standardized, and let the remaining 20% adapt their processes where necessary. That reduces time-to-value, avoids endless edge-case debates, and ensures most teams actually benefit from the platform.

Ajay Chankramath: Sean made a great point about the 80/20 rule. You’ll never get 100% success, and you shouldn’t aim for it. The question is: how do you achieve that 80%?

In our book, we outline seven principles that guide successful platforms. A few highlights:

Measure what you improve. Quantify waste and friction so improvements are visible.
Treat platforms as products. This isn’t just automation—it’s technical product management. Capabilities must be managed like products with clear value propositions.
Balance build vs. buy. Engineers love to build, but with so many products available, organizations must consider total cost of ownership.
Design for composability. Platforms aren’t just Kubernetes. They consist of multiple components. Each must be composable, extensible, and replaceable.
Prioritize observability. Don’t limit it to applications—extend it across the SDLC.
Enable team autonomy. If teams wait on others for security or approvals, waste accumulates. Platforms must empower autonomy.
Articulate value. This is critical. Success depends on clearly communicating the value unlocked, not just building capabilities.

Without these principles—especially value articulation—platform engineering risks losing executive support and ultimately failing.

7: Do you think there is confusion or resistance when people start working on Platform Engineering or its principles? Are there gaps, and what has worked for you in bridging those gaps?

Sean Alvarez: One reason organizations adopt Platform Engineering is to standardize and make developers’ jobs easier. But in doing so, we often take control away from developers.

For example, if we say, “Every developer must use this deployment pipeline,” it might simplify things for the organization, but inevitably some teams will run into cases it doesn’t support. Maybe the pipeline was designed for a single service, but a team needs to deploy three. Now they’re frustrated, waiting on Platform Engineering to adapt the pipeline—or worse, being told to make their work fit into it.

That dynamic quickly creates confusion and resentment. Developers no longer want to use the platform, and adoption stalls.

The way to bridge this gap is to treat the platform as an internal product. The most effective way is to have a technical product owner—someone who understands product management practices but also has the technical depth to talk with developers.

This role continuously interviews developers, identifies gaps in their day-to-day work, and ensures the platform gives them flexibility—guardrails where needed, but also options to override defaults when necessary. By organizing and prioritizing a backlog around developer needs, a technical product owner ensures the platform provides real value, which drives adoption and organizational success.

Ajay Chankramath: Sean covered most of it, but I’ll add this: the technical product owner role is not about saying “yes” to everything developers ask for. Every developer comes with their own perspective, and building every request would be unsustainable.

The job is to balance ROI across the organization. Building platform capabilities costs time and money. The question is: what value will this unlock across the enterprise, not just for an individual team?

That’s where challenging requirements becomes important. By pushing back and prioritizing based on organizational value, the technical product owner ensures investments make sense.

This ties back to the seventh principle we mentioned earlier—articulating value. Without it, platforms risk losing executive sponsorship. We’ve seen it happen: executives reassign platform engineers back into product teams because they don’t see visible value.

A strong technical product owner prevents that by showing how the platform delivers ROI across the organization. That value articulation is often the difference between success and failure.

8: Your book, Platform Engineer’s Handbook, takes a “build first, clarify as you go” approach. Instead of starting with strategy decks and frameworks, you dive straight into building. Why do you think this approach works better, especially for technologists new to Platform Engineering? And what makes it more effective than leading with theory?

Sean Alvarez: Great question. If you think about who gets into Platform Engineering, it’s often people from two backgrounds. Some are software developers who understand APIs and architecture but aren’t used to handling infrastructure. Others come from operations or SRE roles, familiar with Terraform, ServiceNow, or Dynatrace, but less with software development practices like GitHub projects or release pipelines.

Platform Engineering covers the entire SDLC, which is a big scope. You can’t really “practice” it in a small sandbox—it requires working with real teams, real projects, and real deployments.

That’s why starting with building makes sense. It’s similar to “dogfooding”—using what you create. If you build even a small demo platform and deploy an application on it, you immediately see the friction developers face. That teaches you what features are needed. You then build those features, measure their impact, and learn from the experience.

The theory and strategy become clearer once you’ve lived through the practice. You’re not just reading slides—you’ve felt the difference. That makes the lessons stick and prepares you to scale up to enterprise scenarios.

Ajay Chankramath: I agree. Developers understand software best. If you show them real builds and workflows, you bring them along much faster than if you start with strategy.

When this idea was first proposed, Sean and I knew it would be a challenge because we were used to explaining the “why” first. But flipping the model is powerful. It engages developers with what they already enjoy—building—and then shows them why it matters.

That’s why we believe this approach will make the book stand out. It’s not theory-heavy; it’s practical, hands-on, and aligned with how developers actually learn.

9: Ajay, you’ve repeatedly emphasized the seven principles and the importance of measuring value. Do you think taking a “build first, clarify as you go” approach will help highlight that key aspect of value measurement?

Ajay Chankramath: That was the first question that came to my mind when we considered writing this book. We’ve always said you need to measure something before you can improve it. How would that work with a “build first” model?

The more I thought about it, the more I realized this approach makes value measurement easier. When you’re actually building, the value is visible right there—not in a spreadsheet or a theoretical model. Developers can see what they’ve created, how much effort it took, and what productivity gains it delivers.

Instead of abstract discussions, you have concrete examples: “I built this, and here’s the measurable improvement.” That makes the articulation of value much stronger.

This is also one of the unique aspects of the book. I haven’t seen another resource take this approach—making value measurement practical and hands-on. It’s a challenge, but we’re confident it will provide a powerful way to connect principles with real-world practice.

10: Looking ahead, as more organizations mature their platform teams, how do you see the SRE role evolving? Do you expect a dramatic shift in the next few years?

Ajay Chankramath: Absolutely—and it’s a great question. Let me break it down across a few dimensions:

Specialization: SREs focus on complex reliability challenges, while platform engineers focus on building capabilities that enable both developers and SREs. The partnership between these roles will strengthen. Instead of friction over “who owns what,” SREs and platform engineers will complement each other.
Abstraction. SREs will increasingly work at higher levels—service meshes and clusters—rather than building individual features. Their focus will stay on ensuring reliability under pressure, not on building platform products. That’s where collaboration with platform engineers becomes critical.
Domain-driven platform engineering: This means applying platform principles directly into products, not just infrastructure. For SREs, it requires more domain knowledge—something Google emphasized in its original SRE model but that has been diluted over time. I believe we’ll see a return to that principle.
AI: SREs are already using AI for anomaly detection, root cause analysis, and automated remediation. Platform engineers will need to provide capabilities that make this easier. AI won’t replace these roles, but it will reshape them, moving focus from rote tasks to domain-driven decision-making.
Risk and compliance: Especially in industries like finance and healthcare, SREs will need to take on more responsibility here, supported by platform capabilities. Compliance is not going away—it’s only becoming more central.

We’ll also see some fluidity between roles. Some SREs will transition into Platform Engineering if their interests and skills align more with building capabilities, and vice versa. This cross-pollination will strengthen both practices.

11: Sean, you mentioned that much of Platform Engineering is about automating deployments and similar processes. Do you think AI can really make a big difference here, or is it still hype and early experimentation?

Sean Alvarez: Ajay mentioned the growing importance of domain knowledge, and I think that’s the key. Every time there’s a wave of abstraction, people ask if their jobs will disappear.

When cloud and serverless databases arrived, DBAs wondered if they were obsolete. Now with AI, people are asking the same thing: will software developers vanish because AI can write code? Do we even need Platform Engineering if AI can generate infrastructure as code, analyze logs, and raise alerts automatically?

The reality is that AI can take over rote, repetitive tasks—like writing observability queries or scanning logs for anomalies. That frees SREs and platform engineers to focus on higher-value work: understanding what uptime means in a given domain, or which processes truly require five nines of reliability.

For example, in fintech, stock-trading throughput during the day has a very different priority than a nightly batch process. One demands higher uptime even if it costs millions more; the other can tolerate delays. SREs and platform engineers, with their domain understanding, are the ones who can make those calls and guide how AI should be applied.

So I see AI as an inflection point. It won’t replace these roles—it will elevate them. The day-to-day debugging and manual tasks will shrink, while the focus shifts to domain analysis and delivering business value.

12: For someone currently in an SRE or DevOps role who is now expected to build or contribute to an internal platform, what’s the one mindset shift or practical skill you’d suggest they prioritize? And what’s the first practical step they should take to set themselves up for success?

Sean Alvarez: The mindset shift is this: whenever you build something—whether it’s a script or an automation—ask yourself, “Will someone need to contact me to use this?” If the answer is yes, it’s not done yet.

The goal is self-sufficiency. If a developer needs help every time they use your tool, you’re stuck in an operations role—answering tickets all day—instead of moving on to the next feature. True platform engineering means building things that others can use independently. That mindset is critical.

Ajay Chankramath: Sean gave a great example. To tie it together: the biggest shift is adopting a product mindset. Every script or automation should be treated like a product with long-term viability.

The second piece—and sometimes more important than technical skills—is communication. Build your soft skills. Relationships, collaboration, and communication determine whether a platform succeeds. Developers historically avoided this, but today it’s essential.

AI or not, tools will come and go. The constant is people. If you can communicate, align stakeholders, and articulate value, you’ll set yourself—and your platform—up for success.

To explore the principles and practices discussed in this conversation in greater depth—including building developer-focused platforms from a blank slate, embedding observability and security, enabling self-service onboarding, and layering in AI-augmented services—keep an eye out for The Platform Engineer’s Handbook by Sean P Alvarez and Ajay Chankramath, coming August 2026.

The book provides a hands-on, progressive journey: from source control governance and Kubernetes runtimes to developer portals, reusable CI/CD workflows, infrastructure blueprints, and FinOps observability. Each chapter combines concepts with lab-based exercises and production-ready patterns, equipping engineers to build scalable, secure platforms that streamline software delivery.

Pragmatic Clean Architecture in Python: A Conversation with Sam Keen

Divya Anne Selvaraj — Thu, 18 Sep 2025 06:28:01 GMT

From structuring APIs and isolating domain logic to refactoring legacy systems, Python’s flexibility presents both opportunities and challenges for building sustainable software. In this conversation, we speak with —author of Clean Architecture with Python (Packt, 2025)—about applying architectural principles to real-world Python projects without sacrificing the language’s ethos.

Sam is a software engineering leader with over 25 years of experience, a polyglot developer who has used Python everywhere from early-stage startups to large-scale systems at AWS, Lululemon, and Nike. At Lululemon, he led the company’s first cloud-native development team, setting foundational standards for distributed architecture. Currently a Principal Engineer at Pluralsight, he focuses on leveraging generative AI for software engineering enablement—building tools that amplify developer productivity while preserving architectural integrity.

In this interview, we explore how clean architecture can be adapted to Python’s dynamic nature, where SOLID principles prove tricky, and how to keep frameworks like Django and FastAPI from leaking into core logic. We also discuss pragmatic strategies for enforcing the dependency rule, modeling entities and value objects with Python’s modern features, and managing testing and refactoring in complex systems. Looking ahead, Sam offers insights on how AI is reshaping development workflows and what it means for applying clean architecture across services and scaling applications.

You can watch the full conversation below—or read on for the complete transcript.

1: What motivated you to write Clean Architecture with Python, and why do you think Python as a language needs its own treatment of these ideas?

Sam Keen: I’ve been in development for quite some time—and in Python for a big portion of that. As for the topic, clean architecture has been around; Uncle Bob’s book has been out for quite some time. I’d always seen it discussed in the context of static languages like Java and C#. In a lot of Python communities, the thinking was that we didn’t quite need that—that it seemed overburdensome. So I wanted to take an approach and see: can we take from clean architecture the aspects that help us maintain larger Python codebases—without trying to turn it into Java? I knew there would be pushback, so I definitely wanted to keep it Pythonic.

As for why write a book—this is my first published book. During the COVID years, I had a lot of time and got into doing YouTube tutorials for game development. I really liked that content-creation process, and when the opportunity came up to write this book, I thought, yeah, that’s the next step. I’ve always wanted to write a book, so those motivations aligned with good timing.

2: In your book, how do you explain clean architecture in Pythonic terms—especially to developers who know the classic concepts but struggle to apply them cleanly in real-world Python projects?

Sam Keen: Kind of returning to the previous answer: clean architecture aligns well with the Pythonic ethos—one of Python’s core features is to be explicit rather than implicit. Clean architecture gives you that map for your application. If you’re doing object-oriented development, the first things you’re concerned with are the classes—how to design those classes. I’m sure we’ll talk about SOLID. You use those SOLID principles on the class itself to ensure it’s cohesive. Clean architecture takes that and expands on it—how do we apply some of these same principles to the entire application as a whole? The original “clean architecture” was explicit that it’s not a framework and not a rigid set of rules; it’s a set of principles to follow and adapt to your needs. There isn’t one playbook for clean—it depends on what you’re building. You don’t want to build a skyscraper if it’s a little cottage codebase.

3: SOLID principles are often cited as foundational to clean architecture. According to you, which of them are easiest and which of them are the hardest to apply in a dynamic language like Python?

Sam Keen: I think it’s kind of the usual suspects. With SOLID, the S—the single responsibility principle—a lot of folks comprehend pretty well. It’s the first letter, and people understand that a class should have a single concern, that sort of thing. That translates well into Python. The compiler isn’t going to help you much in that regard—it’s more of a human design decision.

It gets more into the nuance with the I—the interface segregation principle—having well-structured and focused interfaces. That’s where it gets tricky. For instance, you might have a vehicle class. You wouldn’t want to have the engine be part of that vehicle base class, because you may have gasoline cars but also electric cars. If you couple the concept of an engine directly into the vehicle base class, you’ll end up with a class that has a power level and a fuel level in liters. You’ll have parts of that interface that don’t make sense for all of the concrete classes that inherit from it.

Another one is the L—the Liskov substitution principle. This is about ensuring that if you have a base class and then classes implementing it, none of those subclasses disrupt the contract of the base class. Anywhere in your code where you’re referring to the base class, you should be able to insert one of the child classes and have it function fine. That’s something a compiler in a static language would help with. In Python, you don’t have that, so type hinting—and I’m sure we’ll talk more about this since it’s core to the book—paired with mypy gives you a little bit of that compiler-like type checking. That can be very helpful in Python.

And then of course, unit testing is always good. So the L and the I are the tougher ones, while S is the easiest.

4: How do you enforce the dependency rule in your projects so that business logic stays free from frameworks, ORMs, or infrastructure code?

Sam Keen: The dependency rule is really core to clean architecture. If you get that one thing right, you’re doing quite well. Conceptually, clean architecture is built in layers. You have the inner domain layer—core business objects with very few dependencies. Then you move out to the application layer, which orchestrates workflows and manages those entities—for example, “save task” or “complete task.”

Then there’s the interfaces layer. It should be a thin layer, often with controllers that translate between the outer layer and the inner business core—the application and domain layers. Finally, the outer layer is your frameworks and drivers. That’s where all the volatility is—things like SQLAlchemy and external dependencies that you don’t control.

Back to your question: how do you enforce the dependency rule? The principle is that dependencies all face inward. The domain layer shouldn’t be aware of anything above it. The application layer shouldn’t be aware of anything above it. They should only depend on what’s inside.

In Python, just like in other languages, you can check this. One way is pragmatic—make sure the team understands the principle and why it matters. Another way is structural—use a folder structure: a folder for domain objects, a folder for application objects, and so on. That way, each file is bound by the rules of its layer.

You can also get precise with automation. For example, in the book we give a simple fitness function test. It runs linting across import statements and checks them against the known directory structure of the layers. If it finds that a use case in the application layer is importing from, say, a driver, it fails the build. That shifts knowledge of violations left, so developers can correct them quickly.

So, it’s a hierarchy: first ensure the team understands the principle, then reinforce it with clear folder structure, and finally back it up with tests that assert violations.

5: What kind of project structure do you recommend? Do you prefer separation by layer—domain, interfaces, etc.—or by feature? And how do you keep the layout from becoming overly rigid?

Sam Keen: One example we mentioned earlier is having a folder per layer—that’s definitely a possibility. But to step back, the bigger principle is: always have the simplest solution that meets the needs of your project and your team. You don’t want to overbuild.

For example, you might have a very simple CRUD application, essentially an API fronting a database with create, update, delete functionality. There’s not much to it. In that case, you could build it in more of a feature structure—say, a task microservice—and just go with FastAPI in a one- or two-file implementation. That makes sense because there isn’t much domain logic in that particular service. I see a lot of single-file frameworks that work quite well for these small cases. That’s one end of the spectrum.

On the other end, as projects and codebases grow larger, with more complex business rules, you start to shift toward the domain structure we talked about earlier—a folder per layer. That helps put in a structure where the right thing to do is also the easiest. For example, if your team decides to support multiple users instead of just one, you now need a User domain object. With a folder-per-layer design, the map is already there: the business object goes in the domain folder, the orchestration goes in the application folder, and so on.

It’s not one-size-fits-all. It’s something you can evolve into. And when your team makes an intentional decision to bend a common practice—for pragmatic reasons—document it in an ADR, an architectural decision record. That way it’s explicit and intentional, not accidental complexity. It also helps developers who come later—or even yourself six months down the road—understand why that choice was made.

6: Domain-driven design is a big part of your approach. How do you model things like entities and value objects in idiomatic Python?

Sam Keen: Something very popular in modern Python is the use of dataclasses. They’re a great way to model domain objects because they eliminate boilerplate and make classes very easy to comprehend—you see just the attributes and functions.

For entities, which have an identity, I use a thin base entity class that every entity extends. A Task, a User, any entity extends this base entity class. That gives you a primary ID field—a reserved field for all entities. Anything in the system that knows it’s dealing with an entity knows that ID field is there and that it’s universally unique.

In Python specifically, in that entity class you’d also implement the __hash__ and __eq__ methods. That keeps you in line with the concept of an entity having identity. You can change all the attributes of that class, but it will always represent the same person or the same task—it’s just that its attributes have changed.

A value object is different. It doesn’t have an ID field; it’s defined solely by its properties. Again, a dataclass works well, but here you’d set frozen=True to make it immutable. For example, an exchange rate could be modeled this way. If any of its attributes change, it’s no longer the same object. By making it immutable, you guarantee that once an exchange rate object is created, its values won’t change, which fits the definition of a value object.

So, as a Python developer, the tools for following domain-driven design are built into the language. It aligns well with the principles of clean architecture.

7: Frameworks like FastAPI or Django can easily creep into core logic. How do you recommend engineers keep frameworks out of the domain and use case layers?

Sam Keen: Again, it builds on what we talked about—deciding on the directory structure that makes sense for your project. You want to make the right thing to do the easy thing. A clear structure gives you context: when you look at a class in the application layer, it should be obvious if it’s behaving abnormally by depending on a framework.

Beyond structure, it comes down to applying principles and patterns. Take databases, for example. SQLAlchemy is a framework. You shouldn’t see any reference to it in the domain or application layers. The anti-pattern would be a User class in the domain layer with SQLAlchemy methods directly on it to save to the database—that’s direct coupling.

Instead, you use the repository pattern. You define an interface with simple methods like save_user, get_user, delete_user. Your User object depends only on that contract. Then, in the frameworks layer, you implement that repository using SQLAlchemy or whatever tool you need.

That way, the domain isn’t directly coupled to the framework—both the domain object and the implementation simply agree to the interface. This is the general pattern across the board: keep frameworks out of your core logic by making them details implemented at the outer layers.

8: Do you use Pydantic or dataclasses in your domain models, or do you restrict them to boundaries? How do you handle input validation and transformation cleanly?

Sam Keen: That’s an interesting one. When I was writing the book, I actually drifted into being too strict with clean architecture—treating it almost like a framework. In some early drafts, I found myself duplicating property validation in two places: once in the interface layer and again in the domain layer, but using different mechanisms. Anytime you’re duplicating validation, that’s a red flag.

The framework I was using was Pydantic, which is very popular in Python. I use it extensively. It has strong validation and serialization methods. In practice, I made a calculated choice: in some applications, I would allow Pydantic into the domain layer. That’s because it’s mainstream, well supported, and it reduced a lot of boilerplate and duplicated code in that specific case.

That’s the bigger point—clean architecture is a set of principles, not a rigid framework. Sometimes you’ll make compromises to reduce complexity, and that’s OK. The important thing is to be transparent about it. Document the decision in an ADR—an architectural decision record—so it’s clear to the team and future developers that it was a conscious, intentional choice. That way, you’ve managed the trade-off explicitly rather than letting accidental complexity creep in.

9: What’s your testing strategy for a clean architecture codebase? You mentioned unit testing and things like that a bit earlier, but where do unit, integration, and end-to-end tests fit within this concept?

Sam Keen: A very common approach to testing is the test pyramid—Martin Fowler and others have popularized this. You want the base of that to be unit tests, which just test individual functions. They’re very quick—testing the behavior of a class on its own—so you want a large number of those. Above that are integration tests, where classes work together; in clean architecture, that often means classes working across layers. At the very top are end-to-end tests, where you actually boot up infrastructure and test as a real user against the application. Those are slow and often brittle because they test interfaces that change quite a bit, so maintaining many of them is a burden.

If you have a tightly coupled application, it’s hard to implement that pyramid—you can end up with an “ice cream cone,” because unit tests are hard to write and you brute-force a lot of end-to-end tests. Clean architecture enables you to have the true pyramid. The domain layer has no dependencies, so you can easily write unit tests without starting a database or worrying about network calls. Up into the application layer, because you’ve used dependency inversion and coded use cases against interfaces rather than concrete classes, a test can insert a mock that implements the interface. That keeps those tests very quick as well. You still need end-to-end tests for critical workflows as final validation, but the confidence comes from a plethora of fast unit tests and some integration tests. Clean really enables that true test pyramid.

10: What’s your approach to refactoring legacy Python code? How do you introduce clean architecture there without overhauling everything at once?

Sam Keen: We have a chapter devoted to this, and much of it is common practice regardless of language. What you don’t want to do is a Big Bang release—rewriting the entire stack and trying to release it all at once hardly ever works. It takes too long, requirements change, and the system you’re rebuilding keeps changing. You have to figure out how to slice up the problem. The “strangler fig” is a common metaphor: it’s a fig vine that grows up and engulfs a tree.

Clean helps because you’re going from a system without a map—something chaotic—to one that has a map and discrete components. That lets you take Service A and split it off into a clean architecture with full test coverage. Where to start: begin at the lower layers. Look at your legacy system and find all the parts that in aggregate build the concept of a user. Extract that domain logic and build your true domain User in clean architecture. Then, using gateways and feature flags, parallelize traffic to the old system and the new system, compare, and ensure parity in state changes for the user across both.

Beyond domain objects, look for natural bounded contexts—returning to domain-driven design—and extract them into domains with their use cases. Do everything you can to avoid a Big Bang release. Clean architecture gives you the guidance to restructure incrementally and release with confidence.

11: How is AI changing the way we approach clean architecture?

Sam Keen: AI is the definition of disruptive. This generative AI wave we’ve come in on is really interesting because, again, we started the book a little over a year ago and, in AI years, that’s forever ago. I think baby ChatGPT-3 was coming out or something, and LLMs were just starting to be able to build Snake—that was the extent of what they could build. And then you look at where we are now.

There are a couple of dimensions. If you’re thinking of AI as a feature you would add to an application—integrating an LLM into an application—nothing really changes. That’s a driver—a framework driver. So the knowledge of, say, LangChain or LlamaIndex—really common frameworks—that’s all going to stay in the outer layers. Same playbook, and then you integrate that down to your pure domain objects. So that part doesn’t change.

The other part is using AI to build with—coding tools and these sorts of things. It helps. We talked about writing tests—AIs are great at writing unit tests, so you can definitely leverage it there. That’s where you mitigate hallucination—you’re writing tests to validate, so you can know the AI is doing the right thing. Another example: we have that User that needs to be saved to a database, and it has an interface contract. You can use AI to build the concrete class against that interface—at least get a start on it. Some might think of that as boilerplate, but it can do that.

Overall, the advantage is this idea of “context engineering.” The AI’s ability to help is only as good as the context you give it. If you let it index legacy, tightly coupled systems, it’s not really sure what the plan was—there kind of wasn’t one—so the AI will continue to build against that codebase without much of a plan. Whereas, if you’re explicit about your approach and you have these four folders with easily defined rationale of what goes into each folder, all that context goes to the LLM—since it’s helping a human. Using clean architecture and having that playbook for how to build out your application helps AIs do the right thing and not go off the rails. It’s exciting.

12: How do you apply clean architecture across multiple services or modules? Should each follow the full layering pattern or do you recommend something else?

Sam Keen: We touched on this a little bit. It’s not one pattern to fit everything. With multiple services, it’s the same idea. The purpose—under the context that what you’re building is multiple services—matters. You may have portions that are really straightforward CRUD applications. You just know that you don’t want direct database access, so you put an API in front of that. Maybe, at this point in time, it’s kind of one-to-one mapping—you have it for defense if you need to change it in the future. In those cases, FastAPI and using Pydantic throughout might be the right mechanism.

But your larger, business-rule-centric, orchestrating parts of the application—those are where you invest in a fully layered approach. And even though you’ve built services, you treat them as framework drivers with respect to one another. You have Service A you’ve built; you have Service B you’ve built. Service B treats Service A as a detail and doesn’t let Service A’s implementation details get pulled into the deeper layers of Service B.

13: When building scalable applications, how do you decide what belongs in the core versus what stays at the edge—for example, pagination, caching, or authentication?

Sam Keen: Yeah, so that—again, there’s a little bit of thought process to that, of course. You kind of get into that domain-driven design mentality of what’s core to the domain. So a user—what’s core to a user—versus, like, transport protocols and these sorts of things. Those are just details—computer concepts.

A tricky kind of heuristic is: what makes sense even if you weren’t a computer program? To explain that—like, a user having a schedule is a concept that makes sense even before computers were invented. But the knowledge that we’re transferring using JSON or gRPC—that’s a computer concept; it’s a detail of the platform we’re implementing these user domain objects on. To be facetious, if we switch to quantum computers in ten years, we’ll bring our domain objects with us, but all those other details are going to change.

So that’s the macroscopic way to think about it—what are the nouns, the objects of our system—versus what’s a transport or technology detail that’s not core and should stay at the edge. There is some nuance to authentication. You may have a system that needs to be impenetrable, so every layer may need to validate the authentication—like a zero-trust approach. But you may not be in that case, so you may stop authentication at, say, the adapters layer. Then everything below either assumes authentication or doesn’t have knowledge of it. That’s an example where that computer concept could come all the way down to the domain out of need.

14: How do you manage inter-service communication in Python systems built with clean architecture?

Sam Keen: That’s—again, that’s a common practice regardless, but clean helps you with it. In event-driven architecture, messages are another way you can leak implementation details. Be very cognizant of the information you put in the message. It should be past tense—“this happened”—those sorts of practices.

Be cognizant that it’s a common way for implementation details to get transmitted across the wire, versus, you know, a leaky interface at the code layer. Even at the transport level, you can leak implementation details that you don’t want to, and that will cause coupling as well.

To learn Clean Architecture through a series of real-world, code-centric examples and exercises, optimize system componentization, and significantly reduce maintenance burden and overall complexity, check out Clean Architecture with Python by Sam Keen. The book helps you apply Clean Architecture concepts confidently to new Python projects and legacy code refactoring.

Inside Go Systems Programming: A Conversation with Mihalis Tsoukalos

Divya Anne Selvaraj — Wed, 20 Aug 2025 09:39:38 GMT

From goroutine scheduling quirks to profile-guided optimizations, Go has grown into a language that balances simplicity with systems-level power. In this conversation, we speak with Mihalis Tsoukalos—author of Mastering Go, Fourth Edition (Packt, 2025)—about what it takes to write high-performance, maintainable Go in today’s evolving ecosystem.

Mihalis is a Unix systems engineer and prolific technical author whose books Go Systems Programming and Mastering Go have become staples for developers working close to the metal with Go and Linux. He holds a BSc in Mathematics from the University of Patras and an MSc in IT from University College London, and his work has appeared in Linux Journal, USENIX ;login:, and C/C++ Users Journal. His expertise spans systems programming, time series data, and databases, but his reputation in the Go community comes from distilling that low-level experience into accessible, practical guidance.

In this interview, we explore what motivated the fourth edition of Mastering Go and the audiences it serves, the realities of structuring goroutines and channels correctly, and the concurrency patterns that actually hold up under production workloads. We also dive into Go’s runtime improvements, profiling and memory-management workflows, and the maturing role of generics in real-world projects. Beyond language features, Mihalis shares his perspective on observability, the expanding standard library, and how Go compares with Rust and Zig for systems programming. Looking ahead, he offers a candid view of where Go is headed—from concurrency safety to ecosystem maturity—without losing sight of its defining trait: clarity without unnecessary complexity.

You can watch the full conversation below—or read on for the complete transcript.

1: What motivated you to write the 4th edition of Mastering Go? Who should pick it up, and what kind of projects will the book help them with?

Mihalis Tsoukalos: First of all, I want to start with a disclaimer—nothing, no book or any other resource, can replace experience. You have to try things. That’s the general idea, and that’s why I wrote the book—to make you try.

The main reason for the 4th edition of Mastering Go was the continued growth and evolution of Go. Since the last edition, the language has seen major changes. The most important was the addition of generics in Go 1.18, a long-awaited feature that really shifted how developers think about type safety and code reuse. Alongside that, we’ve had improvements to modules, better WebAssembly support, and lots of enhancements across the standard library and toolchain, including faster testing for the testing process. So it made perfect sense to update the book to reflect where Go is today.

Another big motivator was feedback from the Go community. Mastering Go has always aimed to be a practical, hands-on guide, and readers kept asking for more real-world examples—especially about things like concurrency, networking, and systems-level programming. This edition builds on that by going deeper into those topics and refining the guidance on writing idiomatic, maintainable Go code. Again, I have to say it: nothing can replace experience. You have to try things all the time. That’s the point of learning something new.

The book is best suited for intermediate to advanced developers—people who already understand the basics of programming and want to work with Go and take their skills further. It is particularly useful for engineers working on backend systems, command-line infrastructure tools, high-performance network applications, or cloud-native services. It’s also a solid resource for developers coming from languages like C, C++, Java, or Python who want to build scalable, efficient systems.

For example, it can be applied in projects like migrating a payments platform from Node.js to Go to reduce latency in transaction processing and better handle traffic, writing a custom reverse proxy in Go to maximize performance and manage connection concurrency efficiently, or an observability team building high-throughput log collectors or trace aggregators that must process and forward millions of events per second.

So overall, this edition is not a minor update—it’s a reflection of how far Go has come as a language and how central it is becoming in areas like cloud computing, DevOps, and systems engineering. It is really for anyone serious about mastering Go and using it to build high-performance, real-world applications.

2: Today, Go’s concurrency model remains a major draw, with Go 1.22 fixing the long-standing loop variable capture issue. How do you now recommend structuring goroutines and channels to avoid common bugs?

Mihalis Tsoukalos: The concurrency model of Go has always been one of its strongest features because it’s simple, easy to understand, yet powerful. You can’t have everything, but what Go offers is pretty much what programmers want. Goroutines are lightweight, channels give you a clear way to coordinate between them, and it is generally easy to express concurrent logic in a readable way.

But you can always go wrong, especially when working on more complex or high-performance systems. One issue that has tripped up a lot of developers over the years was the loop variable capture problem. If you launched goroutines inside a loop, you could accidentally end up capturing the loop variable in a closure, which meant that all your goroutines might reference the same variable—not what you intended. Usually, you wanted each goroutine to take a different variable value. The typical workaround was to reassign the variable inside the loop, but it was error-prone. This was finally fixed in Go 1.22: now the language creates a new instance of the loop variable on each iteration, so closures and goroutines get the correct value automatically. It’s a small change in behavior, but it eliminates a very common class of bugs and makes concurrent code cleaner and more predictable.

That said, even with this fix in place, you still need to be cautious when writing concurrent code. A few best practices:

Always be explicit about goroutine ownership and lifecycle. The best way to do that is by using context.Context to manage cancellation and timeouts. This ensures goroutines don’t hang around longer than they should, avoiding memory leaks and unpredictable behavior.
Limit concurrency when needed. Just because goroutines are lightweight doesn’t mean you should spin up thousands of them without thinking. If you’re processing a large number of tasks or I/O operations, use worker pools, semaphores, or bounded channels to keep things under control.
Avoid unbuffered channels for high-volume communication. They’re great for synchronization, but if you’re passing a lot of data around, buffered channels reduce blocking and improve performance.
Always close channels properly. Only the sender should close the channel, and only once. Closing channels from multiple places or from the receiver side can cause panics or race conditions.
Use the select statement defensively, especially when working with multiple channels. A default case can help you avoid blocking in situations where responsiveness matters, like event loops or fault-tolerant systems.
Don’t force everything through channels. Although they look practical at first, sometimes mutexes or atomic operations are a better fit. Think carefully before you start writing code and designing your program.

So overall, Go 1.22 makes life easier for concurrent programming, but writing robust concurrent code still requires discipline, clear design, and a good understanding of how goroutines and channels behave under the hood. That’s what really helps you build systems that are both maintainable and production-ready. Again—think before you start writing code, and don’t just throw in goroutines because they’re lightweight.

3: When you think about concurrency at a systems level, which patterns do you find most effective for real-world workloads? Are there particular idioms you keep returning to, like worker pools or pipelines?

Mihalis Tsoukalos: At the systems level, concurrency is not just a feature—it’s a design principle. It influences how your software scales, how efficiently it uses resources, and how it behaves under pressure. Go gives you the primitives—goroutines and channels—but using them well requires a solid set of patterns you can rely on.

The first is worker pools. They are probably the most universally effective pattern. The Apache web server used to do this with threads. Instead of spawning a new goroutine for every task, you maintain a fixed set of workers that pull from a task queue. This gives you controlled concurrency—you’re not overloading the system with thousands of goroutines, and you stay within limits like memory, file descriptors, or database connections. This makes system behavior under load much more predictable because you know exactly what resources you’re using. For example, on a project I worked on, we used worker pools in a log processing service that handled thousands of files per hour without any issues.

The second pattern is pipelines. These are great when you want to break a task into stages and process each stage concurrently. Each stage runs in its own goroutine and passes data to the next using a channel chain. It’s a clean way to handle streaming data transformations or multi-step processing. It encourages modularity and makes it easier to deal with backpressure and separation of concerns.

Another critical piece is context.Context, which I consider non-negotiable in any serious concurrent Go application. It’s the standard way to manage timeouts, cancellations, and deadlines across goroutines. If you’re handling HTTP requests, running background jobs, or coordinating distributed tasks, context helps you shut things down cleanly and avoid goroutine leaks. This is especially important when interacting with external systems like databases or APIs, where you don’t want calls hanging indefinitely. For example, if you’re writing a TCP server and connections are not closed properly, you might run out of ports to serve new requests.

Another pattern I use is fan-out/fan-in. Fan-out means launching multiple goroutines to handle parts of a job in parallel, and fan-in means collecting the results into a single place. Combined with worker pools, this is a powerful way to parallelize work and aggregate results efficiently. I used fan-out/fan-in for a monitoring aggregator service with many microservices for health and metrics data—some over HTTP—and then collected the results into a single response.

I also rely heavily on select statements. Being able to multiplex across multiple channels or listen for a cancellation signal or timeout is incredibly powerful. It helps you write responsive systems that can recover from delays, retry on failure, or time out gracefully.

One principle I’ve learned over time: don’t reach for channels by default. A mutex or an atomic operation might be more appropriate, creating a simpler, cleaner, and less error-prone design.

Finally, goroutine supervision is critical. You need to track what your goroutines are doing, make sure they shut down cleanly, and prevent them from sitting idle in the background.

To sum up: the patterns I find most effective are worker pools, pipelines, the context package, fan-out/fan-in, and select statements. These help you create reliable, maintainable concurrent Go code.

4: Let’s talk about Go’s runtime performance, which has improved noticeably. Tail latencies are down and the garbage collector is smarter. What profiling techniques do you recommend to help teams actually realize these gains?

Mihalis Tsoukalos: That’s a good question, because sometimes you have issues and you don’t know what’s going on behind the scenes. Go has made real progress on runtime performance. The garbage collector in particular has seen big gains in terms of pause times and CPU usage. The garbage collector runs as a goroutine—everything in Go is a goroutine, including the garbage collector. The special thing about it is that sometimes, for the garbage collector to operate, everything else must freeze briefly, because you can’t create new variables while the collector is cleaning up.

Tail latencies have also come down, which makes Go a strong option for performance-sensitive systems like APIs, proxies, and backend infrastructure. But those gains don’t happen automatically—you need to profile and measure to benefit from them. Optimization without measurement is guesswork.

Go provides excellent tools for profiling. With the right approach, those tools can lead to real improvements. A practical point: don’t wait until you have a problem to learn about optimization and measurement. Experiment ahead of time so that when a problem arises, you’re ready to use the tools. Also, be very careful when using them on production systems—you might crash them. Don’t run measurements during peak hours unless it’s absolutely necessary.

The first tool I recommend is Pprof, which is built in and very powerful. The net/http/pprof package exposes several types of profiles—CPU, heap, goroutines, blocking operations, mutex contention—and you can access them through an HTTP endpoint in your web browser. Then you can visualize them using go tool pprof or one of the newer web interfaces.

I usually start with CPU profiles. Run them under realistic load and see where your code is spending time—it’s often not where you expect. Heap profiles are equally important, especially now that the garbage collector is more efficient. If you can cut down unnecessary allocations, the collector has less to clean, and your application runs more smoothly.

One mistake teams make is profiling only with benchmarks or local tests. You need to profile under real workloads in production or in a staging environment that closely mimics production. Many teams now include Pprof endpoints in production, behind secure admin-only routes, so they can safely collect data without affecting users.

For deeper insight, I recommend runtime tracing. The runtime/trace package provides a timeline of goroutine scheduling, system calls, garbage collection, and other events. Paired with Pprof, it helps explain why a goroutine was delayed or what caused a latency spike. You can collect traces with go test -trace or via code, and then explore them with go tool trace.

If you’re doing micro-optimizations, the Go benchmarking framework is excellent. Metrics like allocations per operation, bytes per operation, or nanoseconds per operation help you track how small changes affect performance, especially in tight loops or hot paths like serialization or hashing. Even one extra allocation can have a big impact under heavy load, so it’s worth running go test -bench regularly if you’re tuning critical functions.

It’s also important to watch for goroutine leaks or contention. Use goroutine and block profiles to track how many goroutines are running and whether they’re getting stuck. If the goroutine count keeps rising, that’s often a sign of a leak or unexpected blocking.

Beyond profiling, observability matters. The best-performing teams invest in continuous metrics and dashboards. Tools like Prometheus, combined with Go’s ability to export metrics, let you track garbage collection pause times, allocation rates, goroutine counts, and more. With alerting, you can catch issues before they impact users or your boss—which is never a good surprise.

A concrete case: I once worked on a high-throughput telemetry pipeline. The team was seeing unusually high CPU usage during peak hours, even though the runtime looked idle. The issue turned out to be repeated use of json.Marshal inside a loop, which was allocating and copying far more data than necessary. Replacing it with a streaming encoder solved the problem and made everything much faster.

So in short, Go’s runtime has improved, but to realize those gains you must measure continuously, profile under real workloads, and act on what you find.

5: Profile-guided optimization became stable in Go 1.21. Where does PGO make a real difference, and when might it not be worth the effort?

Mihalis Tsoukalos: The stabilization of profile-guided optimization (PGO) in Go 1.21 was a big milestone for performance-focused developers. Go has traditionally emphasized implicit, fast compiler optimizations, but PGO changes that. It gives us a new way to fine-tune performance based on how our code actually runs in production.

In simple terms, PGO lets the compiler make smarter decisions using real-world runtime data—things like which functions are called most often, which branches get taken, and where the hot paths are. With that information, the compiler can reorder functions to improve caching, inline code more intelligently, and reduce indirect calls. The result is lower CPU usage and better latency, especially in high-throughput or tight-loop scenarios.

So where does PGO shine? It’s great for performance-critical systems with stable workloads—things like low-latency services, backend infrastructure, proxies, or message brokers. In these environments, even small improvements in CPU can translate into real wins. It also makes a difference in hot-path code: tight loops that run millions of times, or CPU-bound routines like encoders, parsers, or math-heavy computations. PGO helps optimize layout and branching in those areas, reducing stalls and improving instruction-cache behavior.

If you’re running large-scale or long-lived services, even small gains add up—a 5% CPU saving across hundreds of instances is significant.

That said, PGO isn’t always worth the effort. For applications with unpredictable or highly variable workloads, the profile you generate today might not reflect tomorrow’s behavior. It’s also not ideal for short-lived command-line tools or scripts. And if your codebase is still changing rapidly, PGO is premature. Finish stabilizing your application first, then consider it.

In general, PGO is a powerful tool, but like any optimization technique, it’s most effective when used deliberately. If you’ve already profiled your application, you know where the bottlenecks are, and you want to squeeze out more performance without rewriting code, then PGO is a great next step. But it won’t solve every problem. My advice is to experiment with it on your own time so you’re ready to use it when it’s truly needed.

6: Memory is always a tricky area. What is your typical workflow for diagnosing memory leaks or reducing high allocation rates in Go systems? Do you have any favorite tools or patterns you like using?

Mihalis Tsoukalos: Although modern computers have plenty of memory, we still need to watch for leaks and excessive allocations. When I’m diagnosing memory issues in Go—whether a potential leak or just high allocation pressure—the first step is to establish a baseline. That means running the service under real or representative load and collecting memory data that reflects actual behavior, not just synthetic benchmarks.

From there, I rely heavily on Go’s built-in tooling, especially Pprof. I usually instrument the service with an HTTP endpoint using net/http/pprof, then capture heap profiles at different points—typically one right after startup and another after the service has been running under load. Comparing these snapshots helps answer key questions: Are allocations growing continuously? Which types are taking the most memory? Is the garbage collector doing more work than expected?

I load these profiles into go tool pprof or use the web interface, focusing on views like “in-use space” or “in-use objects.” If I see unexpected memory growth, I look for object types that shouldn’t be long-lived but are still hanging around. I also use the -alloc_space and -alloc_objects views to see where allocations are happening most frequently. That helps distinguish between a true leak and simply too many short-lived allocations.

A common pattern I follow is taking delta comparisons between snapshots. If memory usage looks flat but allocation counts are high, that’s usually a sign of churn, not a leak. Tools like go test -bench -benchmem are useful here—they show allocation behavior in tight loops or hot paths and help validate changes quickly.

When reducing allocations, I start with escape analysis. Running go build -gcflags=-m tells you which variables are escaping to the heap and why. Small changes—like passing a pointer instead of a value, or reusing a buffer—can keep data on the stack and reduce garbage collector pressure. If I see repeated allocations of slices, maps, or temporary structs in performance-sensitive areas, I consider sync.Pool, preallocating, or reusing buffers carefully. Even avoiding repeated string concatenations in loops or unnecessary interface conversions can make a noticeable difference.

For long-running services, I also recommend taking full memory dumps periodically and tracking object retention over time. That helps catch leaks caused by forgotten references. Continuous monitoring with Prometheus and visualization in Grafana is also valuable—it makes unexpected trends easy to spot.

Ultimately, avoiding memory leaks comes down to habits: profile early, understand your allocation patterns, avoid global state, and monitor in production. It’s not just about saving memory—it’s about running a system that behaves predictably under load and doesn’t wake you up in the middle of the night.

One memorable case involved a team whose Go service gradually climbed in memory usage over several days, even under steady load. Garbage collection seemed fine, but comparing heap profiles revealed that a map of cached Protobuf messages was never shrinking. The problem was a custom cache with no eviction policy—it just kept growing. To make matters worse, the keys were strings derived from user input, so the cardinality was unbounded. The fix was introducing a bounded LRU cache with periodic cleanup. The key insight came from seeing that the live object count of a specific type kept rising across heap snapshots. Without those profiles, it would have been much harder to pinpoint and fix.

7: Generics have been around for a couple of releases now. What patterns have you seen work well, and where do you think developers are overusing or misusing them?

Mihalis Tsoukalos: Now that generics have had time to mature over a few Go releases, we are starting to see clear patterns around where they shine and where they can go off the rails.

One of the most effective use cases has been writing reusable, type-safe data structures and algorithms. Things like generic slices, sets, maps, or utility functions—map, filter, reduce—have become much easier to implement in a way that’s both clean and performant. This has led to better library code, especially in packages dealing with collections, number crunching, or parsing. Libraries that used to rely on the empty interface and type assertions now benefit from compile-time safety with very little extra syntax. That’s a big improvement in terms of both correctness and readability.

Another area where generics work really well is domain-specific helper functions. For example, a pagination utility that works across different types of records, or a retry wrapper that can handle arbitrary operations. These kinds of generics eliminate boilerplate and keep APIs consistent without losing clarity. When used thoughtfully, they make code more declarative and reduce the need for duplicating logic across packages or modules.

That said, there have also been missteps. A common one is overgeneralization—creating overly abstract, flexible APIs just because the language allows it. Another is wrapping generic types in ways that obscure intent. Instead of simply using a slice of type T, some developers introduce unnecessary abstractions that add layers without real benefit, making the codebase harder to understand.

There’s also a tendency among some developers to import functional programming paradigms wholesale—monads, chaining combinators, deeply nested generic utilities. While elegant in languages designed for them, these patterns often clash with Go’s core philosophy of clarity, simplicity, and explicit flow of control. The result can be clever-looking code that’s hard to read and even harder to debug.

In short, generics are a powerful addition to Go, but like any powerful tool, they need to be used with purpose and restraint. Think before you reach for them, and prefer clear designs. The goal should always be code that is easy to understand and maintain.

8: Go 1.23 adds iterator functions and generic type aliases. How do you see those changing how we write Go, especially in libraries?

Mihalis Tsoukalos: The addition of iterator functions and generic type aliases in Go 1.23 might look like a quiet update, but it’s actually a significant step forward in writing more expressive, reusable, and composable code—particularly in libraries. These features build on the foundation of generics and help capture common programming patterns more naturally, while still keeping Go’s strengths of simplicity and clarity.

Take iterator functions. Go has always relied on for loops and range for iteration, and that worked well. But now, with iterator functions, we can encapsulate iteration logic as values—functions that yield elements one at a time. That might sound like a small shift, but it opens up powerful patterns like lazy evaluation, functional-style pipelines, and composable data flows. You’re no longer stuck rewriting the same loop boilerplate; you can abstract iteration into helpers that are both type-safe and ergonomic.

Then there are generic type aliases, which reduce friction when using generic types across packages. Before, if you wanted to tailor a generic type like map[K]V or Option[T] to your domain, you often had to rewrap or reimplement it. That made things verbose and diluted the usefulness of generic libraries. Now, with type aliases that support generics, you can define concise, strongly typed shortcuts for common patterns. This improves readability and makes code easier to work with, without introducing runtime overhead.

I think these features will lead to more expressive APIs and more composable, domain-agnostic utility packages. We’ll likely see libraries offering richer iterator utilities—things like filter, map, and reduce—implemented in a way that feels native to Go.

That said, the real challenge for library authors will be balance—using these tools to enhance code, not overcomplicate it. If done well, these features could significantly modernize the Go ecosystem, especially in areas like data processing and systems-level programming, where reusable containers, iterators, and higher-order utilities really shine.

9: Fuzzing is built into Go now. How have you seen teams make fuzz testing practical, and what are some tips to get value from fuzzing beyond just turning it on?

Mihalis Tsoukalos: Fuzz testing is a powerful technique for uncovering edge cases, subtle bugs, and even security issues—things that traditional unit tests often miss. Since Go 1.18 added fuzzing support directly into the go test tool, we’ve seen some teams begin experimenting with it. Again, it’s important to experiment first.

But as you said, just enabling fuzzing isn’t enough. To really benefit, teams need a focused, deliberate approach. The teams that get the most out of fuzz testing usually start by targeting critical code paths—places where the software processes complex or untrusted inputs. Think parsers, codecs, or deserialization logic. These are prime candidates because they’re hard to reason about and easy to break with unexpected input. And one important rule here: never trust user input. Writing fuzz tests for these areas helps surface bugs that could otherwise go unnoticed.

It’s also important to seed the fuzzer well, instead of letting it start with purely random inputs. Give it examples representative of real data—this helps the fuzzing engine explore the space more intelligently and find meaningful values faster.

Integration is key. The most effective teams make fuzz testing part of continuous integration. They run short fuzzing sessions locally during development for quick feedback, and then schedule longer runs overnight or during off-hours on CI servers. That way, fuzzing becomes a continuous part of testing, not just something you do once in a while.

Beyond just finding crashes, fuzz testing is excellent for hardening error handling. It ensures your code doesn’t panic, leak resources, or hang when it gets bad input. And when you combine fuzz tests with other tools like the race detector, you can catch data races that wouldn’t show up otherwise. That combination improves reliability across the board.

One more tip: keep your fuzz functions deterministic and free of side effects. Avoid calling external systems or relying on randomness inside the test itself. Deterministic behavior makes failures easier to reproduce and debug.

In short, fuzz testing is most valuable when used deliberately—targeting the right parts of your code, seeding it well, integrating it into workflows, and combining it with other tools. Done right, it’s not just about uncovering obscure crashes—it’s about building more robust, resilient Go systems.

10: Observability is another area you cover in your book. What do you recommend for monitoring and tracing Go systems effectively, especially under high concurrency?

Mihalis Tsoukalos: Observability is absolutely essential when running Go systems at scale, especially in high-concurrency environments. It gives you the visibility to understand how your application behaves in production, diagnose issues quickly, and keep performance and reliability where they need to be.

For monitoring, the first step is always metrics—both system-level and application-specific. We mentioned Prometheus earlier; it’s the go-to choice in the Go ecosystem, largely because of its flexibility and strong community support. The key is to instrument your code with meaningful metrics. Put simply: if you don’t collect the right metrics, you won’t solve your issues.

So collect things like request rates, error counts, latency percentiles, goroutine counts, and garbage collection pauses. These tell you how the system is behaving and where things might degrade under load. You also get a lot of value from the Go runtime metrics exposed through the runtime/metrics package. These provide insight into memory usage, garbage collection activity, and goroutine scheduling—crucial when dealing with thousands of concurrent operations.

Metrics give you an aggregated view, but tracing lets you zoom in. With distributed tracing—using something like OpenTelemetry—you can follow individual requests as they move through different parts of your system. That’s where you see latency accumulation, service interactions, or contention points. Under high concurrency, tracing is especially useful for catching queuing delays, lock contention, or slow dependencies—issues that metrics alone might mask.

One of the most important practices here is context propagation. We’ve already discussed the context.Context type. This is your mechanism for passing timeouts, cancellations, and tracing data across API boundaries and goroutines. If you don’t propagate context properly, you’ll miss spans or lose correlation in your traces. End-to-end consistency in instrumentation is critical, especially for workloads where a request might fan out into multiple goroutines.

Of course, high concurrency also means generating a lot of telemetry data, so you need to be smart about sampling and rate limiting. Adaptive sampling works well—prioritizing traces based on latency, errors, or unusual behavior. This way, you capture the most informative data without overwhelming your observability systems or introducing overhead.

And observability isn’t just about collecting data—it’s about acting on it. Instead of relying only on fixed thresholds, use anomaly detection and pattern-based alerts. Dashboards that track Go-specific behaviors—like spikes in goroutines or increased garbage collector pauses—make it easier to spot problems early and understand what’s really happening.

In short, effective observability in high-concurrency Go systems means combining detailed metrics, distributed tracing with proper context propagation, smart sampling, and ongoing analysis. With those in place, you’re in a much better position to detect issues early, debug complex behavior, and keep systems running smoothly at scale.

11: The standard library keeps expanding with utility packages and smarter routing in net/http. Do these reduce the need for external frameworks? What do you feel is still missing?

Mihalis Tsoukalos: The standard library of Go has always been one of its strongest features—clean, composable, well-tested, and rich. Over time, the maintainers have added to it in a very deliberate way. Things like smarter routing in net/http and new utility packages like slices, maps, and cmp have made it easier to build web services, command-line tools, and system-level software directly on top of the standard library.

Yes, these improvements are definitely reducing the need for external frameworks, especially for small to mid-sized applications. One of the best examples is net/http, which has steadily improved: better routing logic, smoother integration with middleware patterns, improved support for HTTP/2 and structured headers, and overall better ergonomics. For teams that prioritize simplicity, performance, and long-term maintainability, that’s a big win.

The new utility packages also help. Tasks like filtering, slicing, comparing maps, or writing type-safe logic can now be done concisely and idiomatically, reducing boilerplate and external dependencies.

That said, the standard library doesn’t replace third-party libraries entirely—especially when working on more complex systems or domain-specific problems. For example, I’ve written HTTP services in Go using Gorilla rather than plain net/http, and for building command-line tools I’ve used Cobra and Viper. The famous Docker tool has been written in Go using Cobra, and the Hugo static site generator also relies on Cobra and Viper. These are powerful tools for real-world utilities.

So, while the standard library is strong and keeps evolving, there are still gaps—particularly in areas like higher-level CLI frameworks or more sophisticated HTTP tooling. I expect the standard library will continue to improve, but tools like Cobra and Viper still fill important roles.

12: Even experienced Go developers make mistakes. What are some of the less obvious ones you still see when people work on performance-sensitive or concurrent systems?

Mihalis Tsoukalos: Everyone makes mistakes—that’s how we learn. The important part is being careful not to make them in production systems.

Even experienced Go developers can run into subtle issues when working on performance-sensitive or concurrent systems. A lot of this comes from Go’s simplicity. Goroutines are lightweight, channels are first-class, and the standard library gives you powerful tools. But that simplicity can hide complexity, and mistakes often come from relying too much on defaults or making assumptions about how the runtime behaves.

One common pitfall is spinning up goroutines without proper cancellation or lifecycle management. We’ve discussed before that using context.Context gives you control—allowing you to cancel goroutines properly and avoid memory leaks.

Another mistake is assuming channels are always the right concurrency primitive. When I first learned about channels, I thought they could solve every concurrency problem. But that’s not true. In some cases, a mutex or an atomic variable is more efficient and easier to work with. Think carefully before using channels.

Memory allocation is another big one. Developers often overlook how temporary allocations—like slices created in a tight loop, or boxing values into interfaces—can lead to heavy garbage collection overhead, which gets worse under high concurrency. Tools like Pprof or go test -bench -benchmem help you spot these patterns, but ideally, you should design with memory efficiency in mind from the start.

Another mistake is making false assumptions about how the scheduler works. Developers sometimes expect goroutines to be preempted fairly, but in CPU-bound loops without I/O or channel operations, goroutines might not yield control. This can lead to starvation or uneven workload distribution. Newer versions of Go have improved scheduling and preemption, but in rare cases you still need to explicitly yield with runtime.Gosched to let other goroutines run.

So overall, the issues I see are not usually about syntax—they’re about architecture. They come from assumptions about how Go handles concurrency and performance under the hood. The way to avoid them is by profiling continuously, testing under realistic loads, and building a solid mental model of how the Go runtime behaves at scale. In other words, learn the internals—don’t just assume.

13: You’ve worked very close to the metal for many years now. How would you compare Go and Rust for systems programming, especially in terms of performance, safety, and maintainability?

Mihalis Tsoukalos: This is a question I get often. Go and Rust take very different approaches to systems programming, and choosing between them depends on the specific priorities of your project.

Rust gives you fine-grained control over memory and concurrency with zero-cost abstractions that can deliver exceptional performance. Its ownership model and borrow checker eliminate entire classes of bugs at compile time—things like data races or use-after-free errors. That makes Rust a great choice for low-level systems where correctness and reliability are absolutely critical—think operating system components, device drivers, or performance-sensitive networking.

But that level of control comes with a steep learning curve. Rust’s mental model—ownership, lifetimes, trait bounds—can slow teams down, especially if they’re new to the language. Refactoring or prototyping requires great care to satisfy the compiler. Rust’s tooling (Cargo, Clippy, Rust Analyzer) is excellent, but the language demands precision. That pays off in safety and performance, but it can be a barrier in fast-moving or exploratory environments.

Go, by contrast, is all about simplicity and development speed. Its concurrency model with goroutines and channels is approachable and powerful. The garbage collector handles memory management, so you don’t need to think about it most of the time. Go may not match Rust in raw performance for compute-heavy workloads, but its performance is consistent and more than good enough for most system-level use cases.

That predictability, combined with readability and minimalism, makes Go practical for building high-throughput services, container tools, infrastructure automation, and other backend-heavy systems. It’s also easier to onboard new developers, and because Go code tends to look the same across teams, long-term maintainability is a real strength.

On safety, Go doesn’t give you compile-time guarantees like Rust. It won’t catch data races before you write code, but it does offer good tools: the race detector, a solid testing framework, and a culture that values clarity and explicitness. Go avoids complexity by design—no macro-heavy DSLs, no surprising inference—so the code stays understandable even as systems grow.

To sum up: Rust is the right tool when performance and safety are top priorities and you’re ready to invest in upfront complexity. Go shines when development speed, operational simplicity, and long-term maintainability matter more.

As an example, we once evaluated Rust for a packet inspection engine but chose Go due to faster development time and easier team onboarding.

In the past few months, I’ve also had the chance to explore Zig. It sits closer to Rust in terms of low-level control, but it’s much easier to learn. Zig has no garbage collector—you manage memory manually—but it’s far simpler than Rust. It may be a sweet spot between Go and Rust when you want to go lower without Rust’s complexity.

14: Looking ahead, what are you most excited about in Go’s evolution over the next few releases? Where do you see the ecosystem heading?

Mihalis Tsoukalos: What excites me most is how Go continues to evolve while staying true to its roots—pragmatic, simple, but increasingly powerful.

One area I’m watching closely is the ongoing evolution of generics. Since type parameters were introduced in Go 1.18, each release has built on that foundation, most recently with features like generic type aliases and iterator functions in Go 1.23. These aren’t flashy changes, but they’re meaningful. They enable more expressive and reusable code across the ecosystem—richer data structures, functional-style APIs, and cleaner abstractions in libraries. I look forward to seeing how the standard library and open-source projects embrace these tools to offer more composable, idiomatic patterns without losing Go’s clarity.

Performance tuning and runtime observability are also maturing quickly. Built-in fuzzing, profile-guided optimizations, and expanded runtime metrics are pushing Go beyond being just easy to use—it’s becoming easy to optimize too. For teams building high-performance systems, this is a big deal. I think profiling and performance tuning will become a routine part of development workflows, just like writing tests.

Concurrency is another area evolving. Go has always had a clean concurrency model, but with Go used increasingly in multicore, high-load environments—APIs, networking layers, real-time systems—there’s more attention on scheduler improvements, memory footprint reduction, and smarter resource usage. The recent fix to the goroutine loop variable capture bug is a good example: a small change, but it eliminates a long-standing issue and makes concurrent programming safer without adding complexity.

Beyond the language, the ecosystem is maturing fast. We’re seeing better libraries, stronger tooling for testing, static analysis, and cross-compilation, and an overall improved developer experience. Projects like TinyGo, Go Cloud, and Go’s growing presence in WebAssembly and embedded environments point to a future where Go isn’t just a server-side language—it’s part of a broader portable systems toolkit.

At the same time, community efforts around formal APIs, versioning best practices, and module proxy infrastructure show that Go is becoming more production-hardened and resilient.

So in short, I’m excited that Go is getting more powerful without becoming more complicated. That’s rare in programming languages. Go is investing in performance, safety, and tooling in a way that feels very Go-like: minimal, orthogonal, and deliberate. The future looks bright because Go isn’t chasing trends—it’s solving real problems with clarity and focus. I think we’ll see it used in even more places—cloud, systems, edge, maybe even mobile—while continuing to be a language teams can rely on for the long haul.

To explore the ideas discussed in this conversation—including concurrency design patterns, profiling techniques, and Go’s evolving support for generics and fuzzing—check out Mastering Go, Fourth Edition by Mihalis Tsoukalos, available from Packt. This 740-page comprehensive guide dives deep into advanced Go concepts such as RESTful servers, memory management, the garbage collector, TCP/IP, and observability.

Fully updated with coverage of Go generics, fuzz testing, Docker integration, and performance optimization, the book combines detailed explanations with real-world exercises. Readers build high-performance servers, develop robust command-line utilities, work with JSON and databases, and refine their understanding of Go’s internals. Each chapter is designed to strengthen both conceptual mastery and hands-on practice, from error handling and data types to concurrency, profiling, and advanced testing.

Whether you’re building network systems, optimizing cloud-native applications, or simply aiming to deepen your Go expertise, Mastering Go provides a practical foundation for writing professional, production-grade software.

Here is what some readers have said:

Designing for Decades: A Conversation with Alexander Kushnir on Longevity, Maintainability, and Embedded Systems at Scale

Divya Anne Selvaraj — Tue, 12 Aug 2025 11:22:40 GMT

In safety-critical domains, code longevity isn’t a nice-to-have—it’s a baseline constraint. Software must coexist with hardware for ten years or more, while withstanding evolving standards, team turnover, and limited upgrade paths. In this Deep Engineering Q&A, we ask industry veteran Alexander Kushnir about the realities of building and maintaining embedded systems that endure. We explore long-term technical debt, the discipline of software rejuvenation, and why modern C++ idioms are reshaping how engineers think about embedded maintainability.

Alexander Kushnir is a principal software engineer at Johnson & Johnson MedTech, specializing in electrophysiology systems. With about 20 years of experience across medical devices, industrial controllers, and networked embedded platforms, he has worked on everything from motion control firmware and network switches to VoIP and medical devices software . His core expertise lies in embedded Linux, modern C++, cross-platform development, and HW/SW integration. He has also built and lead a 2-day workshop related to CMake.

1: How do you approach the challenge of managing architectural technical debt in systems with 10+ year hardware lifecycles, especially in regulated environments where major refactoring or redesign is costly and risky?

Alexander Kushnir:

Technical debt is actually a real problem. However, we can follow several strategies to mitigate the issue:

Build modular software: This strategy pays off again and again. It helps us to isolate a specific functionality, which makes the task of “replacing the wheel in a moving car” easier.
“Divide and conquer”: Separate your application logic from the hardware-dependent logic. You will benefit from that by being able to run the logic not dependent on the hardware (for instance in a simulator or using software mocks that simulate hardware behavior).
Test, test, test: If you follow the previous advice, you should be able to test the logic on your development PC, not just on your target. Why is that good? You can write and run your unit tests with much shorter cycles (think - compiling, loading, debugging…all this on your PC instead of the device).
Use industry-standard and up-to-date tools: Even though it is not a hard requirement, tools keep evolving, and if you fall too far behind, then when you eventually need to investigate an issue in the field, you may find yourself forced to use newer tools you’ve never worked with—leaving you at a disadvantage.

Subscribe now

2: What strategies do you use to mitigate hardware obsolescence in long-lived systems?

Alexander Kushnir:

Of course. It is not exactly my responsibility, but I am in the loop. When designing a hardware platform, the engineer must ensure that the components he chooses have a “long-term support”. Having said that, I prefer to use off-the-shelf System-on-Module (SOM) integrated on a custom board, rather than developing a board with the same CPU (or FPGA) and having to address most basic interfaces such as memory or a flash storage during the board bring-up. This reduces the complexity of board bring-up and makes it easier to handle hardware obsolescence, because the SOM vendor typically manages low-level design, interface validation, and long-term component sourcing.

3: How do you reconcile the need for regular updates (e.g. for security patches or feature improvements) with the need to minimize disruption and regulatory overhead?

Alexander Kushnir:

Every change needs to be justified.

One of the projects I am most proud of was adding a firmware update capability to a device my team was developing.

However, the regulatory burden remains — any update that could affect safety or compliance still requires formal review and, if necessary, re-certification. In practice, we minimize disruption by:

Separating safety-critical functions into a stable, validated firmware baseline that is rarely touched.
Isolating updatable modules (non-critical logic, UI features, analytics, etc.) so they can evolve without impacting certified components.
Using risk-based change management to decide when an update is worth the cost of triggering the regulatory process — for example, prioritizing security patches and critical bug fixes, while bundling minor enhancements into larger, less frequent releases.

In this way, the need to keep embedded software up to date becomes operationally similar to maintaining conventional PC or cloud-based software, but with the extra discipline required for regulated environments.

4: What architectural patterns help maintain software flexibility in these conditions? For instance, have you used hardware abstraction layers, multi-process architectures, or IPC frameworks to decouple software from specific hardware so you can update or add features without a full redesign? How effective have these methods been in extending the usable life of older platforms in your experience?

Alexander Kushnir:

Abstract all you can. Whether one is taking the OOP approach (C++, my love), or a procedural one, abstraction and modularity must be applied. Hardware Abstraction Layer (HAL) is an excellent example of abstraction, as the application logic is not aware of the hardware (for example Linux paradigm took abstraction to the edge - everything is a file, whether it is a network connection, hardware device, or a real file - the user reads from and writes to a file).

Multi-process architecture makes sense when the software has many functionalities, and if one of the functionalities has malfunctioned, it won’t affect other ones. For instance, once I worked on an infrastructure that included a terminal (CLI), database engine, and several more features. So, if the DB engine crashed, the terminal would continue running unaffected thanks to the isolation between processes.

Another tricky multi-process architecture usage is when a programmer needs to utilize a GPL-licensed library in a proprietary environment and is not interested in exposing the code. In such a case they can create a process that links with the GPL-licensed library, and communicates with the main software using a well-defined interface such as pipe, socket or shared memory.

I will repeat myself - abstract all you can. However, you must pay attention to the cost of these abstractions. For example, if you use runtime polymorphism, you’ll need to profile your virtual dispatches to verify that they create no bottleneck in your critical path.

5: How do you decide what to keep backward compatible versus when to break from legacy constraints? Are there lessons from enduring platforms (for example, the VMEbus standard stayed relevant for 40+ years by emphasizing modularity and backward compatibility) that you apply to provide a clear migration path for long-term customers?

Alexander Kushnir:

Well, that’s a tough question. If the device interfaces with the outer world, changing that interface will always be the last priority. However, if changes are inevitable, they can be mitigated. For example, if you think ahead when designing the protocol, you can add versioning so that new features or changes do not affect older generations of devices. In some cases, you can run multiple versions in parallel or provide adapters to bridge old and new systems, giving customers a clear migration path. This approach is similar to what made platforms like VMEbus last for decades—keep the external contracts stable, design for modularity, and plan for evolution without forcing everyone to upgrade at once.

6: In a system meant to last a decade or more, how do you design for maintainability to slow down software aging? Can you share practices you use to avoid “bit rot” that ensure the codebase remains clean and adaptable to new requirements over time?

Alexander Kushnir:

All principles mentioned in my answer to the first question apply here. You can’t avoid software aging, as the ecosystem moves quickly. However, if your system is modular enough, the changes can be rolled out gradually, for instance, refactoring module by module, after testing each one thoroughly.

Additionally, CI tests are a must. I would even say that every pull request should be gated, i.e. only if the pull request passes all the tests, should it be merged. Many developers don’t like writing tests, but as a matter of fact, the tests protect them, and provide developers the confidence to make major changes without breaking things.

7: Have you observed issues like memory leaks, data corruption, or performance degradation creeping in over long uptimes in embedded systems? If so, what proactive fault-tolerance techniques do you recommend to address this?

Alexander Kushnir:

I don’t believe in regular restarts or “scheduled maintenance” where the only action is a reboot. If there’s a problem like a memory leak, it should be fixed—not hidden—especially on a resource-tight device.

Memory leaks are possible, of course, but they can be avoided. In modern C++, for example, using smart pointers eliminates most manual memory management errors. During development, I also recommend dynamic memory analysis tools such as Valgrind, which is still underrated in pre-release testing. Combined with thorough code reviews and targeted stress tests, these measures catch leaks and other resource issues before deployment, reducing the need for reactive “rejuvenation” in the field.

8: What fault-tolerance strategies do you build in to ensure long-term reliability? Can you share how you determine the right level of redundancy or self-diagnostic capability for a design that needs to last a decade?

Alexander Kushnir:

All the systems I’ve built have interacted with a human at some point—whether an operator, a technician, or an end user. In such cases, the most practical solution is a periodic health check, or Built-In Test (BIT), that monitors critical components and manages system state when a fault is detected. Typically, this means indicating the issue to the user—via an LED, buzzer, or display—so corrective action can be taken.

The specifics depend on the criticality of the system. For non-safety-critical designs, the goal is early detection and clear reporting so the failure can be fixed before it escalates. For higher-reliability requirements, BIT can be combined with fault isolation, allowing unaffected subsystems to keep running, or with limited redundancy (e.g., a backup sensor or communication path) to maintain partial functionality. The “right” level of redundancy or self-diagnostics is always a trade-off between cost, power, size, and the consequences of downtime—but even in minimal designs, proactive monitoring and clear fault signaling are essential for long-term reliability.

9: How do you ensure that devices you design today can be kept secure 10+ years down the line?

Alexander Kushnir:

Like I’ve mentioned before, one of the features I’m most proud of is the firmware update capability we built into one of the devices I worked on. I think this is a crucial capability—not just for delivering new functionality, but also for applying OS and security patches over the device’s entire lifetime.

To keep a system secure for 10+ years, the update mechanism itself must be secure: signed and verified updates, encrypted transport, and a rollback option in case an update fails. In regulated environments, it also needs to integrate with compliance workflows so updates can be deployed without breaking certification. In some cases, it’s wise to design for network segmentation or controlled update channels, so that only trusted endpoints can initiate the process. Without this foundation from day one, long-term patching becomes either risky or impossible.

10: Are there insights or practices—whether from automotive, avionics, or industrial IoT—that you find relevant or transferable to your work? Are there philosophies or practices from other domains that you think MedTech could borrow—or should avoid?

Alexander Kushnir:

I think the processes in MedTech are good, but slow. Code review, documentation, testing—these all have a clear purpose, and they exist for good reasons. But no process has to be sanctified. Code review isn’t done just because “that’s the rule”; it’s done to catch defects and improve design. The same goes for documentation and tests—they’re tools, not rituals.

That’s something I see in other industries as well. Automotive has learned to speed up iterations without skipping the essentials, especially with OTA updates. Avionics shows how you can lock down safety-critical code while still evolving peripheral systems. From these, I think MedTech can borrow the idea of tailoring process intensity to the context—keeping rigorous control where safety demands it, but streamlining where it doesn’t. The key is to always ask: how crucial is this step at the stage we’re in right now?

Building Emergency-Ready AI: A Conversation with Tony Dunsworth

Divya Anne Selvaraj — Wed, 06 Aug 2025 11:12:17 GMT

For engineers building AI systems in production, scale isn't always the hard part—constraints are. In public safety, the challenge isn’t how to train the biggest model, but how to make a small one reliable. You’re dealing with scarce compute, strict privacy rules, and zero tolerance for downtime—while still trying to deliver tools that make a difference.

In this conversation, Deep Engineering speaks with Tony Dunsworth, a Database Administrator and Data Scientist with the City of Alexandria, where he helps shape the data backbone of 911 emergency response systems. With a Ph.D. in data science and over 15 years of operational experience, Tony has worked across the full data lifecycle—engineering, database administration, applied analytics, and forecasting—in some of the most mission-critical environments in government.

His recent work focuses on deploying AI with limited resources: leveraging open models, running on commodity hardware, and building reliable systems without exposing sensitive public safety data. In this interview, he shares practical strategies for architecting cost-effective AI tools, using synthetic data to train and validate models securely, and maintaining performance under stress in high-stakes domains. From real-world lessons to emerging trends in multilingual AI assistance, Tony brings a rare combination of technical depth and civic-minded design.

You can watch the full conversation below—or read on for the complete transcript.

Subscribe now

Architecting AI Solutions on a Tight Budget

1: Public safety agencies often have minimal tech budgets and few specialist staff. For example, roughly 75% of U.S. 911 centers run with only two to five people on duty and limited resources. Given these constraints, how do you identify and implement AI solutions that are both useful and affordable?

Tony Dunsworth: I think centers have to find their own specific pain points. Each one has its own type of immediate thing that they feel they need, and there are a lot of companies that provide solutions and focus on that space.

One of the biggest spaces right now is intercepting non-emergency calls. In the United States and in Canada, when you call 911 or you call your local non-emergency line, you're going to the same pool of telecommunicators. So one of the things that we're trying to do is figure out how we can take some of those calls—especially for non-emergency services—and find ways to reroute them so they don't get through. If we have fewer calls coming into the center, having fewer telecommunicators isn't as big a deal.

My dissertation, for example, identified different libraries—LLM-based or neural network-based libraries—that could be deployed with minimal technical resources. If you just knew how to write five or six lines of Python code and had someone show you how to write the right import statements, you could get those forecasts that you needed.

There are other products—like Blue Sky Analytics, which is an R-based analytics platform—that are drag-and-drop or menu-driven. You don't have to know as much about how to program in R. As long as you can tell it, “Here's the variable I want, and I want the summary of it,” it can produce that for you in two clicks. So you don't have to have the same programming logic. A lot of it is: where do you identify what you need first, and what’s out there that you can use to leverage your budget in the best way possible?

2: Can you share an example of where a specific budget-friendly analytics or ML tool can make a difference in emergency operations?

Tony Dunsworth: Blue Sky—because I've recommended it to some of my colleagues before. You can load a dataset and then, just by clicking through menus, tell it what you're looking for. You tell it, “I want a bar graph of the number of calls that we’ve taken in this time period,” like, say, “I want to know how many calls we’ve taken in a month.” You can see how that goes.

Or I can say, “How many traffic stops have we executed in the last three weeks, by week?” And then look at that even further and say, “Tell me where they’re located.” And through a couple of clicks and a little bit of looking through the tech, you can easily get those pictures. You can see a better profile of your data, so you can see what your data is telling you.

3: Over the past year or so, open-source AI models have quickly narrowed the performance gap against proprietary giants, while also offering advantages in cost control and security. To what extent has this trend influenced your approach to building AI systems? And what benefits or trade-offs have you observed while doing so?

Tony Dunsworth: I've built two AI labs. I've built one personally that I use—that I'd say is a little more forward-thinking, a little more pushing the envelope. And then I've built one in our environment at work that I'm more conservative with. But leveraging both of those gives me better control, like you said, over cost. So I'm not accidentally running up a large bill with some model provider because I didn't realize I'd run out of tokens, or I didn't realize that I had pushed the model to limits it wasn't expected to go to.

And I try to be creative. For example, I may use models in my personal lab that I can't use at the office due to some underlying security concern. But it gives me a focus on what I can work with.

The biggest trade-off is understanding that the speed is going to be a lot slower. Even with my lab having 24 gigs of RAM, or my office lab having 32 gigs of RAM, they are still noticeably slower than if I'm using an off-site LLM to do similar tasks. So you have to model your trade-off, because I have to also look at what kind of data I'm using—so that I'm not putting protected health information or criminal justice information out into an area where it doesn't belong and where it could be used for other purposes. So the on-premises local models are more appealing for me because I can do more with them—I don't have the same concern about the data going out of the networks.

4: Using open models internally can help with data privacy and cost, and these models can also be optimized to run efficiently on local hardware, letting you keep sensitive data in-house and save on cloud costs. Have you already taken advantage of such approaches in your work? For example, have you been able to fine-tune smaller open models or use lightweight ML frameworks on-premise to deploy AI without requiring too much high-end hardware or high cloud bills?

Tony Dunsworth: Yeah. Right now, I've been doing more of the work with lightweight or smaller LLM models because they're easier to get ramped up with. They're easier to go ahead and get something that I can put into a test lab and expose to people and say, “OK, let me know what this is doing for you.” Is this—both size-wise or resource-wise or token-wise?

I'm interested in working on quantizing models a little bit better so that I can try to use larger models and have a bit more experience with some of the larger models, because some of them are very promising. I have to be cognizant of my environment, knowing that I have definitely some restrictions, but I’ve also got to take and build my own knowledge gap.

I use resources like publishing—I have a subscription—so I use a lot of access to tools and materials to help. And I use other tutorials around the way so that I can get better at handling that type of programming. I'm relatively new to it, so I'm learning as I go along how to better leverage my own engineering skills to improve the performance of those LLM models so that I can do more with them.

5: Can you briefly tell us what you mean by quantizing as a way of optimizing?

Tony Dunsworth: Quantizing is a way to optimize an LLM by making it work more efficiently with fewer resources—less memory consumption. It doesn’t leverage all of the parameters at once, so it’s able to package things a little bit better. It packages your requests and the tokens a little more efficiently so that the model can work a little faster and return your responses—or return your data—a little quicker to you, so that you can be more interactive with it.

6: Even as open-source AI becomes more capable, experts still talk about challenges around scalability and enterprise-grade reliability for these tools. What kind of obstacles or failures have you encountered so far when employing open-source or low-cost solutions in a mission-critical setting?

Tony Dunsworth: The biggest challenge is resource base. Unfortunately, none of us have infinite resources—I wish we did. So running into a situation where a model bogs down because I'm pushing the model hard for something, and at the same time I'm pushing the resources the model's running on very hard—it gets frustrating.

And that's one of the reasons I'm trying my hand at learning how to quantize models in my personal laboratory, because I recognize that I've got to make my work more responsive to my users—or it’s not worth it. That’s one of the biggest frustrations: when I can get it working reasonably well for me, and then I turn it loose to a test group and it seems like it doesn't work at all for them. You feel like you've put all this time into it, and it's not doing the things you promised it would do.

The other big challenge is upkeep. That is: do I reevaluate my models? How do I retrain? Do I bring in a new model that uses fewer resources and retrain it—and go through all that process again? It’s about continuously reevaluating my work and the work of the model behind it. How is that working? How do those fit together? Am I stressing my resources? Am I actually getting the coding right to deliver what I'm intending?

And I have to make sure I move my data—and use my data—in the best way possible to augment what I can do with the model. So there are always a few challenges. It’s not really enough to build it—I’ve got to take care of it too. And I have to plan for upgrades and enhancements, because once the users get ahold of it, they think of things that I haven't thought of. They say, “Well, can you get it to do this?” And it becomes a whole iterative process: how do I take those and fold them into the next round?

7: What are some strategies to ensure that an inexpensive solution is still reliable and maintainable? For example, that a prediction model will perform accurately under real emergency conditions—or that a homegrown tool will be supported in the long term?

Tony Dunsworth: The “supported in the long term” part is an educational experience. Right now, within the city, I'm probably at the vanguard of doing things with programming AI models. We've adopted some external software that allows us to have some AI interaction, as we state it right now. But as we push into what we can do, I'm starting at the vanguard of programming and writing more of the scripts and programming.

The challenge is finding the colleagues that are interested, and getting the time to work with them to show what I've done, listen to their experiences, see what they're doing or what they're interested in—and then putting that together so we can start building a game plan that advances it a little farther. And that’s always a challenge, because we all have our day jobs, and this isn't necessarily our immediate core function.

So it’s about finding that time in between—and how do we work it into our core functions so that we can spend more time building new things that we can then show to the city and say, “These are the things we can do. And if you give us a few more resources, or more time, or more opportunity, we can do even more.”

Synthetic Data for High-Stakes Domains

8: In high-stakes domains like emergency services, real datasets—like 911 logs—are often sensitive, proprietary, or too scarce to use freely for AI development. How can synthetic data, that is, artificially generated datasets, address this problem? What strategies do you currently use to generate synthetic emergency call data that preserves the important real-world patterns?

Tony Dunsworth: I started out a long time ago working with a commercial product, and one of the things they suggested is to take a real dataset and take it apart. Because I’m also the analyst who works with that data, I take it apart and find the things I need to see in it. For example, our center is a combined center, so it answers for police service, fire service, and medical services. I look at the ratio of the calls we receive so that when I make that dataset, it reflects those ratios properly.

Or I look at the reports I currently generate that I want to transition to maybe AI generation, or I want to be able to automate in some fashion. I look at that data. So I use real data as a seed—insofar as I find details about the data that I then want to recreate. And for me, it’s a lot of statistical recreation.

The generator I’m working on now is all Python. It’s all open-source scripts. But now that I have that down fairly well, I can feed that into an AI model and say, “OK, I need you to examine this.” This is where I make a lot of progress: I set it in my local model and say, “OK, this is the generator. What can we do with the data that comes out of it?” I feed it prompts and questions and say, “Let’s massage the model a little bit more so you can generate that dataset based off these parameters.”

And I’ve gotten some really interesting results—fairly positive. I build a lot of it, and then I pass it off to my local models to refine what I’m working on and make it a little better. Between the two, it's made it so that I can generate fairly realistic datasets that I can use in many different places.

I use it in testing my analytics models because I can do an analysis of that dataset really fast—I know what I’m looking for—and then I can have my model do the same thing. I make sure that they match. If it tells me something I didn’t see, I go back and recreate it: is it there, or is it something I need to work on because maybe my model isn’t as accurate as I thought?

One of the biggest challenges is, when I first started building datasets, a lot of my elapsed times between events—because we calculate how many seconds it takes between one event and another—were all normally distributed. In real data, it may be exponentially distributed, log-normal, Poisson, or a gamma distribution. So I check. I run tests against synthetic data to say, “OK, I just want the numeric variables. I don’t want the header. I don’t want anything else—just the numeric variable.”

I’ve written a script that analyzes the distribution and compares it. Then it tells me, “Here’s the distribution, here are its details.” And I feed that back into the model and say, “OK, make sure you’re using this distribution with these parameters when you generate these elapsed times between events.”

9: Government and industry groups have started emphasizing that synthetic data is a key privacy-preserving technique. For example, a 2023 federal strategy noted that synthetic data can unlock the beneficial power of data analysis while protecting privacy. Yet it also noted adoption has been slow due to limited awareness, lack of standards, and concerns about quality. In your experience so far, what are the biggest hurdles to using synthetic data in practice?

Tony Dunsworth: The biggest hurdle was pushback from peers at first. When you start presenting synthetic data—well, the reason I started using it for presentations and for instruction was that I didn’t want to lose my audience if they recognized the events. So when I explained to them, “This is all synthetic data,” they kind of looked at it like, “Well, it looks correct.”

We started taking it apart, and that’s when we found the challenges I identified earlier—like the normal distribution showing up in places where it shouldn’t. But as I’ve reiterated and refined the generators over time—the way I do it—the closer I bring it to a real dataset, the more they see, “Oh wait a minute, can I use the generator so that I can give a presentation without exposing real data?” And I explain how to use the generator, what the data output’s going to look like, how to do things with it.

Now people are more interested. Just a couple of weeks ago, I had a company say, “Hey, have you thought about using that generator for this purpose?” I hadn’t, but I said, “You guys can clone it—it’s a public repository. Clone it and start working with it.” I keep it under an open-source license. Just give me the improvements back—that’s the last rule.

I’m starting to see more people interested in using synthetic data so we can preserve privacy while still advancing what we’re doing. A lot of what I’m seeing now is in training, in teaching, and especially in testing—because if we can generate a dataset, or even just the pieces we need, we can plug that into the application to test the output.

10: Have you faced any challenges when it comes to validating models trained on synthetic data and how they will perform on real-world data?

Tony Dunsworth: Thankfully, not yet. Actually, I’ve been pretty successful. I think the closer I’ve made my datasets to being as realistic as possible, the better I’ve seen the model performances behave.

When I first started—yeah—I got some really weird output, especially when trying to build models that look at data analytics. One of the things we want to do is improve the use of analytics in centers. I want to make it easier for analysts, or for people, to analyze data. And I found that because my models weren’t as realistic as I thought they were, the output obviously wasn’t as accurate as I’d like it to be.

So I found programmer challenges more than application challenges, because it was the old saying: garbage in, garbage out. And when I was giving it certain types of data, it gave me exactly what I would expect—but it wasn’t what I needed. Seeing how the output wasn’t what I was looking for helped me refine my models and refine my overall analytics goals.

11: Like you’ve mentioned before, when training staff or demoing new tools, you never use real 911 call data—only synthetic data—to avoid exposing sensitive information. And you've talked about how you've dealt with the challenge of skepticism within your team or the people you work with. But when it comes specifically to trainees—those just starting off—or decision-makers, how have you found their reception to the idea of using synthetic scenarios for learning?

Tony Dunsworth: For the people I'm training, once you get over the initial explanation of why we’re using synthetic data, they’re usually on board with it. Decision-makers are a little more cautious. They’ll say, “Well, I want to make sure this is as realistic as possible,” because if we’re going to do this, we need that.

Over time, as you show them what you're working on and how it comes out, they’re the ones who—once they come on board—advocate as strongly or more strongly than I do. So that initial resistance melts away. But usually, for folks—especially when I’m training analytics—for them it’s like, “OK, you want to see some of the nastiest things we can put to you, to see what you can do and how well you perform?” All I have to do is change a couple of things in the model and let them go.

And they're like, “Hey, this is great,” because now they can focus on a solution. They can focus on a specific area that they want to work on or need help with. And I can make sure the data gives them that challenge, as opposed to waiting for events to line up correctly with real data. So it becomes a lot easier, especially in training—when you're training programmers or analysts—because you can really hone in on what you want them to focus on.

12: What safeguards or best practices would you say you follow to ensure that nothing about the synthetic data inadvertently results in a privacy violation—or duplicates an actual incident in an identifiable way?

Tony Dunsworth: I do two different things. One, I follow a lot of the ethical guidelines I was taught. I was very fortunate throughout my education—through my software engineering courses, my analytics and data science courses at university—that ethics was stressed as one of the most important things we needed to focus on alongside practice. So I have very solid ethical programming training.

I studied for the Certified Ethical Hacker exam years and years ago and took the practice exams and passed them. So I have a good background in knowing how to do things in an ethical way, and I keep a hold of that.

The other thing I do is I'm careful about where I pull seed data from. I'm careful about what kind of data I'm reproducing. For example, I don't reproduce narratives that come with calls—because they’d be too close. They’d be too close to maybe a triggering event or something that could come out looking like actual incident data. So when I'm working with tabular data, I'm very narrow and specific about what I do.

I also double-check when I get output from—say—a Faker library to generate names. I go through my own list of people I know to make sure that name doesn’t show up as someone I might know. Because, yeah, John Smith is fairly standard—but if I got a “Tony Dunsworth” accidentally generated from the Faker library, I’d say, “Whoa, wait a minute—we’re going to redo this.” I don’t want to see my name in there because now it may be too close.

The other thing is, I've started reviewing different frameworks—like the NIST framework we talked about in earlier conversations—so I can make sure I'm incorporating even more safeguards and rails into my practice. That way I can guide my development processes more efficiently to maintain ethical standards, maintain privacy standards, and make sure I'm doing everything as tightly as possible.

I’ve talked to my colleagues in the private sector, and I see I’m not the only one doing these things. My colleagues in the private sector—especially in the AI sphere—are doing the same. So I feel confident that the people I work with, whether it's public, private, or within my organization, are all operating from the same place: protecting as much information as we can, leveraging the metadata as best we can, but making sure we're not exposing anything that we can’t.

Risks, Lessons, and Future Directions in Public Safety AI

13: Deploying AI in emergency services carries very unique risks—from technical failures to ethical concerns—but also promises new capabilities. Regardless, introducing AI into emergency response is undeniably a high-stakes initiative. Lives can be affected by a bad prediction or a system failure. How do you approach risk management for AI tools in such a critical domain? You already talked about NIST’s AI Risk Management Framework, specifically when it came to privacy, but what about other kinds of risks?

Tony Dunsworth: One of the biggest risk mitigations is starting out from the beginning—knowing what you want to use AI for and how you define how it’s working well. How do you handle the data in transit? What data sources are you exposing to the AI, and what is it going to do with that data? Are you feeding that data back into model training at the same time? Is that data staying within your organization, or is it leaving?

For example, even if I build an AI in my organization, if I'm leveraging an off-site model, am I feeding that data back to that model? What data am I feeding, and how do I work with that vendor to make sure the data is being used the way I intend?

I prefer to keep my training in-house. That way, if I'm training my model off my own data to improve its accuracy, that doesn't leave my organization. But that’s a personal preference—I don’t want to expose that risk.

So a lot of it comes down to: How do you define success? What are you using it for? If you're using it just to say, “Well, we're using AI,” I'm going to be the first one to raise my hand and say, “Stop. We're not going to do it just for the sake of doing it.” But if we have a defined use case, and we have a measurable success rate—so that we know, for example, we're using AI in our quality assurance—it’s enabled our QA manager to process more calls, which is improving our ability to service our community. And we can see noticeable improvements.

Now we know that the AI assistance is paying off. It's doing what we want it to do. And we know that our risk-reward of data management and governance is providing a positive result. So we continue with it.

A lot of that is what we have to do—find that trade-off. What do we want it to do? How do we define if it’s doing it? And how do we handle the data in between?

We know that data is going to contain sensitive information. In this case, we know that data contains information, but we also know the company we purchased it from showed us how they protect that data. We found that their protection was FedRAMP grade, CJIS certified—it had all the protections built in that protected us not only legally but ethically. So we knew what they were doing with our data.

That's one thing I’m very strident about: asking our vendors, “How do you protect our data? How do you use our data?” And teaching the executives how to ask those questions—what questions are important and how to ask them. Whether they’re negotiating with me or with an outside vendor, they need to know how to get to the core: How do we protect our data while still getting the most use out of it?

14: If you had to create a list of recommended steps, what would you say are the steps to test and validate a model to ensure it wouldn't fail in the middle of a 911 operation?

Tony Dunsworth: I always recommend stress testing. Get synthetic data together to test it—and then just, in the middle of your testing lab, hit it all at once. Hit it with everything you've got, all at the same time. Stress it. Make it work really hard and see how it performs.

If you see it bogging down—if there’s a slowdown—is the slowdown still acceptable? Are you still getting enough back that you can continue in time and see that it’s giving you accurate responses? That's the first step of testing.

The second step: stress it again. Make it do everything all at once. And if it continues to perform well, then you have some confidence that when you’re deploying it in a situation where the wheels have fallen off and everything happens at once, it’s still going to be reliable enough for you to continue to operate.

We know that, with the nature of our business, anything can break at any second. So as long as we know it can handle some of that stress, and we can overstress it in test and it still works well enough, then we feel confident that—even if it breaks—it may break in a way we hadn’t anticipated, but we know it can still recover and come back to service quickly.

15: Can you share a lesson from a project that didn’t go as expected—perhaps an AI model that initially underperformed or a data initiative that faced resistance?

Tony Dunsworth: Yeah. I started building an AI-assisted analytics platform, and I thought I had gotten all of my stuff right. I started in testing—the first couple of tests worked well—and then I fed it a dataset that was a little more challenging than the ones I had used before, and it threw up. For lack of a better descriptor, it just said, “I got nothing.” It worked for 20 minutes on what I thought was a basic problem and then said, “I cannot create a solution.”

Unfortunately, I did that stress test in front of management.

Yeah. Lesson learned. I went back. I’m back at the drawing board. I threw everything out and started redoing it, because I found it was also overly complex. What seemed natural and normal to me—when I asked people to poke at it and look into it—they couldn’t get from Step 1 to Step 3, because Step 2 wasn’t obvious.

So I took all that feedback in and said, “We’re going to throw this codebase away. We’re going to start with a new codebase, but we’re going to start with engineering it in a different way.” And I’m in the middle of that process now. I’m really hopeful. I’m really optimistic that I have the workflow defined a lot better.

So what I’m looking at is not one global solution like I wanted, but micro-solutions that will do different things inside the same framework. And I think that will work a lot better.

16: If you had to pin down the most important lesson you learned from that experience, what would it be—and how has it impacted your practice moving forward?

Tony Dunsworth: Better user feedback.

Because I’m a trained analyst—I have multiple degrees in it—I assumed I could develop an analytics flow that would work for everybody. I learned really quickly: it worked well for me, but it didn’t work well for my target audience. And I failed to take my own target audience into account. I assumed they’d look at my workflow and say, “Oh, that’s the right one.” But that wasn’t what they wanted. What they wanted was a different workflow.

I learned that lesson the hard way—that the workflow didn’t work for them. So back to the drawing board.

Now, I reach out more often to power users—because really, they’re my user base—and ask: “What do you want to see? Do you want to see this? Will this work for you?” I pitch ideas, get ideas back, and that’s building a better workflow.

Ultimately, that’s the core of it. It’s the same in any software engineering: if your users aren’t going to be comfortable using it, it doesn’t matter how many bells and whistles it has. It doesn’t matter how great it works. If they’re not going to use it—it doesn’t work.

17: Looking ahead, what developments are you personally most excited about at the intersection of AI and public safety?

Tony Dunsworth: I’m excited about a couple different things. First and foremost, there’s a lot of focus in the vendor community on bringing AI to non-emergency calls so that we can intercept them and take care of them—without the telecommunicators manning our emergency lines being the ones who have to handle them.

Now we’re starting to integrate more of our—here in the United States—311 program, where it’s things like: there’s a pothole in the neighborhood, or the trash pickup didn’t happen. In the past, people would call those into the emergency lines. Now we’re fusing those together so that if someone calls in, it can be routed to the right place—whether it’s 311 staff or even a different department in the government structure that’s more suited to handle it.

So we’re providing more efficient service to the public and reducing the volume that our call-takers handle. It really is a community win-win, because now we can get those services out faster.

The other thing is: my first degree many years ago was in linguistics. So I’m excited about finding better solutions to handle multiple languages. In the city I work in, we publish our city documentation in four languages: English, Spanish, Amharic, and I believe it’s Sudanese Arabic. Because those are our major population groups.

In another community, maybe they need Vietnamese or Korean or Chinese—or French, in Canada. If we can improve the quality and speed of translation, we won’t have to hold to grab a language line and an interpreter. We can get that mobile response out to someone in an emergency several seconds faster.

That can save a life. That can get someone to medical treatment faster. That can calm a situation down faster. If we can do that, we’re benefiting our community and taking stress off of our telecommunicators at the same time.

Those are the most promising things to me—how we can do those things more efficiently, more quickly, and with greater benefit to our communities.

Inside LLVM Backends: A Conversation with Quentin Colombet

Divya Anne Selvaraj — Wed, 30 Jul 2025 13:31:55 GMT

From instruction selection and register allocation to TableGen quirks and machine learning integrations, backend development in LLVM requires a unique blend of precision, modularity, and deep architectural insight. In this conversation, we speak with Quentin Colombet—author of LLVM Code Generation (Packt, 2025)—about what it takes to build and extend modern backends in one of the most widely adopted compiler infrastructures in the world.

Quentin is a veteran LLVM contributor with over two decades of experience in compiler backend development. He’s the architect of GlobalISel—LLVM’s modern instruction selection framework—and serves as code owner for LLVM’s register allocators. Since joining Apple in 2012, he’s contributed to backend support for x86, AArch64, and Apple GPUs, and has worked across a wide range of architectures including microcontrollers, DSPs, and ASICs. Beyond his technical leadership, Quentin is also a long-time mentor to new contributors in the LLVM community.

In this interview, we cover the motivation behind his book, how to balance onboarding content with advanced internals, and the design rationale for GlobalISel’s modular pipeline. We also dig into backend portability, the realities of debugging codegen issues, and why MLIR and machine learning are reshaping how developers think about compiler design—from handwritten passes to auto-derived heuristics. Whether you're writing a new target from scratch or trying to understand how instruction legalization actually works, Quentin offers a candid and deeply practical guide to the LLVM backend.

You can watch the full conversation below—or read on for the complete transcript.

Q1: What motivated you to write LLVM Code Generation, and how does it address the existing gaps in LLVM documentation, particularly concerning back end development?

Quentin Colombet: That's a great question. Part of the answer is already in the introduction you gave. I've been mentoring people ramping up on LLVM for a long time, and one of my motivations was that I'm kind of a lazy guy—I don’t want to repeat myself a thousand times. I wanted to put into writing whatever is needed to ramp up in the LLVM ecosystem.

I'm exaggerating, of course, because writing a book is a lot of work, but that was the primary motivation: to give people resources to get started with LLVM. This is something we've discussed a lot within the community, especially around back end development, where there’s a big need because that part of LLVM is not well documented.

People have to make a significant effort to get into that part of LLVM, and when they finally do, it's like, “Well, now I know it, why would I bother explaining it or writing it down?” That’s something I saw time and time again. At one point, I decided, “Let’s really write this down so everyone has a kind of source of truth for these things.” That was the main motivation for the book.

Now, just to add a bit more on documentation: if you look at LLVM, it's actually one of the most well-documented open source projects I know. The documentation is very well made, but the key piece that is well documented is the LLVM intermediate representation (IR). From a back end development perspective, that’s actually the middle end. If you look at a typical compiler pipeline, your source language gets compiled into LLVM IR by the front end. Then you enter the middle end, where most of the optimizations happen on the IR, and finally the back end, which deals with machine IR.

And it’s this last part—the back end and machine IR—that is pretty much undocumented. That’s what I wanted to cover in the book.

2: The book appears to cater to both newcomers and seasoned LLVM developers. How were you able to balance foundational concepts with advanced topics like TableGen or Machine IR to ensure accessibility without oversimplifying things?

Quentin Colombet: Yeah, that's a good question. I have to admit, the trade-off wasn't easy. And frankly, I’d say the jury’s still out—ultimately it’s going to be the readers who tell us whether we did a good job.

In the book, I tend to go very deep, but I also present all the concepts I use, so everything is self-contained. For seasoned people, that may feel like too much content, but for newcomers, at least they have access to that information.

To strike a balance, each chapter of the book comes with a quiz. The idea is to cover what’s been presented throughout the chapter. Instead of starting by reading the chapter, you can begin with the quiz. If you can’t answer the questions, that probably means it’s a good idea to read the chapter. If you can answer everything easily, then maybe you can move on to the next one.

For some concepts, I wish I had gone further, but the book is already pretty big—I think it's around 600 pages—so I had to make cuts. I cut content that goes deeper into some guts of LLVM that aren’t necessarily useful unless you want to modify them.

For example, you mentioned that I'm the code owner for register allocation. I can tell you exactly how it works inside and out, but in practice, that's not very useful unless you're modifying it. If you're just using it, it’s more helpful for me to explain how to use the API, how to configure it, and where to find more information if you want to go deeper.

3: Could you elaborate on how the Global ISel framework improves upon SelectionDAG and FastISel in terms of performance, extensibility, and target customization?

Quentin Colombet: Sure. We're going very deep, very fast here. Just to give a little context for people who may not have read the book or aren't familiar with LLVM: earlier, I mentioned the front end, middle end, and back end of the compiler pipeline.

Instruction selection—what we call ISel—is the transition from the middle end to the machine IR part. It’s where you go from a target-agnostic representation, like LLVM IR, to something specific to your actual target architecture, like X86 or AArch64. That’s when the actual instructions of the final assembly begin to show up. Instruction selection is about getting to those instructions and picking the best possible ones.

Now, in terms of performance, extensibility, and target customization, Global ISel has several advantages. First, it’s much younger than the other two frameworks, so we were able to learn from the mistakes of the past. From the start, it has a much more modular design. Instruction selection actually involves multiple steps—one of them is called legalization.

Legalization is where you map high-level concepts from your source code, like a multiplication, to the actual instructions supported by your target. For example, maybe you're doing A * B, but your target doesn’t support a multiply instruction. So you break it down into a series of adds, like what you learn in school. That’s legalization—making something that’s illegal for the target legal by using what's available.

In SelectionDAG and FastISel, all those steps happen in one monolithic pass. You go from LLVM IR to machine IR in one shot, and there’s not much you can do in between. It’s a black box. But with Global ISel, it’s a set of distinct optimization passes. Between those passes, you can insert your own target-specific or generic optimizations. That modularity gives you better flexibility, more opportunities for code reuse, and makes debugging and testing easier.

On performance: Global ISel operates directly on machine IR, which is the core IR for the back end. SelectionDAG uses its own intermediate representation, so the path goes from LLVM IR → SelectionDAG IR → Machine IR—two hops. With Global ISel, it's just LLVM IR → Machine IR. That one fewer hop already gives a performance boost—your compiler runs faster.

There’s also scope. SelectionDAG and FastISel work at the basic block level—each block is processed independently. That limits what you can do in terms of optimization across blocks. Global ISel works at the function level, giving you more context and more opportunities for optimization.

So to sum it up: Global ISel is faster, more modular, easier to debug, and gives you a broader optimization scope.

4: What should developers keep in mind when porting Global ISel to a new target? How do components like call lowering, register bank info, and legalization interact in that process?

Quentin Colombet: The four APIs you mentioned—call lowering, register bank info, legalizer info, and the instruction selector—correspond to the different stages I just described in Global ISel.

Call lowering is about mapping source-level arguments—like in C++—to the actual registers or stack locations for your target. Register bank info defines how values are assigned to register banks—essentially how you deal with the target's physical registers. Legalizer info, as I explained earlier, tells the compiler what's legal or illegal for your target and how to transform the illegal stuff into legal instructions.

Now, in terms of challenges, you mentioned “porting” Global ISel to a new target. There are two cases here. One is if your target already has an existing implementation using SelectionDAG. Then the challenge becomes: how do you reuse as much of that code as possible when moving to Global ISel? That’s tough because the intermediate representations are different. There's no magic bullet—you’ll have to reimplement parts of it.

We did put in place a compatibility layer using TableGen, which is LLVM’s domain-specific language used to describe instruction selection rules. You can reuse some of the same TableGen descriptions for both SelectionDAG and Global ISel, which helps a bit. But there's still a lot of code to rewrite.

The other case is when you’re writing a Global ISel implementation from scratch. The challenge there is design. There’s no single right way to do instruction selection. You need a coherent plan—decide when and how you're going to lower each operation. For example, when do you break down a multiply into adds? You could do it during legalization, later, or even earlier—it’s up to you.

Because Global ISel is modular, it’s easy to look at just one piece at a time. But if you're not careful, those pieces may not fit together properly, or you may end up implementing functionality that doesn’t even make sense in the broader pipeline. My advice is to keep things grounded. Always go back to what a real programmer might write, and make sure your lowering works end-to-end—from LLVM IR all the way to assembly. Then you can break it down into phases, confident that everything connects properly.

Also, among those components, call lowering is probably the easiest. It’s mostly about implementing the ABI—the binary-level convention for passing arguments in your target’s calling convention.

5: How does TableGen facilitate back end development, and what should developers be careful of when working with it?

Quentin Colombet: TableGen is kind of the hated child in LLVM. It’s a domain-specific language—a programming language developed within LLVM to help with LLVM development. And it’s used everywhere.

For example, in Clang, the user-facing part of LLVM, TableGen is used to define compiler options—optimization levels, warnings, and so on. It's also used for target features, like enabling vector or cryptographic extensions. And it’s heavily used in instruction selection, to define selection patterns. You’ll also see it used for intrinsics and many other things.

So if you work with LLVM, you'll touch the TableGen DSL at some point.

TableGen itself isn’t that hard. It has its own syntax, which is a bit weird at first, but once you get used to it, it’s manageable. The tricky part is that the syntax alone doesn't tell you the semantics—what your code actually means depends entirely on how it’s used, and that varies by context.

So you might write something in TableGen that looks the same whether you’re defining a Clang option or an intrinsic, but behind the scenes, the classes and records you use have totally different meanings. That’s because the semantics are defined not by TableGen itself, but by the backend generator that processes your TableGen input and turns it into C++ code or whatever else.

Ultimately, TableGen is a code-generation tool. If you’re adding a new Clang option, for example, doing it manually would mean registering it with the driver, wiring it up to different components, and writing a lot of boilerplate. TableGen lets you describe the option once, and the boilerplate is generated for you.

But that generation behavior is backend-specific. So the same TableGen syntax might generate completely different code depending on which backend is processing it. That’s the first difficulty: you have to do a kind of mental shift when working across different TableGen backends, even though the syntax looks identical.

The second issue is error messages. When something goes wrong—when you use the wrong syntax or reference something incorrectly—the errors are often vague or inconsistent. Different backends give different kinds of feedback, and understanding those messages isn’t easy.

I talk about this in the book. I offer some guidance on how to look into the TableGen backend code and reverse-engineer what's going wrong. But at the end of the day, it's often trial and error. Everyone in the LLVM community kind of dislikes TableGen, so if you don’t enjoy working with it, that’s expected.

That said, it's still a powerful tool. It exists to improve the productivity of compiler developers. You just have to get used to it.

6: LLVM 20 introduced several back end improvements, including Global ISel refinements and expanded RISC-V support. In your view, what were some of the most significant recent changes, and how do they reflect broader trends in compiler infrastructure?

Quentin Colombet: LLVM 20 is an interesting release because a lot of the work isn’t immediately visible to users—but it’s still meaningful. One big area of focus was compiler speed. We spent a lot of time making the compiler faster using techniques like profile-guided optimization, which optimizes the compiler itself based on how it’s used in practice. I won't go into too much detail on that, but the upshot is: the compiler should be faster compared to previous versions.

Another behind-the-scenes improvement was in release management. Between LLVM 19 and 20, the process for accepting patches after cutting the release branch became more rigorous. That means fewer last-minute bugs. From the user’s perspective, we hope this results in a more stable release.

There were also some backend-internal improvements that aren't user-facing but help compiler writers. For instance, function and attribute metadata in the intermediate representation got more precise. These attributes—things attached to functions to express constraints or additional semantics—now let us do more aggressive optimizations or better enforce correctness.

Remember, LLVM IR isn’t just for C or C++. It supports a wide range of source languages like Rust or domain-specific languages for GPUs. The IR needs to be expressive enough to handle all those cases, and attributes are one way we enrich it to describe what’s allowed or expected. With LLVM 20, we can be more precise, which means more optimization opportunities and tighter guarantees.

As for RISC-V: it’s an interesting beast. The spec is still evolving, and people keep adding new extensions. As those extensions mature, they get added to the LLVM backend. That means the instruction selector can now take advantage of those extensions to generate better code. If your processor supports a new, more efficient instruction, LLVM can now use it. So performance improves for the end user.

7: Tools like llvm-isel-fuzzer have been instrumental in uncovering backend bugs. How do such specialized fuzzing tools integrate into LLVM’s development workflow, and what benefits have you seen them bring to backend stability?

Quentin Colombet: I have some experience with fuzzing tools, but I haven’t used them heavily in my day-to-day workflow. Well—actually, I have used them, but mostly as a hardening tool. You need to get a lot of the basics right in your compiler before fuzzing becomes relevant. So fuzzing is usually one of the last things on your to-do list.

What fuzzers are really good at is finding edge cases—things that are technically valid but extremely weird. These aren’t necessarily inputs that a human would ever write, but they do exercise parts of the compiler in unexpected ways. That can help uncover crashes or subtle bugs, especially security-related ones.

This is particularly important in contexts like GPUs, where the compiler might actually run on a user’s device—on their phone, watch, or tablet. If an attacker can crash the compiler, that opens up a possible security vulnerability. Crashes are opportunities for malicious code injection, so from that angle, fuzzers are a valuable tool.

That said, fuzzers don’t help much with the quality of the generated code. They're not about finding missed optimizations. They’re about making sure the compiler doesn’t crash or behave in weird, undefined ways. In practice, these tools are often used by academics who stress-test the LLVM infrastructure and then report issues. It’s more of a backend stability thing than a user-facing performance tool.

8: Let’s talk about ML techniques—starting with MLGO, introduced by Trofin et al. in 2021. With machine learning now being applied to compiler optimizations, how do you see it influencing backend development, particularly in instruction selection and register allocation?

Quentin Colombet: That’s something I’m really curious about. Compilers are full of heuristics, and machine learning is great at discovering heuristics we never would’ve thought of—automatically.

There’s a lot of potential here, but there are also challenges. One big challenge is identifying the right parameters to feed into your machine learning model. To use an analogy: could you price a house just by counting the number of windows? There’s probably some correlation, but it’s not enough.

Similarly, in something like register allocation, the features you use to train your model may not carry enough information for it to make meaningful decisions. That’s a general problem in machine learning: capturing the right features is a bit of a black art. You try something, see how it performs, and iterate.

Another challenge is integration. If you look at Global ISel, SelectionDAG, or the register allocator, the APIs don’t necessarily give you a lot of hooks to inject machine learning guidance into the inner workings. If all you can do is tweak some knobs from the outside, you may not be able to make meaningful improvements.

So then the question becomes: do you need to write your own instruction selector or register allocator to take full advantage of machine learning? I think the answer is yes—but we’ll see how things evolve.

There’s also the issue of compile time. Machine learning models can be slow. Will users tolerate waiting 10 seconds for a 1% improvement in performance? What about 10 minutes? There’s always a trade-off.

This isn't a new problem. For decades, we’ve known that some compiler optimizations can be solved optimally using things like integer linear programming. But we don’t use them because they’re too slow. So while ML is promising, especially in research, there’s still a lot of work to do before it becomes practical in production compilers.

9: MLIR introduces a multi-level intermediate representation offering more flexibility in compiler design. How does MLIR interact with LLVM’s backend, and what advantages does it bring to code generation for heterogeneous systems?

Quentin Colombet: MLIR is a relatively new addition to the LLVM family. As you said, it stands for Multi-Level Intermediate Representation, but what that really means is that it's a framework for defining your own intermediate representations. It gives you a huge design space for creating IRs that suit your specific needs.

This is especially useful in heterogeneous systems. With MLIR, you can model both your CPU and GPU modules within the same IR. That opens up optimization opportunities across different targets, which is something LLVM IR alone can’t do.

For example, let’s say your CPU is calling a GPU function. In traditional LLVM IR, those components would be handled separately. But with MLIR, you can represent both sides in one place. That means you could move computations between devices more easily or apply cost models to decide what should run where.

That said, MLIR is a layer above LLVM IR. It doesn’t replace it—it feeds into it. So if your front end used to produce LLVM IR directly, now it might produce MLIR instead, and that gets lowered into LLVM IR. A good example is machine learning frameworks like PyTorch—they can output MLIR, do graph-level optimizations there, and then lower to LLVM IR for final code generation.

But here’s the catch: if you do everything in MLIR, you have to implement those transformations yourself. You’re not automatically reusing LLVM’s optimizations. The LLVM backend still handles codegen, but only after MLIR has done its work.

There’s an opportunity here to reuse more of the LLVM backend within MLIR or to integrate the two more tightly. We’ll see where that goes. But right now, MLIR gives you a lot of flexibility—if you’re willing to build on top of it.

10: Debugging backend passes can be complex. Are there tools or methodologies you personally recommend for diagnosing and resolving issues in code generation?

Quentin Colombet: Yes—and I cover this in the book because it's a key problem. Compilers are complex systems, and debugging them efficiently is critical.

The first thing you can leverage is LLVM’s logging infrastructure. It lets you see what’s happening as your program is lowered through the LLVM pipeline. You can enable logging globally or for specific passes if you already suspect where things are going wrong.

But if you don’t yet know what’s misbehaving, the next step is to reduce your input. LLVM provides tools like llvm-extract and llvm-reduce for that. For example, you can use llvm-extract to isolate a single function from your input file. Then you can keep compiling just that function to reproduce the issue.

llvm-reduce goes further. You give it a predicate—say, “this IR causes a crash”—and it will automatically minimize the IR while preserving that behavior. So instead of debugging hundreds or thousands of lines, you end up with 10 lines that still reproduce the problem. That’s a huge productivity win.

That works great for compiler crashes. The harder case is miscompiles—where the compiler doesn’t crash, but the generated program behaves incorrectly. Those are tougher because there’s no obvious failure signal from the compiler itself.

In that case, the first step is to disable interprocedural optimizations. These cross-function transformations can obscure things, and turning them off helps isolate the problem. Then you can start narrowing things down using function boundaries. Because function calls follow a known ABI, you can mix and match how functions are compiled—for example, compile one with optimizations and one without—to see which one introduces the bug.

Eventually, you can isolate a problematic function, reduce it, and get a minimal reproducer. At that point, there’s no substitute for staring at the assembly and figuring out what went wrong. Maybe someday machine learning will help with that—but for now, it’s still a manual process.

11: For developers interested in contributing to LLVM backend development, what areas currently need attention, and how can new contributors get involved effectively?

Quentin Colombet: That’s a great question—and one I hear a lot from newcomers. First, I’d encourage people to step back and think about what “contribution” really means. A lot of folks assume it’s only about writing code and sending patches, but there are other valuable ways to contribute.

For instance, filing issues is a big help. If you encounter an IR that causes a crash or incorrect behavior, reporting that clearly is already a contribution. LLVM has a ton of open bugs, and the bug tracker keeps growing. One way to help the project is by triaging those bugs—reproducing them, reducing the input, and making them easier to debug. That’s a great way to get familiar with the toolchain and to practice the debugging techniques I mentioned earlier.

You can also contribute by reviewing patches. Code review is essential to the progress of any open source project. And reviewing is a good way to learn—pick an area you’re interested in, follow the changes, and build your understanding. Even asking questions like “Could you add more comments?” or “I didn’t understand this part”—those are helpful. What’s unclear to you might be unclear to others, too.

When you do feel ready to submit patches, there are plenty of open issues you can work on. And if you reach out, LLVM contributors will usually help guide you through writing and reviewing your first patch.

In terms of specific areas: loop optimizations historically haven’t been LLVM’s strong suit, so there’s definitely room for improvement there. And more broadly, there's always interesting new research coming out of academia. If you see something promising, try implementing it in LLVM—you’ll learn a lot, and you might improve the compiler.

So yeah, there’s no shortage of opportunities to contribute.

12: Looking ahead, what trends or technologies do you anticipate will shape the future of LLVM backend development, and how should developers prepare for them?

Quentin Colombet: We touched on this earlier, but I think MLIR is going to be a big part of the future. It’s already being adopted widely—especially in the ML world. For example, Triton, a language used to write high-performance ML kernels, is based on MLIR. Nvidia is using it in tools like CUTLASS. So if you're working in that space, learning MLIR is a must.

Even beyond ML, I think it’s becoming important to understand the full compiler stack. For backend developers, that means knowing what happens in the front end and middle end, too. Producing the right LLVM IR from MLIR is critical—because the LLVM backend performs best when the IR is shaped a certain way.

LLVM is a great C and C++ compiler, and a lot of the backend optimizations have been unconsciously tuned over the years for those languages. So when other languages generate IR that looks very different, things may not work as well. You have two options: improve the backend to handle those patterns better, or adjust your front end to generate IR that looks more like C++. Knowing the full stack helps you make the right choice.

Also, AI is a powerful tool for understanding code or exploring optimizations. Use it—but stay in control. If you generate code with AI, make sure you understand what it’s doing. In some environments, you might not even be allowed to use AI due to copyright or security policies. So it’s important to be able to work without it, too.

Finally, I’d say the ultimate goal is to make compilers more accessible to end users. Sometimes the best way to help developers isn’t to make the compiler smarter—it’s to give users better knobs to express what they want. Tools like Triton succeed because they expose low-level control in a usable way. That’s something we should all aim for: making the compiler a more useful tool for developers.

To explore the ideas discussed in this conversation—including how to design instruction selectors, build legalizer stages, and debug backend passes with LLVM’s own reduction tools—check out LLVM Code Generation by Quentin Colombet, available from Packt. This 620-page comprehensive guide walks readers through the internals of LLVM’s backend infrastructure, from transforming IR to generating optimized machine code. With step-by-step examples, targeted exercises, and hands-on walkthroughs using TableGen, Machine IR, and GlobalISel, it’s both a reference and a roadmap for backend developers working on real-world architectures. Whether you’re building a custom target, contributing to LLVM itself, or deepening your compiler expertise, this book provides a practical foundation for mastering the backend.

Here is what some readers have said:

Foundations of Quantum Programming: A Conversation with Prof. Elías F. Combarro

Divya Anne Selvaraj — Wed, 23 Jul 2025 11:04:45 GMT

From quantum search and error correction to tooling constraints and software reproducibility, programming for quantum computers is unlike anything in classical systems. In this conversation, we speak with Professor Elías F. Combarro—co-author of A Practical Guide to Quantum Computing (Packt, 2025)—about what it means to write, reason about, and teach quantum software in a world where the hardware is still catching up. This book also serves as a prequel to the 2023 book, A Practical Guide to Quantum Machine Learning and Quantum Optimization also co-authored by Combarro.

Combarro is a full professor in the Department of Computer Science at the University of Oviedo in Spain. With degrees in both mathematics and computer science, his research spans computation theory, logic, quantum optimization, and algebraic structures. He has held research appointments at CERN and Harvard, and served on the advisory board of CERN’s Quantum Technology Initiative from 2021 to 2024. His recent work focuses on bridging mathematical formalism with executable quantum systems.

In this interview, we cover foundational algorithms like Shor’s and Grover’s, why Qiskit emerged as the most practical tool for teaching and experimentation, and how to build mental models that scale from toy examples to real circuits. Along the way, we explore entanglement, measurement, abstraction, simulation, and what quantum advantage might realistically mean in the coming decade—for engineers, researchers, and systems designers alike.

You can watch the full conversation below—or read on for the complete transcript.

The Book: A Practical Guide to Quantum Computing

1: What prompted this decision to go back to basics with your second book?

Elías F. Combarro: I would say there are two main reasons for going back to these foundational algorithms in quantum computing with our second book. The first is that we actually wanted to include these algorithms in the first book, but it was impossible. If you’ve read it, you’ll know it’s almost 700 pages long—much more than we expected when we started writing. So it just wasn’t feasible to include anything else.

We were always thinking, “Oh, we should have included those very important algorithms.” It wasn’t possible, so we had this lingering idea to come back and write another book—or maybe an extended edition—that would include them. These foundational algorithms are very important in quantum computing. They’re probably the first algorithms that everyone studies when they start learning the field. But we chose to focus on more modern algorithms in the first book, like those used for optimization and machine learning, because at the time those were hot topics and there weren’t many books covering them.

Then, in addition to that desire to write about these foundational algorithms, new courses in quantum computing have been introduced—at our university and many others. I’m currently teaching two different courses, and possibly a third is coming next year. We felt the need to have a textbook for these classes. The courses include content from our first book on quantum machine learning and optimization, but they also cover foundational topics like Shor’s algorithm, Grover’s algorithm, and protocols like quantum teleportation and quantum key distribution. These are quite different from optimization and machine learning, but equally important.

So we felt both a personal and practical need: we wanted to write about these topics, and we also needed good materials for our students—and for anyone, anywhere in the world, who wants an introduction to quantum computing.

2: Was there any feedback that you received for your first book that influenced your approach in terms of pedagogy or technical depth in the second book, which is more foundational?

Elías F. Combarro: Well, I must start by saying we were overwhelmed by the reception of the book. Even yesterday, I received a message from a master’s student here in Spain—I’ve never met him—but he wrote to say he was defending his master’s thesis in quantum machine learning, and that one of the main reasons he chose the topic was our book. That’s just one example of the many wonderful messages we’ve received. We’ve been really overwhelmed and happy with the response.

I think the main problem—if you can call it a problem—with the first book was that we had a lot of code intertwined with the explanations of the concepts and algorithms. We’re hands-on people. We like to learn by doing, so we thought it was important that readers could fire up Anaconda or JupyterLab and run code to reinforce the concepts they were learning. That was our approach.

But the drawback is that quantum software libraries evolve very quickly. They change versions frequently, and some of the code may become outdated or need small modifications to keep working. We discovered this after publishing the book. Readers appreciated having code alongside the explanations, but it also made the book harder to update because everything was so interwoven.

So, in the new book, we decided to separate the code out. We have chapters that focus only on code, and others that focus only on explanations. That way, readers can learn the algorithm in one chapter and then go to a separate chapter to see how to implement it—run it, modify it, and do exercises. At the same time, this structure makes it easier to update the code since it’s not embedded in the explanatory text.

I still think both approaches have merit and can be useful for students, but this new format is likely easier for someone who picks up the book two, three, or five years from now. They can learn the algorithm, then check online for updated notebooks if needed.

3: In many ways, the second book is like a prequel, isn’t it? For those who may find it more challenging to start with the first one. You’ve used the same hands-on approach and focused more on Qiskit. What makes Qiskit the right choice for teaching foundational quantum computing, in your view?

Elías F. Combarro: Well, this was a difficult decision because nowadays there are several quantum programming languages—or rather, packages or libraries—to choose from. All of them are nice and have their own advantages and drawbacks.

In our first book, we included three different languages: Qiskit, of course, but also PennyLane and D-Wave’s Ocean. We were focusing on quantum machine learning in that book, and for that, PennyLane is probably even better than Qiskit. And if you want to program D-Wave’s quantum annealers, you need Ocean—there’s no other way to access those machines.

But in this new book, we’re going back to basics, so we didn’t need three languages—just one was enough. For foundational algorithms, almost any quantum programming language would suffice. That said, Qiskit has the largest number of features, and it’s the easiest one for accessing quantum computers online. For us, that was very important. You can run code locally on simulators, which are not real quantum computers but simulate their behavior. But at the same time, it’s great to be able to access actual quantum computers online, and Qiskit makes that very easy.

For instance, one of the exercises we propose in the book is to take a quantum protocol, run it locally on a simulator, and then run it on a real quantum computer. You only need to change three or four lines of code to make that switch, but it’s very satisfying to say, “I’m running this on an actual quantum computer.”

There’s one exercise in particular that I really like. It’s based on a protocol used to explore whether nature is quantum or classical. This is called the CHSH game, and we explain it in full detail in the book. We give the code and ask readers to run it on a quantum computer. The result is a figure of merit—the ratio of times you win the game—and this number exceeds what’s possible with classical systems. To me, that’s fascinating. The physicists who originally performed this kind of experiment won the Nobel Prize in 2022. And now, you can run something like it yourself, just with a laptop connected to the internet.

Core Quantum Concepts

4: How should software developers think about a single qubit and its state? For example, why use the Bloch sphere or state vector representation? And how does that view change when moving to two or more qubits?

Elías F. Combarro: That’s a very important question. When you think about qubits and their states—especially with a large number of qubits, like in today’s quantum computers—intuition becomes very difficult.

For example, when you access quantum computers online, some of them have 127 qubits. That means you’re implicitly working with 2^127 numbers. That’s such a huge number, it’s hard to even imagine. So developing intuition about those kinds of structures is really tough.

With just one qubit, though, we have a nice geometric representation called the Bloch sphere. I must say I’m not a very visual thinker—geometry isn’t something I’m particularly intuitive about. I prefer symbolic and algebraic representations. But the Bloch sphere is helpful: every point on the surface of the sphere represents a possible state of your qubit, and quantum gates—operations—can be visualized as rotations of this sphere. It’s a nice way to see what’s happening when you apply operations to a single qubit.

Personally, I prefer thinking in terms of state vectors—ordinary vectors, in this case with complex numbers, though often you can think of them as real numbers. So for one qubit, you only need two numbers. For more qubits, it becomes a longer vector, and the system’s state is just this vector of numbers. Any operation you perform is just a matrix multiplication on this vector. To me, that’s the most useful mental model for what’s happening in the computer.

There are other geometric representations that extend beyond one qubit, but I find them more exotic than helpful—though that may just be because I’m not a geometrical thinker.

What I always tell my students is this: the amount of math you need to get started with quantum computing is surprisingly small. You just need to know what a vector is, what a matrix is, and how to multiply a matrix by a vector. And even if you don’t know that, we cover it in an appendix. So you don’t need advanced math or physics to begin—you can start right away if you understand vectors and matrices.

5: What does an entangled two-qubit state look like in this representation? Why can’t it be factored into independent single-qubit states?

Elías F. Combarro: Yeah, this is a very surprising aspect of quantum systems. Entangled systems can't be described by just looking at the states of their individual parts. So you can have a system with two qubits, and there’s a global state that describes both together. But if you look at either one in isolation, and then try to reconstruct the full system’s state from those parts—you can’t. You need the full global state.

This is exactly why you need 2ⁿ amplitudes for an n-qubit system. If each part could be described on its own, you’d only need two numbers per qubit. But the correlations between the qubits—the entanglement—are encoded in the rest of the numbers. They're not locally accessible from just the individual qubit states.

This was something that was very surprising in the early days of quantum physics. Even Einstein was baffled by it. He called it “spooky action at a distance,” because when you have entangled particles or qubits, a modification to one part instantly affects the other, even if they're far apart. But Einstein also developed general relativity, which imposes a speed limit on how fast information can travel. So the idea of this instantaneous change really disturbed him. But it’s been experimentally confirmed, over and over—including in the CHSH game I mentioned earlier.

Now, going back to representations: block spheres are great for individual qubits, but they break down for entangled states with two or more qubits. That’s why I find the vector representation more useful. It gives you a mathematical way to check whether a state is entangled. If you can factor the state into a product of individual qubit states, it’s not entangled. But if you can’t—if the state isn’t a product state—then it’s entangled, and you must treat the system as a whole.

6: Why is entanglement considered such a crucial resource in quantum algorithms?

Elías F. Combarro: Because this property—entanglement—only exists in quantum systems. It doesn’t happen in classical physics. And that means you can use it to implement protocols and algorithms that are simply impossible with classical resources.

We describe some of these in the book—for example, how to send information using superdense coding, or how to teleport quantum states. These kinds of applications absolutely require entangled states. You can’t do them with classical means alone.

From the perspective of quantum computing and quantum information science, these are concrete ways to exploit entanglement. It could even be central to a future quantum internet. With entanglement, you can teleport states over long distances, which could be a useful communication tool. So these ideas and protocols are practical ways in which entanglement becomes a real computational and informational resource.

7: In what fundamental way does measuring a qubit differ from reading a classical bit?

Elías F. Combarro: This is one of the most surprising things for people new to quantum computing—or quantum physics in general. In classical computing, you take it for granted that you can always inspect your data. You can look at variables, data structures, lists, trees—whatever—and see exactly what values they hold.

But in quantum computing, it's completely different. Quantum states can be in superposition, possibly entangled, and described by a large number of amplitudes. But when you perform a measurement, you can't access all that information. You only get a small part of it.

Take a single qubit, for example. Its state is described by two complex numbers. In theory, that’s an infinite amount of information—real numbers can have infinite decimal places. But when you measure it, you only get a single classical bit: 0 or 1. The act of measuring collapses the state probabilistically into either 0 or 1.

And once you’ve measured it, you’ve destroyed the original state. If you measure it and get 0, and then measure it again, you’ll just keep getting 0—you’ve lost everything about the prior superposition. The system collapses, and that collapse is irreversible.

And this randomness is fundamental. If you run the exact same quantum algorithm twice, with the same input, you might get different outcomes. For people used to classical programming, that's very strange—how can the same inputs give different outputs? But it’s intrinsic to quantum mechanics. It’s not like classical randomized algorithms where the randomness comes from a pseudo-random number generator. In quantum computing, the probabilistic behavior is built into the physics.

So quantum measurement differs from classical data retrieval in two big ways: first, it’s probabilistic; and second, it changes the state of the system. You can’t measure the same system multiple times and expect to extract more information. Once you measure, the original state is gone.

8: How would you say developers can decide which qubits to measure, and at what points in a circuit?

Elías F. Combarro: It depends on the algorithm or the application. But until very recently, with actual quantum computers, you could only measure qubits at the end of a circuit. That might sound like a limitation, but it’s really not—because of something called the deferred measurement principle.

There’s a theorem in quantum computing that says if you want to measure something in the middle of a circuit, you can simulate that effect by deferring the measurement to the end and adjusting the circuit accordingly. So in terms of computational power, there’s no difference.

Now, some platforms like Qiskit do allow mid-circuit measurements. You can measure certain qubits partway through the execution of a circuit. But in practice, it's still often simpler to just measure at the end.

9: What strategies, according to you, can help manage the randomness introduced by quantum measurement?

Elías F. Combarro: First, we need to distinguish between two different sources of randomness. One is intrinsic to quantum theory—this is the probabilistic nature of measurement itself, and it’s unavoidable. The second is due to imperfections in actual quantum hardware—noise, gate errors, and environmental interactions.

To handle the intrinsic randomness, you need to apply statistical methods. For example, suppose you're trying to determine whether a certain element with a specific property is present in a vector—maybe you're looking for a client from Spain in a customer database. You might use Grover’s algorithm for this. Even if the client exists, Grover’s only gives a probabilistic guarantee that you'll find it. Maybe the probability is 99%, but if you’re unlucky and only run it once, you might miss it.

So what do you do? You repeat the algorithm multiple times and either take the best result or use a voting scheme. For example, say you’re using a quantum classifier to determine whether an image is a cat or a dog. If you measure the output qubit once and get 0 (cat), you can’t be sure. But if you repeat the process 100 times and get 70 zeros and 30 ones, then you can conclude it’s most likely a cat.

Similarly, in quantum phase estimation—which is important in many fields—you repeat the procedure to get better and better approximations. The more you repeat it, the more accurate the estimate.

Now, regarding noise from hardware imperfections: in most of the book, we work with idealized quantum computers. But in the last part, we introduce quantum error correction. There are also simpler techniques like error mitigation. One method involves calibrating the machine—measuring how often errors occur when you input known states like 0 or 1. With that data, you can adjust your measurements afterward to account for those errors.

And then there’s full quantum error correction, which is more sophisticated and harder to implement, but also more powerful. It's essential for working with real quantum hardware in the long term.

Key Quantum Algorithms

10: Grover’s algorithm offers a quadratic speedup for unstructured search. Could you explain the core idea of amplitude amplification and what assumptions or resources it requires?

Elías F. Combarro: Grover’s algorithm is probably my favorite quantum algorithm. It’s mathematically beautiful and, from a computer science perspective, very surprising.

The core idea is this: suppose you have some data, like a vector, and you want to find an element that satisfies a certain property. If the data is unstructured—for example, if the entries are in random order—then classically, the only option is to search one by one. If there are a million entries, you might have to check all million.

But with Grover’s algorithm, even if the data is completely unstructured, you can find the correct entry with only about a thousand checks—because of its quadratic speedup. That’s a huge difference, and it gets more dramatic as the data size increases.

How does it work? First, you create a superposition of all possible inputs. Then, through amplitude amplification, you increase the probability of measuring the correct solution. And this happens through a series of geometric transformations—rotations, essentially. With each step, you rotate closer to the solution vector. If you stop at the right point, your probability of measuring the correct answer is very high.

But—and this is important—if you keep going beyond that optimal point, you start rotating away from the solution again. So running Grover’s too many times actually makes your results worse. That’s very different from classical search, where more effort generally improves your chances.

As for resources: like most quantum algorithms, Grover’s assumes you can represent your input as a quantum oracle. You can’t just read a file from a hard drive. You have to encode the information into a function that can be queried by the quantum computer.

For example, if you’re searching for a client from Spain, your oracle takes an index and checks whether that client meets the condition. It returns true or false. In the book, we explain how to implement such oracles for different problems so you can actually use Grover’s in practice.

11: Can Grover’s technique be applied to problems beyond literal database search?

Elías F. Combarro: Yes, definitely. One application we didn’t cover in this book—but did in our previous one—is optimization. The idea of search naturally extends to optimization problems.

For example, suppose you’re looking for the client in your database who spent the most money last year. You don’t know in advance what that maximum value is, so you can’t search for a specific threshold. But what you can do is iteratively refine your threshold using Grover’s algorithm.

You might start by looking for clients who spent at least $1,000. If you find one, you raise the bar—$2,000, $3,000, and so on—until you no longer find anyone who meets the threshold. That gives you a way to zero in on the maximum.

This approach is called Grover Adaptive Search, and we explain it in our first book. It’s a straightforward extension of Grover’s ideas to optimization scenarios.

12: What are the current limitations of Grover’s algorithm on near-term hardware?

Elías F. Combarro: The limitations are similar to those affecting Shor’s algorithm and most of the algorithms in this book. These algorithms were designed for ideal quantum computers—machines with no noise and perfect gates.

But today’s hardware is noisy, and these algorithms typically involve long circuits with many operations. The longer the circuit, the more likely it is that errors will accumulate. That makes it hard to get reliable results.

Another issue is connectivity. Algorithms like Grover’s often require operations involving all qubits at once. But on current machines, you can’t directly entangle distant qubits. You have to insert additional gates just to move information around so that qubits can interact—and that inflates the circuit even more, making it more error-prone.

So the main problems are noise, long circuit depth, and limited qubit connectivity—all of which make it very hard to run Grover’s algorithm at useful scales today.

13: Shor’s algorithm factors large integers exponentially faster than classical methods. Can you outline how it uses period finding and the quantum Fourier transform to achieve this speedup?

Elías F. Combarro: Yes. I have to say this was the most difficult part of the book to write. I think it’s Chapter 11—almost at the end—and we build up to it across the earlier chapters. I’ve studied Shor’s algorithm for many years, so it's second nature to me. But trying to explain it clearly, from first principles, was a real challenge. At the same time, it was a lot of fun, because it forced me to restructure my understanding and find the simplest possible way to present the ideas.

Shor’s algorithm is incredibly important. On the surface, factoring integers may not seem all that exciting, but it underpins much of our modern cryptography. The security of online communication—including the connection we're using now—is based on cryptographic protocols that assume factoring large numbers is computationally hard. Even with powerful classical computers, it would take millions of years to factor large keys. But a quantum computer running Shor’s algorithm could break that encryption much more quickly. That’s why there's a global push to develop post-quantum cryptographic protocols.

The key idea behind Shor’s algorithm is that factoring can be reduced to a problem of period finding. That is, given a number aaa, you want to find the period rrr such that armod N=1a^r \mod N = 1armodN=1. This gives you a periodic function.

Classical computers are bad at finding the period of such functions efficiently. But quantum computers can do it using the quantum Fourier transform, which is extremely fast. And whenever you hear “Fourier transform,” think: “we’re trying to extract frequencies or periodicity.”

So, you create this periodic function by raising numbers to powers and taking remainders modulo NNN, and then you apply the quantum Fourier transform to extract the period. Once you have the period, you can compute the factors of the original number. That’s the heart of the algorithm.

14: What are the main challenges in implementing Shor’s algorithm on today’s quantum hardware?

Elías F. Combarro: It’s very similar to what we discussed with Grover’s algorithm. You need a lot of gates, which means long circuits—and that introduces a lot of noise.

Also, to factor large numbers, you need to store those numbers in the quantum computer. If you're working with cryptographic keys that are 2,000, 3,000, or 4,000 bits long, you’ll need at least that many qubits. Today’s largest quantum computers only have a few hundred physical qubits—and those are not error-corrected.

To get 2,000 or more reliable logical qubits, you’d need many times that number in physical qubits—perhaps hundreds of thousands. That’s far beyond current capabilities. So both the qubit count and noise levels are major obstacles to running Shor’s algorithm on real hardware today.

15: Are there any smaller-scale or simplified versions of Shor’s algorithm that can be run on current hardware? Or perhaps other quantum algorithms for number-theoretic problems that might be practical sooner?

Elías F. Combarro: Yes, actually. In the last few weeks, I’ve read several papers proposing simplified versions of Shor’s algorithm that reduce the number of qubits needed so it can run on smaller quantum computers. But even with those simplifications, the qubit requirements are still far beyond what we currently have.

As for other number-theoretic problems, there’s Simon’s problem, which is a purely academic problem with no practical application—but it's been used to demonstrate quantum advantage in a limited sense. I think just recently, maybe a day or two ago, I saw a paper where researchers ran a reduced version of Simon’s problem on real quantum hardware and showed some advantage.

The challenge is that most of the quantum advantage demonstrations we’ve seen so far are still for academic problems that don’t have real-world applications. They’re very interesting to researchers like me, but they’re not useful yet in a practical sense.

Quantum Error Correction and Quantum Advantage

16: Quantum error correction, or QEC, is essential for scaling up quantum computers. What are the basic principles behind QEC—for example, the distinction between logical and physical qubits?

Elías F. Combarro: The idea behind quantum error correction is similar to classical error correction. In classical computing, error correction happens all the time, mostly through redundancy. If you’re sending a bit over a noisy channel, and you send it just once, you can’t be sure it was received correctly. But if you send it three times—like 000 or 111—and the receiver gets 001, they can infer that the message was probably 0 and correct it.

The more redundancy you add, the more resilient the message becomes. If you use 1,000 bits instead of 3, you can drive the probability of an incorrect message down as far as you like.

Quantum error correction works on the same principle, with some important twists. Instead of storing information in a single qubit, you spread it across many qubits. The individual ones are called physical qubits, and the combined, encoded unit is a logical qubit. It’s an abstraction that behaves like a perfect qubit, even though it’s built from noisy ones.

But there’s a catch: in classical computing, you can check the value of a bit directly. In quantum computing, you can’t do that—you can’t measure a qubit without collapsing its state. So quantum error correction uses partial measurements—what we call syndrome measurements—that only reveal limited information about what kind of error may have occurred, without disturbing the actual quantum information.

From that limited data, you can then infer what correction to apply and restore the logical state. So it’s similar in spirit to classical error correction, but it has to work under stricter constraints due to the nature of quantum mechanics.

17: The term “quantum advantage” or “supremacy” gets a lot of attention. How would you define quantum advantage rigorously? And can you cite examples of problems or tasks where even current noisy quantum devices might outperform classical ones?

Elías F. Combarro: I think there are several ways to think about quantum advantage, and that’s part of why the term creates confusion—people use it to mean different things.

One kind is mathematical quantum advantage. That’s when you can prove, in theory, that a quantum algorithm outperforms any classical algorithm for a given task. Grover’s algorithm and Shor’s algorithm are examples. If you have a large enough, error-corrected quantum computer with the right connectivity, then mathematically, you can run these algorithms faster than on a classical computer. There’s no debate there.

But then there’s practical quantum advantage: showing, in real-world experiments, that a quantum computer solves a particular problem faster than the best known classical algorithms. That’s much harder to pin down, because classical computers are improving too. New classical algorithms appear all the time. So even if a quantum computer beats classical systems today, someone might develop a better classical algorithm next month—and then that quantum advantage disappears.

This actually happened in 2019 when Google claimed quantum supremacy for a specific problem. At the time, classical computers couldn’t solve it. But now they can—so that instance is no longer an example of quantum advantage. Of course, Google and others keep improving their quantum systems too, so it’s a race.

Eventually, for some problems, quantum computers will pull ahead for good. But we’re not there yet. So practical quantum advantage is a moving target—and a subtle one.

Another point to keep in mind: the problems used in today’s quantum advantage experiments are not practical. They're interesting academically, but they don't solve real-world problems. We expect that to change in the future, once we have larger and more stable quantum machines.

18: Conversely, what are some of the most common misconceptions people have when they hear claims about quantum advantage?

Elías F. Combarro: The biggest one is assuming that if quantum advantage has been demonstrated, then quantum computers can now solve all problems faster. That’s just not true.

In reality, these demonstrations apply to very specific, often artificial problems that don’t have practical applications. So people hear “quantum advantage” and think it means we can now simulate molecules faster or break encryption—but we’re not there yet.

Another misconception is assuming that quantum advantage, once demonstrated, is permanent. As I mentioned earlier, it’s not. It can vanish if a better classical algorithm is developed. So it’s not a one-time milestone—it’s part of an ongoing race between classical and quantum approaches.

Quantum Software and Engineering Practices

19: As quantum computing frameworks like Qiskit mature, what programming abstractions have emerged? How do quantum circuits and gates, for example, map to classical programming concepts?

Elías F. Combarro: This often surprises classical programmers. I remember the first computer science student who came to my office interested in quantum computing. He asked, “How do you implement a loop in a quantum computer?” And I had to say, “Come in and sit down—I have bad news.”

Quantum programs are fundamentally different. You don’t have loops. You don’t have persistent memory or data structures in the way you do in classical programming. What you have is a quantum circuit—a finite sequence of operations that runs once, from start to finish. You can't stop, inspect, or loop within the circuit. You run it, you measure, and then you’re done.

That’s why many quantum algorithms require a classical computer to post-process the results or control repeated executions. You might run a quantum circuit hundreds or thousands of times and then use a classical routine to aggregate the measurements and make decisions.

This structure makes it hard to build higher-level abstractions. Quantum circuits are more like assembly code—very low-level, with no branches or loops. That said, some reusable quantum subroutines have emerged, like amplitude amplification or the quantum Fourier transform. These can be treated as modular building blocks—like functions or libraries in classical programming.

But the core challenge is that quantum circuits must be executed in full. You can’t pause, inspect, or reuse intermediate results, because measurement collapses the state. That makes composition and modular design harder than in classical systems.

20: Are there any emerging design patterns or standard libraries of quantum operations that developers can adopt to manage complexity in quantum code?

Elías F. Combarro: Yes, libraries like Qiskit include some useful abstractions. In our book, for instance, we build a kind of design pattern for Boolean functions and oracles. These let you express conditions or constraints within quantum circuits, and they’re essential for algorithms like Grover’s.

That said, circuit design is still tricky. I’m not an expert in hardware-level optimization, but I’ve collaborated with people who are—one of our technical reviewers works at Qiskit and specializes in designing efficient quantum circuits for arithmetic operations like addition and multiplication. These optimizations are important for reducing the number of gates and minimizing noise.

There are some emerging design patterns, but it’s still early. Circuit construction is very problem-specific, and often requires deep insight into both the quantum algorithm and the hardware limitations. So we’re still a long way from having general-purpose, high-level abstractions like you’d find in classical software engineering.

21: How does the notion of a quantum compiler or transpiler differ from a classical compiler? And what should a developer know about optimizing circuits for a given hardware backend?

Elías F. Combarro: The concept of transpiling or compiling is quite similar to classical programming. In classical computing, you write code in a high-level language like C or Java, and then it gets compiled into machine code. In quantum computing, it’s the same idea: your code—written, say, in Qiskit—is translated into a sequence of low-level quantum operations that can be executed on hardware.

However, there are important differences. First, quantum “high-level” languages are still very low-level by classical standards. You don’t have loops, branches, or complex data structures. So the abstraction gap is smaller.

Second, unlike classical compilers, quantum transpilers can’t completely shield you from hardware details. In classical computing, you usually don’t have to think about the processor’s internal wiring. But in quantum computing, that kind of detail really matters. Not all qubits in a quantum computer are connected to each other. So if you want to apply a gate to two distant qubits, the transpiler has to insert extra operations to move data around—introducing noise and increasing circuit depth.

That’s why, as a quantum developer, you need to know something about the machine you’re targeting. For instance, suppose you’re writing a circuit that uses qubit 0 and qubit 10, and those two qubits aren’t physically adjacent. The transpiler will find a way to bring them together using swap gates—but that adds overhead and error risk.

In fact, even the quality of individual qubits varies. Quantum hardware is calibrated daily, and some qubits perform better than others. So if your algorithm is sensitive to noise, you may want to restrict it to the highest-quality qubits on a given device. Qiskit lets you see this kind of diagnostic information, and we explain how to use it in the book.

22: Given that quantum states can’t be directly copied or fully observed, how do developers test and debug quantum algorithms in practice?

Elías F. Combarro: That’s where classical simulation becomes absolutely essential. If you write quantum code and run it on real hardware and get unexpected results, it’s very hard to know why. Is it because of noise? Is it because of a bug in your logic? Is it just the randomness of quantum measurement?

To untangle that, you start by running your code on a classical simulator. These simulators are deterministic and noise-free—they give you the exact mathematical result of the circuit, assuming perfect qubits. This lets you validate whether your logic is correct before moving to actual quantum hardware.

The limitation is scale. Classical simulators require a lot of memory. For example, to simulate 38 qubits, we needed 8 terabytes of RAM. To simulate 39, we’d need 16 terabytes. So there’s an exponential wall. But up to 30 or so qubits, simulation is still feasible, and it’s extremely useful for debugging.

In practice, you’ll often go through three stages: first, run the code on a perfect simulator; second, use a simulator that includes noise; and third, move to real hardware. That way, you can isolate where the errors come from—whether they’re in your code, in the noise model, or in the physical device.

23: What kinds of bugs or errors are most common in quantum code? And does Qiskit have any specific tools for debugging such issues?

Elías Fernández Combarro Álvarez: Since quantum programs are often run on classical simulators during development, you can use standard debugging tools—like breakpoints, inspection, or logging—just as you would in regular Python code. One of the most useful techniques is inspecting the state vector at different points in the circuit, especially when using a simulator.

There are also visual simulators that let you build circuits by dragging and dropping gates. These tools allow you to observe the state of the system at each step without performing a destructive measurement. That’s extremely helpful. You can see how the state evolves and whether it's behaving as expected.

Of course, if you’re working with 20 qubits, your state vector has over a million complex amplitudes, so inspecting the full state isn't always practical. But in many cases, the circuits have structure or symmetry that helps you reason about what's going on without needing to track every number.

As for common errors: they’re often conceptual. For example, misunderstanding how measurement affects state, or using gates that don’t preserve intended entanglement. Indexing errors can also creep in—especially when you’re copying parts of circuits or trying to modularize components.

24: Quantum computations yield probabilistic outcomes, and results can differ across hardware backends. How can developers ensure reproducibility of quantum experiments?

Elías Fernández Combarro Álvarez: Well—they can’t, at least not in the strict sense. Quantum computations are inherently probabilistic, so you can’t reproduce the exact same measurement result every time. What you can do is ensure a high probability of success.

Any useful quantum algorithm comes with some success guarantee. For example, Grover’s algorithm might give you a 99.9% chance of finding the right answer. But there’s always a non-zero chance it won’t. You could run it 1,000 times and still miss the correct result—it’s very unlikely, but possible. That’s just how quantum mechanics works.

However, reproducibility is possible in simulations. Simulators use pseudorandom number generators, so if you set a fixed random seed, you’ll get the same result every time—as long as the simulator version and environment stay the same. That’s what we do in the book: we specify the random seed so that readers can reproduce the results exactly.

So in summary: reproducibility is possible in simulations with fixed seeds, but not on real quantum hardware, because the randomness is fundamental and unavoidable.

25: Let’s talk about developer experience. Qiskit and other tools have come a long way, but what gaps do you see remaining in terms of usability, documentation, or tooling?

Elías F. Combarro: I mentioned one issue earlier—the rapid pace of change in quantum software. Qiskit, for example, used to update frequently, and each new version could break existing code. Maybe after just two or three months, your scripts would stop working if you upgraded to the latest version.

That situation has improved a lot. When we were writing this book, we knew that Qiskit 2.0—a major new version—was scheduled for release around the time we’d be finishing. We were nervous because we hadn’t written the book using that version, and we didn’t know what might change. That’s one of the reasons we separated the code from the explanatory chapters—to make updates easier. Fortunately, when Qiskit 2.0 came out, we only had to make a few changes. Most of the code ran out of the box.

Still, documentation is an area that needs more work—not just for Qiskit, but also for tools like PennyLane, which we used in our previous book. The reality is that many of these projects rely heavily on volunteers. Even at companies like IBM and Xanadu, which develop these libraries, not everything is fully documented—especially the newest features. Sometimes you have to read the source code to understand how something really works.

For that reason, in both our books we try to explain not just how to use the tools, but what’s happening under the hood. That way, readers don’t get stuck when something unexpected happens. For example, while writing this book, we discovered that the name of the result variable in Qiskit changes depending on whether you declare measurements at the beginning or at the end of a circuit. That wasn’t documented, and it took us a while to figure it out. So we included a clear explanation in the book to save others the trouble.

Hardware and Roadmap

26: The hardware landscape includes superconducting qubits, trapped ions, photonics, spin qubits, and more. Could you compare some of the leading approaches in terms of qubit connectivity, coherence times, gate fidelities, and scalability?

Elías F. Combarro: I should say upfront: I’m not a quantum hardware expert. My background is in algorithms and quantum software. But you do need to understand some hardware basics to run code effectively—otherwise, you’ll get results that are hard to explain.

Each technology has its strengths and weaknesses. The systems you can access with Qiskit, for example, are based on superconducting qubits. That’s the most mature platform right now. It’s close to classical hardware in terms of fabrication, which helps with scalability. But it has limitations. Coherence times—how long a qubit stays in a useful quantum state—are very short, usually in the microseconds. That limits how many gates you can apply before decoherence ruins your computation.

Gate fidelities on superconducting systems are getting quite good—99.9% or better—but they still introduce errors, especially in long circuits. Connectivity is also an issue: not all qubits are directly connected, which means more swap operations and more noise.

Trapped ions offer better coherence times—sometimes up to seconds—and generally better connectivity. But they’re harder to scale. You might get a few dozen high-quality qubits, but not hundreds or thousands yet.

Photonics is another promising direction. These systems maintain coherence for longer and can operate at room temperature, which is a big plus. But certain operations—like entangling gates—are harder to implement.

There are also newer platforms, like Rydberg atoms and neutral atom arrays. These use lasers to trap and manipulate individual atoms, and they have some unique advantages. For instance, with optical tweezers, you can physically move atoms around, allowing qubits to interact even if they’re far apart—solving some connectivity issues. But the operations are slower than in superconducting systems.

So each platform has trade-offs. And honestly, I think the technology that will enable large-scale, practical quantum computing probably hasn’t been invented yet. Many research teams are exploring different directions, and the winning approach may be something entirely new.

27: Looking ahead 5 to 10 years, what do you consider realistic timelines for quantum computing to deliver practical benefits?

Elías Fernández Combarro Álvarez: That’s a very difficult question. People have been asking me this for years, and I still find it hard to estimate. Some things have moved faster than I expected—for example, we’re now seeing early demonstrations of quantum error correction with a few logical qubits constructed from many physical ones. That’s exciting.

At the same time, many of the limitations that quantum hardware had 20 years ago are still with us. So it’s a mixed picture.

Just recently, I read a paper by IBM researchers claiming they had matched classical accuracy on real quantum hardware for quantum chemistry problems. That’s not quantum advantage yet—they’re just reaching parity—but it’s a milestone. They’ve even said they aim to achieve practical advantage by 2026. That seems optimistic to me, but if it happens, it would be amazing.

Personally, I would guess five years. But if you ask me again next year, I might still say five. So it’s hard to pin down. Some breakthroughs could accelerate things dramatically, but until then, it’s wise to remain cautiously optimistic.

28:What advice would you give to software architects and engineering teams who want to prepare for integrating quantum computing into their technology stack within the next five years?

Elías Fernández Combarro Álvarez: That’s a great question. We work with many companies that are trying to integrate quantum technologies into their workflows—not because they expect immediate results, but because they want to be ready when the time comes.

My advice is simple: start now. If you think quantum computing might be relevant to your domain, begin exploring it as early as possible. The learning curve is steep. Even if you already know Python, Java, C++, and Rust, quantum computing requires a different mindset. There are no loops, no traditional data structures, no copying of information. Measurement changes everything. You have to relearn how to think about programming.

In both our books, we’ve tried to make the field accessible. And based on the feedback we’ve received, I think we’ve been successful to some extent. But it’s still not easy. If you wait until quantum computing is mainstream, it may be too late to catch up.

The earlier you start, the better positioned you’ll be—both to understand the field and to take advantage of it when it becomes practically useful.

To explore the ideas discussed in this conversation—including how to model multi-qubit systems, implement protocols like quantum key distribution, and run foundational algorithms like Grover’s and Shor’s on simulators and real hardware—check out A Practical Guide to Quantum Computing by Elías F. Combarro and Samuel González-Castillo, available from Packt. This self-contained introduction uses Qiskit 2.1 to walk readers from single-qubit concepts to full quantum applications, with runnable code, clear mathematical explanations, and examples ranging from quantum money to fault-tolerant computation. It’s an ideal starting point for students, professionals, and self-learners preparing to engage with quantum programming in practice.

Designing Resilient Architectures for the Cloud and AI Era: A Conversation with Gabriel Baptista and Francesco Abbruzzese

Divya Anne Selvaraj — Wed, 09 Jul 2025 07:41:51 GMT

From AI-assisted code generation to designing for edge computing and zero-downtime deployments, software architecture today demands fluency across disciplines. In this conversation, we speak with Gabriel Baptista and Francesco Abbruzzese—co-authors of Software Architecture with C# 12 and .NET 8—about how cloud-native and distributed systems are reshaping enterprise applications, where AI fits into architectural workflows, and why adaptability, not complexity, should guide long-term design decisions.

Gabriel Baptista has been working with software development since the early days of .NET. Today, he specializes in Azure Platform-as-a-Service (PaaS) solutions, teaches at computing engineering universities, and mentors tech startups across industries. Francesco Abbruzzese is the creator of the MVC Controls Toolkit and the Blazor Controls Toolkit. His career spans decision support systems for finance, top-selling video games, and over two decades of advocacy for Microsoft’s web stack. Together, they bring deep experience in enterprise systems, modern DevOps practices, and real-world architecture challenges.

Their book, Software Architecture with C# 12 and .NET 8, now in its fourth edition, translates high-level design theory into practical guidance for the .NET ecosystem. Covering everything from microservices and DevOps pipelines to design patterns, observability, and Kubernetes-ready architectures, the book is anchored by a detailed case study that walks readers through building an enterprise travel agency system from the ground up.

You can watch the full interview below—or read on for the complete transcript.

1: What inspired you to write Software Architecture with C# 12 and .NET 8?

Francesco Abbruzzese:
Yes. The main point was that this book collects various subjects that are hard to find together in the same place. I wrote it mainly to put everything into something solid—a book—to capture my experience in one place. It’s also a useful tool for my job, for my advisory work, for my courses, and for my customers.

Gabriel Lara Baptista:
Well, to me, it was a really important milestone in my career—because writing a book is no small thing, right? I’ve been working in academia and in the industry for a long time, and this was a great opportunity to give back to the community. Like Francesco said, it’s a complete pipeline on how to create an enterprise solution, and that was really exciting.

There’s a big value in having the opportunity to write something like that. Today, I can use the book, as Francesco does, for teaching and in my current career—it’s something I can actually apply. So I really enjoyed the experience. It was a great opportunity to work with Francesco because we share a lot of common ground, and I think readers will see that in the book. It might look like it’s written by a single person, but we’ve worked together for a long time. It’s been a great collaboration.

Francesco Abbruzzese:
It was a pleasure for me too, Gabriel.

2: AI is transforming pretty much everything in the tech landscape. Could you share some examples of how AI-driven tools are currently being used in architectural design, and what new paradigms you see emerging from AI’s influence in this space?

Gabriel Lara Baptista:
Yeah, well, AI is the most talked-about topic in every discussion right now. What I think we, as architects, need to understand is that a good AI solution first requires a good software architecture—because AI only works with good data. Without good data, you cannot have a good AI.

As architects, we’ll be impacted by AI—positively or negatively—depending on how we work with it. Let me give two examples. Today, it's possible to upload an architecture diagram into an AI tool like ChatGPT and discuss with it whether you’re creating a good or bad design. That’s already possible. In some cases, I’ve used AI to give me feedback or suggest changes to my designs. It can do that.

But as a software architect, you still need to be a good analyst. You need to evaluate whether the output from the AI is actually correct. Especially in enterprise systems, that’s not always easy to do. So, yes, AI will change the world, but we—as individuals—need to use our intelligence to critically analyze whether the AI output is good or not, in any context.

As software architects, we need to understand that we have to build architectures that will support good AI—because if you don’t provide quality data, you won’t get quality AI. What do you think, Francesco?

Francesco Abbruzzese:
OK. I think AI is a valuable tool, but at least for now, it can’t completely replace the experience of a professional.

It helps save time. It can suggest choices—but sometimes those suggestions are wrong. Other times, those suggestions can be useful as a starting point for further investigation. AI can write some code, some diagrams, some designs that you might use as a base. That saves time.

Sometimes it suggests something you hadn’t thought of, or reminds you of a possibility you forgot to consider. That doesn’t mean it’s the best solution—but it’s a helpful suggestion. It’s a tool to avoid missing things and to save time.

At the moment, AI can’t replace the experience of a real professional—whether it’s an architect, a programmer, or someone else. For instance, I’ve never seen AI come up with a completely new algorithm. If you have to invent a new one, it’s not capable of doing that.

So yes, AI is a helpful tool—it helps save time and make sure you’re not overlooking something. But that’s all. There isn’t much more to say about it.

Francesco Abbruzzese:
And I think this won’t change much over time—at least not until we reach actual artificial general intelligence, something human-like.

3: How can architects effectively use AI to enhance rather than replace their roles in software architecture?

Francesco Abbruzzese:
OK. I’ve already addressed this a bit, because architects should use AI to save time—to write down some starting designs or starter code. For instance, in my company, we developed an AI tool that, through interaction with the architect, is able to generate a complete Visual Studio solution—with all the required projects—and some initial code.

Of course, that code doesn’t always work perfectly, but it’s a good starting point. It saves time compared to creating the entire solution manually, which usually takes a lot of time—especially if you're using complex architectures like the Clean Architecture or the main microservices architecture. These require solutions to be organized in a complex way, with many projects involved.

AI can help save time by generating most of the initial code. You can then modify it as needed, but even so, it can reduce the groundwork by 50 to 60 percent.

There’s no real risk of AI replacing us, because the hardest part is still modeling the real world—and at the moment, AI doesn’t have enough experience or understanding of the world to do that on its own. It needs a human to explain the requirements. We’re still quite far from AI being able to do everything without human help.

Gabriel Lara Baptista:
Yeah. Our book specifically discusses the role of the software architect in an enterprise team. As a software architect, you design the pipeline and solutions for the entire team. You’re responsible for the technical aspects of designing enterprise solutions.

Right now, AI can be used to accelerate processes—not just for the software architect but for the entire team. But I totally agree with Francesco: AI today is mostly helpful as a starting point. You still need a human to judge whether that output fits the actual requirements of the enterprise solution.

Especially in enterprise systems, the need for high-quality work is critical. Within a team, you’ll have junior developers, senior developers, analysts—and AI tools, especially those that assist with coding, can help all of them. Not just by generating code, which is often just the beginning, but by analyzing whether you’re following best practices in your code. We already have copilots that assist with that.

So, as a software architect, you need to evaluate whether these tools are right for your team—whether they help improve the team’s output and velocity. That’s where AI can play a supporting role: helping the architect define the team’s development pipeline.

4: According to you, what specific architectural practices should be adopted to ensure cloud-native systems are resilient and adaptable?

Gabriel Lara Baptista:
First, it’s important to say that in the near future, I believe most applications will be cloud-native. I agree that some applications still need to run on the edge, but it’s almost impossible not to have at least something running in the cloud to support those edge-based systems. This is something that everyone working in software development today needs to think about.

When it comes to building resilient and adaptable software—well, our book covers this topic in depth. We have several chapters that talk about resilience, writing good code, and adaptability—especially because the speed of development today is much higher than it was a few years ago.

We’re discovering new ways to build solutions every single day. We can’t always keep up that same pace on the architecture side, which is why we need to think carefully about how to design a software architecture that can be both adaptable and resilient.

We also need to account for non-functional requirements like security, performance, and—most importantly—resilience. One of the most critical factors is that applications can no longer afford downtime, especially enterprise applications that need to run 24/7.

To achieve that, you need to write good code—code that provides visibility into what’s happening, that integrates with retries, that enables better performance. A software architect has to consider these things from the very beginning—right when they start analyzing the application’s requirements.

Francesco Abbruzzese:
OK. In my opinion, it’s quite simple—cloud computing basically means distributed computing, with added adaptability. It allows you to change your hardware dynamically.

Cloud computing is really about reliability and adaptability. But you have to use the right architecture—that means applying the theory behind modern microservices and cloud-native systems.

I’m talking about reliable communication, orchestrators like Kubernetes, and automatic scaling—these are all provided by cloud platforms and also by Kubernetes itself. You also have tools for collecting metrics and adjusting the software’s behavior automatically based on those metrics. This is the essence of the theory we’re dealing with.

For example, in microservices architectures, reliable communication is essential. These applications are often structured like assembly lines—processing and transferring data step by step. That means it’s unacceptable to lose data. Communication must at least eventually succeed. It can be delayed, but it has to succeed.

There are many techniques and even formal theorems for reliable communication and for understanding what guarantees you can expect. The theory behind distributed computing and microservices is quite advanced at this point. You can find a lot of the basics and starting points in our book.

5: Considering the sophistication of modern cyber threats, how can security by design principles be integrated throughout the architecture?

Francesco Abbruzzese:
OK, it’s a good question. First, you have to understand that security by design isn’t a separate subject—it’s just a way of doing architecture and code, keeping security in mind while designing and coding.

So, the main answer is: education. You have to study the right way to do things, and avoid inventing untested or improbable solutions. You also need to choose the right team members—that’s a major factor.

That said, some tools can help. In my opinion, the best tool is code review—having a security expert review the code to identify potential vulnerabilities and ensure best practices are being followed.

Another important tool is using existing stacks and libraries that are already designed with security in mind—tools that follow best practices. Avoid inventing your own methods for things like authentication. There are already reliable, tested libraries available for that.

So, in my opinion, the three main pillars are: education, code reviews, and using existing secure libraries and frameworks.

Gabriel Lara Baptista:
Yeah, I totally agree with Francesco. I’d just like to complement his answer by expanding a bit on the three areas he mentioned.

First, education. A software architect needs to study information security. For example, OWASP provides a list of the top ten vulnerabilities you might encounter in your APIs. A software architect needs to be aware of these to prevent code that introduces such risks.

Second, when it comes to code review—think about DevOps. Instead of just implementing DevOps, why not implement DevSecOps? With DevSecOps, you can include static analysis tools that identify security issues early. These tools help architects and senior developers review the code produced by the team and ensure security practices are being followed.

Third, we need to be cautious when choosing libraries. One of the most critical security issues is using outdated or vulnerable libraries. So, as Francesco said, it’s essential to choose libraries that are secure and well-maintained.

6: What strategies do you think architects should employ to ensure long-term architectural integrity?

Gabriel Lara Baptista:
Well, architecture is not immutable—you can and should change it. It’s a living thing most of the time.

But if you design it with the principles we’ve been talking about—like security by design, reliability, adaptability, and monitoring—you’ll be in a much better position. These principles help you recognize when you’re hitting limitations and when the architecture needs to evolve.

I don’t believe you should start with the most complex architecture possible. In my opinion, you should begin with something simple. Most software, even enterprise software, doesn’t require an overly complex architecture to start with.

Start simple, and continually monitor and analyze whether you're hitting bottlenecks or facing challenges. That way, you can evolve your architecture gradually. This is especially easy with cloud technologies, where scaling or adjusting an application is much faster compared to on-premises systems.

Also, you need a strong pipeline—one that helps you understand the requirements, implement good code, perform thorough code reviews, run proper testing, and gradually deploy to production. Even if the application itself is simple, a solid pipeline ensures the architecture remains healthy over time.

Francesco Abbruzzese:
Yes. I completely agree—the main points are using the cloud, because it allows you to easily change the architecture, and monitoring your application so you can detect when requirements or conditions have changed.

Also, it's important to understand the requirements clearly and choose the best-fitting architecture. The better your initial architectural choice, the longer it will remain valid.

Let me add one more point. In my opinion, it’s very important to use techniques like Domain-Driven Design and microservices. These approaches allow you to make localized changes to your architecture without affecting the whole system. This makes your architecture easier to modify over time.

For example, each microservice can follow its own architectural approach if needed. You can update or tune just a small part of the system. This flexibility really helps with long-term architectural integrity.

Gabriel Lara Baptista:
Let me just add one thing, as Francesco said. When we implement applications using Domain-Driven Design, we can’t forget about design patterns. These patterns—whether at the code level or architectural level—make applications more extensible.

When Francesco talks about Domain-Driven Design with microservices, that’s a pattern. Why not implement a pattern that’s already widely adopted and proven? It makes things simpler. That’s why I mentioned earlier that simplicity is key.

7: How do you envision the evolution of enterprise application architecture with the integration of things like edge computing and increased data centricity? What kind of industry-specific impacts do you think these changes could have?

Francesco Abbruzzese:
OK. Collecting peripheral data and using all available data for decision-making—that’s definitely the future, no doubt about it. Applications will absolutely have to be implemented with this in mind.

But on one hand, we have the need to keep data separated. You mentioned data-driven design—each microservice should be responsible for its own data. There are also geographic constraints and other concerns that make data separation necessary.

On the other hand, we want to take advantage of all the available data. So what does this mean? It means we need a special kind of microservice—worker microservices that consolidate data. These services take peripheral data from different sources and consolidate it to feed decision-making processes.

This kind of data consolidation—supported by AI or statistical techniques—will become a central aspect of future applications. I can’t say for sure which technique will become dominant, but the key point is: we need to consolidate data in the right format and with the right level of detail for it to be useful. That’s what matters most.

Gabriel Lara Baptista:
I’d complement Francesco’s answer by saying that edge computing will impact every industry in the future—there’s no industry that won’t need it. Edge computing enables decision-making at the point of data collection, and that decision is often powered by AI.

The decision might happen at the edge, but the results of that decision need to be sent to a central location or data store. That’s why the future of application architecture will be entirely distributed. And because of that, the principles Francesco mentioned—Domain-Driven Design and microservices—will be even more important. Future applications will need to be distributed by design.

We’ll also need to think about how to collect and consolidate all that edge data. And it’s important to understand that we don’t need to send all the raw data—just the decisions or critical insights. That’s a very important distinction when designing edge applications.

So, yes, distributed solutions will become necessary for every industry.

Q: Gabriel, how do you see technological advancements—both current and future—altering career paths in software architecture? What new skills do you think are becoming essential for professional growth?

Gabriel Lara Baptista:
Technically speaking, based on everything we’ve discussed so far, understanding how to design and implement distributed applications is essential. Whether we’re talking about microservices, serverless, or something else—at the core, we’re dealing with distributed systems. And that’s where the world is headed.

So, in the near future, anyone pursuing a career in software architecture will need a strong foundation in distributed computing. Information security is another critical area. And, of course, AI—we’ve mentioned it at several points today. Those three—distributed computing, security, and AI—are the technical pillars moving forward.

But there’s also another key aspect: soft skills. Architects need them. Whether you're implementing a DevOps pipeline, building a secure SaaS platform, or introducing observability, those changes will affect the entire team. To lead those changes successfully, you need to bring the team along with you. And soft skills will make that possible.

8: How can architects ensure their designs effectively contribute to business outcomes—especially in industries where the impact of technology is significant?

Francesco Abbruzzese:
OK. The key is making sure our work—our architecture and our design—adds value to the business. That’s the main point.

The best tool for this is DevOps. DevOps is designed specifically to align technical outcomes with business goals.

But I’ll say more. It’s also about collecting metrics, so you can see how your application is performing and where improvements are needed. And it’s about creating architectures that are easy to adapt. We already talked about this earlier—so again, Domain-Driven Design and microservices are essential.

If your architecture is based on these principles, you can modify individual parts easily. You don’t have to rewrite large portions of the application. You can identify the specific part that needs to change in order to deliver more value to the business.

Also, we need a different approach to gathering requirements. Instead of just collecting what the user says they want, we need to think about the business value behind each request. This mindset shift is similar to what we do with security by design. Every requirement we collect should be evaluated in terms of the value it adds to the business.

So for me, there are three key components: DevOps, Domain-Driven Design with microservices, and a value-oriented mindset from the very beginning of the requirement-gathering phase.

Gabriel Lara Baptista:
As Francesco said, DevOps is the answer.

Why is the answer DevOps?

Because with DevOps, you have the possibility to collect information in a small period of time about the evolution of the application. And then you have the feedback that gives you the opportunity to design the application better for the near future—
for the next cycle of DevOps.

Architecture, by concept, is the idea of implementing something that makes things easier to adapt and to maintain. So if you have a good architecture, it’s going to be easier to adapt for new functionalities that you may have or to maintain, if you have something wrong.

So with good architecture, you give space for the business team to define better solutions. And you’re going to have a faster feedback loop, and you're going to adapt the application faster. And you're going to give more value to the business.

So it’s a good cycle when you have DevOps well implemented, together with a good architectural design because you're going to happily increase the value of the solution that you are developing.

To explore the topics covered in this conversation in greater depth—including building scalable enterprise systems with microservices, applying architectural patterns like Domain-Driven Design, and preparing .NET applications for Kubernetes and cloud-native deployments—check out Software Architecture with C# 12 and .NET 8 by Gabriel Baptista and Francesco Abbruzzese, available from Packt. Now in its fourth edition, the book combines design fundamentals with hands-on .NET practices, covering everything from EF Core and DevOps pipelines to Blazor, OpenTelemetry, and a complete case study centered on a real-world travel agency system.

Here is what some readers have said:

Learning Python and Leading Engineers: A Conversation with Fabrizio Romano

Divya Anne Selvaraj — Wed, 02 Jul 2025 08:38:11 GMT

In this conversation, we speak with Fabrizio Romano—author of Learn Python Programming, Fourth Edition—about what initially drew him to Python, how its design supports clarity and learning, and what developers should understand about its strengths, limitations, and evolving tooling. We also explore how his experience as a software developer shaped his transition into engineering leadership, and why he believes non-technical challenges—like communication, stress, and emotional awareness—are often the real barriers to building great software.

Romano is the development manager of the Sohonet product development team. He has worked as a professional software developer since 1999 and has used Python extensively since 2007. He holds a master’s degree in computer science engineering from the University of Padova and has spoken at EuroPython (Berlin 2014, Bilbao 2015) and ProgSCon 2016, with talks covering topics like test-driven development and Python training at scale. In 2022, he taught Python to software engineers and data science students through a collaboration with Oxford University. Outside of work, he enjoys playing the guitar and teaching Python, mathematics, and meditation.

His 2024 book, Learn Python Programming, Fourth Edition, provides a comprehensive, up-to-date introduction to the language—covering everything from core syntax and data structures to real-world projects involving APIs, automation, and packaging. This edition includes updates for Python 3.9 through 3.12, new content on type hinting and CLI applications, and hands-on examples designed to help readers build confidence and independence as Python developers.

You can watch the full interview below—or read on for the complete transcript (reorganized to enable a smoother reading experience.

Getting Started with Python

1. You've spent some years now working with Python. What's your journey with the language been like, and what originally drew you to it?

Fabrizio Romano:
Back in the early 2000s, I was mostly coding in C#, building software for Windows machines. I was also coding in Java, PHP, and ASP.NET for websites. But in my spare time—when I had more of it—I really liked playing on programming challenge websites and doing competitive programming.

On one of these forums where you discuss your solutions, I saw another Italian guy, Marco Berry, who eventually became a friend. He was using Python, and I noticed his solutions were conceptually the same as mine in C#, but much, much shorter. That caught my attention—his code was so concise.

We started talking, and he’s a great enthusiast of Python. Eventually, I decided to give it a try. I started learning just by playing on those websites and solving challenges, which was a great way to get into it. You learn about the various data structures Python offers, since you're doing algorithms—you get to understand what’s fast, what’s not, which data types to use.

That’s when I started considering switching to Python for work. But the original inspiration was really that it looked different and concise. It’s indented, and you don’t use curly braces to define scope—that made it stand out to me.

2. Do you remember the names of any of these websites where you started out just playing around with Python?

Fabrizio Romano:
Yes, one of them is called Project Euler. I think it’s .net. I used to be one of the admins there, creating problems. It’s mostly math-oriented, and the problems look simple—but if you try a brute force algorithm, it would never finish. So it teaches you to think more efficiently.

3. For someone looking to learn Python today, why would you say it's a good choice? What makes Python particularly well-suited for certain types of programming?

Fabrizio Romano:
First of all, the learning curve isn’t steep. Python is practically English—it's very readable and easy to learn the fundamentals. It runs everywhere, so it’s portable. It’s extremely coherent, logical, and elegant.

Guido van Rossum, who designed it, has a degree in math and computer science, and you can really see that in how Python is designed. For example, in other languages, if you want to know how many items are in a collection, you might use .size or .length or .items, depending on the type. In Python, you just use the len() function—it doesn’t matter what kind of collection it is. So it's consistent and intuitive.

Often, even if you don’t know the exact function you need, it’s easy to guess. That really helps—you don’t have to remember 7,000 different exceptions.

It’s also concise. On average, a Python program is about a quarter the size of an equivalent Java or C++ program. There’s an amazing community around the world, and Python has an extensive standard library as well as countless third-party libraries available on PyPI. Most of the time, if you need to do something, someone else has already written a library for it. You can just leverage that instead of reinventing the wheel.

It also integrates well with other languages, so you can extend it. Google, for many years, used Python as glue code to connect components written in other languages—and now it’s even more prominent there. It’s also become the de facto standard for data science.

For me, one of Python’s most beautiful aspects is how it uses protocols like the generator and iterator protocols, the way it does polymorphism, and the flexibility you have with multiple inheritance. There’s a lot of freedom and power in how Python is designed. It’s often called a language for adults because it doesn’t constrain you too much. When I write Java, I sometimes feel claustrophobic—probably because I’m so used to Python.

When Not to Use Python

4. On the flip side, are there any situations where Python isn't a perfect fit—cases where you'd recommend using another language?

Fabrizio Romano:
Sure. The elephant in the room is usually speed. Although to be fair, Python’s speed today is rarely a problem. But if you really need maximum performance, you're usually better off writing in C or using something like Rust.

For example, data science libraries like NumPy and pandas are compiled and use C under the hood. That’s why they’re so fast. But if speed is crucial for your use case, then you might want a compiled language.

That said, in all the years I’ve been using Python, speed has never been an issue for me—probably because I haven’t worked in domains where performance is the top priority.

Another area is mobile app development. Python is still not great for that. You’re probably better off using Swift or something like Dart with Flutter. Java’s been around longer and is more mature in that space.

The language is evolving, though. For example, in Python 3.14, I was reading that they’re going to address some of these limitations—so who knows, maybe in the future we won’t even think of these as limitations anymore.

Evolving Language and Tools

5. Python 3.13 introduced a revamped interactive interpreter and an experimental free-threaded mode. How do you see these impacting day-to-day development, particularly for teams working on large-scale applications?

Fabrizio Romano:
Personally, I almost never use the interactive interpreter because I use Jupyter Notebook. It’s like a better-looking version of the interpreter—you run things in cells, and you don’t deal with the issues the classic Python shell had. For instance, changing a function you’ve already defined used to be a nightmare. That’s now been addressed in the new shell, which is more powerful and user-friendly.

It’s especially helpful when you're not on your local development environment—like if you’re SSH-ed into a server and don’t have the luxury of tools like Jupyter. In those situations, having a shell with multiline editing, coloring, syntax highlighting—that’s a big plus. It just makes your job easier when your toolset is limited.

As for the free-threaded mode, which we touched on earlier—it’s about removing the Global Interpreter Lock (GIL), something that's been discussed forever. There’s potential for performance gains, especially on multi-core systems. But as of Python 3.13, it's still experimental, so we don’t know exactly how it’ll play out.

That said, even with the GIL in place, developers have always had workarounds—like older libraries such as greenlet or eventlet, which let you write code that behaves a bit like multithreaded code. That’s useful when your threads are I/O-bound—like when your app is waiting for an HTTP response. In those moments, you can yield and let the CPU do something else.

Nowadays, we have asyncio, and Python also has a threading library. If you really want to leverage multiple cores, you can use the multiprocessing module—it’s a bit more complex, but it works well. And as I mentioned before, libraries like NumPy or TensorFlow already bypass the GIL internally for performance-critical operations.

So it’s not like Python developers are constantly frustrated by the GIL—but it is a thing. Removing it, if done right, will definitely help.

6. Looking back at all the different versions, what would you say have been the most pivotal moments that transformed your personal workflow?

Fabrizio Romano:
Great question. I think there are two aspects: one is how Python itself evolved, and the other is how all the tools in the Python ecosystem evolved around it.

A few years ago, we moved from Python 2 to Python 3—and those years were painful. You wanted to use Python 3, but then you’d realize that the library you needed hadn’t been ported yet. So you couldn’t.

Thankfully, that’s over now. Python 2 is legacy. It’s rare to find a Python 2 project these days. I haven’t worked with Python 2 in maybe nine or ten years.

Python 3 brought things like full Unicode support. Over time, more features came in—like type annotations, asyncio, data classes, structured pattern matching, and virtual environments built right into the standard library. Libraries like mock were also integrated into the standard library—now it's unittest.mock.

So Python is faster, richer, and more modern now. But I wouldn’t say any one feature was dramatically transformative. They’ve all been meaningful improvements, and I really appreciate all the effort the community and core developers have put into making the language better.

When I think about my day-to-day workflow, though, what’s really transformed things are the tools.

I use an IDE—these days it’s VS Code. I’ve used others before: Komodo, IntelliJ IDEA, NetBeans, Sublime. But VS Code suits me best now.

Then there are the frameworks that rose in the last 10 to 15 years—Django, FastAPI. They became widely adopted and brought strong communities with them.

Jupyter Notebook was another game changer. When I was teaching on behalf of Oxford University, I never used slides. All my materials were in notebooks. If a student said, “I don’t understand,” I could immediately run an example and help them see what’s happening. A slide can’t do that. A notebook is alive—you can play with it.

Testing tools improved too. Libraries like pytest make testing a much more pleasant experience. Tools like Celery made background tasks easier.

And then of course, the whole deployment landscape changed. Ten or fifteen years ago, we were deploying to physical servers. Then we moved to virtual machines. Now it’s all containers—Docker, Kubernetes, Ansible, AWS, Terraform. That was a complete revolution.

For instance, today I’m writing a library that interacts with RabbitMQ. I just spin up a RabbitMQ container—no need to clutter my laptop with installations that might conflict with each other. Containers made everything easier.

Then you have formatting and linting tools like Black, Flake8, Ruff, isort. I’m old enough to still know PEP 8 by heart—it’s the Python Enhancement Proposal that defines the language’s style guide. But these days, I rarely format my code manually. I just hit a shortcut, Black takes care of it. I use other tools to check for unused imports or issues. It saves a ton of time.

There's also a newer tool called UV, written in Rust. It used to be called Puffin. It’s trying to unify the experience around tools like pip, pip-tools, poetry, virtualenv, and twine. UV is blazing fast. I still use it alongside other tools, but it’s great to see a move toward standardizing Python tooling—especially for beginners.

Because one of the biggest hurdles for new developers is setting up their environment: managing different Python versions, setting up virtual environments. That’s hard. And it’s not something you want to cover in Chapter 1 of a beginner’s book.

In Learn Python Programming, we try to guide readers to helpful resources so they can learn that part too. But yes, simplifying setup is one of the most valuable directions Python tooling is heading in—and UV is a big part of that.

The Book: Learn Python Programming

7. Let’s talk about your book now. What kind of problems does Learn Python Programming help readers solve? And what sort of professional would benefit most from it?

Fabrizio Romano:
The book is called Learn Python Programming, so the title already hints that it’s for beginners. Anyone starting their career with Python will probably benefit from reading it.

By the fourth edition, I think we’ve really found a good balance between theory and practice, especially in the later chapters. It took four editions to get there, but I’m happy with where we are now.

We provide all the foundational knowledge a developer needs in the first part of the book. Then, we move into practical projects—like how to build an API, how to build a CLI application, how to package your Python projects.

One of my goals for this book, in every edition, was to write it in a way that would age well. Technology books can become outdated quickly, so I tried to infuse each chapter with lessons about programming and being a developer that aren’t tied exclusively to Python. I’ve included a bit of my personal experience in each chapter.

So even if you’re a seasoned developer, there might still be something interesting in the book for you. I’d say it’s suitable for beginners, but also for mid-level and senior developers.

I get messages every now and then from people on LinkedIn saying, “I’ve been developing for years, but I’m starting with Python and picked up your book—I learned this or that.” That’s always nice to hear.

8. What are the minimum prerequisites for someone to benefit from the book? What should they already know?

Fabrizio Romano:
Ideally, you have some experience with another language, or at least an idea of what programming is about. But even if you haven’t coded before, the book can still be useful—it’ll just take a bit more effort in the beginning.

If you’ve dabbled in another language or understand basic programming concepts, then you’re equipped to start the journey. We begin with a bird’s-eye view of Python—how to install it, then we dive into the basics: data types, functions, objects, and so on. It’s all in there.

The book also offers suggestions on where to go deeper. That’s something I care about a lot. When I mentor people on my team, I try to mentor them in a way that eventually they don’t depend on me, or on a book, or on anything. I help them develop a method they can use to unblock themselves when they get stuck—which happens a lot when you’re a developer. The book tries to teach this kind of mindset too.

We put a lot of effort into helping readers learn that methodology—something they can rely on whenever they need it.

AI in Software Development

9. From a team management perspective, how do you see these tools affecting how developers collaborate and write code?

Fabrizio Romano:
It’s undeniable—that's the direction developers are going. We have to use AI. I think a developer who refuses to embrace AI today is probably going to be obsolete very soon. That’s just the reality of our world.

At Sohonet, in my role, I got everyone on my team set up with GitHub Copilot. I wanted them to start using it, get familiar with it, and understand how to leverage what it can offer.

It might not affect how people collaborate as much, but it definitely changes the way you write code. It can be very helpful. We also have people trying out Cursor. We’re always testing things—there’s the Windsurf editor from the Code team, and the new Try editor from the TikTok company.

I try to stay up to date with what’s new and see if anything’s worth exploring. But at the end of the day, most of my developers use Copilot.

Copilot is especially helpful for menial or repetitive tasks—like hardcoding different test cases. It’s really good at predicting what the next test case might be. It’s also good at giving you a structural starting point for what you’re trying to write. Sometimes it misses, but most of the time it’s helpful.

Even when it's just acting like a better IntelliSense, it’s still useful. When you're editing or refactoring code, it often guesses quite nicely what you're going to do next. So instead of rewriting a line yourself, you just hit Tab and it’s done. For those aspects, I’d definitely recommend it. Everyone should at least try it, see what it can do for them, and then decide.

10. Are there any red flags—areas where developers should be cautious about over-relying on these tools?

Fabrizio Romano:
Yes, absolutely. This is just my opinion, but I think part of the job—especially when I was a full-time software developer—was to smash my brain against a problem now and then. That’s really beneficial for your thinking. It forces you to explore different perspectives, to be creative, to problem-solve, to recall things you’ve learned. It keeps your mental muscles in shape.

Relying too much on AI to unlock you, to figure out the next step, or to tell you how to solve a problem—that’s risky. I still want to “go to the gym” up here. I want to make that effort. I want to learn and know things.

So I think the best approach is for each developer to find the right balance—using AI as a tool, but still keeping their minds fit and challenged. That way, you continue to learn and grow.

Tool Agnosticism and Pragmatism

11. In an earlier conversation, you mentioned that you're not strongly opinionated about tools and languages. How has adopting a more pragmatic, tool-agnostic mindset benefited you and your team?

Fabrizio Romano:
This is a really good question—and something I think is very important. You’ve probably heard the saying: “When the only tool you have is a hammer, everything looks like a nail.” The same applies to programming. If your vocabulary is limited, you can’t fully articulate your ideas.

If I only know Python, or one framework, then whenever a problem arises, I’ll try to make that problem fit the tools I already know. But maybe there’s another tool that’s better suited to solve it. If you're strongly opinionated, you may not even see that tool as an option. You’ll be too focused on making your preferred solution work.

So in my team, I try to help everyone aim for the best idea—not my idea, but the best idea. Usually, the best idea comes out of conversation. And conversations involve different people, different perspectives, different experiences. One person might prefer FastAPI, another might say Flask, another Django. We talk about it, weigh the options, and go from there.

Developers who want to be great developers should steer clear of having too many fixed opinions. Strong opinions cloud your judgment and give you tunnel vision.

Non-Technical Challenges in Software Teams

12. You have said in the past that technical challenges are rarely the biggest problem in software development. What did you mean by that? What are some of the more critical non-technical challenges that teams face today?

Fabrizio Romano:
That’s a really good question.

At Sohonet, we work with cutting-edge technology for the media industry. Sometimes we’re building things no one else is doing, so yes—there are technical difficulties. But that’s a small part of the job. The rest of the product still has a front end, a back end, a platform or API, and it stores data. Those aren’t particularly hard to build. They require skill, sure, but a senior developer won’t be overwhelmed by something like “how do I make a request to an API?” We know how to do those things.

Most of what we do is fairly routine. You’re not constantly solving deep technical puzzles. The real challenges lie in everything around the code: working in a team, following a process, collaborating with departments that don’t share your workflow, or aligning with people who don’t fully understand what you're doing.

For example, one big challenge is process alignment. Say you have 11 people on a team—and not all of them share the same understanding of how your process works. That creates friction. You do something a certain way, someone else doesn’t get why, and suddenly they’re asking, “Why is Fab doing that?” Maybe I’ve disrupted their flow. Now we have to have a conversation. Whose way wins—mine or theirs?

That’s why process is so important. Everyone needs to be on the same page, so my actions are predictable to others. We shouldn’t have to talk about why I moved a task from one column to another—we should all just know, because it’s our shared process.

Some teams solve this by having a very strict process—lots of rules and constraints. That might work for them. But not for my team. I’ve set up a flexible process that’s based on values and guidelines, not hard rules.

When your values are things like openness, focus, commitment, respect, and courage—then whatever decision you make, you can measure it against those values. Is it aligned with our principles?

It’s the same with certain spiritual disciplines. Take the Japanese lineage of the Reiki system—it’s based on five principles: don’t be angry, don’t worry, be grateful, be compassionate, and so on. Whenever you do something, you can ask: “Will this make me more angry or more grateful? Is it compassionate?” And from there, you judge whether the action is right.

At work, I see it similarly. We use Agile, so we value individuals and interactions over processes and tools. That means instead of a long pull request thread with 15 comments, you just sit down and talk. When you type things out, you lose tone, facial expression, and body language. A quick conversation can resolve things much faster and with less misunderstanding.

So our process is flexible, but it requires everyone to stay mentally engaged. You have to think about how you work. Just like with AI—I don’t want people to become robots following rigid steps. I want them to stay creative, to own their work, to think independently, and still be predictable to others.

That’s just one non-technical challenge. Another big one is stress.

We’re all under pressure to deliver—often for yesterday. That’s just how things are. It’s not unique to Sohonet; it’s everywhere. And stress creates negative emotions: frustration, anger, anxiety.

I’m lucky—my team is great. Eleven people who really get along. But sometimes, interactions with other departments cause friction. Maybe they don’t understand the way we work. I’ve seen this at other companies too—it’s not uncommon.

So I need to tend to that. I have to pay attention. In meetings, I watch body language. I listen to tone of voice. I notice how people type on Slack, how quickly they respond, how they phrase things. If something feels off, I talk to the person and try to understand what’s going on.

Sometimes that means helping them manage stress directly. For example, I’ve had sessions with about half my team where I’ve taught them simple meditation techniques—just enough so they can re-center themselves when something throws them off.

We also do a lot of one-on-ones. I try to teach the basics of emotional intelligence—how to cope with difficult emotions. Because when you're upset, frustrated, or angry—those are all different shades of the same thing—it triggers a chemical change in your body. Your sympathetic nervous system kicks in. That’s your fight-or-flight response.

And that means tunnel vision. Your body starts consuming more than it needs. Natural functions slow down. You’re in emergency mode, and if you keep stimulating that state without learning how to manage it, it becomes a health risk.

So we work on identifying the root cause of stress. Usually, it’s not “that guy did something I don’t like.” The deeper issue is: “I’m not willing to accept that he did that.”

The tricky part is that most people confuse acceptance with passivity. If I say, “accept this situation,” it doesn’t mean you put up with it and do nothing. It means: recognize that this is reality. And once you’re calm and grounded in that reality, your mind is in the best position to find a solution.

This is something you learn in martial arts too. If you want to react quickly and effectively, you need to be fully relaxed. A relaxed mind is a creative mind. It can solve problems. It also keeps you healthier.

So that’s a big part of what I do—mentoring my team on how to stay calm and centered, even when things are tough. We acknowledge problems and try to solve them while keeping our minds clear and steady.

Becoming a Development Manager

13. You’ve transitioned into a development manager position over the past few years. What has that shift been like for you? What’s been the biggest challenge in leading developers?

Fabrizio Romano:
I’ve been a developer for a long, long time. I started coding when I was 16. But at some point—for me, at least—writing code wasn’t giving me the same satisfaction it used to. I found myself more drawn to helping people—developing people, not just code.

Of course, I still love writing code. But I get more enthusiastic about helping others grow. For maybe the past 11 years, I’ve been in some kind of leadership role—leading teams, being a line manager. So when I joined Sohonet, I came in as a team lead, and then three years ago I moved into the development manager role. It felt completely natural.

The skills we learn as developers aren’t confined to software. They transfer to life. Like: approach a problem from different angles, break it down into smaller parts, choose the right tool for the job. Those aren’t just programming lessons—they’re life lessons.

So for me, that transition into management came naturally. And a big reason for that goes back to an experience I had teaching math maybe 25 or 30 years ago.

To this day, I think it’s the memory I cherish most from being a teacher.

There was a little girl—maybe 11 or 12 years old—who couldn’t understand how to work with fractions. Everyone had tried to teach her: her dad, her mom, her teacher, a family friend who knew math. But they all came to the same conclusion—that she just wasn’t smart enough to understand mathematics.

One day, her mother spoke with my neighbor, who happened to know that I was giving private math lessons. Eventually, the girl came to my house for a session. I was trying to help her add 1/3 and 1/5. Of course, you have to find the least common denominator—in this case, 15. But she didn’t grasp why we were doing that.

She knew the textbook by heart. She had memorized the rules. But she didn’t make the connection—why we do those things. So I tried a different approach.

I asked her, “What is one banana plus one banana?” She said, “Two bananas.” Then I asked, “What’s one apple plus one apple?” She said, “Two apples.” Then I asked, “What’s one banana plus one apple?” She paused. It didn’t make sense to her at first. But then she said, “Two fruits.” And I said, “Exactly.”

That’s what we do with fractions. One-third and one-fifth are like bananas and apples—you can’t add them directly. You have to find a different way to represent them so they can be added up. For us, that means finding a common denominator.

Her face changed completely—she said, “I understand.” And I think, in that moment, she also realized, “I’m not stupid. I just needed someone to explain it differently.”

For me, that’s the gift. As a teacher, you don’t just insist on your way of explaining things. If one road doesn’t work, you find another. You keep trying until you discover a path that works for that particular person’s mind.

That’s something I’ve carried with me ever since. And I try to apply that same approach with my team at Sohonet. It comes very naturally to me, and I’m very passionate about it.

14. If someone wants to become a development manager like you, what kind of skills should they start building early in their career to make that transition?

Fabrizio Romano:
That’s a really good question. Have you ever heard the saying, “People don’t leave companies—they leave managers”?

For many developers, the natural progression seems to be moving into a lead or manager role. But not everyone is suited for that. Some people are amazing developers, but they don’t have the people skills—or the passion—for managing others.

I’ve seen that a lot. I’ve also been lucky to work with some really good managers, so I’ve seen both sides.

The thing is, in our industry, we often promote people into management roles just because they’re technically strong. But managing people is a completely different skill set. If you’re someone who’s drawn to logic, machines, and technical problems—and you’re not interested in helping people grow—then you probably shouldn’t go down the management path.

For me, it’s not just about acquiring a skill set, although training can help. You can train as a coach, mentor, teacher. But underneath all of that, you need something foundational—a genuine desire to care for people.

That’s what this job is really about: doing your best to help the people you manage become healthier, happier, more skilled professionals—and hopefully better human beings too. Because a lot of these skills are transferable to life.

If you’re only doing it because it’s your next step, or because someone handed you the role, it can be tough. People aren’t logical like machines. Managing them requires effort, empathy, and patience.

You have to listen. You have to figure out how each person communicates, what motivates them, what makes them tick. Everyone is unique. That’s the real work. And you have to want to do it.

For me, helping people grow is a worthwhile use of my time. More than career progression, more than money or recognition, I care about planting those seeds—helping someone move forward in their life and career. If that’s something that excites you, then yes, becoming a development manager is a great option.

And I do think it’s important to have a solid foundation in software development before stepping into this role. If you’ve written code, if you’ve been under deadline pressure, if you’ve lost a whole afternoon going down a rabbit hole because your ego wouldn’t let you ask for help—then you’ll understand what the people you manage are going through.

That empathy makes you more effective as a manager.

But no, I don’t think it’s a natural progression for everyone. Some people just aren’t interested in that kind of career—and that’s completely fine.

15. What advice would you give someone navigating their early career and figuring out what kind of developer they want to be?

For me, it happened quite naturally. When I started using Python as my main language, I used it for everything—websites, apps, experiments with different frameworks. I just loved writing Python code.

Back in 2013 or 2014, I was in Manila teaching Python and a bit of data science with pandas. At the time, Jupyter Notebook was still called IPython Notebook. I was using that instead of slides—it made it easier to show code live and help people understand what was happening. I was working for a social media advertising company, and we had a department of analysts there. I went for two weeks to train them.

So curiosity drove me. I wanted to explore all the aspects of the language.

If someone’s starting out today, they’ll probably need to specialize. Computer science and software engineering are such vast fields now—it’s hard to know everything. So I’d recommend you take a step back and assess your personal interests. What excites you most?

Do you love websites? Are you a front-end person? A back-end person? Do you enjoy DevOps—managing the infrastructure behind a product rather than building the product itself?

It’s important to appreciate the different domains and figure out what pulls you in. That’s probably your best path forward.

To do that, you need a holistic understanding of all the disciplines involved in software engineering. Explore different areas—otherwise, you might think you’re choosing from the only two options you know, when actually there are ten, and the right one for you is in the other eight.

Of course, we also need to be pragmatic. Look at the market and see what job opportunities are available. You still have to pay rent, buy a house. But at the end of the day, you want your job to feel like something you’re passionate about—not just a paycheck.

Find where your strengths lie and where you see yourself succeeding. Learn as much as you can. Contribute to open source. Join Discord communities. Check GitHub projects. It’s so easy to find information today—people starting now are very lucky. When I started, we had libraries. With actual books.

We didn’t use ChatGPT to do research.

There are so many fields—web development, data science, machine learning, AI, automation, DevOps, game development. Python is even used in game development now, which is super interesting.

To dive deeper into the ideas discussed in this conversation—including Python’s core syntax, its application in scripting, automation, and API development, and how to build real-world projects with modern tooling—check out Learn Python Programming, Fourth Edition by Fabrizio Romano and Heinrich Kruger, available from Packt.

This comprehensive guide has been fully updated for Python 3.9 through 3.12 and includes new chapters on type hinting, command-line application development, and updated examples reflecting current best practices in Python web development. Whether you're learning Python for the first time or looking to deepen your fluency with current features and workflows, this edition offers a clear, practical path to becoming a confident and independent Python developer.

Here is what some readers have said:

Building Smarter Systems with Algorithms and Agents: A Conversation with Imran Ahmad

Divya Anne Selvaraj — Wed, 25 Jun 2025 08:00:32 GMT

From scaling fraud detection systems to embedding AI agents in robotics, algorithmic thinking is more central to software engineering than ever before. In this conversation, we speak with Imran Ahmad—author of 50 Algorithms Every Programmer Should Know—about how the definition of “core algorithms” is evolving in the AI era, what classical techniques still offer in modern architectures, and why practical experimentation, not memorization, is key to mastering real-world problem solving.

Imran is a data scientist at the Advanced Analytics Solution Center (A2SC) within the Canadian Federal Government, where he builds machine learning solutions for high-stakes use cases in public services. He is also a visiting professor at Carleton University, an authorized instructor for Google and AWS, and an experienced educator whose story-driven teaching style has resonated with thousands of learners. His 2023 book, 50 Algorithms Every Programmer Should Know, spans foundational computer science through to advanced AI systems. He is now working on his next title, 30 Agents Every AI Engineer Should Know, which explores how autonomous agents can orchestrate tools, models, and data sources to solve complex problems in the real world.

You can watch the full interview below—or read on for the complete transcript.

1. The first edition of your book 40 Algorithms Every Programmer Should Know came out in 2020. The second edition, published in 2023, expanded to include 50 algorithms. What changed in the industry—or in your own perspective—that led to this evolution?

Imran Ahmad:
So, first of all, 40 Algorithms Every Programmer Should Know was essentially a narration of the world around me—the problems I was trying to solve. These books are story-driven. Each chapter presents a problem, and algorithms are the tools I use to solve them. They’re very powerful tools.

When I wrote the first book in 2020, it was the early days of COVID—a difficult year. Since we were stuck at home, I thought it would be a good time to reflect on and write about the problems I had solved over the last 20 years. That’s what the first book was about.

Now, to your question—why was there a need to write the second edition? From 2020 to 2023, we saw the rise of generative AI, attention mechanisms, Transformers, autoencoders, and the increased importance of sequential data. The boom in GenAI and the surrounding ecosystem really shifted things.

Back in 2020, GenAI existed but it wasn’t everything. Today, if you ask a college student or even someone outside of data science, like a government employee, “What is AI?”, the answer is often “ChatGPT” or “GenAI.” That’s how dominant it has become.

So, the 10 additional algorithms in the second edition are focused on sequential data—things like RNNs, LSTMs, GRUs—all of which lead up to attention mechanisms. They lay the foundation for working in this new AI-driven world.

2. You also had a Discord community around the book. Did feedback from that community influence the updates in the second edition?

Imran Ahmad:
Yes, definitely. When you write a book, you have to think carefully about your target audience. As an author, you want to go broad—but not too broad. There's always a balance between depth and breadth.

Some books cover a lot of topics but don’t go deep. Others focus intensely on a single topic. With 50 Algorithms, I had to strike that balance. Covering 50 algorithms in a single book means I couldn’t go in-depth on all of them. So I had to choose which ones deserved a deeper dive—based on what I felt was needed and relevant.

Now, coming to your question: in the Discord community, most readers liked that approach. Of course, you can't please everyone—some wanted different algorithms covered in more depth. A lot of the feedback was, “Why not have a separate book just on the 10 new algorithms from the second edition?” Because GenAI, Transformers, LSTMs, autoencoders, and attention mechanisms have become so important.

And that's exactly why I’m writing another book that goes deeper into the GenAI world. But overall, most readers found the book useful across a wide variety of scenarios and use cases.

3. Let us talk about you upcoming book—30 Agents Every AI Engineer Should Know. What are the new challenges in AI engineering that you're aiming to address with this book?

Imran Ahmad:
We have a lot of hope around AI—that it can eventually replace a human. But if you think about how a person in a company solves a problem, they rely on a set of tools. Depending on the problem, they might first do a web search, then read a book, email someone, call a friend, or go to the library. After gathering information, they create a solution.

An “agent” is meant to replace that kind of human reasoning. It should be able to discover the tools in the environment around it, and have the wisdom to orchestrate a solution tailored to the problem. We're not there yet, but that's what we're striving for. The 30 agents I will cover in my next book represent the next step—the next generation of the algorithmic world we live in.

4. How would you define the term “agent” in the context of your upcoming book?

Imran Ahmad:
An agent is an entity that has the wisdom to work independently and autonomously. It can explore its environment, discover available tools, select the right ones, and create a workflow to solve a specific problem. That’s the dream agent.

Now, most agents we use today have only part of that capability. But we’re striving for agents that mimic how a person like me—a data scientist—would work when given a problem. The agent should understand the ecosystem, be aware of each tool’s strengths and weaknesses, and know when and how to use them in combination.

It might use web search, a calculus engine, an academic paper locked behind a university firewall, and a large language model—all as part of its toolkit. And that’s an important point: a large language model is not the only tool. It’s perhaps the most important one right now, but real wisdom lies outside the LLM—in the agent.

5. How do you see the definition of core algorithms evolving in today’s software landscape? Are there any areas you think will become essential knowledge in the next few years?

Imran Ahmad:
Yeah, so when we talk about algorithms, you can look at them from two perspectives: the practitioner’s and the researcher’s.

The first thing is to identify your role. If you’re a researcher, you’ll be concerned with things like partial differential equations, optimization techniques, proofs of optimality, NP-hardness, and so on.

But as a practitioner, you don’t want to dive into unnecessary details. I often use the analogy of a car. Do you want to build a car and understand every component of the engine? Or do you just want to drive it? If you want to drive it, you need to know the essentials—how to maintain it—but not necessarily every internal detail. That’s the practitioner role.

This book is written for practitioners. It does go into some depth where needed, especially to help practitioners understand where a solution is coming from. That deeper understanding helps you choose the right algorithm for the problem you’re trying to solve.

If you know a bit more about how the engine works, you can choose the right car for your needs. Similarly, with algorithms, even a minimal understanding of how they work under the hood can help you make better decisions.

6. From your own work, can you share an example where selecting or optimizing an algorithm made a measurable difference to a system’s scalability or resilience?

Imran Ahmad:
Yes, I can give many examples.

When algorithms are taught in universities or academic environments, they’re usually applied to small, curated datasets. I call this “manicured pedicure data.” But that’s not real data.

The problem is that people take online or graduate-level courses and learn algorithms in isolation. They never see the kind of messy, large-scale data we deal with in the real world. In my work, I often deal with datasets containing 2 million, 5 million, even 10 million rows. If you’re applying a graph algorithm on that scale, you need something that performs.

For example, in my first book, 40 Algorithms Every Programmer Should Know, I included the Apriori algorithm. It’s well-known for association analysis—discovering causal relationships between features in unsupervised learning scenarios, like weather data or market baskets.

But when I used Apriori in practice, I found it doesn’t scale. It generates thousands of rules and then filters them after the fact. There’s a newer, better algorithm called FP-Growth that does the filtering at the source. It only generates the rules you actually need, making it far more scalable.

That’s why in the second edition—50 Algorithms—I replaced Apriori with FP-Growth.

This ties back to the idea that in real-world applications, non-functional requirements become important—performance, security, availability. In academia, we focus on functional requirements—like "this algorithm should detect fraud." And yes, the algorithm might technically work. But in practice, you also have to consider how it performs, how scalable it is, whether it can run as a cloud service, and so on.

Sometimes you need to run the algorithm in a distributed fashion, apply divide-and-conquer techniques, and optimize how you prepare and process data. That’s what makes the solution scalable and production-ready.

7. When it comes to performance tuning or system scaling, do you often find yourself revisiting core algorithms? Or are these optimizations typically more architectural?

Imran Ahmad:
Both. Not all algorithms have tuning parameters, but many do.

Take machine learning or deep learning algorithms, for example—your batch size, number of epochs, learning rate, and optimizer choice are all hyperparameters. These are crucial for determining whether your solution will scale.

With other algorithms—like FP-Growth, which I mentioned earlier—there are fewer tuning knobs. You can still adjust things like lift, confidence, and support thresholds, but the algorithm is largely driven by its internal design.

In cases like that, the optimization often shifts to the infrastructure—finding a more performant backend to run the algorithm on. And cloud computing is perfect for this. We’re lucky to be in the cloud era; it’s a catalyst for building large-scale systems that simply weren’t feasible 20–25 years ago.

What we aim for are elastic architectures—systems that can expand and shrink based on the demands of the algorithm. This not only improves performance but also keeps costs down.

8. Have you ever encountered a situation where choosing the wrong algorithm led to long-term issues?

Imran Ahmad:
Yes. Especially in predictive analytics.

Take something like approving or refusing a mortgage application at a bank. You could start with simple intuition-based rules. Then maybe you move to a decision tree, which gives you more structure and explainability. Decision trees are often used in sectors like finance, government, and healthcare because they’re white-box models—transparent and interpretable.

But decision trees can overfit. They also don’t perform well when there’s a high correlation between features. In such cases, you need to move to more advanced algorithms. If you don’t want to go all the way to deep learning, you can use something like SVM or XGBoost. These models overcome some of the limitations of simpler ones.

This whole process is iterative. You start with a simple model, test it, and gradually move toward more complex ones. It’s usually a mistake to begin with something too complex—it can be overkill, like using a forklift to lift a sheet of paper.

9. How do you advise engineers to integrate algorithmic thinking into architecture-level decisions, especially when working with distributed systems or cloud platforms?

Imran Ahmad:
Let’s start with machine learning, since that’s where this comes up often.

ML algorithms have very different requirements during training, testing, and production. During training, you need a lot of data to establish meaningful causal relationships between features and labels. That, in turn, requires a lot of processing power—GPUs, ideally. It’s expensive and time-consuming.

This is where cloud computing really shines. The cloud gives you elastic architectures—you can spin up 2,000 nodes for 2 or 10 hours, train your model, and then shut it down. The cost is manageable, and so is the time, because of that elasticity. You might pay $100–200 for that burst of power, and you're done.

Once the model is trained and tested, the final model file is often very small. I joke that it’s like the tail of an elephant—tiny compared to the size of the data and effort used to build it. You can even deploy it on your phone.

So, the hardware and architecture needs vary drastically across different stages. If you want to optimize for cost and performance, you need elastic systems. Cloud computing—whether AWS, Google Cloud, or Azure—offers exactly that. It doesn’t matter which provider you choose, but definitely use the cloud.

10. Are there any patterns or heuristics you use to identify when algorithm choice should drive architectural decisions?

Imran Ahmad:
Yes—it depends on the complexity of the problem you’re solving.

If you’re working on something simple, then a straightforward design and algorithm are usually enough. Overdesign is a common issue—we sometimes overthink things unnecessarily.

First, ask yourself: does this problem justify the additional complexity that a particular algorithm might bring into your architecture? Then evaluate that decision along three axes: cost, performance, and time.

Whatever you implement will likely increase your hardware requirements and, therefore, your cost. So, can you justify that increase? Are you gaining significant improvements in accuracy? Are you saving time?

If an algorithm is more accurate, more time-efficient, and the cost increase is justified, then it’s probably the right choice. That’s the kind of trade-off analysis I recommend for selecting algorithms at the architecture level.

11. Classical algorithms like graph search or dynamic programming often underpin modern AI systems. Could you share an example where these “old” techniques support cutting-edge AI today?

Imran Ahmad:
Absolutely. I see them as complementary—not rivals.

Take search algorithms, for instance. When you're preparing datasets for AI, especially in enterprise environments, you often have massive data lakes—structured and unstructured data all in one place. Now, say you're training a model for fraud detection. You need to figure out which data is relevant from that massive repository.

Search algorithms can help you locate the relevant features and datasets. They support the AI workflow by enabling smarter data preparation.

Classical algorithms also play a role in things like link analysis, establishing causal relationships, and assessing data quality. Even classical NLP techniques are still useful.

I think of it like training at the gym. Maybe AI is your “main muscle,” but to build a strong body—or in this case, a performant system—you need to train the supporting muscles too. Classical algorithms are part of that foundation.

12. Could you give us an example of a specific hybrid approach—where classical algorithms support AI systems in production?

Imran Ahmad:
Sure. One example is the Apriori algorithm and its more scalable successor, FP-Growth. These are classical algorithms used for association analysis. In my opinion, association analysis is one of the most powerful and underutilized techniques.

Let me walk you through an example. A couple of hours ago, I went to a corner shop to buy milk. Now, imagine we had 30 days of transaction data from that shop—say 10,000 transactions. Each row in that dataset would list the items purchased in a single sale.

If you feed that data into FP-Growth, it will find relationships—like if someone buys milk, they’re likely to buy cheese too. Or if someone buys a shaving razor, they also buy shaving gel. These are the kinds of patterns the algorithm surfaces.

Now, where does AI come in? Well, in real-world data, labels aren’t always clearly defined like they are in academic “toy” datasets. You often have to derive labels from combinations of features. Algorithms like FP-Growth help you discover those relationships. They can even help you decide which features to treat as labels or predictors.

So here, a classical algorithm is directly informing the data preparation and feature engineering stages of your AI pipeline. It’s a great example of how classical and modern techniques work hand-in-hand.

13. Optimization is key in both classical algorithms and AI models. What are some common mistakes you've seen engineers make when trying to optimize systems that combine both worlds?

Imran Ahmad:
Optimization is absolutely critical. Let’s talk about how important it is.

Math can be cruel. If you're not careful, your problem might never converge—not in hours, not in days. If you accidentally introduce an exponential factor in the wrong place, it might take years—or even centuries—for the solution to converge. The sun might die before your algorithm finishes!

So no, it's not OK to say, “I’m not in a hurry, I’ll just let it run.” In algorithmic work, things can spiral out of control very quickly. That’s why optimization isn't a luxury—it’s a necessity.

You can optimize at multiple levels. First, hardware. For deep learning especially, using a GPU can speed up training by a factor of 1,000. I’ve dedicated a chapter to this in my book—why GPUs make such a huge difference.

Then there’s hyperparameter tuning. Traditionally, this took a lot of time—trial and error. But now we have tools like Vizier, developed by Google. It’s designed to find optimal hyperparameter configurations.

In historical terms, a “Vizier” was the advisor to the Sultan, helping the kingdom run better. In the same way, Vizier helps your training process by intelligently selecting hyperparameters. And what’s fascinating is that Vizier is based on classical heuristic algorithms—not deep learning.

There are two types of algorithms: those that guarantee an optimal solution, and heuristics, which give you a “good enough” solution without that guarantee. Vizier falls into the second category, and it’s very effective.

So, even in advanced AI workflows, we’re relying on classical heuristic methods to optimize performance.

14. In addition to Vizier, are there any other tools or techniques you recommend for profiling and diagnosing algorithmic bottlenecks in AI workflows?

Imran Ahmad:
Yes. You can profile and diagnose at several levels, but let’s start by clarifying what AI really is.

AI is about formulating a signal from patterns embedded in data. For example, say you have 100,000 records, and some indicate fraud while others don’t. AI is about detecting that differentiator—the signal—and using it to train a model.

But before you even get to model training, you go through the CRISP-DM lifecycle: understanding the data, preparing it, training the model, testing, and deploying. That first stage—data understanding—is crucial. This is where classical algorithms can help.

Let’s say your dataset has a lot of randomness and noise. If there's no clear signal, then even a Nobel Prize–winning scientist won’t be able to build a good model. So you need to assess your data before you commit to AI.

Classical techniques—search algorithms, dynamic programming, and others—help you profile the data. Your approach will depend on the type of data. For IoT data, you might use graph algorithms. For government or healthcare data, you'd need other tools.

The key takeaway is this: before jumping into training a machine learning model, use classical methods to ensure that your data even has a learnable pattern. Otherwise, you’re wasting time and resources.

15. Many developers learn algorithms for interviews but struggle to apply them meaningfully in real systems. How can learning approaches evolve to make algorithmic knowledge stick?

Imran Ahmad:
Learning algorithms for interviews is a good start—it shows initiative. But in interview prep, you're not solving real-world problems. You’re anticipating what might be asked, and the interviewer only has limited time, so the conversation stays at a shallow level.

That kind of surface-level knowledge is expected—and sufficient—for most programming roles. Of course, if you’re applying for a deep learning role at a company like Google Brain, they’ll expect more depth. But in general, you're not expected to know things like how to tune hyperparameters in FP-Growth for an interview.

To truly make algorithmic knowledge stick, you need to use algorithms to solve actual problems. That’s when you'll face real challenges, discover edge cases, and realize that you may need to know other algorithms just to get your main one working. You'll learn how to prepare the data, optimize your solution, and deploy it in a scalable way.

There’s an entire ecosystem—an algorithmic community—that supports every solution. And that’s why I keep saying: classical and modern algorithms aren’t separate worlds. They complement each other, and a solid understanding of both is essential.

16. Do you have any favorite practical exercises or projects that help solidify algorithmic understanding beyond textbook examples?

Imran Ahmad:
Yes. I recommend use-case-driven projects. In industry terms, we call these verticals. Concepts and algorithms are the horizontals—they span multiple use cases. Verticals are specific domains like government, healthcare, or finance.

I work in government, so I’ll use that as an example. Governments around the world are legal custodians of citizen data. That includes everything from criminal records to healthcare, immigration, and employment data. If used responsibly, this data can change lives.

Governments can identify vulnerable populations, deliver services more efficiently, and even lift people out of poverty. So if you want to work on a meaningful project, start with real government data.

One of the best sources is data.gov, which contains nearly half a million datasets from all levels of the U.S. government. You’ll find datasets on the environment, public health, traffic, flights, and more.

Similar portals exist for other countries too—Canada, Europe, India. The Indian government, for example, is making great progress toward open and accountable data access.

These are not “academic” datasets—they’re real, messy, and relevant. Choose a vertical you care about, download a dataset, pick an algorithm, and try to solve a problem. That’s the best way to solidify your learning.

To explore the topics covered in this conversation in greater depth—including practical guidance on applying algorithms to real-world systems, navigating the evolution from models to agents, and understanding the role of classical techniques in AI workflows—check out 50 Algorithms Every Programmer Should Know by Imran Ahmad, available from Packt. And keep an eye out for his upcoming book, 30 Agents Every AI Engineer Should Know, publishing later this year.

Here is what some readers have said: