Rethinking Test-Driven Development for the AI Era: A Conversation with Kevlin Henney
On misconceptions, design pressure, legacy code, language cultures, and why testing and review skills—not tools or AI—will shape how teams use TDD.
Test-driven development sits in an awkward place in many teams: widely cited, unevenly practiced, and often misunderstood. For some developers, TDD is a niche technique that only applies to greenfield code; for others, it is reduced to “writing some unit tests” after the fact. In between those extremes are practical concerns about legacy systems, language ecosystems, CI pipelines, AI-generated code, and the day-to-day pressures of shipping software with limited time and attention.
In this Q&A, we speak with Kevlin Henney — independent consultant, speaker, writer, and trainer — whose career sits at the intersection of software design and everyday development practice. Kevlin works with companies on code, design, practices, and people; contributes to the Modern Software Engineering YouTube channel; and is co-author of A Pattern Language for Distributed Computing and On Patterns and Pattern Languages in the Pattern-Oriented Software Architecture series, as well as editor of 97 Things Every Programmer Should Know and co-editor of 97 Things Every Java Programmer Should Know.
Across the conversation, Kevlin unpacks why TDD adoption stalls even for experienced developers, the misconceptions that blur the line between “developer testing” and true TDD, and how tests shape design without losing sight of the bigger architectural picture. He talks through introducing tests into large legacy codebases, how language and ecosystem culture influence testing practice, and what distinguishes good, specification-like tests from brittle method-by-method checks. We also explore tooling choices, where TDD fits alongside integration, acceptance, contract, and performance testing, and how team leaders can sustain testing discipline under deadline pressure. Finally, Kevlin shares his perspective on AI-assisted development, the risks of outsourcing tests to generators, and why, in an era of increasingly automated code, testing and review skills matter more than ever.
You can watch the full conversation below or read on for the complete Q&A transcript.
1: Adopting TDD can be tricky, even for seasoned developers. In your experience, what are the main reasons that experienced developers struggle when first adopting TDD?
Kevlin Henney: I think there are different kinds of developers, and they will have different reasons for struggle. At one level, you are asking people to do something different from what they normally do. That is the first challenge. Just as a human being, that is always going to be difficult, particularly when you already have a set of habits in place. Regardless of how effective those habits actually are, we always perceive the habits that we have as being comfortable. That is why they are habits, and sometimes we have a justification for them.
So trying to get anybody to do something different from something they already do is going to be a challenge. The more experience you have, in this case, the more at a disadvantage you may be, interestingly. If you are relatively new to software development, then everything is fresh and every new idea is more likely to be treated equally by you.
But even then, we need to understand that a novice developer can sometimes struggle, and sometimes we have the issue with people who are in that overlap space where they are not necessarily formally a developer, but they do a lot with code. I am thinking particularly of data scientists and engineers who might not consider themselves to be developers, but who have worked extensively with Python and associated libraries such as NumPy, Pandas, and things like that. They are in the development space, but they do not necessarily have the insight of development culture and concepts, and often they have semi-effective workarounds that they have created, which get them by every day. The point is that for most people, this is the case.
When it comes to testing of any kind, we do not necessarily have as good a story for people as we do for creating a feature and doing a demo. These are very well-practiced within the software development space, and often videos and books will emphasize these, and there is much less on any kind of testing, let alone TDD. So testing tends to be more ad hoc. When you are trying to get somebody to do something like TDD, which is a very structured workflow, that is your challenge: you are trying to get them to do something different.
Then one of the other challenges is often the way that TDD is described. There is a simple mantra, “red, green, refactor.” You write a failing test for something, then you make it pass, and then you refactor. Although that is a very simple mechanical description, and it is not wrong, it is not very motivating. It leads to the reaction, “Why do you want me to write something that does not work?” That is not the right mindset. “Write a thing that does not work and then make it work” does not feel like a motivating mindset.
So I think that is another obstacle. Often the examples or the way that TDD is taught make a lot more sense to somebody who has expertise in it, or when you are coaching alongside somebody, than they do when you are just offering somebody the mantra. It is not compelling. I will be the first to say this. When I do workshops and training courses for companies, I will describe the red-green-refactor cycle. You need to know that. But then I go into it, I take it apart, and I say what is really going on.
At that point, it becomes easier to motivate. The first point is that you are not just writing a failing test. You are writing something for a behavior that you do not have. Because you do not have that behavior, of course it is not going to pass. But the goal is not simply to make it pass. The goal is to write what you want for the new behavior.
The next motivation is actually a simple constraint. In many cases, we can end up yak shaving or just running off into the horizon with complex behaviors, saying, “I will just write everything, and then I will come back and test it later.” If we do that, we often end up with things that are not as simple as they should be, and we do not ask ourselves the questions, “Is this what I need? Is there a simpler way to do this?”
So TDD is literally a limiting factor. It is like throttling back the instinct to just throw everything at the screen. Instead, you are going to take steps so that you understand every step and consider it. It is really a scoping mechanism. The idea is: now I am going to make it pass with something that is no more complex than necessary, so that I fully understand what the next step is going to be. I can guarantee that this is always working, but I am also going to give myself the opportunity for refactoring.
When explained like this, I am not going to say that it suddenly turns on all the lights, but it does make more sense. Then we move to the next level, where we say, “Let us forget the red and green. Red and green are side effects.” Your real goal is: tell me what you want to have working. Here is a piece of code. It has a certain amount of functionality. You want it to do something else. What does that extra bit look like? Show me an example. Somebody says, “Well, it should do this.”
“OK, great. Does it do that now?” “No, it does not.” That is why it either does not compile or it does not pass a test, because you are asking for something new. A test is a change request. When you describe it that way, a lot of people say, “Oh, I see. You are writing a change request to yourself. You are saying, ‘I want to have a piece of code that does this, but it does not do that yet. Here is a really concrete example of what I want.’”
Now I am going to work towards that, and I am done within a couple of minutes, and I can continue from there. The point is that you are not simply teaching somebody how to test, although there is a truth in that. You are actually trying to rewire how they think about the very act of coding, and that is hard.
That is why you will find that it is a skill. It is something worth practicing, and it is a practice that, once you have it, you can draw upon. It does not mean you have to do it all the time, but if you have never practiced it, how can you say, “That would be appropriate now as a tool or a technique”? You are trying to rewire how people think about the act of coding, and that is difficult. So you will meet resistance because of change, but also resistance because it is a fundamentally different way of doing something for which people already have some behaviors.
2: You have talked a lot about the mindset shifts required, and you said that adopting TDD itself is a skill. Are there any specific skill gaps you can point out that tend to be the biggest hurdles for developers who do not adopt TDD?
Kevlin Henney: Honestly, sometimes the problem in terms of skill gaps is simply testing itself, unit testing itself. In other words, developers do not have a habit of any kind for testing, or testing is something that happens later and sometimes in a mad rush. Therefore, the tests that are written are quite difficult to read, and people often have this idea of tests being second-class citizens.
Often you look at tests and you say, “Yes, they look like second-class citizens,” and people create tests that are difficult to maintain, sometimes because they have never been shown what a good test looks like. For many people, when they are learning TDD, they encounter the fact that there are many things they are trying to learn at the same time, and one of them is, “What does a good test look like?”
That is an issue. Picking up on what I said earlier, tests are specifications. There are many ways of thinking about testing, but the way that we are encouraging here is specifying. Your test should be an explanation, a description that captures intent, and it should have an example. The example is the centerpiece, and you want to capture the intent in the name. If you have a test that does too much, it is not a good test.
It turns out that many of these things are things that people do not already know or do. So in addition to the workflow, there is an additional skill: how do you write a good test?
The other skill that is often missing is that people do not necessarily have a good code sense or design sense. By that I mean they often do not know what good code looks like. When you say, “And now refactor,” they do not know what to do, because although they may have a refactoring menu available to them, and although they may know the meaning of the word, they do not actually know what “better than this” looks like.
So you end up with code that just gets bigger and bigger, with more ifs and whiles. That is not what I have in mind. Where is the simplification? They are not actively looking for simplification. That is a design skill, and that is quite difficult to teach.
Therefore, if you are going to make the best use of any workflow, and this is not unique to TDD, you need to be actively looking for good design. Many workflows suffer because people are not asking, “How do I make this simpler? How do I make sure I have less code to maintain in future?” You want to write less code. Your goal is not to produce more code; it is to produce less code.
The most common thing I see is that programmers do not know how to write less code than they need. They often go in with code like, “There is absolutely nothing wrong with writing too much code as your first draft.” There is nothing wrong with that. What matters is what you do with your second draft, and that is the problem. Many people do not have that second draft because they have not worked alongside somebody who can show them what that looks like.
You cannot expect this to be an act of magic. You start learning how to code and suddenly you develop a good instinct for the right balance and structure of a method, of a class, and what an interface should or should not look like to be effective and easy to change. Without helping people develop that sense, almost any workflow you throw at them is going to make things potentially worse.
We see that with AI: people who do not know how to code can produce a lot of code. They need to learn to produce less. You can use AI to produce less. The skill is to produce less that does exactly what you want, because then you have less that can go wrong and less to read.
This is something that I do not think we get across well enough. For me, TDD helps with that, because it always reminds me: “OK, now cook it down. You have this; cook it down.” You have tests; they work. You have a safety net. There is a skill there, which is very much code sense, both for the tests and for the body of the code itself.
3: What do you see as the biggest misconceptions or myths about TDD among developers and teams today?
Kevlin Henney: I do not know that there is necessarily just one, but there are a few. One is that you can, that it is only something that you can do with new code. Another is that, to be precise, it can only be used on a greenfield situation. Another is that your TDD is very much centered on your unit testing framework and things like that.
So there are these kinds of ideas, and we live and work in an industry where jargon is often thrown around and sometimes it is very imprecise. When something is described in a number of companies, “Oh yeah, we are doing testing,” that is great. There is nothing wrong with that. The code leads and the tests follow, which is a different workflow. That is perfectly fine. I am not here to tell people that TDD is the only way to work.
What I am trying to avoid is a kind of semantic dilution if “TDD” comes to mean just that developers are testing. That is great, but we would like to call that “developer testing” as a general term, rather than TDD, which is a very specific workflow.
4: Are there any particular false beliefs you frequently find yourself debunking?
Kevlin Henney: Oh, yes. There is one that has two sides to it. First of all, I separate out a couple of things. Sometimes when people are being negative about TDD, they are not talking about TDD; they are talking about unit testing. They are using “TDD” to stand in for unit testing in an environment where, culturally, within the organization, you do not test. That becomes a reinforcing thing. It is not about TDD at all in that case.
Then there is another one that does come from people who do practice TDD. Every now and then you will hear the slogan that TDD is not about testing, it is about design. I know what they are trying to do. They are trying to emphasize that testing is not just an act of verification. We often have this idea of testing as purely about verification, a kind of gatekeeping activity. But saying “TDD is not about testing” is not a true statement, and I always have problems when people present it that way.
At least half of my work is with companies who say, “We want to do TDD,” when what they really want to do is testing. TDD is a discipline, a workflow. You can tell when people are doing it. It is also the most extreme thing they have probably done in terms of testing discipline, so why give it a name that you also use for everything else? I always make sure people are aware there are many workflows.
What I try to make sure they understand is that when we say TDD, we mean something specific. It is not a magic spell. It is a particular way of working that gives you certain kinds of feedback and certain kinds of design pressure. Part of that pressure is on you as a developer to ask, “What am I actually asking for? Do I know what I want from the code?” Sometimes the honest answer is that you do not know what you want. That is a recognition of ignorance, that you do not yet have enough knowledge. At that point you may need to discover that knowledge, perhaps by spiking something or exploring, rather than pretending you are doing TDD when you are not.
Another part of this is the tests themselves. Some of them are actually quite large, and you have to ask, “Do I really want that? Is that genuinely helpful, or is that telling me something about the design?” Often the test is large because the design is causing that. If the interface feels very clunky, then that is telling you something about the design as well.
So as to what it feels like: testing is in fact the way you experience the design. Rather than looking at testing as a purely quantitative activity—“I got this percentage of statement coverage, I have done my job”—you can ask, “What does it feel like to write the test? What does it feel like to use the code I am providing?” If the answer is, “Yes, it is quite easy, it feels natural,” that is good feedback. If it feels like having a code review where you have to do most of the work yourself, then the tests are giving you a signal that there may be something up with the design.
5: There is an ongoing debate about TDD’s effect on software design and architecture. Some argue that focusing on small tests leads to fragmented design or lack of “big picture” thinking. How do you believe TDD influences software design?
Kevlin Henney: Hmm. I think it goes back to what I was saying about having this kind of design sense or code sense. If you are only ever going to think small, then yes, TDD will have those effects and you will end up with fragmentation rather than a cohesive design. That is one of the reasons it is quite important to make sure that you have a reasonable test hierarchy, that you are testing at all levels, and why, when you are doing this, you should always be taking the big picture view as well.
And this is, I guess, where the driving metaphor that is used extensively when talking about TDD becomes even more appropriate these days. When I drive, there are three places that I am typically looking. I am looking at the road immediately in front of me. I am also looking down at the dashboard to see what my car is telling me. And I am also looking at a map to see what the big picture is.
The problem is that I get the feeling that many people, and this is again not just a TDD thing, I find this with different roles in development, are only ever looking at one of these at a time. So it is like, of course, if you are only looking at the dashboard, you are not going to see what is in the road in front of you; you are going to slam into something. But if you are only looking at the things in the road in front of you, that does not tell you what the bigger picture looks like and what the trends are in traffic, for example. So you are not getting feedback at all these three levels. You are only ever looking at one and ignoring the feedback from the others.
So here is the thing. If you are using TDD and that has caused you to end up with a fragmented design, you are looking at the bigger picture. But also, whenever you are having design ideas, the idea is that when you are launching into TDD, you should have a vision of where you are going to go. The problem is that sometimes people do not actually have an idea of where they are going to go. I often have this thought of sketching out an approach. Do not commit yourself to detail. This is not a committed design; it is literally a sketch.
As a sketch, what you are going to do, what TDD is going to do, is fill in the details. For anybody who does draw, and I know that drawing is not a very common skill among developers, it is one of those things where I always ask people what they do. Music is very common. Gaming is very common, whether it is computer-based games or board games. Certain sports are very common. Drawing is not very common, but when you draw, you often sketch the form and then you put the detail in, but that detail sometimes tells you that maybe the form is not right.
So for me, people often launch in hoping that if they start drawing in the bottom right-hand corner a miracle will occur. If you are a brilliant artist, yes, a miracle will occur; you will produce a great picture. But sometimes people are not looking at the big picture. They always need to be asking, how is this going to be used, how does this affect that? So for me, I think that we can look at that.
When some people say, “TDD does not do this,” my answer is, “No, that is your job.” TDD’s job is to do the sketching. It is your job as the artist to see the bigger picture and say, “I am drawing the wrong thing,” or “Maybe that needs to be moved,” or to take the feedback. If you are only taking feedback at one level, that is great; many people take feedback at zero levels. However, you need to be looking at multiple perspectives. Some of them are closer and some of them are further away.
So I do not really accept the criticism that TDD causes this. I accept that there may be a misunderstanding of the role of TDD, that people are sometimes saying, “If I do TDD, magic will occur.” As I told my kids when they were growing up, there is no such thing as magic. There is you, there is you and a tool and a technique. That is it. If you are misapplying the technique, that is not the technique’s fault, so there is a learning opportunity.
6: When it comes to scaling TDD in a larger organization, what challenges do enterprises face in rolling out TDD across teams? Based on what you have seen, what strategies help make TDD stick in the long run at the organizational level?
Kevlin Henney: I think this one is more a case of, although I am very keen on TDD, I do not necessarily know that an organization wants to roll out TDD. It is a workflow practice, and I think if you can get that working within a team, that is great, but there is no reason that another team has to do it. I think it might not be helpful for an organization to be mandating these things.
I think what the organization needs to care about is more a case, not so much of the way that we are producing the tests, but the fact that, do we have builds that work together, do we have comparable testing philosophies across different teams? If you have a team that is doing a more traditional kind of “test later towards the end of the sprint” type approach, and let us say they are really effective and they have some really good design and their interfaces evolve really nicely, I would not mess with that. They are doing a perfectly good job, and because we have organized around teams, that does not really interfere. As long as our teams have some kind of alignment and relationship with an architecture, then I do not think there is a problem there to be solved.
What we do want is the idea that we have a consistent or a reasonably consistent and compatible view of testing across the organization, and that if TDD helps me get that, then that is what I should be encouraging. But I am not going to say that it is going to be the thing I should focus on. I think that what an organization probably wants to focus on at the organizational level is: if we have various build pipelines, do these build pipelines follow similar philosophies of testing?
Because a build pipeline that does not have any testing in it is not really, certainly I will be very careful here, I am using the term “build pipeline” because people will often say, “Oh, it is our CI/CD pipeline.” Is it? Are you doing continuous integration and are you doing continuous delivery? Because CI/CD is predicated on the idea that you have tests. In fact, to be fair, CI/CD is predicated on you doing trunk-based development and you doing a lot of tests. That is what that means. You can go and look at the original books and they are very clear on this. So definitely a lot of companies have build pipelines, but do they qualify as CI/CD pipelines? Not always, not from the strictest definition. I think that is more valuable.
So let us put it this way. I do not think an organization needs to worry about what people are doing in their homes, but they probably need to worry about the road system. In this sense, organizationally, when we look at software architecture, we need to be thinking of software architecture more like urban planning. We want to have consistent rules and models for the roads. We want to have a consistent layout, see what the issues are, and agree on things about roads and services. What people do in their homes and how they structure their homes and how they do it, I think that can be a lot more freeing, as long as we have the knowledge available and maybe one team can coach another and we can say, “You can become our enabling team; we are going to try this practice.” I think that is great, but I am not sure I want the organization to get involved in some of the more detailed practices that support what goes on. I think what goes on, what the output is, and how teams integrate is probably more important than specifically what they do on the inside.
I think what can make it stick is very much, let us build off what I have said. One of the things is that a team needs to feel that it owns its practices. Teams respond, and individuals respond, sometimes quite poorly when they are told what they are going to do and they do not really feel it. If a team is told, “You are going to do TDD,” that is not a way of getting them to do anything well.
If they can make it their own habit, if they can create it, if it is their decision, then that is really important, but also if they feel like they have learned something. Again, this goes back to this idea of, within any large organization – and this is obviously a question of different organizations and different scales – in any large organization we are going to find that there are different kinds of teams trying to produce different kinds of products.
Some people say, “We do not do TDD here.” Be very careful that when somebody says, “We do not do TDD here,” that this is not also, “We do not do testing here.” Again, going back to what we have already discussed, that is what I hear when many people actually say, “We do not do TDD” or “TDD is not appropriate for us.” They are actually using TDD to mean any kind of testing, and so therefore they are using the wrong word. They are actually saying something much more bluntly as, “We do not have tests.” If they said that, that would be far more direct and we could work with that.
We need to work out whether that affects us or not. If it is a team that is just prototyping and giving us the results of prototypes, then that is not important. If it is a team that is prototyping a design, yes, we want tests, because you are telling us that this code, which we do not know whether or not it works, is the basis for what we are going to build. Prototyping can involve TDD and it can involve tests. I have done that a number of times in the past. So really it is a case of trying to understand from the organizational level how to get the knowledge out there and make the knowledge feel much more natural.
For many people, any kind of unit testing habit is the challenge. Having tests that run quickly is the challenge, and I would address these questions. I would treat those as the questions to address, and what we may find is that TDD by example may follow, particularly if we have somebody from within the organization who has experience of that and that is how they drive it and that is how they show it and that is how they demonstrate it. Then that may become a lot easier.
Lead by example in this case rather than by mandate. Basically say, “Look, there are a lot of different testing workflows. Our objective is to get better testing, to make testing more convenient of any kind. Let me show you this. I am going to use a test-driven workflow.” Suddenly when you do that, that is much more open and I think people are more likely to adopt it. Whereas if we have somebody going around measuring different teams in a very obvious way, teams justifiably feel a little bit of resistance, offer a little bit of resistance there.
7: Legacy code is a reality for most teams. If a team inherits a large untested codebase, how would you recommend they approach introducing TDD or even more testing in that scenario?
Kevlin Henney: I think that is a really good question, because it matches a lot of people’s lived experience. The key point is that you have to prioritize. From where you are, perfection is impossible, so you have to look at what is possible, and that is going to be a little different for each codebase and for each team. A lot depends on whether you have what I would call a “maintenance mindset.” If you have that mindset, it is going to be very difficult to adopt TDD.
By “maintenance mindset,” I do not just mean software maintenance in the narrow sense. I mean the broader idea that “we are just maintaining whatever it is we do.” You often see this where initial development has been done in one location, and then the work is offshored to another group. The second group is told, “You are just maintaining it,” and people there may not think of themselves as doing software development. In reality, they are. There is no real separate thing called “maintenance” when it comes to software products. It is all software development. There is not “software development plus maintenance”; there is just software development.
So the first step is to reclaim the right words. You are doing software development. Everything you do has the potential to change the architecture. It is your responsibility not to preserve the problems in the existing codebase, but to eliminate them. “Maintenance” as stasis is not what you want. Your job is to be more ambitious: to make the product better than it was when you received it. How do you do that? One obvious obstacle is that you would love to test everything, but you have poor test coverage. In that case, do not try to test everything. Instead, decide how to prioritize what you test.
A useful way to do that is to look at what is going to happen in the next quarter. Suppose over the next three months you are going to add features in a particular part of the codebase. If that corner of the codebase is already relatively well isolated, then you lean into that. Reinforce the isolation. Make sure you have good automated refactoring tools available. Remember that your compiler will still catch many type-based errors. You can introduce separation and decouple tightly coupled code without relying on a large pre-existing test suite. You can lean on automated refactoring, appropriate review, and, as Michael Feathers puts it in Working Effectively with Legacy Code, “lean on the compiler.”
I have done this with teams: we deliberately ignore much of the rest of the codebase for the moment and decide, “We are going to make this part really good.” Once you have something isolated, it becomes easier to test. Unit testing and even integration testing are really about understanding isolation, loosening coupling, and improving cohesion of code units. Those are the practices that improve your code sense. Many developers these days do not have a clear understanding of coupling and cohesion. They get distracted by principle catalogs that are not very coherent. For example, the SOLID principles do not form a coherent set of design ideas; they are a bunch of things thrown together and they miss many important aspects. I know I will get comments for saying this, but I have been doing this long enough to say that SOLID principles are next to useless if you want to learn how to write good code. You are better off learning and reinforcing the fundamentals.
If you can isolate a small part of the system, that becomes your zone of “new development.” This is a bit like urban planning. In a city you cannot change everything at once, but you can change a particular district. Because that area is separated, you can make it good and benefit from that separation. That is one technique for allowing a team to claim territory and improve not only their testing but also their design. The important idea is that you are not just improving testing habits; you are improving the code itself. Testing and design are not separate activities. Treating tests as something separate is part of the misconception. Tests show you how the code fits together and whether your design is good. If you say testing is difficult, you are actually saying the design is difficult. That is useful feedback: “What do we change so this becomes easier?”
A practical goal to hold is that in six months’ time it should be easier to work on this codebase than it is now. That will involve more than just testing. It will involve changing code, build settings, and all kinds of small things. You are trying to improve the overall situation. Another way to prioritize is to “ask the system itself.” Treat the legacy system as having a body of knowledge and let it tell you what to focus on. If you have a million lines of code, a team of ten is not going to transform it overnight, so do not try. Instead, look at the system’s history. What changed? What keeps changing? Look at the parts of the code that change most often.
It does not matter whether those parts are changing for good reasons or bad reasons. If they are changing frequently, that is where you want to improve both testing and developer experience. If you are frequently changing something, you are more likely to break it, and you are also more likely to benefit from making it better. Parts of the system that are incredibly stable do not need the same attention. That does not mean they are automatically good. Some things are stable because they are terrible and people are frightened to touch them. But if they are not changing, they are no more broken than they were before, and they already “work” in the sense that people rely on them as they are.
So use the system to tell you what to change. The system already has an opinion, visible in its history and defect patterns. Do you have a heat map of where your defects are? That is where you want your tests. In that sense, you can use the legacy nature of the system constructively and positively. I think we often overlook that because it is not immediately obvious, but it is a very practical way to introduce more testing and TDD-like practices into a legacy codebase.
8: You have worked with a variety of programming languages, from C++ to higher-level languages like Python. Do you find that TDD plays out differently depending on the language or tech stack?
Kevlin Henney: Yes, it does, but not always for the reasons people might think. Sometimes it is more about culture than language features. Just as natural languages are associated with different cultures, programming languages have associated cultures, idioms, and toolchains. So you have the syntax of a language, but you also have the tools that are available and the habits that have grown up around them.
Culturally, testing as developer testing is far more prevalent in the Java world. There is nothing inherent in the Java language that makes it more amenable to testing than a language like Python, but testing is more likely to be present. That is because modern unit testing, at least in the popular sense, grew up around Java. The JUnit framework appeared in the late 1990s and was integrated with Eclipse. That made it normal for unit testing frameworks to be integrated into IDEs. Java was the language in which those practices and cultural habits were first formed. As a result, Java developers are much more likely to encounter JUnit and similar tools in an integrated environment early in their careers. In that sense, Java is “better suited” to TDD than Python, not because of the language itself, but because of the surrounding ecosystem.
Python, by contrast, does not have a single standard IDE in the same way. If you are working with Java, you are very likely using IntelliJ or a similar environment. If you are using Python, you might be coming from many different directions. If you are a data scientist, you have a different view of the world. Data scientists do not usually use Java; they use languages like Python. With Python you have people who consider themselves software developers, children who are learning to code, people who are scripting, people who are doing data science, and so on. There is not a single core culture, so you end up with disparate practices. Python itself also predates the period when automated unit testing became a strong habit. That is not to say Python developers do not test, but the cultural environment around Python does not have the same unified testing norm as the Java ecosystem. So in that sense, what you see with TDD or testing is often more about development culture, who is around you, and what information and tools are available.
If we move to C, or C for classic systems programming and embedded work, we see yet another culture. These are contexts where you are much less likely to find unit testing. If people are testing, they often test at the system level and not even at the integration level, let alone at the level of small units of code. So culturally that is an obstacle to TDD.
Then there are the language characteristics themselves. Python is a much “looser” language; it is dynamically typed, and that can actually make some aspects of TDD easier. I sometimes joke that when I am using Python, I do not need a mocking framework because Python is the mocking framework. Mocking frameworks were invented for statically typed languages like Java, where the language does not easily support meta-level behavior. Those languages are less elastic and less plastic. In Python, I can reshape almost anything. The language itself is a tool that can be used to modify itself. At that level, from a purely linguistic point of view, Python can make testing easier.
However, cultural habits can get in the way even there. For example, many Python developers, especially in more data-science-oriented contexts, have a habit of reading and writing files everywhere and accessing files in every function. That makes testing harder, and it is something I try to encourage people to stop doing. In C and C++, there are language constructs that encourage longer build times and more source-file dependencies. There are also design habits that do not lead to natural decoupling or obvious substitution points where you can say, “I can easily put something else here because the design allows it.” In those environments, you sometimes have to push uphill against the prevailing culture of the codebase to get to a design that is test-friendly.
So yes, languages can make TDD easier or harder, but only sometimes is that because of the language features themselves. Very often it is due to the surrounding culture: the design culture, the testing culture, and the practices that have grown up in and around that language.
9: The quality of tests is crucial in TDD. What are some best practices you recommend for writing good tests in a TDD cycle?
Kevlin Henney: That your tests should. So one of the techniques that I always think of is that your tests should be testing one concept, one idea. That does not mean they necessarily have just a single assertion, but they should have a single focus. What is the thing that you are trying to demonstrate? That should be easily summarized by the test name. This is one of those cases where naming something is not merely labeling, it is actually testing as design in this case, because it will cause you to create different tests if you use a different design approach or different naming approach.
My preferred habit is to use tests that are propositions. So let us just take this cup, for example. Some developers might say, “I have a constructor,” and they will write a method testConstructor, and testFill, or testDrink. What you are doing is you are just going through the shopping list of methods and writing a test, and you cannot test like this. There is no way to produce good tests using that technique. I actually do this when I run training courses. I show people that it is impossible to produce good tests using this technique. If you just go one method at a time and say, “I am going to write a test for this method,” you cannot test. You cannot write good tests like that, partly because, in order to drink from a cup, I need to create it.
So therefore I have already involved the constructor. I am not just testing the drink method. I also need to fill it, so I am using the fill method, and then I can drink from it, and then I need to determine whether or not it became empty. I have just used four different operations there. I am not testing a single operation, I am actually testing the interaction. This is why, when we look at the perspective from BDD, behaviour-driven development, that gives us a different way of understanding what you are after. You are after testing the behaviour. The behaviour is not just in a single method, it is the composition of different methods in different scenarios.
Another reason you do not want to end up doing testDrink, for example, is that I can drink in two different scenarios. I can drink from an empty cup, and I can drink from a full cup. That is not one test case, that is two. They are very different, and they have different outcomes. So the first thing is, if you are currently doing that, it is a huge test smell if I see that pattern. If I just see tests that are “here is a method, here is a test method that corresponds to it” — testA, testB, testC — you do not have the tests. It is as simple as that.
I always lay it down as a challenge to people: show me if you have any counterexamples. Nobody has ever been able to come up with a good example that contradicts that observation. What you need to be doing is testing behaviours, or in some cases we would look at it as testing a property. There is a fluid overlap between these approaches. You make a statement, a propositional statement. By propositional statement, I mean that we describe something and the way that it is.
“A new cup is empty.” “Drinking from a full cup empties it.” These two sentences are the test names. So literally your test name should be as easy to read as if it were a specification, which is what I said earlier. In other words, each test needs to be organized and thought of as a specification with an example. Here is the thing that we are showing. This is the expected outcome in this scenario. This is the behaviour or the property that we are entitled to, and that we are requiring at this particular point.
When we start looking at it like that, you suddenly realize your tests are not just a bunch of assertions with bits of setup. You are telling a story. You are describing the system from a specification-oriented point of view. You are giving people a series of logical propositions. If the test fails — if I say, “A new cup is empty” — that is a proposition. If that test fails, what does that mean? It means a new cup is not empty. I can tell immediately by looking at the test name what is wrong. I might not know why, but I know what. Whereas if I say testConstructor and that fails, I have no idea what that even means.
So the point is, your tests are units of meaning. Or, put another way, they are not just verification, they are communication. You are communicating actual meanings. If your testing philosophy is that you are just poking your code to verify it, you are going to end up with tests named after methods, or even worse — and I have seen this a few times — test1, test2, test3. Honestly, that is not going to help anybody.
You can always tell whether or not a team has really understood or has a good testing habit, because if they are testing like this, there is no way they have a good testing habit. They are doing testing as an afterthought. It does not feel good. I would not like to write tests like that, and if somebody said, “Kevlin, why are you not writing tests?” I am going to say, “Because it feels wasteful and it is annoying, it is frustrating.” If you adopt those practices, it is annoying. I would not want to write tests like that.
So test quality needs to be quite high; otherwise you are going to end up with unmanaged technical debt in your test base as well as problems in your code base. You do not want to double your problems, you want to reduce them. Your tests should be a clear explanation of what your system does in the detail, along with intention. For me, that is what I put under the heading of GUTs — good unit tests — and that is a term from Alistair Cobra. TDD does not miraculously cause you to do GUTs. You need to again realize that you are in the driving seat. Having a nice car does not cause you to be a better driver, and I think there are a lot of people who would benefit from that analogy.
Then you need to listen to your tests. What are your tests telling you? Your tests are telling you, “This is not cohesive.” Everything is bound up, and you have too much in one place. If you want to test a behaviour in that, or a related group of behaviours, then that related group of behaviours is its own module or its own class. Why is it hiding inside another class? This is design feedback. Again, sometimes the difficulty of testing comes from the difficulty of the code.
So I would say listen to your tests. My standard answer when people say, “How do I test the private stuff?” — my stock answer is that, generally speaking, you do not. That is a signal that you need to separate something out. You are not dealing with one idea, you are dealing with two ideas, and one of them is hidden inside the other. Pull it out. Do an Extract Class and focus on that. It is clearly important because you value it. You just said, “I want to test these behaviours.” You probably even have words and names for it, but it is hiding embedded inside another class. So give it first-class citizenship and extract it.
At the same time, I recognize that there is a point here. If I told you that and you have a major release tomorrow, that is probably not helpful advice from me. So that is why I do not say, “Do not test,” or “You should never test private stuff.” What I say is that you should take it as a signal, and you probably do not want to do that. So in those cases where we need a little pragmatism, I would say, “Yes, I am either going to weaken the encapsulation on the class in one way or another, but I need to put a huge deprecation, or, ‘This is technical debt I need to manage.’”
If I have ignored that warning three times, take it as a “three times and you are out” rule. If I keep coming back to the same code, and my colleagues and I keep coming back to the same code and saying, “Yes, we said we would fix this,” now you need to properly schedule it as a piece of work, because you are always working around it. You are not working with your code, you are working around your code.
That would be something I would say. Again, that is not really so much a tooling thing as understanding what your test is telling you. When it comes to mocking, I do not have any strong opinions, except that most people mock too much, rather than understanding that excessive mocking is an indication that you have a problem that you should be solving. Do not lean into it by adding more mocking. Lean into it by asking a different question: “How do I mock less? What rearrangement of interfaces or class responsibilities would make this easier?”
I generally think that people use too much mocking anyway, even in quite good designs. There are simpler ways of looking at it, and they confuse themselves. So you end up with a lot of mocks and a lot of mock noise, which is not to say that mocking is not useful. It is just that most of the time, I think the guidance I gave to one team years ago still holds: if you are not mocking, you probably need to learn how to mock. Learn how to mock. But if you are already mocking, you probably need to learn to mock less.
I do not have specific feedback on mocking tools, except to say that sometimes I do not find them particularly necessary because of the language. I made the comment about Python earlier. In some languages the language itself is effectively the mocking framework. So for me the emphasis is less on specific tools and more on what your tests are telling you about your design, your responsibilities, and your coupling.
10: Does tooling make or break the TDD experience?
Kevlin Henney: If you are able to establish the workflow, and the code that you are working with has the right properties, or you are moving in the direction of it being loosely coupled and highly cohesive — you are using good, classic design practices to organize your code — then you are going to get most of the experience that you need, and that will not change too much between testing frameworks.
I used a technique years ago where I would get people initially testing with just plain asserts, just a straight assertion, whatever is available in the language or library, without using a testing framework. Then I would get them to refactor towards a framework. That actually turns out to be quite useful. One company did this for their C and C++ code and actually created a framework that they then controlled, which was very useful for their embedded environment. It is not something I ever really did with the Java folks, and I occasionally do it, sometimes as a bit of fun with Python. But I do not do that very much anymore because these are solved problems.
The point of that exercise was to show people that, first of all, the fundamental ideas in a testing framework are not too complex, but also that you would be surprised how little you need to get a testing workflow. But that said, I like to have a testing framework that supports a number of basic features. Obviously, when a test fails, I want to continue with the rest of the tests. I want to be able to have parameterized tests so that my tests can be data-driven.
Any testing framework that does not support that in 2025 is, in my view, an interesting beta, but it is not yet a proper testing framework. It is a 2000s testing framework. I like to have a testing framework that allows me a way of organizing and grouping tests easily. These features streamline the overall testing experience, but they also allow you to have more expressive tests.
Whether that makes or breaks TDD, I do not think it goes quite that far, although I can imagine being sufficiently frustrated in some cases that it would break. However, I think good tooling improves the experience dramatically, and if you get a better experience, you are going to do more of it. There is something to be said for that: good tooling can encourage better habits and more frequent testing.
In terms of specific tools or frameworks that I personally like using for TDD, I have mentioned some of them already a couple of times — the ones that I think flow best for me. Obviously some of these are going to be a matter of personal experience. If I am using Java, then JUnit 5 is my choice, and that is actually a little bit different from JUnit 4. I found JUnit 4 got in my way a little bit, but JUnit 5 has just enough that allows me not to be working around the framework.
In the C++ space, I mentioned Catch as the framework of choice. I would also encourage the use of Catch for C. In other words, if you are in an environment where you are doing the production code in C, do your testing in C++, because the tools are generally more powerful. That is a common pattern anyway, but I would use Catch there. It allows you to be much more specification-oriented.
There is no surprise, I think, if I say that I am comfortable using Jest with TypeScript and JavaScript. With C#, I have already mentioned NUnit as being the one of choice. Occasionally I will do work in languages where I have less familiarity with frameworks. I did something for a client a couple of years back in Ruby and we used RSpec, and I was quite impressed with RSpec. It had been years since I had used it. I found it still a little limited in some senses, but I also found that I could create some really nice testing approaches with it.
My opinions on that and other frameworks are slightly less strong, but those are the ones that normally stand out. There is nothing unusual in that list. The key point is that the workflow and the design properties of your code matter most, and the tooling, when it supports that well, can significantly improve your TDD experience.
11: Putting TDD in the context of overall testing, how does TDD fit with other testing practices on a project? You have talked about it and hinted at this briefly throughout, but if you were to just focus on this aspect, how do you see it?
Kevlin Henney: Yes, I think normally when we talk about TDD, we tend to lean towards the unit testing side, because that gives us the fastest feedback cycle. There is no single standard definition of what a unit test is, but the one that is perhaps most widely used and accepted is very much about the isolation question: can I isolate a piece of code from external dependencies, external runtime dependencies?
If the answer is yes, then that is a unit. It is not about language constructs. It is not “is it a class, is it a module, is it a function,” whatever. It is about isolatability, and the idea that I am not going across a significant boundary of communication. I am not hitting the file system, I am not communicating with another service. The idea is that I am contained and therefore, as a nice consequence, it is going to run fast, but also I control everything about it. You do not control the file system, you do not control the network. Those are outside your control. They may be under your influence, but not your control. So if we define “unit” from that point of view, then we get a fast feedback cycle and it tells us something about the internal structure and looseness of coupling and the strength of cohesion of what we are building.
From that point of view, it feels like TDD is very much in the unit testing space. But that said, exactly the same workflow works for integration tests. There is nothing different there. You can use the same workflow. All of the same test recommendations pretty much apply, but I will probably be using other aspects of my unit testing framework. For example, if I can pull in data files, then it starts becoming a little more serious. It just means that when a test fails or when a test passes, I cannot guarantee that the reason it passes or fails is only to do with correctness of code. If your network is down and what you are testing involves wandering across the network, your test will fail and it will not be because your test is wrong or your code is wrong; it will be because the outside world is wrong. It is to do with the nature of feedback, and it will also be a bit slower. But everything else about it—the sensibility, the mindset, the structure, the naming, the partitioning—all of these other things are kind of the same.
Then we hit things like acceptance test driven development, ATDD, which I always find difficult to say. It is a lot easier to write it down. Acceptance test driven development is where we are actually looking just beneath the UI skin of an app, so potentially very much end-to-end without UI interaction, or at an integration level, but the idea is that it is still code on code and we are doing that. The idea with that is that clearly the small iterative steps will not be as small. They are probably very much more at the feature level. We are looking not at minutes to hours, but hours to days before a certain test may pass, and that is acceptable. The same sensibility applies. This is also where many people will associate behavior driven development, although behavior driven development is a philosophy that also applies to the unit tests. For many people, they think of BDD in this higher-level space. I want to be very careful to say it is not just in that space; it is just the way that many people approach it. But again, that can lead you to structuring your workflow in the same way, so you can see there is a sympathy between many of these kinds of testing.
We can also look at other forms of testing. Contract testing is where we would say we are testing external code. Historically I call that conformance testing. For me, contract testing is what you always do because you are testing the contract of the class; you are defining it. It is not about third-party code, but the term has come to mean code that is external to the code that you are writing and that you want to test conforms to your expectations. I think that is a really important point because it is complementary; it is not the same. It addresses an issue that sometimes people have when they are writing TDD. Let us say, for example, that I am using somebody’s cup framework that I have obtained from GitHub, and maybe I have had some bad experiences with that framework in the past and there were some bugs in it. That is annoying, because I am trying to focus on my code, but I am discovering bugs in somebody else’s third-party code.
The problem is that you end up putting an extra little test into your tests just to check that the bit that broke before does not break again, and so on. You end up with a lot of tests that are “drive-by” tests. In other words, you are testing your thing, but you are also testing this other thing. Do not do that. Your test has now got mixed responsibilities. You want to separate that. That is where contract or conformance tests fit in. The idea is that you want to say, “Everything we depend on that might cause a problem, we have tests that check that.” If those tests fail, we do not even bother running our tests, because there is no point. If the foundation of what we are building on does not work, then why are we even going to bother testing our code, because the foundation is already broken. So the idea is that is a clear separation that is written in a much more what we might call defect-driven style rather than test-driven style: “This is not working,” or “It was not working historically,” so I will write a test to make sure the latest version, or the versions we use in future, are always working.
We may also have other tests like performance tests. Performance tests are typically going to be something slightly different because they follow different experimental design. They have a different workflow. If I drink from a full cup and it does not empty it, then that is a bug straight out. But if I say that we have a particular availability or there is a particular performance limit, then statistically we may find that sometimes when we run the test we pass and sometimes when we run the test we do not, because of the way the operating system schedules and so on. We are not dealing with something that is simply about true and false. We are dealing with something that is better or worse, and there is a kind of grey area. We really want 90 percent of the time to be in this performance space and we will tolerate 10 percent outside it. That suggests that the nature of our test requires a different philosophy. We can pass and fail at certain limits, but not in the same way, and we do not just run it once and say, “That passed.” We need to draw from different samples, sometimes scaling-based samples. Those tests feel very different in that sense. Again, they are complementary, but not in a way that fits with our TDD; it is actually quite separate. They are testing behaviors that are outside the basic semantics. They are testing performance characteristics and so on, and that requires a different mindset, I feel.
So for me, TDD sits largely in the automated developer testing space, mostly centered on unit tests, but the same workflow and sensibility extend into integration tests and acceptance-level tests. Around that, we have complementary practices such as contract or conformance tests, characterization tests for third-party APIs, and performance tests with their own experimental mindset. All of that lives together in the broader testing picture on a project.
12: Maintaining TDD discipline under pressure: from a team leadership perspective, how can leads or senior developers encourage the team to stick to TDD when deadlines are tight or when people feel tempted to “just code it and test later”? Are there any habits or cultural practices that help sustain TDD in the real world of rapid timelines?
Kevlin Henney: Yes. I think the thing is, again, it comes back to whether or not it is your idea, whether or not you feel you own that idea. If TDD is something you only do when the team leader is in the room, and the minute they walk out of the room you stop doing it, then you do not have it. Your team is not doing TDD. It is a kind of performative TDD. You are doing it because you are supposed to, and that is understandable. But it means that the minute you feel any kind of pressure, you are going to throw that out of the window. We see this with a lot of different practices; it is not unique to TDD.
The point is that you need to get to the point where it is a habit and you embody it. You know, “This is what we do.” Also, if you have enough experience, you start realizing that the minute you start throwing out certain disciplines, you are going to pay for that later. This is where our managed technical debt comes from. It does not come from some kind of magic genie that pops into your code. Well, actually, maybe agentic AI can reduce the quality of your code while your back is turned, but the point is that technical debt does not magically appear in your code. You know it got there for a reason.
People often like to say that certain practices only work in certain cases. Honestly, there is a truth to that, but the chances are that whatever you are working on is not so special that practices you normally find useful suddenly stop applying. If you are already finding TDD useful, lean into that. Lean into it a bit more. If you are not, then that is a different discussion. But from the team lead perspective, the job of a team leader is not a controlling role. It is a leadership role. Leadership is not about managing; it is mostly about example, about enabling, about making people see opportunities and making it somehow easier for them to try the right thing than to try the wrong thing.
In some cases, TDD may be a good thing for them. That is great. How do we make that feel like it belongs to them and it is their practice, not the team lead’s practice? Not the organization’s practice, but my practice. How do I make it my practice so that when I start on a new team, that is how I work? When I go for an interview, that is how I describe what I do, because it is my practice, not the team’s practice or the organization’s practice or the team leader’s practice. This is not a practice that belongs to that person or that entity; it is my practice.
For me, that is the skill, which means that there is no easy answer. I am afraid if anybody is watching and hoping for an easy answer, there is not one. But that is the skill and also the subtlety in it: moving from performative compliance under pressure to a place where TDD is something the developers feel they own, something they practice because it helps them, so they are less likely to abandon it when deadlines are tight.
13: AI can draft tests fast, but quality is uneven. What acceptance gates—e.g., minimum mutation score (PIT), property-based invariants, automated test-smell checks, and explicit review rules—would you require so they increase fault detection? How would you enforce this in CI to scale safely?
Kevlin Henney: I would actually take a step back before talking about specific acceptance gates and ask why you are using AI in the first place. Why are you using AI to generate tests? What problem are you trying to solve by doing that? A lot of teams cannot answer that question. They say, “We are using AI because we were told to use AI,” and then we are right back at, “You were told to do stuff; this is not your practice, this is somebody else’s practice.” For many people the story is, “We do not have many tests,” so now they generate a lot of tests with AI, but they do not understand those tests. They do not know what is being tested, or whether the tests are correct.
Recently I wrote a blog post called “Think For Yourself,” where I gave people four things to consider whenever they want to integrate anything that is AI generated into their code base. The first question is: does it work? Do the tests pass in a way that gives you confidence that they are actually verifying the right behavior? The second question is: do you understand the generated code or tests? If you do not know that something works, or you do not understand it, step away. Do not pretend that you are being productive.
There is a common illusion here. Somebody will say, “AI has boosted my productivity,” and then you ask them how they know. How are they measuring that? The answer is often that they are not measuring it. They just have the feeling of speed because the AI produces a lot of stuff in ten minutes. Then they spend the rest of the week fixing it. That is not productivity; that is the creation of legacy. In my view, one working definition of legacy code is “code written by somebody else.” AI-generated code fits that definition perfectly. I am not saying, “Do not use AI.” I am saying that good use of AI requires more understanding, not less, and that requires tests and review.
The number of times I have been fooled or could have been fooled if I did not have tests is significant. That is why I write my own tests. I do not trust AI to do a better job than I can, because I still have to explain the behavior I want. In the time it takes me to explain that precisely enough to an AI, I could have written the tests myself, and I would know exactly why I wrote them and what design decisions they encode. AI might be useful for generating certain coverage-oriented tests in situations where coverage is very poor, but even there I have questions. If I am using AI to generate tests as well as code, I must spend most of my time reviewing, and reviewing is a skill that many people do not have.
I spent years learning how to review: fiction, non-fiction, technical material, books, articles, and code. Code review is not “I glanced at it and it looks good to me.” That is not review. If you do not understand the generated code and tests, you have a problem waiting to happen. The upside is that you can treat AI as a teacher as well as a generator. If you are using AI-generated tests, ask yourself: do you understand what your code is doing when viewed through those tests? What can you learn from them?
Then there is the question of taking control. What is the difference between the generated code or tests and what you would have written? If you were to write that test yourself, what would you have done? They might be similar, but they will often be different. Understanding that difference is an education. Sometimes I look at the generated result and think, “That is a really good way of doing it.” In other cases I look at it and think, “That is not a very good way of doing it at all.” Either way, I have learned something.
So my final recommendation in that space is to add one more gate: can you think of at least one way to improve what has been generated? Do not treat AI output as something that is simply “good enough to accept.” Treat it as a starting point. That gives you two big groups of questions. The first is: why are you using AI to generate your tests? Do you have a clear understanding of the benefit and how you will measure that benefit? If you do not, do not do it. Being busier is not the same as being productive.
The second group applies if you do decide to generate code or tests with AI. Use this list of four gates: does it work, do you understand what you have, what is the difference between what AI produced and what you would have done, and can you think of at least one way to improve it. If you habitually apply those checks, you will learn a lot. You will be using AI as a possibility generator, not as an autopilot. You will be interacting with it, passing judgment, using your design sense, and either accepting the result because you know why it is good, or changing it because you know what is wrong and how to fix it.
In other words, you turn AI into an assistant or a coach. The problem I see at the moment is that many people are backseat drivers with AI. They have no idea what is being generated on their behalf. They do not understand what is being tested. When they have to fix an issue or extend the code, they discover that they do not know enough, and it takes them longer. They are not using AI in the right way.
So my general advice is this: be crystal clear about why you are doing something, especially with tests. For me, the strength lies in you writing the tests, not in outsourcing them to AI. Tests are your executable specification and your feedback loop. If you hand that over to a tool without understanding or review, no mutation threshold in CI will save you. You may use metrics and gates in your pipeline, but the real acceptance gates are still the human ones: clarity of purpose, understanding, comparison with your own judgment, and deliberate improvement.
14: Finally, looking forward: What do you see as the future of TDD and automated testing practices? Are there emerging trends—perhaps in tooling (like property-based testing, AI-assisted test generation) or in process (like BDD, Continuous Deployment practices)—that you believe will shape how experienced developers approach TDD in the coming years?
Kevlin Henney: If you do not already have an automated testing habit, now is a very good time to start. With AI in the mix, you will find that generated code sometimes fails in ways that are quite intuitive, where you look at the mistake and think, “Yes, I can see how that happened given the training data.” In other cases, the mistakes are very odd: failures where you think, “Why would you ever do that?” You need to become better at testing to deal with both.
Interestingly, this is something I was saying years before large language models. Around 2016 or 2017, at a conference in Poland called MobiConf, somebody asked me about the future of AI. At that point everyone was guessing about where AI would go. My answer was that you need to get better at testing. That is still my answer. The more AI we add, the more testing skill we need. I do not want AI anywhere near company-critical code without tests and without proper reviewing skills.
So one part of the future is skills. You need to get good at testing, and you need to get really good at reviewing. Reviewing is not just a testing skill; it is a design skill. You cannot review code effectively unless you have deep design experience, which means you also need to learn to code. Coding remains relevant, because otherwise you do not know what you are looking at. This is not about language tricks. It is about familiarity with a kind of precision that you normally only see in areas like science and mathematics. Those are the skills you want to build, and they sit in the same space as the precision and specification thinking that testing requires.
In terms of the specific workflow of TDD, I find it hard to make strong predictions. My sense is that TDD adoption will always be relatively low compared to the overall population of developers. As things stand, many people still do not test at all. There is a significant and increasing proportion of developers who do test, and that has changed over the last couple of decades. The needle has moved. Within that group there is a smaller subset who will try TDD, or have TDD as part of their toolkit and can employ it when appropriate.
I would like that number to go up. Leaving AI out of it for a moment, I think TDD is a good practice. I have thought that for a long time. It is helpful because it encourages incremental thinking and clarity. Done with the right sensibility, it leaves behind something worth inheriting, rather than something people curse you for.
If you bring AI back into the picture, those same qualities remain valuable. Clear tests, incremental feedback, and a strong sense of specification help you reason about AI-generated code and about changes in general. I would like to think that AI might even increase the uptake of TDD, because it will force people to confront questions about correctness and understanding more directly. But whether that happens, and to what extent, is difficult to predict.
So my view of the future is less about a specific new tool or fashionable acronym and more about emphasis. We will see more AI and more automation, but the teams that thrive will be the ones that double down on testing skill, review skill, design sense, and the ability to work with precise specifications. TDD is one of the workflows that aligns naturally with that direction, and that makes it a practice that will continue to be relevant, even if it never becomes universal.




