The C++ Programmer’s Mindset on Abstraction Costs, “Future You,” and Thinking with the Machine: A Conversation with Sam Morley
From STL leverage and modular design to cache-aware performance, concurrency pitfalls, and memory-safety habits in modern C++
C++ rewards engineers who treat problem-solving as a deliberate process rather than an improvisation. In this conversation, Sam Morley returns repeatedly to that theme: decompose the work until it becomes a set of solvable, “atomic” parts, then choose abstractions that fit the real constraints of the system. He argues that abstractions are never free, even when runtime overhead is low, and that good design means balancing competing costs: build time, cognitive load, flexibility, and performance. That same pragmatism shows up in his emphasis on leaning on the standard library, iterating from “working” to “fast” based on measurement, and understanding when low-level details like cache behavior and memory access patterns should influence how you structure code.
Morley, author of The C++ Programmer’s Mindset and a research engineer with a background in mathematics who maintains a high-performance C++/Python library for data science, also frames maintainability as a problem of empathy for “future you.” He discusses writing code that can be understood months later, structuring systems with clear separation of concerns, and treating concurrency and memory safety as design problems rather than afterthoughts. Along the way, he outlines practical guidance on thread-safe architectures, where synchronization mechanisms go wrong, and how ideas from Rust’s ownership model can sharpen a C++ engineer’s instincts about lifetimes, pointer safety, and undefined behavior.
You can watch the full conversation below or read on for the complete Q&A transcript.
1: For an experienced engineer, what does adopting the C++ programmer’s mindset look like in practice? How does it change the way you approach complex software challenges?
Sam Morley: For experienced engineers—and probably some less experienced engineers as well—they’re probably using this framework of computational thinking already. The framework itself, as I came to discover when I was putting this together, is really a set of common elements that one finds you do when you solve problems. It’s less about the actual components of the framework and more about how one connects with the different mechanisms and features and facilities within and around the C++ language that make this an interesting discussion topic.
So, one might be quite experienced at solving software problems, but what we’re doing here is more about connecting those with a broader thinking about the system and the language—and all of the facilities around those—which is hopefully the additional knowledge that I’m imbuing. As I said, most people are already kind of familiar with this sort of framework, even if they’re not conscious of that fact.
One of the things I want people to take away is that it’s sometimes very helpful to really think about your process. So when you do solve a problem, look back and think: How did I do this? How did I break this problem up? What abstractions did I find? Where did I find them? Where were the common elements—things that I’d seen before? What were the things that I’ve never seen before? Try and make a mental note of those facts, because these are the things that will come up again. From a longevity point of view, it’s important to not only remember your solution, as it were, but also your process.
Because if you fall down the same hole every time, then it becomes quite easy to fall down it again if you don’t remember that there was a hole there. So if you document your process and think about your process—at least maybe not physically document, but mentally document the process that you go through to solve a problem—then you can remember these facts much better.
Moreover, if you’re experienced, then you should also be mentoring your more junior people, and this is also helpful for them. So it’s important, for lots of reasons, to think about what you’re doing and how you’re doing it.
And that’s one aspect of the book. The other is connecting it with the C++ language, the broader system in which it operates, and how you marry those two things together to make an overall, hopefully better, more efficient, and faster solution.
2: In your book you talk about breaking down challenges, and choosing the right abstractions to build the most efficient solutions. Can you walk us through a concrete example where this approach made a big difference?
Sam Morley: I want to start by challenging this a little bit. The notion that one can solve a problem without breaking it down into smaller parts is kind of folly. I don’t think it’s possible, really. You might not be conscious of the fact that you’ve broken it down into smaller parts, but you’re almost surely doing it. Even something as simple as doing some arithmetic—you might think that you’re just adding N numbers together—but really what you’re doing is you’re adding two numbers together, and then adding the result of that to the next one, and then adding the result of that to the next one. It’s an expanding-brackets problem. Whether you are conscious of this fact or not, the point is that you’re solving several smaller problems that look like the same big problem.
However, the way that you break down problems obviously matters, and some ways are more efficient than others. For that reason, it is a good thing to get into.
I want to talk about something that I did a few years ago now, which involved taking frames out of a very large number of video files, sending them to one of the ML services on Azure—so this was over a REST interface to Azure—and then we got the results back, and we had to store those on disk in files. This is a very meaty topic, meaty problem. There’s lots of different elements here: there’s loading all the files and then decomposing them into individual frames; then there’s sending all of those frames off to the Azure service; there’s getting the results back; and then there’s writing them to disk. So immediately there are four components to think about.
In trying to process this—once you’d expanded this to 100,000 videos or something, each with a few hundred frames—the numbers here are pretty enormous. In order to get them to and from the Azure service in a meaningful amount of time, we had to multiplex this. So we had multiple threads all sending frames to multiple Azure endpoints, because each of the endpoints is rate-limited, so you can only send, I don’t know, 10 requests a second or something to each of the things.
But actually, sending requests was not the bottleneck. Getting the results back was part of the bottleneck. The biggest problem that we actually encountered was writing the results back to the disk, because this was hundreds of gigabytes of results at the end of the day. What we ended up with was that we were getting results back from the service so fast that we had to build in some back pressure into this system to slow down when we had a backlog of things to write to the relatively slow spinning-rust disks that we had.
So there we have a very interesting structure. We start off with these four big components. Within the first component, we have reading video files from the disk where they were resident, decomposing them into a number of frames that was then passed on to another subsystem, which was responsible for sending these frame things up to the Azure service and waiting for the results. This was quite carefully orchestrated so that we didn’t hit the rate limit—so figuring out how to do that was an interesting problem.
Then there was another component which was taking the results that were being returned from the Azure service and collecting them into a buffer. This was the sub-problem of figuring out that we were thrashing the disks trying to write all these results out. So we installed a buffer in between. We wrote into a buffer and then had another worker process that would take the things from the buffer in big chunks and write them out to the disk.
So there was this sort of filtering down of problems. You start off with big, challenging, meaty problems at the top, and then each one of those gets decomposed into smaller bits, and then smaller bits still, until you reach a level where you either have an existing algorithm to do it, or it’s some functionality handled by a library, or it’s some other kind of interaction with the world. Talking to Azure, talking to a disk—these are sort of base-level problems that you can solve quickly.
It’s all about bringing down the level of the larger problem to these small, atomic things which you can actually solve using facilities that you have. That’s the real challenge. But I don’t think that you could just write a singular piece of software that would do all of these things together without breaking it down into these components. I don’t think that’s possible.
3: Abstraction in Detail (Chapter 2 of your book) covers when to use different language features—simple functions, classes, templates, etc.—for a given task. How do you determine the appropriate level of abstraction in modern C++?
Sam Morley: Abstraction is tricky. It really depends on the purpose of the code. What am I trying to achieve with my code? I want to upfront say there are no zero-cost abstractions. People will claim up and down that things are zero-cost abstractions. They’re really not; every abstraction has a cost. Now, this might be a runtime cost, which is what people usually refer to, but that’s not the only type of cost.
Templates, for example, have very little runtime cost, but they do have a significant build-time cost. Including lots of templated code might make your runtime faster, but it will surely expand your build time. They also have a pretty heavy cognitive load—the ability of the programmer to reason about programs which are heavily templated is significantly higher than just ordinary plain C++ code with no templates.
So getting that balance right—what am I trying to achieve with this code? Is this supposed to be a set of components of high-performance systems that really need to have the best possible runtime performance, and I don’t care about build time? Or is this a general-purpose thing that needs to be extremely flexible, and I do care about the cognitive load of people who are going to work with this task? It’s all about balancing the different competing costs and also competing utilities. How flexible is my system? How fast is my system?
Now, it’s not necessarily true that abstractions are always bad. Sometimes you can use an abstraction and it adds very, very little to any of the loads. For example, introducing a very small templated helper function is very useful and it adds basically no overhead, and if that’s used correctly it can be a big help to the program.
But sometimes—and I’m especially guilty of this—you can over-abstract. I’m a mathematician; we like our abstractions. You can over-abstract and make the thing more complicated than it needs to be, and at this point you start to lose something. It might be runtime performance if this is a virtual class hierarchy, or it might be build-time performance if you have heavy template code. Or it could be that you no longer can reason about your software because it’s now so complicated and filled with all sorts of clever bit-hacking tools and abstraction mechanisms that you no longer understand. It’s finding a balance.
Now that I’m conscious of this fact, I try to keep my abstraction as minimal as possible. I look for the minimal amount of abstraction I need in order to solve the problem without going too far and over-generalizing it. This has come back to bite me recently: I over-specified an interface to the point where it only satisfied the conditions in one very specific instance, and I had to rework the entire interface to make it fit the actual thing that I should have programmed against in the first place. It can come back to bite you. Hopefully that doesn’t happen very often, but when it does, it is always painful.
This is why thinking about the abstraction up front is important, and it goes hand in hand with the way that you decompose your problems as well—the thinking about what the abstractions might be if you decomposed it in a particular way versus a different way. If you pick one or the other and it turns out not to be the right choice, then now you understand that the abstraction was made, maybe the problem—and the more problems that you solve, the more you get used to this idea.
4: What guidelines help decide when a straightforward function is enough versus when you should introduce a class or a template to solve a problem?
Sam Morley: Yeah—if you start with a simple function and you make it a class template or you make it a function template, like I said, that can afford you a lot of flexibility at very low cost. I do this sometimes internally. For instance, I have a function which does something to a pair of integers, and I don’t know exactly what type of integer I want to use later on, so I just template it. Because the cost of doing this is basically nil, it means I don’t have to go back and refactor my code later when I change my mind about what integers I use. That kind of thing can be very low cost and high maintainability—high friendliness when it comes to programming.
The cost of moving from a simple function to a class is higher, especially if that class has virtual functions—if it’s abstract in the other sense of the word. If that’s the case, then you’re now incurring a runtime performance penalty, which may be warranted. Runtime performance penalties are not always a bad thing. As long as they’re away from hot code—the bits of code that need to run at maximum speed—you can get away with an awful lot of slop when it comes to runtime cost, especially in instances where the bandwidth and runtime latency is limited by some other factor, like a network connection or a disk or something like that.
But really there are three reasons—or at least three reasons—why you might want to use a class instead.
The first is that you have some kind of internal state that needs to be managed carefully. For example, a std::vector or std::map manages its internal storage, and if you were to code this by hand in line in a function, you would almost surely get something wrong. These are managing the state very carefully, and you then don’t need to worry about those details. Your code is much more readable if you’re using a std::vector than if you have a bunch of goto statements for resizing a buffer when it overflows and things. This is not very nice code to read.
The second reason to use a class is if you have some kind of behavior that needs to be flexibly abstract. What I mean by that is: you have an interface which reads and writes data from some source, but the source of the data is unknown. You might be reading from a disk or reading from a network socket, and this is a really great place to encapsulate the reading and writing process because it’s the outward interface. The bit that you’re really programming against is the same in both instances. You have a read function; you have a write function. It doesn’t really matter how that is implemented behind the scenes, as long as those two things work. Wrapping this in a nice class doesn’t have to be a virtual class; it can be a template, or something like that—some combination of both perhaps. It is a very convenient way of packaging the behaviors that are specific to one mechanism for doing that thing.
The third reason is that you need some point in which you—or some other developer later on—needs a point of customization. This is a slightly nuanced point. C++ templates are very powerful, but function templates are a little bit more tricky to use sometimes than class templates, and the reason for this is the way that class templates can be partially overloaded, partially specialized, whereas function templates can, but not directly in the same way. So it’s a really powerful technique to use a class template inside a function template that allows you to provide a different specialization that will customize the behavior of the function template without directly interacting with the code within. This is a very nuanced use, I contend, but it is very useful. I’ve used this pattern a few times in my code, and I’ve seen it in other code as well. I think the first time I saw this was in NVIDIA’s CUTLASS library, and I think I’d used it before that, but without being conscious of the fact that I was using this particular pattern. It is very useful, and I think it’s somewhat analogous to a sort of bridge or command interface that you might find in the Gang of Four, but with templates instead of virtual classes.
So those are my guidelines. If you have other uses, I’d be interested to hear what kind of reasons you would use a class rather than just sticking to a simple function.
5: One key to proficient C++ is knowing the standard library. How important is it for developers to leverage the STL’s algorithms and containers instead of writing their own from scratch?
Sam Morley: OK, so there are two things about the STL which are really important to remember. First is that it is a set of very flexible and very generic algorithms and containers for a very wide range of purposes. And secondly—and probably more importantly—is that it is there always and you can always use it. “Always” being a little bit tricky there—embedded developers, please don’t get angry with me—but for most C++ developers the STL is a sort of thing that you can rely on and use.
Whenever you need it, and generally speaking, these are very, very good, very high-performance facilities, and they can make your life much easier. So what the STL does, in effect, is make your development window smaller: you spend less time implementing standard things and more time implementing the difficult things. They raise the floor of what is the base-level problem that you can solve without thinking about it.
If you remember when I was talking through my example earlier, you have these layers of problems. You start with big problems, you make them smaller, you make them smaller, and eventually you get down to a set of problems—maybe not at the same level everywhere—but you get down to problems which you know how to solve using standard tools or libraries. So what the STL does is it gives you one level up from having to write those things for yourself. It’s one less problem to solve, and this means you can move much faster. You can develop much faster.
Now, they might not give you the performance that you need. You might have to change the way that these work in order to get the performance that you need, but a large amount of the time the STL will probably give you all the performance you need, providing that you’re using the right algorithms and containers. OK, but that’s a separate question. The thing that it does do is speed up the development cycle.
If you implement something from scratch the first time and it doesn’t perform as well as you need, then fixing that might become problematic. And moreover, you might not know whether that actually is a bug source, or whether there’s some characteristic that you missed somewhere else in the problem, or whether this new thing that you’ve implemented is causing the problem. You can get around that with testing and things, but really, if you’re prototyping something, you might know that you can’t use those things in the future because they won’t perform well enough. But building it with the STL things at the beginning is the right way to get started, and it means that you can find a solution. It doesn’t have to be the best solution.
Solving problems is an iterative process. You don’t always find a solution—let alone the best solution—the first time round. You probably have to take many bites at the apple. So first you solve the problem, and then you make it fast. And only by measuring do you know which bits are not fast. So starting with the STL will probably get you most of the way, and you’ll probably find that other parts of your software are the slow parts.
Now, there are some caveats. First, a lot of libraries provide faster, or slightly more flexible, or things with different properties which are basically drop-in replacements for the STL. For example, Boost containers are a set of more expanded and more flexible container types that are drop-in replacements in most cases for STL equivalents. Abseil has the same set of things, and probably other libraries too. These are really great if you’re already working in, say, a project that’s using Abseil—you already have all of those container types at your fingertips—and sometimes they do perform better. And things like small inline vectors are extremely useful for a lot of things, and both of those libraries provide such a thing.
Now, the other side of that is the algorithms. Similarly, there are other libraries that provide standard STL-like algorithms. NVIDIA Thrust is one that comes to mind. This is parallel algorithms. C++—I think 20 or 23—introduced these different dispatches for the standard algorithms, which causes it to run multi-threaded or to do it on a particular execution context, I think they’re called. Thrust was sort of prior to that, and it’s specifically geared towards running on NVIDIA GPUs and NVIDIA libraries, but it’s the same set of functionality, actually. It’s a set of very general-purpose algorithm template functions which dispatch very cleverly through various pathways to give you a fast implementation of whatever that algorithm is doing on whatever device you’re doing it on. And it’s a very clean and efficient way of writing very parallelizable, very general-purpose code.
There is one more caveat that I want to mention, and that is that writing custom containers is a very dangerous game to play. Writing containers is hard. There are so many things you have to keep track of. You have to keep track of the construction and destruction of your elements. If that’s not a trivial thing, that is something you have to be very careful of. If you’re doing bulk allocations, you need to be careful that you have properly moved everything, and how you handle the errors. If something goes wrong during the copy, during the allocation, how do you unwind that? What guarantees can you give to the outside world—the rest of your program—about how that process happens? And moreover, how do you efficiently move things from an old allocation to a new allocation?
These are all very complicated and difficult things. I’m not saying that people aren’t capable of doing it, but I am saying that it’s very difficult to get right. If you are reimagining containers, then you should be asking why rather than how. There are genuine reasons to use different containers, but I don’t think you should be implementing them necessarily yourself. I would reach for a standard container library—like Boost or Abseil containers—and rely on the work of a lot of people to maintain those good implementations rather than trying to hack together something yourself.
6: Do you find that mastery of the standard library is a distinguishing factor in how efficiently developers can solve problems in C++?
Sam Morley: It surely can be. This goes back to the notion of what is the smallest problem that you know how to solve without thinking, and having a very good understanding of what is in the standard library—what the things in the standard library are capable of delivering, and how you might reasonably do that—will certainly raise this floor.
If you know that the standard library contains binary search functions, for instance, then that immediately is taking the place of having to solve a problem of how you binary search through something. Obviously this is a very well-understood thing; it’s just an example. But knowing how to make use of some of the more tricky and multifaceted std algorithms—for example, transform, reduce—knowing how to make use of that efficiently will make the range of problems that you can solve without doing a lot of hard work yourself quite a lot larger.
However, it’s not necessarily true that you can’t be efficient without the STL. You can absolutely be very, very productive—productive is probably a better word than efficient. The factor is speed, speed and convenience. Like I said, the STL allows you to get going very quickly because it’s there, it’s ready to use. You don’t have to worry about linking or importing or doing anything difficult. And moreover, you don’t have to worry about licensing and things, which do come up occasionally. It’s there ready to go, and you can just use it. So it makes a big difference to how quickly you can deliver solutions.
It also makes a big difference in how quickly you can iterate on solutions. If you build something that works but is slow, then you can make it faster. I don’t think it’s necessarily important for you to use the standard library exclusively. If you’re already working in an ecosystem that provides standard-library-like abstractions, possibly more flexibly, then by all means use those things. If you always have Boost available to you, then use Boost. Boost also provides a great set of many, many more features besides what is in the standard library, and making use of those things will also enhance your productivity.
Similarly, if you’re in Abseil, then use Abseil. But you still should keep track of what is in the standard library, because if you move away from a project where you’re familiar with Boost, or familiar with Abseil, familiar with Folly, or whatever library stack you’re using now might not be the library stack you’re using tomorrow. The STL is a constant factor. If you’re using C++, you more or less always have the STL, so having it in the back of your mind all the time is always a good idea. And it certainly will make you faster—not necessarily in code execution time, but certainly in the development time.
7: C++ is a multi-paradigm language with many powerful features, some of which can be a double-edged sword for maintainability. Since the goal is to build scalable, maintainable solutions, what best practices do you suggest to keep C++ codebases clean and manageable?
Sam Morley: Yeah, this is a tricky question. There are, of course, a lot of general-purpose good practices that apply here—things like documenting your code and leaving lots of comments about how your function operates, what guarantees it expects, and what guarantees it gives, and understanding that.
Before we jump into this, I want to introduce the notion of “future you.” Future you is your future self, and for all intents and purposes, this is a different person. Because when you’re writing some code, you understand things in the context of what you’re doing at the moment. Future you will have lost this context. So when you come back to your code in a month, six months, a year’s time, and you look at it and you think, “What was I thinking to make this code?” almost surely the answer is, “I don’t know.”
So writing comments is not just for other people—it’s also for yourself. You don’t have to go overboard and say, “I add these two numbers together,” because that’s not a useful comment. But I’ve taken to doing this quite recently where I’ve been working on some very intricate mathematical expressions and processes: I’ve taken to writing very big, chunky block comments. It’s like, “Right, OK, this is where we are in this process. This is how the next set of things works. This is what it should do. This is broadly how I’m going to implement the algorithm to do this.”
These comments save me so much pain when I jump off the project for a week and then go back and have to remember exactly what I was trying to do. It takes you a few minutes to sit and think about what that thing was, but that’s time well spent because now you’re thinking about the problem. This is where you can do some of this work of breaking down the problem—abstracting, finding common patterns, things that you recognize, things that you know how to implement—and then you should be able to spot those elements in the thing that follows. Doing this work in the code, in the body of the code, will keep it there so that when you come back to it, you can remember what you were thinking.
And moreover, this also applies to other people—not just future you. But that’s general-purpose advice.
Specifically for C++ things, and more with scalability in mind: having a very strict separation of concerns is a very good idea. You want to keep code that does numerical computations away from code that talks to users. You want to separate different functionalities as much as possible, and ideally you want to test those in isolation. Having a very modular, very pick-and-choose kind of situation will really help with that.
Sometimes it’s not possible to do this easily. Sometimes separating things can be really hard work. But being able to test and benchmark your high-performance components in isolation can really help you understand what they’re doing, how they’re doing it, how fast they’re doing it, and make sure that everything there is correct before you integrate that into the rest of your program.
It also means that if you’re doing some work that involves distributing large computations over a large cluster or on the cloud or something, you can write the different distribution mechanisms separately and then just reuse your tight-loop computation routines inside those. So it affords you a great deal of flexibility to modularize your code and separate them into separate libraries, or even just separate namespaces within a library. These kinds of things can make a big difference in the way that you can test and run your code.
A couple more points: you should always pay attention to thread safety, even if your application is not going to be multi-threaded. You should be thinking, at some point, this might be multi-threaded; I might need to access this class, these class members, from different threads—so how do I make sure that that’s a thread-safe thing to do?
And the third thing is to make sure that you keep your build system clean. I use CMake, typically. Make sure you keep that clean, and keep it in a way that is easy to see what the individual components are. Moreover, if you need to extract bits and put them in their own library, make sure that’s an easy process, because build systems can get left behind, and having a broken build system is far worse than having broken code. It’s much harder to figure out what exactly has gone wrong if your build system is broken. So those are my points.
8: When using advanced features like template metaprogramming, clever lambdas, or other C++ “power tools,” how do you ensure the code stays readable and team-friendly rather than turning into an overly complex “wizardry”?
Sam Morley: Yeah—I mean, wizardry is the right word. I’ve seen some horrendous template metaprogramming in my life. I’ve written some horrendous template metaprogramming in my life. I’m going to be the first one to admit that it’s never worth it.
Generally, I stay away from template metaprogramming nowadays. The need for it has diminished somewhat with concepts and constexpr functions being part of the standard now, and the amount of flexibility that those afford you going up. The need for very complicated template metaprogramming has gone down.
There are other reasons, of course. Templates are very expensive from a build-time point of view. Instantiating a complicated template metaprogramming construct can easily double the compile time for a particular C++ file. And that’s not healthy if you’re building 10,000 of these—that’s a lot of time. There’s a good reason why Google, when they wrote Abseil, kept their metaprogramming to an absolute minimum. They’re very explicit about this fact. It’s because the compile-time costs are just too high.
And moreover, going back to the “future you” idea: if you write template metaprogramming code, future you will have a hard time understanding it, because it’s one of those things that makes sense while you’re writing it, and then it becomes immediately impenetrable. So I would stay away from template metaprogramming as much as possible. There are some isolated things that are useful—like using SFINAE to enable or disable particular instantiations of templates and things—but always keep that as minimal as possible.
For lambdas, lambdas are interesting because, used correctly, they can really enhance the readability of your code. They can really make it much easier to understand. On the flip side of that, they can really, really make it hard to understand what the code is doing. So my general advice for using lambdas is: keep them relatively short, and avoid having lambdas which capture and modify values that are a long way away.
What I mean by that is: suppose you have a big function that is performing some kind of calculation, and at the top you have a couple of lambdas which capture a row number. Let’s say you’re doing a matrix multiplication. It captures a row number, and the lambda accesses data from a particular row and then advances the row number. Now using that lambda will always cause confusion because the row number is a long way away from where the lambda is used. So every time you think, “What is this lambda doing?” it’s modifying something that you’ve not looked at for a long time because your screen has been further down the page.
Done correctly, this can be quite a powerful pattern. Done incorrectly, it really is a hindrance to you remembering what your code is doing. Almost surely in this instance, if you have a value which is initialized and then only ever modified or used by a lambda, it would almost surely be better encapsulated in a class of some description separately, so that the dependency on this thing—and the fact that this is a value that’s only modified or used by the class—is very explicit.
So that’s my thoughts, but that only really applies if the lambda is modifying a value. If it’s just capturing and doing something to it, that’s different. One of my favorite uses of lambdas is to capture a pointer that’s come in as a span or something that’s come in as a function argument, and then return particular subspans or particular elements from that span. For writing a matrix multiplication, for example, you might want to return a submatrix, or you might want to return a row or a column, and using a lambda for that purpose is really helpful because it saves the amount of work that you have to write again and again. And also it’s not modifying anything. Modifying is the problem.
As soon as you’re just returning a particular row, a particular column, or a particular element, that’s less problematic. In the past, you probably would have used a macro for doing these kinds of operations, but this is just C++. We don’t use macros anymore.
So those kinds of uses are fine, but I would generally try to keep your lambdas very short—and if they do need to capture things, remember the locality in the code of where you’re capturing from, and try not to let that drift too much.
9: Let’s talk a little bit about performance, concurrency, and safety, specifically in C++. You have a chapter in your book on understanding the machine, covering topics like modern CPU architecture, memory errors, SIMD instructions, and branch prediction. Why should today’s C++ developers care about these low-level details?
Sam Morley: OK, so let’s think of it like this. Suppose you are driving down a road. If you’re going along an unfamiliar road, you have to drive slower. You don’t know where the turns are. Suppose it’s dark—you don’t know where the turns are. You don’t know what the traffic is like. You don’t know what the road condition is like, so you drive slower to be cautious. And this is what writing code without thinking about the system is like. In this world, the system that you’re running the code on is the road, the code that you’re writing is the car, and you’re thinking ahead about what the road conditions are going to be like—although you actually know what the road condition is going to be like in a lot of cases.
And in those conditions—like if the road is flat and straight, the road condition is good, there’s good visibility, there’s little traffic—you can go faster. And this is really what understanding the machine is all about: understanding how the different levels of cache interact, and how one retrieves data and then operates on it efficiently is a big part of how you make applications fast. If you ignore the cache, the code will work, but it will be much, much slower.
So, for example, most people in computer games have this discussion of structure of arrays or arrays of structs. The pattern is very simple. If you have, say, a set of objects inside your game, do you put those in a vector of structs, where the struct has all the different properties—like position, velocity, mass, whatever—or do you put them in separate arrays? One array for positions, one array for velocities, one array for masses, and so on. And this makes a big difference because of the cache and also because of vectorization. If you’re going to operate on positions only, then having a contiguous set of positions in memory means you can fetch them and operate on them very efficiently. Whereas if you have an array of structs, then you’re fetching positions but you’re also fetching velocities and masses and all the other stuff that you don’t need at that point in time, and you’re wasting bandwidth and you’re wasting cache.
So that’s one of the really classic examples. Another really classic example is matrix multiplication. Matrix multiplication is interesting because, in one direction of your matrix, you’re accessing data sequentially, which is really good. That’s really great for cache hierarchy. In the other direction, you’re accessing it with a huge stride, so the elements that you touch as you move from row to row down a particular column are far apart in memory, so you have to go a long way between these elements. So this is really bad for cache locality.
In order to address this, you do tiling. You take a small chunk of your matrix and use the data in that as much as possible so that you make the most of those expensive load operations, and you do as much operation as you can on that small tile of matrix. Then you move to the next tile.
In the book, I show a very marked improvement over a very naive implementation—it’s like a factor of four or something—and this was the point at which I started to engage a bit more with the pipelining and SIMD parts of this. You can dramatically speed up.
And if you want examples of this kind of thing, FFTW is a really great code base to look at. It’s a very difficult code base to read because it’s a C code base and it’s full of macros, but you can spot some elements of what they’re doing. The pipelining is the process they’re using, and this is to sort of hit the compiler with all of these things so that it can stack up all the other operations and make the execution much faster, because it stacks all of these things up at once rather than having this situation where, “I need this value, but now I have to wait for it.”
Also, they will use lots of SIMD operations and vectorization at the end. So that’s where I would suggest that people look. This is prevalent across all compute domains. It’s just about understanding what is the limiting factor in the performance of your software and then having some knowledge of the underlying computer—or whatever system you happen to be operating on—and really making use of every part of that.
For machine learning, for example, the models are huge now—billions of parameters, trillions of parameters even—and throughput really matters. Taking an extra microsecond to do a computation might not sound like much, but those micro-efficiencies really make a big difference in the long run. For general-purpose compute, if you’re interacting with a disk, or interacting with a network, or interacting with a user, then those details might not matter because you’re limited by something else. So it’s all about understanding where and when it’s appropriate.
10: Can you share an example of how understanding hardware behavior can guide a C++ programmer to write more efficient or optimized code?
Sam Morley: Well, I mean, OK—this structure of arrays discussion is certainly one example of this. I come from a sort of scientific computing, high-performance compute for machine learning kind of background, or at least that’s where I am now, and here I always have to think about this.
One of the real classic examples of where you really need to understand these things is matrix multiplication. Matrix multiplication is interesting because, in one direction of your matrix, you’re accessing data sequentially, which is really good. That’s really great for cache hierarchy. In the other direction, you’re accessing it with a huge stride, so the elements that you touch as you move from row to row down a particular column are far apart in memory, so you have to go a long way between these elements. So this is really bad for cache locality.
So in order to address that, you do tiling. You take a small chunk of your matrix and use the data in that as much as possible so that you make the most of those expensive load operations, and you do as much operation as you can on that small tile of matrix. Then you move to the next tile.
This is something that you have to think about if you’re writing high-performance code, because you can’t just write the naive triple loop and expect it to be fast. It will work, but it will not be fast. If you want it to be fast, you have to structure your computation so that it plays nicely with the cache and the memory hierarchy. And the same kind of thinking applies to lots of other algorithms as well.
So that’s a really quick example of how understanding hardware behavior—specifically cache locality and memory access patterns—can guide you to write code that’s much more efficient.
11: Your book also delves into parallel computing and even GPU programming, which is notoriously difficult with pitfalls like data races and deadlocks. Coming back to the mindset aspect of things, what mental models or strategies do you recommend for designing multi-threaded C++ applications?
Sam Morley: Yeah, thankfully modern C++ really does make this a lot easier. There are two different scenarios I want to highlight.
The first is where you have a large amount of data that you need to process and you want to do this in parallel. Now, with some caveats, this is relatively safe to do in a multi-threaded environment because you just give each thread a different range of values to operate on. There’s never any overlap, and each thread goes away, does its work, and the results are put in the buffer, and there’s no overlapping. There are no data races; there are no problems there.
And this is a safe thing to do, and it’s very easy to do with parallel algorithms or OpenMP and things like that, which will do a lot of the hard work of checking that these things are not violated for you. Setting up the problem so that it works rather than the other—there are some conditions on that. Operating on self-referential data, or data that refers to other parts of the data, is obviously going to cause problems. But that wouldn’t be an appropriate usage of those things anyway.
The other type of multi-threaded environment that you might have is where you have several worker threads that are handling different events within a bigger system, and here you have shared state. So each of the threads has some kind of global—or inter-thread, at least—state that they need to access. This could be for communicating between threads. So, for example, you might have one worker which is dispatching work to all of the other worker threads. This would be your main thread stacking up operations it needs performing, and the typical way that you would do this is with a queue.
So you’d have a thread-safe queue that you put work into. Each thread comes along, queries the queue, and says, “Is there any more work for me to do?” If so, it takes the job out and works on it in isolation, and this operation is thread-safe. It has to be thread-safe.
But also, you might have some global configuration or some kind of global data that you need to access everywhere. And there it becomes really important to understand what it really means to be thread-safe. Thread safety is a tricky thing. You need to understand where things can be mutated, who has ownership over particular things, and where that ownership can change.
Ideally—and this is something that will come up later, I’m sure—you want to have this model where only one place in your code—one thread, one function, one whatever—can modify a value at any given time. This can be achieved in one of two ways. Either you design the architecture of the program so that one thread can only ever touch one value—this is the distributed data type model—or you have a synchronization mechanism like an atomic or a mutex-locked value, or some other kind of thread mechanism for controlling access to a particular resource.
In the latter case, it’s very easy to get this wrong. Deadlocks can happen. You can still end up with data races if you use these things inappropriately.
So what I would suggest is that if you do have to do multi-threaded code, you read very carefully the documentation on cppreference or some other equivalent source for all of the different synchronization mechanisms that are available in C++, and you really try and understand what each one of those things is for and how it operates. Then you’ll be much better equipped when you are trying to design a class that needs to be shared between multiple threads—how you manage the mutability. That might be in a mutability—mutable values within the class—or exterior multiple mutability where you need to take a mutable instance of the class and actually do something with it.
Ideally, you need all of that to be thread-safe, and knowing what the different options are will enable you to actually write this code. Hopefully that will mean that you don’t have deadlocks or data races. Always test your code.
12: Robustness and security are critical in systems programming. With C++’s manual memory management and undefined behavior guarantees, how can C++ engineers improve the safety of their code?
Sam Morley: Go and learn some Rust. I know a lot of C++ programmers turn their nose up when Rust is mentioned, and generally the feeling that I get from a lot of people is that, “Oh, we don’t need Rust. We can do all of this in C++.” But that’s not the point. The point is that Rust has a bit of a learning curve, particularly for C++ developers, because they go into it with a C++ attitude, and the Rust compiler isn’t having any of that.
The Rust compiler forces you to think very carefully about ownership and lifetimes, and whether it’s safe to move things from one thread to another. That’s its whole design: managing access, the validity of values across an entire system, and very carefully managing the enforced properties—whether it’s safe to send things or share things between threads. They have these two traits called Sync and Send, which basically determine whether you can share things or send things between threads safely.
The same applies to async programming. Even if you’re not using multiple threads, you still need to think about this for async programming as well. Learning a bit of Rust will force you to think about these things up front, and many other good things that you should definitely think about—like unsafe code. These are things that C++ programmers sort of take for granted without actually thinking about what they’re doing.
When is it actually safe to dereference a pointer? The answer is almost never. It’s almost never safe to dereference a pointer. That’s fundamentally an unsafe thing to do. You don’t know where that pointer came from. You may do, but you don’t really know where that pointer came from. You don’t know whether it’s valid or not. These are things that you have to reason about as the developer.
Rust forces you to think of this as an unsafe operation, and because of that you’re far more cautious about actually doing it. And these concepts—this way of thinking—is transferable. Learning Rust, learning a bit of Rust, will make you better at writing safe C++.
The reverse is not true. Learning C++ will not make you good at writing Rust code. In fact, it will probably make you very frustrated. But getting over that frustration and understanding why Rust enforces these things is important, because these are the same principles that allow you to write safe code anywhere, not just in Rust.
13: Are there any particular practices or modern C++ features you advocate for to prevent things like buffer overflows, memory leaks, things like that—while retaining the performance and control that C++ offers?
Sam Morley: Yeah, absolutely. I mean, it’s not exactly a new feature, but using std::array rather than C-style arrays is definitely a huge win. Smart pointers mean you don’t ever manage memory by hand.
There are some cases where you might actually do this, but most of the time, writing operator new in your code is an anti-pattern by this point. Use a smart pointer; use a container.
The mantra of my containers section is: just use std::vector. It applies most of the time. And use std::span rather than using raw pointers or C-style arrays for passing data around. It adds this extra sort of memory safety—and yes, it does carry a small runtime performance cost, but that’s negligible compared to the risk of your code crashing out because of an invalid memory access, or producing—worse—producing garbage and it going unnoticed.
The best-case scenario for a bad memory access is a crash. That’s the computer responding to a bad thing. If it goes unnoticed, it could happen for months before you notice that this has been producing garbage the entire time, by which point you’ve wasted months. So those are the things that I would reach for first.
But the other thing is: stop using C functions. The C functions that existed a long time ago have numerous documented vulnerabilities in this sense. gets—the function from the C library which does an unchecked read from standard input to read a line of text—is fundamentally unsafe. I can make a line of terminal input as long as I need, and that’s a sure way of getting a buffer overflow. There are safer equivalents, but generally speaking, don’t use the C library if you can avoid it. It’s not safe, and using it will always cause some problems somewhere—especially the I/O functions like gets and puts and sprintf and things like that. These things you have to be very, very careful about.
14: Let’s finally talk about your book, The C++ Programmer’s Mindset. You’re both a research engineer and a mathematician, and you maintain a high-performance C++/Python library for data science. The book itself combines practical insight with academic rigor. What drove you to write The C++ Programmer’s Mindset. Did you observe a gap in how C++ developers approach problem-solving that you wanted to address with this book?
Sam Morley: So it’s an interesting question. Going in, of course I had to do a bit of market research around this, but my feeling was: I like solving problems.
The main motivation for me writing this book was to share my feelings about solving problems—my enthusiasm for solving problems. There will always be a new problem to solve. You’ll never—almost surely anyway—you will never encounter a situation where you’ve solved all the problems. There will always be a new one, and it will be interesting because it’s new. And the more problems you solve, the better you get at it, for sure.
But this is not just a passive process. As I mentioned at the beginning, a lot of people are doing this process of computational thinking using this framework that we described. A lot of people are doing this without thinking about it, and one of the things I wanted to highlight in this book was: in order to get better at solving problems, you need to be conscious of what you’re doing to solve the problems. You need to think about what it is that you actually need to do and how you can do it—not just in the context of the problem, but in the context of thinking about the problem, understanding the problem.
And something else that I feel quite strongly about is that I feel like a lot of C++ developers could benefit from being conscious of the environment in which they operate—thinking about the operating system, the underlying hardware, thinking about what the different mechanisms that they’re using are, how those things are informed by and inform the problem-solving process.
Do I need a map, or do I need a hash map, or do I need a vector? These are design questions that are informed by the implementation, and those relationships are really what the book is about. It’s about thinking about the language, the hardware, the operating system—all of those things combined—in the context of solving problems, and how the process of solving problems is informed by, and informs, the choices that you make elsewhere.
So that’s the message that I eventually decided was going to be the topic of the book.
15: What mindset shift or new capabilities do you expect a seasoned C++ developer to gain after reading your book?
Sam Morley: Yeah—so, seasoned developers might feel that they already have a pretty strong grasp of solving problems, and this probably is true. A lot of very talented engineers out there. I would suggest, though, that everybody has something to learn. You don’t—you can’t ever know everything. So the sort of mindset shift is: you can’t know everything. So learn as much as you can from as many people as you can, and hope that that fills in as much—as many gaps—as you need. And so that’s the sort of philosophy that I would hope that seasoned developers would take away from this.
In terms of new capabilities, seasoned developers might already be pretty familiar with cache hierarchy and things like that. What they may not be so familiar with is this linkage between the problem-solving process and the implementation details and the other factors. The computers are complicated machines, so understanding all of these things is impossible, of course, but you can understand parts of it, and moreover you can tune your problem-solving process to fit what you have and where you’ll be working. It’s a two-way street, and that I hope is something that even senior engineers can think about while they’re reading.
One of the key things that I mentioned very early on in the book is this “future you” idea. That will be helpful for you in the future—future you—but it will also be helpful for less senior people who are learning this process for themselves, and being able to point out to them where and why certain parts of the process can be so tremendously helpful, and imbuing this understanding of how all of these different moving parts interact with one another can be really, really powerful. That is something that I hope that even a seasoned engineer can gain from this book.
To go deeper into the ideas Sam Morley discusses in this interview—treating C++ problem-solving as a deliberate process, choosing abstractions with a clear-eyed view of their costs, and connecting design decisions to the realities of hardware, build systems, and team maintainability—see The C++ Programmer’s Mindset (Sam Morley, Packt, 1st ed., Nov 2025). The book introduces computational thinking as a practical framework—decomposition, abstraction, and pattern recognition—and shows how to apply it using modern C++ features to build solutions that are maintainable, efficient, and reusable. Across small examples and a larger case study, Morley covers using algorithms and data structures effectively, designing modular code, analyzing performance, and scaling work with concurrency, GPUs, and profiling tools—aimed at intermediate C++ developers who want to strengthen both their technical toolkit and the way they approach complex software challenges.
Here’s what some readers have said:








Morley's advice about learning Rust to write better C++ is spot-on and often overlooked. The ownership model forces you to confont assumptions about pointer validity that C++ lets you gloss over until production. I've seen teams struggle with concurrency bugs for weeks that Rust's compiler catches at compile time. It's not about switching languages, it's about internalizing safty patterns that improve systems programming across any language.