Inside LLVM Backends: A Conversation with Quentin Colombet

From instruction selection to MLIR: how modern compilers generate code—and what’s next for LLVM backend development.

and

Jul 30, 2025

From instruction selection and register allocation to TableGen quirks and machine learning integrations, backend development in LLVM requires a unique blend of precision, modularity, and deep architectural insight. In this conversation, we speak with Quentin Colombet—author of LLVM Code Generation (Packt, 2025)—about what it takes to build and extend modern backends in one of the most widely adopted compiler infrastructures in the world.

Quentin is a veteran LLVM contributor with over two decades of experience in compiler backend development. He’s the architect of GlobalISel—LLVM’s modern instruction selection framework—and serves as code owner for LLVM’s register allocators. Since joining Apple in 2012, he’s contributed to backend support for x86, AArch64, and Apple GPUs, and has worked across a wide range of architectures including microcontrollers, DSPs, and ASICs. Beyond his technical leadership, Quentin is also a long-time mentor to new contributors in the LLVM community.

In this interview, we cover the motivation behind his book, how to balance onboarding content with advanced internals, and the design rationale for GlobalISel’s modular pipeline. We also dig into backend portability, the realities of debugging codegen issues, and why MLIR and machine learning are reshaping how developers think about compiler design—from handwritten passes to auto-derived heuristics. Whether you're writing a new target from scratch or trying to understand how instruction legalization actually works, Quentin offers a candid and deeply practical guide to the LLVM backend.

You can watch the full conversation below—or read on for the complete transcript.

Q1: What motivated you to write LLVM Code Generation, and how does it address the existing gaps in LLVM documentation, particularly concerning back end development?

Quentin Colombet: That's a great question. Part of the answer is already in the introduction you gave. I've been mentoring people ramping up on LLVM for a long time, and one of my motivations was that I'm kind of a lazy guy—I don’t want to repeat myself a thousand times. I wanted to put into writing whatever is needed to ramp up in the LLVM ecosystem.

I'm exaggerating, of course, because writing a book is a lot of work, but that was the primary motivation: to give people resources to get started with LLVM. This is something we've discussed a lot within the community, especially around back end development, where there’s a big need because that part of LLVM is not well documented.

People have to make a significant effort to get into that part of LLVM, and when they finally do, it's like, “Well, now I know it, why would I bother explaining it or writing it down?” That’s something I saw time and time again. At one point, I decided, “Let’s really write this down so everyone has a kind of source of truth for these things.” That was the main motivation for the book.

Now, just to add a bit more on documentation: if you look at LLVM, it's actually one of the most well-documented open source projects I know. The documentation is very well made, but the key piece that is well documented is the LLVM intermediate representation (IR). From a back end development perspective, that’s actually the middle end. If you look at a typical compiler pipeline, your source language gets compiled into LLVM IR by the front end. Then you enter the middle end, where most of the optimizations happen on the IR, and finally the back end, which deals with machine IR.

And it’s this last part—the back end and machine IR—that is pretty much undocumented. That’s what I wanted to cover in the book.

2: The book appears to cater to both newcomers and seasoned LLVM developers. How were you able to balance foundational concepts with advanced topics like TableGen or Machine IR to ensure accessibility without oversimplifying things?

Quentin Colombet: Yeah, that's a good question. I have to admit, the trade-off wasn't easy. And frankly, I’d say the jury’s still out—ultimately it’s going to be the readers who tell us whether we did a good job.

In the book, I tend to go very deep, but I also present all the concepts I use, so everything is self-contained. For seasoned people, that may feel like too much content, but for newcomers, at least they have access to that information.

To strike a balance, each chapter of the book comes with a quiz. The idea is to cover what’s been presented throughout the chapter. Instead of starting by reading the chapter, you can begin with the quiz. If you can’t answer the questions, that probably means it’s a good idea to read the chapter. If you can answer everything easily, then maybe you can move on to the next one.

For some concepts, I wish I had gone further, but the book is already pretty big—I think it's around 600 pages—so I had to make cuts. I cut content that goes deeper into some guts of LLVM that aren’t necessarily useful unless you want to modify them.

For example, you mentioned that I'm the code owner for register allocation. I can tell you exactly how it works inside and out, but in practice, that's not very useful unless you're modifying it. If you're just using it, it’s more helpful for me to explain how to use the API, how to configure it, and where to find more information if you want to go deeper.

3: Could you elaborate on how the Global ISel framework improves upon SelectionDAG and FastISel in terms of performance, extensibility, and target customization?

Quentin Colombet: Sure. We're going very deep, very fast here. Just to give a little context for people who may not have read the book or aren't familiar with LLVM: earlier, I mentioned the front end, middle end, and back end of the compiler pipeline.

Instruction selection—what we call ISel—is the transition from the middle end to the machine IR part. It’s where you go from a target-agnostic representation, like LLVM IR, to something specific to your actual target architecture, like X86 or AArch64. That’s when the actual instructions of the final assembly begin to show up. Instruction selection is about getting to those instructions and picking the best possible ones.

Now, in terms of performance, extensibility, and target customization, Global ISel has several advantages. First, it’s much younger than the other two frameworks, so we were able to learn from the mistakes of the past. From the start, it has a much more modular design. Instruction selection actually involves multiple steps—one of them is called legalization.

Legalization is where you map high-level concepts from your source code, like a multiplication, to the actual instructions supported by your target. For example, maybe you're doing A * B, but your target doesn’t support a multiply instruction. So you break it down into a series of adds, like what you learn in school. That’s legalization—making something that’s illegal for the target legal by using what's available.

In SelectionDAG and FastISel, all those steps happen in one monolithic pass. You go from LLVM IR to machine IR in one shot, and there’s not much you can do in between. It’s a black box. But with Global ISel, it’s a set of distinct optimization passes. Between those passes, you can insert your own target-specific or generic optimizations. That modularity gives you better flexibility, more opportunities for code reuse, and makes debugging and testing easier.

On performance: Global ISel operates directly on machine IR, which is the core IR for the back end. SelectionDAG uses its own intermediate representation, so the path goes from LLVM IR → SelectionDAG IR → Machine IR—two hops. With Global ISel, it's just LLVM IR → Machine IR. That one fewer hop already gives a performance boost—your compiler runs faster.

There’s also scope. SelectionDAG and FastISel work at the basic block level—each block is processed independently. That limits what you can do in terms of optimization across blocks. Global ISel works at the function level, giving you more context and more opportunities for optimization.

So to sum it up: Global ISel is faster, more modular, easier to debug, and gives you a broader optimization scope.

4: What should developers keep in mind when porting Global ISel to a new target? How do components like call lowering, register bank info, and legalization interact in that process?

Quentin Colombet: The four APIs you mentioned—call lowering, register bank info, legalizer info, and the instruction selector—correspond to the different stages I just described in Global ISel.

Call lowering is about mapping source-level arguments—like in C++—to the actual registers or stack locations for your target. Register bank info defines how values are assigned to register banks—essentially how you deal with the target's physical registers. Legalizer info, as I explained earlier, tells the compiler what's legal or illegal for your target and how to transform the illegal stuff into legal instructions.

Now, in terms of challenges, you mentioned “porting” Global ISel to a new target. There are two cases here. One is if your target already has an existing implementation using SelectionDAG. Then the challenge becomes: how do you reuse as much of that code as possible when moving to Global ISel? That’s tough because the intermediate representations are different. There's no magic bullet—you’ll have to reimplement parts of it.

We did put in place a compatibility layer using TableGen, which is LLVM’s domain-specific language used to describe instruction selection rules. You can reuse some of the same TableGen descriptions for both SelectionDAG and Global ISel, which helps a bit. But there's still a lot of code to rewrite.

The other case is when you’re writing a Global ISel implementation from scratch. The challenge there is design. There’s no single right way to do instruction selection. You need a coherent plan—decide when and how you're going to lower each operation. For example, when do you break down a multiply into adds? You could do it during legalization, later, or even earlier—it’s up to you.

Because Global ISel is modular, it’s easy to look at just one piece at a time. But if you're not careful, those pieces may not fit together properly, or you may end up implementing functionality that doesn’t even make sense in the broader pipeline. My advice is to keep things grounded. Always go back to what a real programmer might write, and make sure your lowering works end-to-end—from LLVM IR all the way to assembly. Then you can break it down into phases, confident that everything connects properly.

Also, among those components, call lowering is probably the easiest. It’s mostly about implementing the ABI—the binary-level convention for passing arguments in your target’s calling convention.

5: How does TableGen facilitate back end development, and what should developers be careful of when working with it?

Quentin Colombet: TableGen is kind of the hated child in LLVM. It’s a domain-specific language—a programming language developed within LLVM to help with LLVM development. And it’s used everywhere.

For example, in Clang, the user-facing part of LLVM, TableGen is used to define compiler options—optimization levels, warnings, and so on. It's also used for target features, like enabling vector or cryptographic extensions. And it’s heavily used in instruction selection, to define selection patterns. You’ll also see it used for intrinsics and many other things.

So if you work with LLVM, you'll touch the TableGen DSL at some point.

TableGen itself isn’t that hard. It has its own syntax, which is a bit weird at first, but once you get used to it, it’s manageable. The tricky part is that the syntax alone doesn't tell you the semantics—what your code actually means depends entirely on how it’s used, and that varies by context.

So you might write something in TableGen that looks the same whether you’re defining a Clang option or an intrinsic, but behind the scenes, the classes and records you use have totally different meanings. That’s because the semantics are defined not by TableGen itself, but by the backend generator that processes your TableGen input and turns it into C++ code or whatever else.

Ultimately, TableGen is a code-generation tool. If you’re adding a new Clang option, for example, doing it manually would mean registering it with the driver, wiring it up to different components, and writing a lot of boilerplate. TableGen lets you describe the option once, and the boilerplate is generated for you.

But that generation behavior is backend-specific. So the same TableGen syntax might generate completely different code depending on which backend is processing it. That’s the first difficulty: you have to do a kind of mental shift when working across different TableGen backends, even though the syntax looks identical.

The second issue is error messages. When something goes wrong—when you use the wrong syntax or reference something incorrectly—the errors are often vague or inconsistent. Different backends give different kinds of feedback, and understanding those messages isn’t easy.

I talk about this in the book. I offer some guidance on how to look into the TableGen backend code and reverse-engineer what's going wrong. But at the end of the day, it's often trial and error. Everyone in the LLVM community kind of dislikes TableGen, so if you don’t enjoy working with it, that’s expected.

That said, it's still a powerful tool. It exists to improve the productivity of compiler developers. You just have to get used to it.

6: LLVM 20 introduced several back end improvements, including Global ISel refinements and expanded RISC-V support. In your view, what were some of the most significant recent changes, and how do they reflect broader trends in compiler infrastructure?

Quentin Colombet: LLVM 20 is an interesting release because a lot of the work isn’t immediately visible to users—but it’s still meaningful. One big area of focus was compiler speed. We spent a lot of time making the compiler faster using techniques like profile-guided optimization, which optimizes the compiler itself based on how it’s used in practice. I won't go into too much detail on that, but the upshot is: the compiler should be faster compared to previous versions.

Another behind-the-scenes improvement was in release management. Between LLVM 19 and 20, the process for accepting patches after cutting the release branch became more rigorous. That means fewer last-minute bugs. From the user’s perspective, we hope this results in a more stable release.

There were also some backend-internal improvements that aren't user-facing but help compiler writers. For instance, function and attribute metadata in the intermediate representation got more precise. These attributes—things attached to functions to express constraints or additional semantics—now let us do more aggressive optimizations or better enforce correctness.

Remember, LLVM IR isn’t just for C or C++. It supports a wide range of source languages like Rust or domain-specific languages for GPUs. The IR needs to be expressive enough to handle all those cases, and attributes are one way we enrich it to describe what’s allowed or expected. With LLVM 20, we can be more precise, which means more optimization opportunities and tighter guarantees.

As for RISC-V: it’s an interesting beast. The spec is still evolving, and people keep adding new extensions. As those extensions mature, they get added to the LLVM backend. That means the instruction selector can now take advantage of those extensions to generate better code. If your processor supports a new, more efficient instruction, LLVM can now use it. So performance improves for the end user.

7: Tools like llvm-isel-fuzzer have been instrumental in uncovering backend bugs. How do such specialized fuzzing tools integrate into LLVM’s development workflow, and what benefits have you seen them bring to backend stability?

Quentin Colombet: I have some experience with fuzzing tools, but I haven’t used them heavily in my day-to-day workflow. Well—actually, I have used them, but mostly as a hardening tool. You need to get a lot of the basics right in your compiler before fuzzing becomes relevant. So fuzzing is usually one of the last things on your to-do list.

What fuzzers are really good at is finding edge cases—things that are technically valid but extremely weird. These aren’t necessarily inputs that a human would ever write, but they do exercise parts of the compiler in unexpected ways. That can help uncover crashes or subtle bugs, especially security-related ones.

This is particularly important in contexts like GPUs, where the compiler might actually run on a user’s device—on their phone, watch, or tablet. If an attacker can crash the compiler, that opens up a possible security vulnerability. Crashes are opportunities for malicious code injection, so from that angle, fuzzers are a valuable tool.

That said, fuzzers don’t help much with the quality of the generated code. They're not about finding missed optimizations. They’re about making sure the compiler doesn’t crash or behave in weird, undefined ways. In practice, these tools are often used by academics who stress-test the LLVM infrastructure and then report issues. It’s more of a backend stability thing than a user-facing performance tool.

8: Let’s talk about ML techniques—starting with MLGO, introduced by Trofin et al. in 2021. With machine learning now being applied to compiler optimizations, how do you see it influencing backend development, particularly in instruction selection and register allocation?

Quentin Colombet: That’s something I’m really curious about. Compilers are full of heuristics, and machine learning is great at discovering heuristics we never would’ve thought of—automatically.

There’s a lot of potential here, but there are also challenges. One big challenge is identifying the right parameters to feed into your machine learning model. To use an analogy: could you price a house just by counting the number of windows? There’s probably some correlation, but it’s not enough.

Similarly, in something like register allocation, the features you use to train your model may not carry enough information for it to make meaningful decisions. That’s a general problem in machine learning: capturing the right features is a bit of a black art. You try something, see how it performs, and iterate.

Another challenge is integration. If you look at Global ISel, SelectionDAG, or the register allocator, the APIs don’t necessarily give you a lot of hooks to inject machine learning guidance into the inner workings. If all you can do is tweak some knobs from the outside, you may not be able to make meaningful improvements.

So then the question becomes: do you need to write your own instruction selector or register allocator to take full advantage of machine learning? I think the answer is yes—but we’ll see how things evolve.

There’s also the issue of compile time. Machine learning models can be slow. Will users tolerate waiting 10 seconds for a 1% improvement in performance? What about 10 minutes? There’s always a trade-off.

This isn't a new problem. For decades, we’ve known that some compiler optimizations can be solved optimally using things like integer linear programming. But we don’t use them because they’re too slow. So while ML is promising, especially in research, there’s still a lot of work to do before it becomes practical in production compilers.

9: MLIR introduces a multi-level intermediate representation offering more flexibility in compiler design. How does MLIR interact with LLVM’s backend, and what advantages does it bring to code generation for heterogeneous systems?

Quentin Colombet: MLIR is a relatively new addition to the LLVM family. As you said, it stands for Multi-Level Intermediate Representation, but what that really means is that it's a framework for defining your own intermediate representations. It gives you a huge design space for creating IRs that suit your specific needs.

This is especially useful in heterogeneous systems. With MLIR, you can model both your CPU and GPU modules within the same IR. That opens up optimization opportunities across different targets, which is something LLVM IR alone can’t do.

For example, let’s say your CPU is calling a GPU function. In traditional LLVM IR, those components would be handled separately. But with MLIR, you can represent both sides in one place. That means you could move computations between devices more easily or apply cost models to decide what should run where.

That said, MLIR is a layer above LLVM IR. It doesn’t replace it—it feeds into it. So if your front end used to produce LLVM IR directly, now it might produce MLIR instead, and that gets lowered into LLVM IR. A good example is machine learning frameworks like PyTorch—they can output MLIR, do graph-level optimizations there, and then lower to LLVM IR for final code generation.

But here’s the catch: if you do everything in MLIR, you have to implement those transformations yourself. You’re not automatically reusing LLVM’s optimizations. The LLVM backend still handles codegen, but only after MLIR has done its work.

There’s an opportunity here to reuse more of the LLVM backend within MLIR or to integrate the two more tightly. We’ll see where that goes. But right now, MLIR gives you a lot of flexibility—if you’re willing to build on top of it.

10: Debugging backend passes can be complex. Are there tools or methodologies you personally recommend for diagnosing and resolving issues in code generation?

Quentin Colombet: Yes—and I cover this in the book because it's a key problem. Compilers are complex systems, and debugging them efficiently is critical.

The first thing you can leverage is LLVM’s logging infrastructure. It lets you see what’s happening as your program is lowered through the LLVM pipeline. You can enable logging globally or for specific passes if you already suspect where things are going wrong.

But if you don’t yet know what’s misbehaving, the next step is to reduce your input. LLVM provides tools like llvm-extract and llvm-reduce for that. For example, you can use llvm-extract to isolate a single function from your input file. Then you can keep compiling just that function to reproduce the issue.

llvm-reduce goes further. You give it a predicate—say, “this IR causes a crash”—and it will automatically minimize the IR while preserving that behavior. So instead of debugging hundreds or thousands of lines, you end up with 10 lines that still reproduce the problem. That’s a huge productivity win.

That works great for compiler crashes. The harder case is miscompiles—where the compiler doesn’t crash, but the generated program behaves incorrectly. Those are tougher because there’s no obvious failure signal from the compiler itself.

In that case, the first step is to disable interprocedural optimizations. These cross-function transformations can obscure things, and turning them off helps isolate the problem. Then you can start narrowing things down using function boundaries. Because function calls follow a known ABI, you can mix and match how functions are compiled—for example, compile one with optimizations and one without—to see which one introduces the bug.

Eventually, you can isolate a problematic function, reduce it, and get a minimal reproducer. At that point, there’s no substitute for staring at the assembly and figuring out what went wrong. Maybe someday machine learning will help with that—but for now, it’s still a manual process.

11: For developers interested in contributing to LLVM backend development, what areas currently need attention, and how can new contributors get involved effectively?

Quentin Colombet: That’s a great question—and one I hear a lot from newcomers. First, I’d encourage people to step back and think about what “contribution” really means. A lot of folks assume it’s only about writing code and sending patches, but there are other valuable ways to contribute.

For instance, filing issues is a big help. If you encounter an IR that causes a crash or incorrect behavior, reporting that clearly is already a contribution. LLVM has a ton of open bugs, and the bug tracker keeps growing. One way to help the project is by triaging those bugs—reproducing them, reducing the input, and making them easier to debug. That’s a great way to get familiar with the toolchain and to practice the debugging techniques I mentioned earlier.

You can also contribute by reviewing patches. Code review is essential to the progress of any open source project. And reviewing is a good way to learn—pick an area you’re interested in, follow the changes, and build your understanding. Even asking questions like “Could you add more comments?” or “I didn’t understand this part”—those are helpful. What’s unclear to you might be unclear to others, too.

When you do feel ready to submit patches, there are plenty of open issues you can work on. And if you reach out, LLVM contributors will usually help guide you through writing and reviewing your first patch.

In terms of specific areas: loop optimizations historically haven’t been LLVM’s strong suit, so there’s definitely room for improvement there. And more broadly, there's always interesting new research coming out of academia. If you see something promising, try implementing it in LLVM—you’ll learn a lot, and you might improve the compiler.

So yeah, there’s no shortage of opportunities to contribute.

12: Looking ahead, what trends or technologies do you anticipate will shape the future of LLVM backend development, and how should developers prepare for them?

Quentin Colombet: We touched on this earlier, but I think MLIR is going to be a big part of the future. It’s already being adopted widely—especially in the ML world. For example, Triton, a language used to write high-performance ML kernels, is based on MLIR. Nvidia is using it in tools like CUTLASS. So if you're working in that space, learning MLIR is a must.

Even beyond ML, I think it’s becoming important to understand the full compiler stack. For backend developers, that means knowing what happens in the front end and middle end, too. Producing the right LLVM IR from MLIR is critical—because the LLVM backend performs best when the IR is shaped a certain way.

LLVM is a great C and C++ compiler, and a lot of the backend optimizations have been unconsciously tuned over the years for those languages. So when other languages generate IR that looks very different, things may not work as well. You have two options: improve the backend to handle those patterns better, or adjust your front end to generate IR that looks more like C++. Knowing the full stack helps you make the right choice.

Also, AI is a powerful tool for understanding code or exploring optimizations. Use it—but stay in control. If you generate code with AI, make sure you understand what it’s doing. In some environments, you might not even be allowed to use AI due to copyright or security policies. So it’s important to be able to work without it, too.

Finally, I’d say the ultimate goal is to make compilers more accessible to end users. Sometimes the best way to help developers isn’t to make the compiler smarter—it’s to give users better knobs to express what they want. Tools like Triton succeed because they expose low-level control in a usable way. That’s something we should all aim for: making the compiler a more useful tool for developers.

To explore the ideas discussed in this conversation—including how to design instruction selectors, build legalizer stages, and debug backend passes with LLVM’s own reduction tools—check out LLVM Code Generation by Quentin Colombet, available from Packt. This 620-page comprehensive guide walks readers through the internals of LLVM’s backend infrastructure, from transforming IR to generating optimized machine code. With step-by-step examples, targeted exercises, and hands-on walkthroughs using TableGen, Machine IR, and GlobalISel, it’s both a reference and a roadmap for backend developers working on real-world architectures. Whether you’re building a custom target, contributing to LLVM itself, or deepening your compiler expertise, this book provides a practical foundation for mastering the backend.