Deep Engineering #11: Quentin Colombet on Modular Codegen and the Future of LLVM’s Backend

How LLVM’s modular backend improves code generation across targets—by breaking down instruction selection into testable, reusable passes

Divya Anne Selvaraj

and

Quentin Colombet

Jul 31, 2025

Welcome to the eleventh issue of Deep Engineering

LLVM has long been celebrated for its modular frontend and optimizer. But for years, its backend—the part responsible for turning IR into machine code—remained monolithic, with instruction selectors like SelectionDAG and FastISel combining multiple responsibilities in a single, opaque pass. That’s now changing, as modular pipelines begin to reshape how LLVM handles instruction selection.

This issue’s delves into GlobalISel, the instruction selection framework designed to replace SelectionDAG and FastISel with a more modular, testable, and maintainable architecture. Built around a pipeline of distinct passes—IR translation, legalization, register bank selection, and instruction selection—GlobalISel improves backend portability, supports new ISAs like RISC-V, and makes it easier to debug and extend LLVM across targets.

To understand the design decisions behind GlobalISel—and the broader implications for backend engineering—we spoke with its architect, Quentin Colombet. A veteran LLVM contributor who joined Apple in 2012, Colombet has worked across CPU, GPU, and DSP backends and is also the code owner of LLVM’s register allocators. His perspective anchors our analysis of the trade-offs, debugging strategies, and real-world impact of modular code generation.

We also include an excerpt from LLVM Code Generation (Packt, 2025), Colombet’s new book. The selected chapter introduces TableGen, LLVM’s domain-specific language for modeling instructions and backend logic—a central tool in GlobalISel's extensibility, despite its sharp edges.

You can watch the complete interview and read the transcript here or scroll down to read the feature and book excerpt.

Deconstructing Codegen with Quentin Colombet

How LLVM’s Modular Backends Enable Portable, Maintainable Optimization

LLVM’s instruction selection was long dominated by SelectionDAG and FastISel, both monolithic frameworks that performed legalization, scheduling, and selection in a single pass per basic block. This design limited code reuse and optimization scope. GlobalISel was created to improve performance, granularity, and modularity. It operates on whole functions and uses Machine IR directly, avoiding the need for a separate IR like SelectionDAG. This reduces overhead and improves compile times. While AArch64’s GlobalISel was initially slower than x86’s DAG selector at -O0, ongoing work has closed the gap; by LLVM 18, GlobalISel’s fast path was within 1.5× of FastISel.

Perhaps more importantly, GlobalISel breaks down instruction selection into independent passes. Rather than one big conversion, it has a pipeline: IR translation, legalization of unsupported types, register bank selection, and actual instruction selection. Quentin Colombet, LLVM’s GlobalISel architect, explains that in SelectionDAG

“all those steps happen in one monolithic pass…It’s a black box. But with GlobalISel, it’s a set of distinct optimization passes. Between those passes, you can insert your own target-specific or generic optimizations. That modularity gives you better flexibility, more opportunities for code reuse, and makes debugging and testing easier.”

GlobalISel is designed as a toolkit of reusable components. Targets can share the common Core Pipeline and customize only what they need. Even the fast-O0 and optimized-O2 selectors now use the same pipeline structure, just configured differently. This is a big change from the past, where ports often had to duplicate logic across FastISel and SelectionDAG. The modular design not only avoids code duplication, it establishes clear debug boundaries between stages. If a bug or suboptimal codegen is observed after instruction selection, a backend engineer can pinpoint whether it originated in the legalization phase, the register banking phase, or elsewhere, by inspecting the Machine IR after each pass. LLVM’s infrastructure supports dumping the MIR at these boundaries, making it far easier to diagnose issues than untangling a single mega-pass. As Colombet quips,

“Instruction selection actually involves multiple steps…From the start, [GlobalISel] has a much more modular design.”

The benefit is that each phase (e.g. illegal operation handling) can be tested and understood in isolation.

Portability for New Targets and ISAs

A clear motivation for this overhaul is target portability. LLVM today must cater to a wide variety of architectures – not just x86 and ARM, but RISC-V (with its ever-expanding extensions), GPUs, DSPs, FPGAs, and more. A monolithic selector makes it hard to support radically different ISAs without accumulating lots of target-specific complexity. GlobalISel’s design, by contrast, forces a clean separation of concerns that parallels how one thinks about a new target. There are four major target hooks in GlobalISel, corresponding to the key decisions a backend must make:

CallLowering – how to lower abstract calls and returns into the concrete calling convention (registers, stack slots) of the target.
LegalizerInfo – what operations and types are natively supported by the target, and how to expand or break down those that aren’t. For example, if the target lacks a 64-bit multiply, the legalizer might specify to chop it into smaller multiplies or call a runtime helper.
RegisterBankInfo – the register file characteristics, such as separate banks (e.g. general-purpose vs. floating-point registers) and the cost of moving data between banks.
InstructionSelector – the final pattern matching that turns “generic” machine ops into actual target opcodes.

Each of these components is relatively self-contained. When bringing LLVM to a new architecture, developers can implement and test them one by one. Colombet advises keeping the big picture in mind:

“There’s no single right way to do instruction selection…because GlobalISel is modular, it’s easy to look at just one piece at a time. But if you’re not careful, those pieces may not fit together properly, or you may end up implementing functionality that doesn’t even make sense in the broader pipeline.”

In practice, the recommended approach is to first ensure you can lower a simple function end-to-end (even if using slow or naive methods), then refine each stage knowing it fits into the whole. This incremental path is much more feasible with a pipelined design than it was with SelectionDAG’s all-or-nothing pattern matching.

Real-world experience shows the value of this approach. RISC-V, for instance, has been rapidly adding standard and vendor-specific extensions. LLVM 20 and 21 have seen numerous RISC-V backend updates – from new bit-manipulation and crypto instructions to the ambitious V-vector extension. With GlobalISel, adding support for a new instruction set extension often means writing TableGen patterns or legality rules without touching the core algorithm. In early 2025, LLVM’s RISC-V backends even implemented vendor extensions like Xmipscmove and Xmipslsp for custom silicon.

This kind of targeted enhancement – adding a handful of operations in one part of the pipeline – is exactly what the modular design enables. It’s telling that as soon as the core GlobalISel framework matured, targets like ARM64 and AMDGPU quickly adopted it for their O0 paths, and efforts are underway to make it the default at higher optimizations.

New CPU architectures (for example, a prospective future CPU with unusual 128-bit scalar types) can be accommodated by plugging in a custom legalizer and reusing the rest of the pipeline. And non-traditional targets stand to gain as well. Apple’s own GPU architecture, which Colombet has worked on, was one early beneficiary of a GlobalISel-style approach – its unusual register and instruction structure could be cleanly modeled through custom RegisterBank and Legalizer logic, rather than fighting a general-purpose DAG matcher.

The result is that LLVM’s backend is better positioned to embrace emerging ISAs. As Colombet noted,

“The spec [for RISC-V] is still evolving, and people keep adding new extensions. As those extensions mature, they get added to the LLVM backend…If your processor supports a new, more efficient instruction, LLVM can now use it.”

Another aspect of portability is code reuse across targets. GlobalISel makes it possible to write generic legalization rules – for example, how to lower a 24-bit integer multiply using 32-bit operations – once in a target-independent manner. Targets can then opt into those rules or override them with a more optimal target-specific sequence. In SelectionDAG, some of that was possible, but GlobalISel is designed with such flexibility in mind from the start. This pays off when supporting families of architectures (say, many ARM variants or entirely new ones) – one can leverage the existing passes instead of reinventing the wheel. Even the register allocator and instruction scheduling phases (which come after instruction selection) can benefit from more uniform input thanks to GlobalISel producing consistent results across targets.

Easier Debugging and Maintenance

The switch to a modular backend isn’t just about adding features – it also improves the day-to-day experience of compiler engineers maintaining and debugging the code generator. With the old monolithic pipeline, a failure in codegen (like an incorrect assembly sequence or a compiler crash) often required reverse-engineering the entire selection process. By contrast, GlobalISel’s structured passes and the use of Machine IR make it far more tractable. Engineers can inspect the MIR after each stage (translation, legalize, register assignment, etc.) using LLVM’s debugging flags, to see where things start to diverge from expectations. For instance, if an out-of-range immediate wasn’t properly handled, the issue will be visible right after the Legalizer pass – before it ever propagates to final assembly. This clear separation of concerns reduces the cognitive load in debugging.

Colombet emphasizes testing and debugging as first-class considerations. He advocates using tools like llvm-extract and llvm-reduce to isolate the function or instruction that triggers a bug.

“Instead of debugging hundreds or thousands of lines, you end up with 10 lines that still reproduce the problem. That’s a huge productivity win,” Colombet says of minimizing test cases.

With GlobalISel, this strategy can be taken even further. Each pass in the pipeline can often be run on its own, enabling unit-test-like isolation. LLVM’s verifier checks invariants between passes, so errors tend to surface closer to their source.

This modular design yields tangible benefits:

Clearer failure boundaries: MIR can be inspected after each phase (translation, legalization, register assignment).
Faster diagnosis: bugs can be isolated and reproduced at the level of a single pass.
Built-in correctness checks: verifier routines catch many issues early.
Reuse over reinvention: less hand-written C++, more declarative TableGen logic.

TableGen, for its part, remains a double-edged sword. GlobalISel backends rely heavily on it to define matching rules, allowing reuse across targets. But the tooling is infamously brittle. As Colombet puts it:

“TableGen is kind of the hated child in LLVM… The syntax alone doesn't tell you the semantics… what your code means depends on how it’s used in the backend generator. And the error messages are often vague or inconsistent… everyone in the LLVM community kind of dislikes TableGen.”

Despite its flaws, TableGen is central to GlobalISel’s maintainability. It helps abstract instruction complexity into compact, reusable rules — a major win for modern ISAs.

Backend stability is also reinforced by fuzzing. Tools like llvm-isel-fuzzer generate random IR to stress-test instruction selectors, uncovering obscure failures that user test cases might miss. Colombet highlights their importance, especially in contexts like GPU drivers:

“In contexts like GPU drivers, a compiler crash could potentially be exploited, so hardening the backend against unexpected input is vital.”

While fuzzing doesn’t improve performance, it ensures each GlobalISel pass handles unexpected inputs robustly. Over time, this approach, combining modularity, reproducibility, automation, and stress-testing, has made LLVM’s backend infrastructure more resilient and easier to evolve.

Supporting New Hardware Paradigms

LLVM’s move toward a modular backend reflects two major broader architectural shifts in computing: the rise of heterogeneous computing, which LLVM addresses through MLIR; and the growing use of machine learning to guide compiler decisions, exemplified by projects like MLGO. Both reflect a broader trend toward modularity, data-driven optimization, and architectural flexibility in modern compilers.

The Rise of Heterogeneous Computing

As heterogeneous systems become standard, combining CPUs, GPUs, and specialized accelerators, compilers must generate efficient code across dissimilar targets, and optimize across their boundaries. LLVM’s response is Multi-Level Intermediate Representation (MLIR) which we covered in Deep Engineering #9, a flexible, extensible IR framework that sits above traditional LLVM IR and enables high-level, domain-specific optimizations before lowering to machine code.

Colombet explains:

“With MLIR, you can model both your CPU and GPU modules within the same IR. That opens up optimization opportunities across different targets… you could move computations between devices more easily or apply cost models to decide what should run where.”

This enables compilers to consider cross-device trade-offs early in the pipeline — for example, determining whether a tensor operation should run on a GPU or CPU based on context or cost. MLIR achieves this via a layered, dialect-based design: each dialect captures a different level of abstraction (e.g., tensor algebra, affine loops, GPU kernels), which can be progressively lowered. Once it reaches LLVM IR, the standard code generation path, including GlobalISel, takes over.

MLIR’s integration with GlobalISel brings key advantages:

Targets like GPUs or DSPs can be supported by implementing GlobalISel hooks for custom codegen.
MLIR transformations can assume the backend will honor those hooks, enabling consistent lowering.
LLVM 20 improved backend metadata and attribute precision, allowing frontends like Swift and Rust to better express semantic constraints to the optimizer — particularly important in multi-language, multi-device builds.

Although GlobalISel doesn’t directly manage CPU–GPU splitting, its modular design makes it easier to support unconventional targets cleanly, whether an Apple GPU or a DSP with custom arithmetic units. The combination of MLIR’s flexible front-end IR and GlobalISel’s extensible backend forms a coherent pipeline for future hardware.

Growing Use of Machine Learning to Guide Compiler Decisions

A second major shift, still largely experimental — is the integration of machine learning inside the compiler itself. Research tools like Machine Learning Guided Optimization (MLGO) have shown promising results in replacing fixed heuristics with learned policies. In 2021, Trofin et al. used reinforcement learning to drive LLVM’s inliner, achieving ~5% code size reductions at -Oz with only ~1% additional compile time. The same framework was applied to register allocation, learning spill strategies that occasionally outperformed the default greedy allocator.

Colombet sees real potential here:

“Compilers are full of heuristics, and machine learning is great at discovering heuristics we never would’ve thought of.”

But he’s also clear about the practical challenges. First is the problem of feature extraction — the task of encoding program state into meaningful inputs for a model:

“To use an analogy: could you price a house just by counting the number of windows? There’s probably some correlation, but it’s not enough. Similarly, in something like register allocation, the features you use to train your model may not carry enough information.”

Even with good features, integration into the backend is nontrivial. LLVM’s register allocator and GlobalISel weren’t built with explicit “decision points” for ML models to hook into.

“If all you can do is tweak some knobs from the outside, you may not be able to make meaningful improvements… do we need to write our own instruction selector or register allocator to take full advantage of machine learning? I think the answer is yes – but we’ll see.”

The implication is that further modularization may be needed — isolating backend subproblems (like spill code insertion or instruction choice) into well-defined, pluggable interfaces. This would allow learned components to replace or guide specific decisions without requiring wholesale rewrites. Such a hybrid model — rule-based infrastructure augmented by ML at critical junctures — aligns with the trajectory GlobalISel already began: decoupling backend logic into testable, replaceable units.

Whether through MLIR’s IR layering or MLGO’s data-driven policies, the common trend is clear: LLVM’s backend is evolving toward composability, configurability, and adaptability by refactoring it into pieces that are easier to understand, reuse, and eventually learn. By decomposing code generation into well-defined passes, LLVM has made it easier to support new ISAs such as RISC-V, extend to targets like GPUs and DSPs, and integrate with tools like MLIR. The transition is still ongoing, and trade-offs remain—compile-time costs, tooling gaps, and the complexity of mixing TableGen with C++—but the payoff is clear: a backend that is more debuggable, more maintainable, and better prepared for architectural change. As machine learning and domain-specific IRs reshape the frontend, GlobalISel ensures that the backend can evolve in parallel. It is not just a rewrite; it is infrastructure for the next era of compilers.

If the architectural case for modular code generation in LLVM caught your attention, Quentin Colombet’s book, LLVM Code Generation offers the definitive deep dive. Colombet, the architect behind GlobalISel, takes readers inside the backend machinery of LLVM—from instruction selection and register allocation to debugging infrastructure and TableGen. The following excerpt—Chapter 6: TableGen – LLVM’s Swiss Army Knife for Modeling—introduces the declarative DSL that powers much of LLVM’s backend logic. It explains how TableGen structures instruction sets, eliminates boilerplate, and underpins the extensibility that modular backends depend on.

TableGen – LLVM Swiss Army Knife for Modeling by Quentin Colombet

The complete “Chapter 6: TableGen – LLVM Swiss Army Knife for Modeling” from the book LLVM Code Generation by Quentin Colombet (Packt, May 2025).

For every target, there are a lot of things to model in a compiler infrastructure to be able to do the following:

Represent all the available resources
Extract all the possible performance
Manipulate the actual instructions

This list is not exhaustive, but the point IS that you need to model a lot of details of a target in a compiler infrastructure.

While it is possible to implement everything with your regular programming language, such as C++, you can find more productive ways to do so. In the LLVM infrastructure, this takes the form of a domain-specific language (DSL) called TableGen.

In this chapter, you will learn the TableGen syntax and how to work your way through the errors reported by the TableGen tooling. These skills will help you be more productive when working with this part of the LLVM ecosystem.

This chapter focuses on TableGen itself, not the uses of its output through the LLVM infrastructure. How the TableGen output is used is, as you will discover, TableGen-backend-specific and will be covered in the relevant chapters. Here, we will use one TableGen backend to get you accustomed to the structure of the TableGen output, starting you off on the right foot for the upcoming chapters.

Read the Complete Chapter

Use code LLVM20 for 20% off at packtpub.com.

Get the Book

🛠️Tool of the Week

DirectX Shader Compiler (DXC) – HLSL Compiler Based on LLVM/Clang

DXC is Microsoft’s official open-source compiler for High-Level Shader Language (HLSL), built on LLVM and Clang. It supports modern shader development for Direct3D 12 and Vulkan via SPIR-V, and is widely used in production graphics engines across the gaming and visual computing industries.

Highlights:

LLVM-Based Shader Compilation: Leverages the LLVM infrastructure to provide robust parsing, optimization, and code generation for HLSL, targeting both DXIL (DirectX Intermediate Language) and SPIR-V.
Cross-Platform Targeting: Supports SPIR-V output for Vulkan through the -fspv-target-env flag, making it viable for multi-platform engines needing portability between Direct3D and Vulkan.
Modern Shader Features: Enables developers to use Shader Model 6.x features, including wave operations, ray tracing, and mesh shaders, with forward compatibility for future models.
Active Development and Tooling Improvements: The June 2025 release (v1.8.2406.1) added new diagnostics, SPIR-V fixes, -ftime-trace support for compilation profiling, and improvements to the dxcompiler API surface.

Learn more on GitHub

Tech Briefs

2025 AsiaLLVM - Understanding Tablegen generated files in LLVM Backend | Prerona Chaudhuri: This beginner-focused talk covers how TableGen generates key C++ backend files in LLVM—such as CodeEmitter, DisassemblerTables, and RegisterInfo—using AArch64 examples to explain how MIR instructions are encoded, decoded, and mapped to target-specific definitions.
Type-Alias Analysis: Enabling LLVM IR with Accurate Types | Zhou et al.: Introduces TypeCopilot, a type-alias analysis framework for LLVM IR that overcomes the limitations of opaque pointers by inferring multiple concrete pointee types per variable, enabling accurate, type-aware static analyses with up to 98.57% accuracy and 94.98% coverage.
LLVM 22 Compiler Enters Development With LLVM 21 Now Branched: LLVM 21 has been officially branched for release—introducing support for AMD GFX1250 (RDNA 4.5?), NVIDIA GB10, and expanded RISC-V features—while LLVM 22 development begins with continued backend enhancements, Clang 21 updates for C++2c and AVX10 changes, and LLVM 22.1 expected around March 2026.
The Architecture of Open Source Applications (Volume 1): LLVM | Chris Lattner: This book chapter presents LLVM as a modular, retargetable compiler infrastructure built around a typed intermediate representation (LLVM IR), designed from the outset as a set of reusable libraries rather than a monolithic toolchain.
2024 LLVM Dev Mtg - State of Clang as a C and C++ Compiler | Aaron Ballman: Clang's lead maintainer outlined ongoing progress across C and C++ standards support, tooling, diagnostics, and community growth—highlighting Clang’s expanding role within LLVM, its near-complete C++20 and C23 conformance, and persistent challenges like compile-time overhead and documentation.

That’s all for today. Thank you for reading this issue of Deep Engineering. We’re just getting started, and your feedback will help shape what comes next.

Take a moment to fill out this short survey we run monthly—as a thank-you, we’ll add one Packt credit to your account, redeemable for any book of your choice.

We’ll be back next week with more expert-led content.

Stay awesome,
Divya Anne Selvaraj
Editor-in-Chief, Deep Engineering

Share Packt Deep Engineering

If your company is interested in reaching an audience of developers, software engineers, and tech decision makers, you may want to advertise with us.

A guest post by