Implementing a Diff Algorithm in MoonBit

A hands-on introduction to MoonBit through the implementation of a version-control-grade diff tool.

Jun 04, 2025

MoonBit is an emerging programming language that has a robust toolchain and relatively low learning curve. As a modern language, MoonBit includes a formatter, a plugin supporting VSCode, an LSP server, a central package registry, and more. It offers the friendly features of functional programming languages with manageable complexity.

To demonstrate MoonBit’s capabilities, we’ll implement a core software development tool—a diff algorithm. Diff algorithms are essential in software development, helping identify changes between different versions of text or code. They power critical tools in version control systems, collaborative editing platforms, and code review workflows, allowing developers to track modifications efficiently. If you have ever used git diff then you are already familiar with such algorithms.

The most widely used approach is Eugene W. Myers Diff algorithm, proposed in the paper “An O(ND) Difference Algorithm and Its Variations”. This algorithm is widely used for its optimal time complexity. Its space-efficient implementation and ability to find the shortest edit script make it superior to alternatives like patience diff or histogram diff and make it the standard in version control systems like Git and many text comparison tools such as Meld.

In this tutorial, we’ll implement a version of the Myers Diff algorithm in MoonBit. This hands-on project is ideal for beginners exploring MoonBit, offering insight into version control fundamentals while building a tool usable by both humans and AI through a standard API.

We will start by developing the algorithm itself, then build a command line application that integrates the Component Model and the Model Context Protocol , leveraging MoonBit’s WebAssembly (Wasm) backend. Wasm is a blooming technology that provides privacy, portability, and near-native performance by running assembly-like code in virtual machines across platforms —qualities that MoonBit supports natively, making the language well-suited for building efficient cross-platform tools.

By the end of this tutorial, you’ll have a functional diff tool that demonstrates these capabilities in action.

Project Setup

Let’s first create a new moonbit project by running:

moon new --lib diff

The following will be the project structure of the code. The moon.mod.json contains the configuration for the project, while the moon.pkg.json contains the configuration for each package. top.mbt is the file we'll be editing throughout this post.

├── LICENSE
├── moon.mod.json
├── README.md
└── src
    ├── lib
    │   ├── hello.mbt
    │   ├── hello_test.mbt
    │   └── moon.pkg.json
    ├── moon.pkg.json
    └── top.mbt

We will be comparing two pieces of text, divided each into lines. Each line will include its content and a line number. The line number helps track the exact position of changes, providing important context about the location of changes when displaying the differences between the original and modified files.

pub(all) struct Line {
  number : Int
  content: @string.View
} derive(Show)

The diff algorithm works by generating an edit script that transforms sequence A (the original version) into a sequence B (the new version). The edit script includes three operations: insertion, deletion or retention.

pub(all) enum Edit {
  Insert(Line)
  Keep(Line, Line)
  Delete(Line)
} derive(Show)

Thus, here’s the diff function signature:

pub fn diff(origin: Array[Line], new: Array[Line]) -> Array[Edit] {
  ...
}

The goal is to minimize insertions and deletions to produce a meaningful diff. A meaningful diff prioritizes human readability by identifying the set of changes needed to transform one text into another —preserving unchanged sections and highlighting only the actual modifications. This makes it much easier for developers to quickly understand what changed between versions.

Applying the Diff Algorithm

Before diving into the implementation, let’s walk through a simple example to illustrate how the diff algorithm detects and represents changes between two versions of text.

Let's first define a test case that will guide our implementation:

test {
  let origin =
    #|a
    #|c
  let new =
    #|a
    #|b
    #|c
  inspect!(
    EditScript(
      diff(
        origin
        .split("\n")
        .collect()
        .mapi(fn(index, content) { { number: index, content } }),
        new
        .split("\n")
        .collect()
        .mapi(fn(index, content) { { number: index, content } }),
      ),
    ),
    content=
      #| a
      #|+b
      #| c
      #|
    ,
  )
}

We certainly don't want to see the following result as it is harder to read; it doesn't highlight the critical change clearly.

-a
-c
+a
+b
+c

Before implementing an optimized approach, let us first explore a simple but inefficient depth-first search (DFS) method to understand the core problem of computing diffs.

Naive DFS Implementation

A straightforward approach tries all possible transformations and selects the shortest path:

pub fn diff(origin : Array[Line], new : Array[Line]) -> Array[Edit] {
  fn dfs(
    origin : ArrayView[Line],
    new : ArrayView[Line],
    ops : @immut/array.T[Edit]
  ) -> @immut/array.T[Edit] {
    match (origin, new) {
      ([], []) => return ops
      ([], _) =>
        return ops.concat(@immut/array.from_array(new.map(Edit::Insert)))
      (_, []) =>
        return ops.concat(@immut/array.from_array(origin.map(Edit::Delete)))
      ([origin_head, .. origin_tail], [new_head, .. new_tail]) => {
        if origin_head.content == new_head.content {
          return dfs(
            origin_tail,
            new_tail,
            ops.push(Edit::Keep(origin_head, new_head)),
          )
        }
        let insert = dfs(origin, new_tail, ops.push(Edit::Insert(new_head)))
        let delete = dfs(origin_tail, new, ops.push(Edit::Delete(origin_head)))
        if insert.length() < delete.length() {
          return insert
        } else {
          return delete
        }
      }
    }
  }

  dfs(origin[:], new[:], @immut/array.new()).to_array()
}

This method is inefficient for large inputs due to exponential time complexity. For each position where characters don't match, the algorithm explores two possible paths (insert or delete), resulting in O(2^(n+m)) worst-case complexity. Without memoization, it redundantly computes the same subproblems. For real-world scenarios with substantial text differences, this approach quickly becomes impractical, consuming excessive memory and processing time.

So now, it's time to explore the Myers Diff algorithm which improves efficiency by greedily searching for the shortest path. The following implementation illustrates the core concept of the algorithm rather than providing a direct 1:1 copy from the original paper.

Myers Diff

Myers algorithm models the problem as an editing graph, where each state (x, y) represents x deletions and y insertions. The shortest path from (0, 0) to (n, m) minimizes operations.

The graph below visualizes the editing process between two sequences:

  0   1   2   3   4   5   6   7
0 o---o---o---o---o---o---o---o
  |   |   | \ |   |   |   |   | -C
1 o---o---o---o---o---o---o---o
  |   | \ |   |   | \ | \ |   | -B
2 x---o---o---o---o---o---o---o
  | \ |   |   | \ |   |   | \ | -A
3 o---x---o---o---o---o---o---o
  |   | \ |   |   | \ | \ |   | -B
4 o---o---x---o---o---o---o---o
  | \ |   |   | \ |   |   | \ | -A
5 o---o---o---x---o---o---o---o
  |   |   | \ |   |   |   |   | -C
6 o---o---o---o---x---o---o---o
   +A  +B  +C  +A  +B  +B  +A

Myers' algorithm follows a few key principles:

It greedily retains common elements, because Keep operations have no cost.
It prioritizes paths with fewer insertions and deletions to minimize the number of operations.
It tracks the optimal paths along diagonals, ensuring efficient computation of the shortest transformation sequence.

So, here’s the naive implementation based on these principles:

pub fn diff(origin: Array[Line], new: Array[Line]) -> Array[Edit] {
  struct Vertex {
    x: Int
    y: Int
  } derive(Eq, Hash, Show)

  let n = origin.length()
  let m = new.length()

  // A hashmap to keep track of the best path reached on each diagonal
  let mut map = { 0: ({ x: 0, y: 0 }, @immut/array.new()) }

  for d in 0..<(n + m) {
    let next_map: Map[Int, (Vertex, @immut/array.T[Edit])] = {}

    for _, edits in map {
      // Greedily try to keep the longest common subsequence
      let mut x = edits.0.x
      let mut y = edits.0.y
      let mut edits = edits.1

      while x < n && y < m && origin[x].content == new[y].content {
        edits = edits.push(Keep(origin[x], new[y]))
        x += 1
        y += 1
      }

      // Check if we've reached the goal
      if x == n && y == m {
        return edits.to_array()
      }

      // Try to delete the original line
      if x < n {
        let next_vertex = { x: x + 1, y }
        let next_edits = edits.push(Delete(origin[x]))

        if next_map[x + 1 - y] is None ||
           (next_map[x + 1 - y] is Some(({ x, y: _ }, _)) && next_vertex.x > x) {
          next_map[x + 1 - y] = (next_vertex, next_edits)
        }
      }

      // Try to insert the new line
      if y < m {
        let next_vertex = { x, y: y + 1 }
        let next_edits = edits.push(Insert(new[y]))

        if next_map[x - y - 1] is None ||
           (next_map[x - y - 1] is Some(({ x: _, y }, _)) && next_vertex.y > y) {
          next_map[x - y - 1] = (next_vertex, next_edits)
        }
      }
    }

    map = next_map
  }

  // Unreachable
  panic()
}

Notice that this implementation is not the most efficient, as it explores each previous case twice. In practice, the algorithm can be optimized to process once per diagonal.

There’s also a variant of Myers’ algorithm that uses linear memory. For a detailed explanation, refer to the MoonBit documentation on Myers diff part 3.

Building a CLI application

Having a library is nice, it’s time to put it into action, by creating an application.

Let’s create a new MoonBit project by cloning the CLI template repository:

git clone https://github.com/peter-jerry-ye/cli-template.git

Alternatively, you can also create a repository from the template directly on GitHub and then clone your own version.

Notice that even though we are creating an “executable”, there is no main function. Instead, we are developing a “library” that imports and exports specific functions based on the wasi-cli, the standard API set for command-line interfaces in Wasm applications. This setup imports functions such as read and write, while exporting an entry function that serves as the program’s starting point.

Next, let us add a local dependency pointing to the library that we’ve just built by editing the moon.mod.json file:

{
  "name": "moonbit-community/cli-template",
  "deps": {
    "peter-jerry-ye/wasi-imports": "0.1.4",
    "peter-jerry-ye/io": "0.3.2",
    "peter-jerry-ye/async": "0.1.6",
    // check the `moon.mod.json` of the previous project
    "username/hello": {
      // path to the other project root
      "path": "../diff"
    }    
  },
  "source": "src"
}

With the CLI project set up and the diff library linked as a local dependency, we’re ready to build a functional command-line interface. The next step is to handle user input, read files, and display the diff results—making the tool usable for humans.

Building a CLI for Human Interaction

This CLI application will:

Retrieve command-line arguments
Read input files
Compute the Diff
Display the result

Getting arguments

First, we will retrieve the command line arguments using the get_arguments function from the environment package.

The top function in cli/src/stub.mbt retrieves the arguments and validates them:

async fn top() -> Unit! {
  let args = @environment.get_arguments()
  guard args is [_arg, origin, current] else {
    @io.println!("Usage: diff <origin> <current>")
  }
  @io.println!("Diffing \{origin} and \{current}")
}

The cli/src/moon.pkg.json file should include the following import for the environment interface of wasi-cli:

{
  // "link": { ... },
  "import": [
    // ...
    // add import environment interface of wasi-cli
    "peter-jerry-ye/wasi-imports/interface/wasi/cli/environment"
  ]
}

Although asynchronous I/O is not strictly necessary here, the current io package provides only an async API, so we use it for compatibility.

Next, let us try to execute our program. As we are building a Wasm module that uses the WASI-CLI (a standard API for command line interfaces), we need wasm-tools to convert our Wasm to embed the necessary interfaces and wasmtime to run the program. The following commands build and execute the Wasm component:

# Build Wasm
moon build --target wasm --debug
wasm-tools component embed --encoding utf16 wit target/wasm/debug/build/cli-template.wasm -o target/cli.core.wasm
wasm-tools component new target/cli.core.wasm -o target/cli.wasm
# Run the program
wasmtime target/cli.wasm a.mbt b.mbt

This should produce the following output:

Diffing a.mbt and b.mbt

Reading files

Next, we’ll read the input files specified as command-line arguments.

For this step, we need additional imports to enable filesystem access and handle file content encoding:

{
  // ...
  "import": [
    // ...
    "peter-jerry-ye/wasi-imports/interface/wasi/filesystem/preopens",
    "peter-jerry-ye/wasi-imports/interface/wasi/filesystem/types",
    "peter-jerry-ye/wasi-imports/interface/wasi/io/streams",
    "tonyfettes/encoding"
  ]
}

Next, we define a function that will read a file from a provided path. The function retrieves the preopened directory (passed to the runtime), opens the file, reads its contents via an input stream, and converts the content from UTF-8 to UTF-16 encoding (the internal encoding for MoonBit Strings):

async fn read(path : String) -> String! {
  let decoder = @encoding.decoder(UTF8)
  // Get the directory
  let directories = @preopens.get_directories()
  guard directories is [(root, _), ..] else {
    fail!("No preopen directories found.")
  }
  // Get the file within the directory
  let file = root
    .open_at(
      @types.PathFlags::default(),
      path,
      @types.OpenFlags::default(),
      @types.DescriptorFlags::default().set(@types.DescriptorFlagsFlag::READ),
    )
    .unwrap_or_error!()
  // Get the input stream
  let input_stream = file.read_via_stream(0).unwrap_or_error!()
  // Read the full content and decode UTF8 to UTF16 String
  let content = decoder.decode!( @io.read_all!(stream=input_stream).contents())
  input_stream.drop()
  file.drop()
  content
}

Note: For simplicity, this function does not include robust error handling —the program will exit and print an error message if something goes wrong. Additionally, it assumes that the files to be diffed reside within the same directory, which must be accessible to the Wasm runtime due to the sandboxed environment and “deny by default” security principle.

Next, let us update the top function to read the file contents:

async fn top() -> Unit! {
  let args = @environment.get_arguments()
  guard args is [_arg, origin, current] else {
    @io.println!("Usage: diff <origin> <current>")
  }
  // @io.println!("Diffing \{origin} and \{current}")
  let origin_content = read!(origin)
  let current_content = read!(current)
}

Now, we can rerun the previous build commands and execute the program as before. This time we must provide a directory path (assuming `A.txt` and `B.txt` exist in that directory) because Wasm modules execute within sandboxed environments that follow a "deny by default" security model. Without explicitly granting access via runtime flags, the program can literally access nothing:

# Rerun the previous build commands
moon build --target wasm --debug
wasm-tools component embed --encoding utf16 wit target/wasm/debug/build/cli-template.wasm -o target/cli.core.wasm
wasm-tools component new target/cli.core.wasm -o target/cli.wasm
# Execute the Wasm
wasmtime run --dir . target/cli.wasm A.txt B.txt

Displaying the Diff results

The final step is to integrate the diff algorithm and display the results. First, let us import the diff package into your project:

{
  // ...
  "import": [
    // ...
    {
      "path": "username/hello",
      "alias": "diff"
    }
  ]
}

Next, let us use the diff package to compare the files and print the result. ANSI escape codes are used to add color, making insertions and deletions easier to distinguish:

async fn top() -> Unit! {
  ...
  let origin_content = read!(origin)
  let current_content = read!(current)
  let origin_lines : Array[@diff.Line] = origin_content
    .split("\n")
    .collect()
    .mapi(fn(i, content) { { number: i, content } })
  let current_lines : Array[@diff.Line] = current_content
    .split("\n")
    .collect()
    .mapi(fn(i, content) { { number: i, content } })
  let script = @diff.diff(origin_lines, current_lines)
  for edit in script {
    match edit {
      Insert(value)  => @io.println!("\u001B[32m+\{value.content}")
      Keep(value, _) => @io.println!("\u001B[37m \{value.content}")
      Delete(value)  => @io.println!("\u001B[31m-\{value.content}")
    }
  }
}

With this, the CLI tool is complete.

Building the CLI for AI integration

While the command-line tool we have created works well for human users —and for AIs that can call the CLI tools—supporting the Model Context Protocol (MCP) makes it accessible to a broader range of AI systems. MCP defines how tools integrate with AI clients, enabling features like prompt generation and autocompletions. With MCP, our diff tool can be used seamlessly by any client that supports the protocol, such as VSCode, Cherry Studio, and more.

In this section, we’ll use the MCP server SDK for MoonBit to register our tool and launch the server, allowing AI clients to integrate and use the diff functionality.

Let us add the mcp-server sdk by running:

moon add peter-jerry-ye/mcp-server

And add the following import to our moon.pkg.json:

{
  // ...
  "import": [
    // ...
    "peter-jerry-ye/async/stream",
    {
      "path": "peter-jerry-ye/mcp-server",
      "alias": "mcp"
    },
    "peter-jerry-ye/mcp-server/schema"
  ]
}

Now, let us remove the top function and setup a run function that defines the tool and launches the server. Each tool requires a name, a description, and an input schema (parameters are passed as JSON). We will also need to define a callback to handle incoming requests.

///| Run the program.
pub fn run() -> Result[Unit, Unit] {
  // Create a new server
  let server = @mcp.new(Bytes::from_fixedarray(@random.get_random_bytes(32)))
  // Add the tool
  server.add_tool(
    name="diff",
    description="Diff two files under the current repository",
    // Define the input by using json schema
    inputSchema=@schema.object({
      "origin": @schema.string(),
      "current": @schema.string(),
    }),
    // Define the callback for handling the request
    cb=async fn(args, _server) {
      guard args is { "origin": String(origin), "current": String(current), .. } else {
        return {
          isError: true,
          payload: [Text(text="Usage: diff <origin> <current>")],
        }
      }
      // Handle error just in case the files aren't found correctly
      let (origin_content, current_content) = try {
        (read!(origin), read!(current))
      } catch {
        error =>
          return {
            isError: true,
            payload: [Text(text="Failed to read the file: \{error}")],
          }
      }
      let origin_lines : Array[@diff.Line] = origin_content
        .split("\n")
        .collect()
        .mapi(fn(i, content) { { number: i, content } })
      let current_lines : Array[@diff.Line] = current_content
        .split("\n")
        .collect()
        .mapi(fn(i, content) { { number: i, content } })
      let script = @diff.diff(origin_lines, current_lines)
      let builder = StringBuilder::new()
      for edit in script {
        match edit {
          Insert(value) => builder.write_string("+ \{value.content}\n")
          Keep(value, _) => builder.write_string("  \{value.content}\n")
          Delete(value) => builder.write_string("- \{value.content}\n")
        }
      }
      { isError: false, payload: [Text(text=builder.to_string())] }
    },
  )
  server.serve(
    stdin=@stream.input_stream_reader(@io.stdin_stream),
    stdout=@stream.output_stream_writer(@io.stdout_stream),
    stderr=@stream.output_stream_writer(@io.stderr_stream),
  )
  @io.event_loop.run()
  Ok(())
}

Take VS Code as an example MCP client. Configure MCP servers with the following settings in the file .vscode/mcp.json:

{
  "servers": {
    "my-diff-server": {
      "type": "stdio",
      "command": "wasmtime",
      "args": [
        "--dir",
        "cli", // path to the workspace where the files to be diffed exists
        "cli/target/cli.wasm" // path to the cli.wasm
      ]
    }
  }
}

With this setup, AI clients can now invoke the diff tool to compare two files directly.

Conclusion

In this article, we demonstrated how to implement a diff algorithm in MoonBit and transformed it into a fully functional tool—usable by both humans via the CLI and AI systems via the MCP. This project showcased MoonBit’s capabilities for building efficient, portable applications with WebAssembly.

Feel free to explore further enhancements, such as optimizing the diff algorithm or extending the tool for new AI clients.

A guest post by

Zihang

MoonBit developer