Writing A Brainf*ck JIT

Posted: July 2024

GitHub

Brainf*ck

I will be abbreviating Brainfck as 'BF' from here on out*

BF is an extremely simple language. In BF, there are 30,000 cells. Each cell has a capacity of 1 byte. BF allows us to manipulate these cells with the given instructions.

Operations

Operations	Usage
<	Moves cell pointer 1 space to the left
>	Moves cell pointer 1 space to the right
+	Increments the value at the currently selected cell by 1
-	Decrements the value at the currently selected cell by 1
.	Print the value at the currently selected cell as ASCII
,	Read input from user, and store at the currently selected cell as ASCII
[	Jump to the next ']' if the currently selected cell's value is 0
]	Jump to the previous '[' if the currently selected cell's value is NOT 0

Loops

All the operations are straight-forward, except for the [ and ] operations. Sqaure brackets are the only form of loops in BF. They are the equivalent of the following C code:

while(i != 0) {
    // Do whatever...
}

Now that we understand BF, we can move onto the JIT.

ARM64

I have written an ARM7TDMI interpreter, so I am fairly fluent in ARM assembly. I've been wanting to learn ARM64 assembly for some time now, and this is my first time writing a JIT. This is why I picked BF. Due to the simplicity of the language, a lot of programmers like myself use BF as a learning tool of sorts.

Back to ARM64. Here are some important details on ARM64 instructions:

- Each ARM64 instruction takes up 32 bits (at least the ones I've used).

- PC-relative jumps take an immediate value, which is the amount of bytes to jump forward and multiply it by 4.

Writing the JIT

Here is a basic overview of how my compilation loop works.

int i = 0;

// Allocate memory with RWX permissions
uint8_t *memory = mmap(NULL, JIT_MEM_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

// microasm is a helper struct that makes writing raw bytes easier.
microasm bin = {.dest = memory};

// Continue until we hit EOF
while((i = fgetc(bf_file)) != EOF) {
    char input = (char)i;

    const uint8_t pos_reg = 9; // Register that contains the cell pointer.
    // Switch on character
    switch (input) {
    }
}

Now comes the tricky part... generating ARM64 instructions.

We'll tackle the 2 easiest BF operations, incrementing and decrementing the cell pointer.

'<' and '>'

// Rd: register which stores the result of Rn - imm
void asm_arm64_immsub(microasm *a, uint8_t rd, uint8_t rn, uint8_t imm) {
    uint32_t instruction = 0xD1000000; // Equivalent to `sub x0, x0, #0`
    instruction |= (rn << 5) | rd; // Rd is stored at the first 5 bits of the instruction, then rn is stored next.
    instruction |= (imm << 10); // Imm takes 12 bits of storage, however we never use more than 8 bits when using this instruction.

    asm_write_32bit(a, instruction); // Helper function which adds our instruction to the JIT result
}

// Rd: register which stores the result of Rn + imm
void asm_arm64_immadd(microasm *a, uint8_t rd, uint8_t rn, uint8_t imm) {
  uint32_t instruction = 0x91000000; // Equivalent to `add x0, x0, #0`
  instruction |= (rn << 5) | rd;
  instruction |= (imm << 10);

  asm_write_32bit(a, instruction);
}

...

case '<':
    asm_arm64_immsub(&bin, pos_reg, pos_reg, 1);
    break;

case '>':
    asm_arm64_immadd(&bin, pos_reg, pos_reg, 1);
    break;

This one wasn't too bad. Onto the next!

'+' and '-'

Ok, we have ran into a problem. How do we access the 30k cells from assembly? Welcome to ARM64 calling convention! When we call our finished JIT block, we can pass arguments. They will be stored from register x0 to x8, depending on how many arguments you pass. For now, we only need to pass an array of 30k uint8_ts. Since it's the first argument, we can safely expect x0 to contain the address.

To index into this chunk of memory, we can use simple pointer arithmatic. Now that pos_reg is the register that contains our index into this byte array, we can simply add pos_reg and x0 together to get the address of the requested cell.

However, we can't just leave x0 equal to the memory address. It's not allowed, as x0 should not be used as a scratch register. So we will modify the code before the loop as so:

const uint8_t data_reg = 10;
const uint8_t value_at_pos_reg = 12; // Will be used for accessing values in the array
asm_arm64_regmov(&bin, data_reg, 0); // Move value at x0 to x10

...

while() {
    // switch
}

Onto the addition and subtraction:

void asm_arm64_regadd(microasm *a, uint8_t rd, uint8_t rn, uint8_t rm,
                      uint8_t imm_shift) {
  uint32_t instruction = 0x8B000000; // Equivalent to `add x0, x0, x0`
  instruction |= (rn << 5) | rd; // You already know...
  instruction |= (imm_shift << 10) & ((1 << 5) - 1); // This is an optional shift value, which we do not use
  instruction |= (rm << 16);

  asm_write_32bit(a, instruction);
}

void asm_arm64_regldrb(microasm *a, uint8_t rt, uint8_t rn) {
  uint32_t instruction = 0x39400000; // Equivalent to `ldrb x0, =0x00000000`
  instruction |= (rn << 5) | rt;

  asm_write_32bit(a, instruction);
}

void asm_arm64_regstrb(microasm *a, uint8_t rt, uint8_t rn) {
  uint32_t instruction = 0x39000000; // Equivalent to `strb x0, =0x00000000`
  instruction |= (rn << 5) | rt;

  asm_write_32bit(a, instruction);
}

...

case '+':
    // Add `pos_reg` and `data_reg` together, to find the address of the requested cell
    asm_arm64_regadd(&bin, value_at_pos_reg, pos_reg, data_reg, 0); 

    // Load the value from the address into x13
    asm_arm64_regldrb(&bin, 13, value_at_pos_reg);

    // Increment `x13` by 1
    asm_arm64_immadd(&bin, 13, 13, 1);
    
    // Store `x13` back to it's location in memory
    asm_arm64_regstrb(&bin, 13, value_at_pos_reg);

    // Clear `x13`
    asm_arm64_immmov(&bin, 13, 0);
    break;

case '-':
    // Add `pos_reg` and `data_reg` together, to find the address of the requested cell
    asm_arm64_regadd(&bin, value_at_pos_reg, pos_reg, data_reg, 0); 

    // Load the value from the address into x13
    asm_arm64_regldrb(&bin, 13, value_at_pos_reg);

    // Decrement `x13` by 1
    asm_arm64_immsub(&bin, 13, 13, 1);
    
    // Store `x13` back to it's location in memory
    asm_arm64_regstrb(&bin, 13, value_at_pos_reg);

    // Clear `x13`
    asm_arm64_immmov(&bin, 13, 0);

We got the 4 easiest operations out of the way, onto the last 4...

'.' and ','

These two are quite complicated, as they require reading and writing to the console. In order to do both of these tasks, we will be using syscalls.

For example, in ARM64 assembly we can print a word using the write syscall:

/* https://peterdn.com/post/2020/08/22/hello-world-in-arm64-assembly/ */
/* Data segment: define our message string and calculate its length. */
msg:
    .ascii        "Hello, ARM64!\n"
len = . - msg

.text

/* Our application's entry point. */
.globl _start
_start:
    /* syscall write(int fd, const void *buf, size_t count) */
    mov     x0, #1      /* fd := STDOUT_FILENO */
    ldr     x1, =msg    /* buf := msg */
    ldr     x2, =len    /* count := len */
    mov     w8, #64     /* write is syscall #64 */
    svc     #0          /* invoke syscall */

    /* syscall exit(int status) */
    mov     x0, #0      /* status := 0 */
    mov     w8, #93     /* exit is syscall #93 */
    svc     #0          /* invoke syscall */

The kernel reads a string from the address stored in x1. This makes our lives a little bit easier. We can skip loading the value from memory.

case '.':
    // Calculate address of value at current cell and store result in `x0`
    asm_arm64_regadd(&bin, 1, pos_reg, data_reg, 0); 

    // `x8` tells the kernel which syscall we want to invoke.
    // 0x40 = write syscall
    asm_arm64_immmov(&bin, 8, 0x40);
    
    // Specify we want to write to STDOUT
    asm_arm64_immmov(&bin, 0, 1);

    // Length is 1, since `.` only outputs one ASCII character
    asm_arm64_immmov(&bin, 2, 1);

    // Invoke syscall
    asm_arm64_syscall(&bin, 0);

    // Clear registers `x0` to `x2`
    asm_arm64_immmov(&bin, 0, 0);
    asm_arm64_immmov(&bin, 1, 0);
    asm_arm64_immmov(&bin, 2, 0);

Reading user input is similar to what we just did. We just change the syscall number and the arguments

case ',':
    // read syscall
    asm_arm64_immmov(&bin, 8, 63);
    
    // `x0` specifies we are reading from STDIN
    asm_arm64_immmov(&bin, 0, 0);
    
    // `x1` stores the memory address where we would like the value to be written
    asm_arm64_regadd(&bin, 1, pos_reg, data_reg, 0);

    // `x2` stores how many characters we will read in
    asm_arm64_immmov(&bin, 2, 1);
    
    // Invoke syscall
    asm_arm64_syscall(&bin, 0);
    
    // Clear registers `x0` to `x2`
    asm_arm64_immmov(&bin, 0, 0);
    asm_arm64_immmov(&bin, 1, 0);
    asm_arm64_immmov(&bin, 2, 0);

Two more to go, we are almost there!

'[' and ']'

This is the hardest one by far... Originally, I overcomplicated this part and paid dearly for it (it did not work...). My final solution was using two arrays and a stack, which would push the current location every time it spotted the '[' character, and popped a location off the stack when it encountered the ']' character. This ensured that no matter how many nested loops the BF program had, we would always capture the right loop.

The two arrays were named lpos and rpos. Each loop was given a unique ID. Using this ID, you could lookup the beginning and end of the loop. For example: lpos[2] is the beginning of the loop with the ID 2

We will be passing lpos and rpos to the JIT program as arguments

All in all, this took quite the fuckery to figure out...

// Jumps to pc + (imm * 4), as this allows us to jump further in either direction
void asm_arm64_pcrelbranch_ze(microasm *a, uint8_t rt, uint32_t imm) {
  uint32_t instruction = 0xB4000000; // Equivalent to `cbz x0, #0`
  instruction |= rt;
  instruction |= imm << 5;

  asm_write_32bit(a, instruction);
}

// Same thing as above
void asm_arm64_pcrelbranch_nz(microasm *a, uint8_t rt, uint32_t imm) {
  uint32_t instruction = 0xB5000000; // Equivalent to `cbnz x0, #0`
  instruction |= rt;
  instruction |= imm << 5;

  asm_write_32bit(a, instruction);
}

void asm_arm64_regldr(microasm *a, uint8_t rt, uint8_t rn) {
  uint32_t instruction = 0xF9400000; // Equivalent to `ldr x0, =0x00000000`
  instruction |= (rn << 5) | rt;

  asm_write_32bit(a, instruction);
}

...

uint32_t loop_count = 0; // Used for keeping track of loop ID's

// lpos and rpos will hold memory addresses to jump to, which are the beginning and end of their respective loops
// NOTE: What BF program has more than 1024 loops?!
uint64_t lpos = malloc(1024 * 8); 
uint64_t rpos = malloc(1024 * 8);

// Leaving out the definition of the Stack struct, as it's out of scope of this blog post.
// It's written within the source code if you are curious.
Stack s_loops = stack_init(1024 * 8);

// `x14 and `x15` contains the address of the `lpos` and `rpos` arrays respectively.
const uint8_t loop_lpos = 14;
const uint8_t loop_rpos = 15;

...

// while loop

case '[':
    loop_count++; // This is the loop ID for this given loop 


    // Ok. This is quite complicated. 
    // If you don't remember, `bin` is of the type `microasm`,
    // which is a helper struct for writing bytes/instructions to an array.
    // The `dest` field is the memory address of the raw array.
    // When we invoke this JIT block, the PC (program counter) will jump to the beginning of this array.
    // Since we have access to this memory, we can calculate the offsets between '[' and ']' instructions in the raw array.
    // So `bin.dest` would be equal to something like `0x7fffff30`.
    // Which is the current spot in memory we are operating in right now.

    // Later on, when our program encounters ']', it will look up the beginning of it's loop using it's loop ID,
    // and index into the `lpos` array and jump back, restarting the loop.

    // The reason we add 24 to `bin.dest` is to account for the instructions in front of it.
    // We do not want ']' to jump back to this current instruction, as it would cause undefined behavior in our BF program.
    // If we jump to this address, we would land on top of the `pcrelbranch` instruction.
    lpos[loop_count] = (uint64_t)bin.dest + (6 * 4); 
    
    uint32_t *loop_id = malloc(4);
    *loop_id = loop_count; // Clone `loop_count` value
    stack_push(&s_loops, loop_id); // Pushes `loop_id` onto stack

    // Load value in current cell
    asm_arm64_regadd(&bin, value_at_pos_reg, pos_reg, data_reg, 0);
    asm_arm64_regldrb(&bin, 13, value_at_pos_reg);

    // Calculate end-of-loop address via pointer arithmatic
    // Store the address in `x2`
    // We multiply `loop_count` by 8, since we are dealing with 64 bit values within the array.
    // This index does not currently exist, but when the program is finished being compiled,
    // it will exist.
    asm_arm64_immadd(&bin, 2, loop_rpos, loop_count * 8);
    
    // Notice we aren't using ldrb, but ldr
    // This loads a full 64 bit value rather than an 8 bit value
    
    // Load end-of-loop address into `x3`
    asm_arm64_regldr(&bin, 3, 2);

    // A conditional PC-relative branch, which only branches if `x13` does not equal 0
    // If `x13` is 0, it will jump over the unconditional branch next to it.
    // The `2` tells the PC to jump 8 bytes forward (2 * 4)
    asm_arm64_pcrelbranch_nz(&bin, 13, 2);

    // If `x13` == 0, this will skip the loop and jump to the corresponding ']' location
    asm_arm64_br(&bin, 3);
    break;

case ']':
    // Retrieve last loop ID from the stack
    uint32_t *loop_id_ptr = (uint32_t *)stack_pop(&s_loops);
    uint32_t loop_id = *loop_id_ptr;

    // Don't leave a mess
    free(loop_id_ptr); 

    // Save current memory location, for '[' to jump to
    rpos[loop_id] = (uint64_t)bin.dest;
    
    // You should know what this does by now.
    asm_arm64_regadd(&bin, value_at_pos_reg, pos_reg, data_reg, 0);

    // You should know this too
    asm_arm64_regldrb(&bin, 13, value_at_pos_reg);
    
    // If `x13` is zero, jump 16 bytes ahead
    // Else, prepare to jump back to the beginning of the loop
    asm_arm64_pcrelbranch_ze(&bin, 13, 4);
    
    // Calculate beginning-of-loop address and store into `x4`
    asm_arm64_immadd(&bin, 4, loop_lpos, loop_id * 8);
    
    // Load 64 bit address from `x4` into `x12`
    asm_arm64_regldr(&bin, value_at_pos_reg, 4);
    
    // Branch to the beginning of the loop
    asm_arm64_br(&bin, value_at_pos_reg);
    break;

This is the entire JIT compilation loop. I find BF to be a perfect language for writing tooling for. It's simplicity makes learning how to write interpreters and JITs somewhat bearable.

Optimizations

Let's say you have a piece of BF code that looks like this:

++[>++++++++[>++++++<]<]>>+. This prints the letter 'a'

The JIT would output the following assembly:

  // This is the assembly for one add
  add x12, x9, x8; Calculate address for current cell
  ldrb x13, x12; Load value
  add x13, #1; Increment by 1
  strb x13, x12; Store value back
  mov x13, #0; Clear x13
  ...

The JIT would generate those 5 instructions every time it runs into a '+' character. Instead, anytime the JIT encounters a '+' character it continues reading the file until it encounters a character that isn't '+'. We count how many times the loop ran, and that's our value to increment by!

This is done for '-', '<' and '>' operations.

I hope this inspired you to go and write a JIT of your own.

made by ~~> Mateo