Compatibility for Compilers and OSes

So, as the Raspberry Pi 5 uses a 64-bit ARMv8.2A CPU (Cortex-A76), it supports the AArch64 instruction set. That means that all assembler compilers can be used that target the AArch64 instruction set. The Raspberry Pi 5 OS configuration allows the Operating System to start in 32-bit mode. To avoid unnecessary problems, the OS must run in 64-bit mode. This can be verified by checking the /boot/firmware/config.txt file. In the file, the parameter “arm_64bit” must be set to one (to 64-bit mode). If this parameter is missing, then it can be added at the end of the file. There are options with compiler selection: one option is GNU Assembler (GAS) via GCC; another is the LLVM assembler in CLANG. CLANG was originally not meant to compile assembler code, but LLVM is. LLVM includes an integrated assembler that supports most LLVM targets, including ARM and ARM64. Both can be installed on the Raspberry Pi operating system:

GNU GCC: sudo apt install build-essential
CLANG: sudo apt install clang
or if just LLVM is needed: sudo apt install llvm

Either way, the assembly code must be saved in a file with the extension “S” or “.s” before it can be assembled into an executable binary file. To make it easier to navigate between all created programs, it is recommended to create a new folder for each new code project. In such a case, the folder will contain the assembler code file, the compiled object file, and the executable binary file.

All examples will be compiled with GCC assembler or GAS (GNU Assembler). Assuming that the assembler code already exists in a program.S file, it can be compiled with “as -o program.o program.s” and then using the linker command “ ld -o program program.o”. The first step creates a program.o file, or object file that contains all objects, the machine code, text messages, constants, etc., into one single file. Later, the object file becomes an executable binary once all objects in the file have been linked.

Sometimes compiling the code on another PC may be advantageous. Larger codes can be compiled faster. But for small code projects, it may be easier to compile them directly on Raspberry Pi OS. This removes necessary connections between the PC and the Raspberry Pi.

Here goes the basic assembly code to print out a text message in a hello.s:

.global _start
_start:
    MOV X0, #1              @ stdout
    LDR X1, =msg            @ address of string
    MOV X2, #13             @ string length
    MOV X8, #64             @ write syscall
    SVC #0
    MOV X8, #93             @ exit syscall
    MOV X0, #0
    SVC #0
msg: .ascii "Text Message!\n"

Build it by calling “as -o hello.o hello.s” and then “ld -o hello hello.o” commands in the Linux OS command line. After that it can be executed by the “./hello” command. During program execution time, the “Text Message!” should be shown in the terminal. This example uses the Linux OS system calls.

System calls

The way Raspberry Pi OS calls created code depends on how it is built and how it runs, because this can be done at several different levels. The code can be written and executed as a user-space program on some OS, such as Linux. It is the most common and at the same time the easiest way to create and execute assembly code. Raspberry Pi Os will build and create assembly code into an ELF executable.
Commands “as -o program.o program.s” and “ld -o program program.o” create a Linux executable file. This file follows the ELF format (Executable and Linkable Format). After that, the developed program can be executed as an ordinary Linux program (i.e. ‘./program’). At this moment, the shell (bash) calls the Linux kernel using the execve() system call. The Linux kernel reads the ELF header of the newly created program and loads the code and data into memory. The kernel also arranges the stack and, if arguments are passed to the program, prepares the registers and their initial values. And the last thing for the kernel is to set the program counter register to the _start symbol in the program, the memory address where the Linux kernel copied the program code it created. At this point, the created program code begins executing - the designed program is being called like an ordinary function - we can pass the arguments and receive the data.

In a pure assembly program, the label “_start” must be present. This creates an entry point into the program where Linux can jump in. Let's take the same example from the previously developed code and break it.

.global _start @ define _start label as global in the whole program
_start: @ the point where created program begins
MOV X0, #1 @ Set stdout
LDR X1, =msg @ pass an argument - address of string
MOV X2, #13 @ pass an argument - string length
MOV X8, #64 @ write syscall
SVC #0 @ call syscall
MOV X8, #93 @ exit syscall
MOV X0, #0 @ pass an argument 0 (success)
SVC #0 @ Finish this program execution and exit
msg: .ascii “Text Message!\n”

Another rule is proper exit from the program. In Linux, the code cannot use the RET instruction to return to the label ‘_start’ where the program was started. Because there is no return address, the created and program code is the first thing executed after loading into memory, and it must end by telling the kernel to end this program’s execution. This is also done through system calls:
MOV X8, #93 @ syscall number for exit
MOV X0, #0 @ return code
SVC #0 @ make the syscall
After these instructions are executed, the kernel will stop the process and free up the stack memory. Processor control is returned to the OS shell.

Bare-Metal program code

As for the second level, the code can be made bare-metal and does not depend on an OS. This type of code is much more complex because it heavily relies on knowledge of hardware, its address space, and related details. On the other hand, this type of code in hardware will execute much faster than the same code in the OS. That’s because the OS schedules multiple tasks, and the program can be halted for some time while other programs run. These types of codes are present on all devices with processors, whether or not the device uses an OS. The bootloader is a bare-metal program that prepares the hardware for the OS. If there is no OS, the program is designed to perform a specific task on the hardware.

The special program runs immediately after a system reset. Taking an example of the bootloader in the Raspberry Pi. Before the bootloader loads into the memory, the very special startup binary code sets the CPU state, initialises memory and sets SP to a high address. Caches and Memory management units (MMUs) are turned off by default to save energy in case something fails, but the startup script can be edited to enable them. At the end of the startup code, the Program Counter register is set to a specified address where the main program starts (i.e., the bootloader). The bootloader checks the hardware, initialises and prepares it to work with the operating system. In this case, the bootloader and the operating system can be replaced with a bare-metal program. Program code runs directly on the processor; there is no system-call layer or stack setup unless the program code provides it.

Example code to blink the LED connected to Raspberry Pi GPIO17 (pin 11 on the physical board). Theoretically, the example code should work on older Raspberry Pi versions, but since the board now contains a dedicated chip, it becomes much harder because the PCIe must be initialised, the address space must be known, and much more. Unfortunately, there is no comprehensive documentation for Raspberry Pi 5, including its address space and other details.

Listing 2: Bare metal program code

	.equ PERIPH_BASE,   0x40000000  @ RP1 peripheral base
	.equ GPIO_BASE,   PERIPH_BASE + 0x0D0000  @ GPIO controller base
	.equ GPFSEL0,     0x00  @ Function select register 0
	.equ GPFSEL1,     0x04  @ Function select register 1
	.equ GPSET0,      0x1C  @ Pin output set register
	.equ GPCLR0,      0x28  @ Pin output clear register

@ the next section defines that ongoing code is program code
    .section .text
    .global _start
_start:
    @ Initialise stack pointer
    LDR     X0, =stack_top
    MOV     SP, X0

    @ Configure GPIO17 as output
    LDR     X1, =GPIO_BASE
    LDR     W2, [X1, #GPFSEL1]     @ Each pin takes 3 bits
    BIC     W2, W2, #(0x7 << 21)   @ Clear bits for GPIO17
    ORR     W2, W2, #(0x1 << 21)   @ Set bits to 001 (GPIO17 is output)
    STR     W2, [X1, #GPFSEL1]

blink_loop:
    @ Set GPIO17 high
    MOV     W3, #(1 << 17)
    STR     W3, [X1, #GPSET0]

    @ Simple delay loop
    MOV     X4, #0x200000  @ set value
delay1:	
    SUBS    X4, X4, #1     @ decrease register value
    B.NE    delay1         @ if not equal to zero, repeat

    @ Set GPIO17 low
    STR     W3, [X1, #GPCLR0]

    @ Delay again
    MOV     X4, #0x200000
delay2:
    SUBS    X4, X4, #1
    B.NE    delay2

    B       blink_loop             @ repeat 

@Reserve space for stack
    .align 16
stack_top:
    .space 4096

This code must be loaded with the bootloader as “kernel8.img” at address 0x80000. These commands will compile and prepare the program to replace the OS:

as blink.s -o blink.o
ld blink.o -Ttext=0x80000 -o blink.elf
objcopy -O binary blink.elf kernel8.img

After that, the code will continue to execute until the power is switched off, the processor is RESET, or an unexpected exception occurs. Here, the code is responsible for everything, including setting up the stack pointer, enabling caches, handling interrupts, and working with devices directly at hardware addresses. This requires reviewing all hardware-related documentation. But these programs tend to be much faster and more robust than OS related programs. This is a typical way to design programs such as device firmware, RTOS, or bootloader.

The same code can be adjusted to work as a kernel module or a driver. In such a case, the code will require much more editing and investigation into OS-related documentation. The kernel must know how often this program should execute, and it may also need to work with mutexes. This is better implemented in C, as it involves multiple runtime libraries from the Linux OS.

en/multiasm/paarm/chapter_5_8.txt · Last modified: 2025/12/04 13:50 by eriks.klavins

Table of Contents

Compatibility for Compilers and OSes

System calls

Bare-Metal program code