This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:paarm:chapter_5_8 [2024/09/27 21:00] – pczekalski | en:multiasm:paarm:chapter_5_8 [2025/12/04 13:50] (current) – [Bare-Metal program code] eriks.klavins | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Compatibility for Compilers and OSes ====== | ====== Compatibility for Compilers and OSes ====== | ||
| + | |||
| + | So, as the Raspberry Pi 5 uses a 64-bit ARMv8.2A CPU (Cortex-A76), | ||
| + | There are options with compiler selection: one option is GNU Assembler (GAS) via GCC; another is the LLVM assembler in CLANG. CLANG was originally not meant to compile assembler code, but LLVM is. LLVM includes an integrated assembler that supports most LLVM targets, including ARM and ARM64. Both can be installed on the Raspberry Pi operating system: | ||
| + | * GNU GCC: '' | ||
| + | * CLANG: '' | ||
| + | * or if just LLVM is needed: '' | ||
| + | |||
| + | Either way, the assembly code must be saved in a file with the extension “S” or “.s” before it can be assembled into an executable binary file. To make it easier to navigate between all created programs, it is recommended to create a new folder for each new code project. In such a case, the folder will contain the assembler code file, the compiled object file, and the executable binary file. | ||
| + | |||
| + | All examples will be compiled with GCC assembler or GAS (GNU Assembler). Assuming that the assembler code already exists in a program.S file, it can be compiled with “'' | ||
| + | |||
| + | Sometimes compiling the code on another PC may be advantageous. Larger codes can be compiled faster. But for small code projects, it may be easier to compile them directly on Raspberry Pi OS. This removes necessary connections between the PC and the Raspberry Pi. | ||
| + | |||
| + | Here goes the basic assembly code to print out a text message in a hello.s: | ||
| + | < | ||
| + | < | ||
| + | .global _start | ||
| + | _start: | ||
| + | MOV X0, #1 @ stdout | ||
| + | LDR X1, =msg @ address of string | ||
| + | MOV X2, #13 @ string length | ||
| + | MOV X8, #64 @ write syscall | ||
| + | SVC #0 | ||
| + | MOV X8, #93 @ exit syscall | ||
| + | MOV X0, #0 | ||
| + | SVC #0 | ||
| + | msg: .ascii "Text Message!\n" | ||
| + | </ | ||
| + | </ | ||
| + | Build it by calling "'' | ||
| + | |||
| + | ===== System calls ===== | ||
| + | |||
| + | The way Raspberry Pi OS calls created code depends on how it is built and how it runs, because this can be done at several different levels. The code can be written and executed as a user-space program on some OS, such as Linux. It is the most common and at the same time the easiest way to create and execute assembly code. Raspberry Pi Os will build and create assembly code into an ELF executable.\\ | ||
| + | Commands “'' | ||
| + | |||
| + | In a pure assembly program, the label “'' | ||
| + | * ''< | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | *'' | ||
| + | |||
| + | Another rule is proper exit from the program. In Linux, the code cannot use the RET instruction to return to the label ‘_start’ where the program was started. Because there is no return address, the created and program code is the first thing executed after loading into memory, and it must end by telling the kernel to end this program’s execution. This is also done through system calls:\\ | ||
| + | ''< | ||
| + | ''< | ||
| + | ''< | ||
| + | After these instructions are executed, the kernel will stop the process and free up the stack memory. Processor control is returned to the OS shell. | ||
| + | |||
| + | |||
| + | ===== Bare-Metal program code ===== | ||
| + | |||
| + | As for the second level, the code can be made bare-metal and does not depend on an OS. This type of code is much more complex because it heavily relies on knowledge of hardware, its address space, and related details. On the other hand, this type of code in hardware will execute much faster than the same code in the OS. That’s because the OS schedules multiple tasks, and the program can be halted for some time while other programs run. These types of codes are present on all devices with processors, whether or not the device uses an OS. The bootloader is a bare-metal program that prepares the hardware for the OS. If there is no OS, the program is designed to perform a specific task on the hardware. | ||
| + | |||
| + | The special program runs immediately after a system reset. Taking an example of the bootloader in the Raspberry Pi. Before the bootloader loads into the memory, the very special startup binary code sets the CPU state, initialises memory and sets SP to a high address. Caches and Memory management units (MMUs) are turned off by default to save energy in case something fails, but the startup script can be edited to enable them. At the end of the startup code, the Program Counter register is set to a specified address where the main program starts (i.e., the bootloader). The bootloader checks the hardware, initialises and prepares it to work with the operating system. In this case, the bootloader and the operating system can be replaced with a bare-metal program. Program code runs directly on the processor; there is no system-call layer or stack setup unless the program code provides it. | ||
| + | |||
| + | Example code to blink the LED connected to Raspberry Pi GPIO17 (pin 11 on the physical board). Theoretically, | ||
| + | |||
| + | < | ||
| + | < | ||
| + | < | ||
| + | .equ PERIPH_BASE, | ||
| + | .equ GPIO_BASE, | ||
| + | .equ GPFSEL0, | ||
| + | .equ GPFSEL1, | ||
| + | .equ GPSET0, | ||
| + | .equ GPCLR0, | ||
| + | |||
| + | @ the next section defines that ongoing code is program code | ||
| + | .section .text | ||
| + | .global _start | ||
| + | _start: | ||
| + | @ Initialise stack pointer | ||
| + | LDR X0, =stack_top | ||
| + | MOV SP, X0 | ||
| + | |||
| + | @ Configure GPIO17 as output | ||
| + | LDR X1, =GPIO_BASE | ||
| + | LDR W2, [X1, # | ||
| + | BIC W2, W2, #(0x7 << 21) @ Clear bits for GPIO17 | ||
| + | ORR W2, W2, #(0x1 << 21) @ Set bits to 001 (GPIO17 is output) | ||
| + | STR W2, [X1, #GPFSEL1] | ||
| + | |||
| + | blink_loop: | ||
| + | @ Set GPIO17 high | ||
| + | MOV W3, #(1 << 17) | ||
| + | STR W3, [X1, #GPSET0] | ||
| + | |||
| + | @ Simple delay loop | ||
| + | MOV X4, # | ||
| + | delay1: | ||
| + | SUBS X4, X4, #1 @ decrease register value | ||
| + | B.NE delay1 | ||
| + | |||
| + | @ Set GPIO17 low | ||
| + | STR W3, [X1, #GPCLR0] | ||
| + | |||
| + | @ Delay again | ||
| + | MOV X4, #0x200000 | ||
| + | delay2: | ||
| + | SUBS X4, X4, #1 | ||
| + | B.NE delay2 | ||
| + | |||
| + | B | ||
| + | |||
| + | @Reserve space for stack | ||
| + | .align 16 | ||
| + | stack_top: | ||
| + | .space 4096 | ||
| + | |||
| + | </ | ||
| + | </ | ||
| + | This code must be loaded with the bootloader as “kernel8.img” at address '' | ||
| + | - '' | ||
| + | - '' | ||
| + | - '' | ||
| + | |||
| + | After that, the code will continue to execute until the power is switched off, the processor is RESET, or an unexpected exception occurs. Here, the code is responsible for everything, including setting up the stack pointer, enabling caches, handling interrupts, and working with devices directly at hardware addresses. This requires reviewing all hardware-related documentation. But these programs tend to be much faster and more robust than OS related programs. This is a typical way to design programs such as device firmware, RTOS, or bootloader. | ||
| + | |||
| + | The same code can be adjusted to work as a kernel module or a driver. In such a case, the code will require much more editing and investigation into OS-related documentation. The kernel must know how often this program should execute, and it may also need to work with mutexes. This is better implemented in C, as it involves multiple runtime libraries from the Linux OS. | ||
| + | |||