Procedures and Functions Call Standards

Procedures and Functions Call Standards

There is no difference between procedures and functions in assembler. The distinction is more about their purpose and structure - functions usually return values and are often called from multiple places. Procedures, on the other hand, are often more focused on a specific task and typically do not return a value. In this section, we will use functions, as this term is commonly used in other programming languages as well. The function is a labelled block of code that performs a specific task. This block can be reused from different parts of a program. Functions follow a consistent structure that includes setting up a stack frame, saving necessary registers, performing the task, and returning control to the function caller. First, we need to get familiar with branch instructions. Branching is a way in which a processor handles decision-making and control flow. Branches jump to another location in the code, either conditionally or unconditionally. These locations are usually labelled with a unique label. They let the program repeat specific actions, skip parts of code, or handle different tasks based on comparisons.

B exit_loop @ unconditional branch to exit_loop
This instruction forces the CPU to jump directly to exit_loop, and the processor will proceed with the following instruction right after the exit_loop label. It does not check any statuses in the status register. This instruction has a range restriction: 128 MB of address space from the current PC register address. This means that the label must be within 128 MB of the device. If the program code is small, i.e. 2MB of instruction code, then this restriction can be ignored. Otherwise, this must be considered, but there is another instruction that does not have such restrictions.
ADR X0, exit_loop @ load exit_loop address into X0 register
BR X0 @ unconditional branch to address stored in X2 register
But the register must hold an address for that location. The address can be loaded in many ways; in this example, the ADR instruction is used to load it.

Conditional Branches

Conditional branches rely on the status flags in the status register. These flags are:

N - Negative
C - Carry
V - Overflow
Z - Zero

Example code:
SUBS X0, X1, #3 @ subtract and update status register
AND X1, X2, X3 @ Perform logical AND operation
B.EQ label @ conditional branch using flags set with SUBS instruction
As the comments pointed out, the logical AND instruction does not update the status register, which is why the branch condition relies on the SUBS instruction result. In the status register, all status flags are updated by the last instruction that issues the status flag update. AArch64 supports many conditional branches. Here are examples of conditions:
B.EQ label @ if Z==1 Equal
B.NE label @ if Z==0 not Equal
B.CS label @ if C==1 Carry set Greater than, equal to, or unordered (identical to HS).
B.HS label @ if C==1 Identical to B.CS
B.CC label @ if C==0 Less than (identical to B.LO)
B.LO label @ if C==0 identical to B.CC
B.MI label @ if N==1 Less than. The result is negative
B.PL label @ if N==0 Greater than, equal to, or unordered. The result is positive or zero
B.VS label @ if V==1 Signed overflow
B.VC label @ if V==0 No signed overfollow
B.HI label @ if C==1 AND Z==0 Grater than
B.LS label @ if C==0 AND Z==1 Less than or equal
B.GE label @ if N==V Greater than or equal
B.LT label @ if N==V Less than
B.GT label @ if N==V and Z==0 Greater than
B.LE label @ if N==V and Z==0 Less than or equal to
B.AL label @ branch always
B.NV label @ branch never
These listed instructions check the condition flags set by a previous instruction that updates the status register, such as CMP or ANDS instruction. Note that B.AL and B.NV are logically useless instruction condition. The condition B.AL is mostly replaced with the B instruction (Branch) without any condition. That’s because the result is the same. The B.NV condition isn’t used because it's not needed. This is something like a NOP (No Operation) instruction, which forces the processor to do nothing. It can be pointed out that many conditions check the same condition flag, but they differ in naming. These mnemonics exist only for code readability. Taking a deeper look at the instruction set documentation, there are also instruction aliases, not only condition aliases. ARM keeps both mnemonic sets (aliases) because historically ARM assembly used CS (Carry Set) or CC (Carry Clear) mnemonics for arithmetic carry. Then came value comparison, for example, unsigned comparison HS (higher or same) or LO (Lower) to make the comparison meaningful. ARM assembly supports all comparison mnemonics, and the aliasing mnemonics produce the same binary code. Similarly, the aliasing instructions also share the same meaning, to preserve the code readability, even if the binary code for the aliasing instructions is equal. This, of course, makes reverse engineering harder because binary code can be translated into any of the aliasing instructions.

Example: Loop with Condition

loop_start:
    CMP X0, #0
    B.EQ loop_end
    SUB X0, X0, #1
    B loop_start
loop_end:
    NOP

This loop runs until X0 becomes zero.

Example: Compare and branch

CBZ X0, label     @ Branch if zero
CBNZ X0, label    @ Branch if not zero

These combine a test and a branch into one instruction. Useful for tight loops and fast decisions.

Example: Test and branch

TBZ X1, #3, label  @ Branch if bit 3 is zero
TBNZ X1, #3, label @ Branch if bit 3 is not zero

These check a single bit in a register and branch based on its value. Suitable for flag or bit checks.

Note that these branch instructions do not update the link register and cannot be used to call the functions or procedures.

The branch with a link is one of the instructions that updates the link register. The following example uses the BL instruction to call a function add_two.

add_two:
    ADD X0, X0, X1    @ X0 = X0 + X1
    RET
	
MOV X0, #5	@ write value 5 into X1 register
MOV X1, #3	@ write value 3 into X1 register
BL add_two @ branch to function "add_two" and store PC+4 into Link register

This branches to a function, and the link register is updated at the same time. To return and execute the next instruction after the branch instruction, the RET instruction must be used as the very last instruction of the function. Functions in assembly rely on the same calling conventions used in higher-level languages. The AArch64 calling convention specifies how arguments are passed, how return values are handled, and which registers must be preserved across function calls.
The link register is a special-purpose register in the AArch64 architecture. It is held in the X30 register. The primary purpose of the X30 register is to store the address of the program code instruction that the program needs to return to after a function call. It maintains the return address when a function is called. The processor sets it automatically when using branch instructions like BL, which stands for branch with link. This allows the program to return to the correct place after a function finishes.
The link register helps avoid extra memory access. It saves the return address in a register rather than pushing it to the stack. This makes function calls faster. A function also begins with a unique label that is not the same as any instructions. If a function needs some arguments, then arguments are passed using registers X0 up to X7. Register use restricts the number of arguments and sizes. When more than eight 64-bit variables must be passed to the called function, or when an array or a specific data structure must be passed, the stack must be used. If the function must return a value, the X0 register can be used to store a single value. Like function status can be returned in the X0 register, but if the function returns more than just one value, then the stack would be the best place to store multiple values.
A typical function:
function_name: @ Function label or function name
STP X29, X30, [SP, #-16]! @ Prologue of function
MOV X29, SP @ Prologue of function
… @ Function body
LDP X29, X30, [SP], #16 @ Epilogue of function
RET @ Epilogue of function
This code saves the Stack pointer and Link register, then updates the Stack pointer. After doing the work, it restores the saved values and returns. However, if a function calls another function, or if the return address needs to be saved across longer sections of code, then the value in X30 should be saved to the stack. This prevents it from being overwritten. The use of MOV, STR, or STP instructions can help save necessary information, including the stack pointer and the link register, to the stack memory. For example, a line of code, STP X30, [SP, #-16]! stores register X30 (the Link Register) on the stack and updates the stack pointer. Later, the link register can be restored with the LDP X30, [SP], #16 instruction. Such a pattern is typical at the very beginning and right after the end of a function, and it may be modified slightly. As in a typical function example with LDP or STP instructions, these instructions load and store register pairs. This ensures the program can return to the correct place even after nested function calls.

The Stack pointer

The stack pointer (SP) is a special register in AArch64 that always points to the top of the stack. The stack itself is an organised block of memory where data is stored in sequential order. It means that the data cannot be placed everywhere inside the stack. It is used to store the data, return addresses, local variables temporarily, and saved registers during function calls. In AArch64, the stack grows downward, meaning that each time data is pushed onto the stack, the stack pointer value decreases and the stack moves to a lower address. In AArch64, each exception level has its own stack pointer. This allows the processor to handle function calls, interrupts, and exceptions safely without mixing data from different exception levels.

The stack is a Last In, First Out (LIFO) structure. The data pushed onto the stack will become the first data to be removed later. The stack pointer always indicates the current top of the stack.

On AArch64, the stack must always remain 16-byte-aligned.

These requirements are defined by the ABI (Application Binary Interface). This alignment ensures compatibility with SIMD and floating-point operations and avoids unaligned memory access errors. SIMD (NEON) instructions operate on 128-bit registers Q0.. Q31(16 bytes). The processor is optimised for aligned load and store instructions. This allows faster memory operations, reducing data transfer cycles and more.

The stack pointer behaves like a general-purpose register in most arithmetic and memory operations, but with some restrictions. It cannot be used as the destination for instructions if the result would be an unaligned address.
STR X0, [SP, #-16]! @stores the value in X0 at the address SP-16, then updates SP
LDR X0, [SP], #16 @loads a value from the top of the stack into X0 register and then increases SP by 16
Prologues of a function include instructions to push data into the stack, and epilogues include instructions to pop data out of the stack. But actually, data stays in memory; the stack pointer points to the location where the new data can be written. The stack is physical memory, but in the entire memory address space, it reserves only a small portion of that range.

Working with stack and function calls, four key elements are used: Stack Pointer itself, the stack (a memory), the stack frame and the frame pointer (the register X29). The stack pointer and the stack are already explained. The stack pointer points to the stack, but the stack itself is built from multiple Stack Frames. The Stack Frame is an area of the stack that holds data and register values to be preserved during function calls. The register X29 (FP – Frame Pointer) contains the memory address of the previous (pre-Prologue) X29 value that is residing in the Current Stack Frame. Pushing the stack means creating a new stack frame, and this is done with function prologues.

When a function begins executing, it often needs to preserve specific registers and allocate space for local variables. This is called creating a stack frame. The frame holds all the data that must be restored when the function returns. A typical function prologue (the code that runs when entering a function) might look like this:
STP X29, X30, [SP, #-16]!
MOV X29, SP
This saves the previous frame pointer (X29) and link register (X30) on the stack, then updates the current frame pointer to the new value of SP. The link register holds the return address, so saving it ensures the program can return to the caller correctly after the function finishes.

When the function is done, it uses an epilogue to restore the saved registers and free the stack space:
LDP X29, X30, [SP], #16
RET
This restores the old frame pointer and link register, moves the stack pointer back up, and returns to the saved address.

A function may need to back up registers to free up registers for a task. When such a scenario occurs, the callee-saved registers are saved first. If a callee-saved register needs to be saved, the first one to be used is X19. If a function needs more, it will work its way up the register list till X28, the final callee-saved register. The prologue will be like this example:
STP FP, LR, [SP, #-0x20]! @ Push (make) new Frame with size of 32 bytes
STP X19, X20, [SP, #0x10] @ Backup (save) X19 and X20 onto the Frame
MOV FP, SP @ Update fp to update the back chain

The epilogue for the above prologue would be like this example:
LDP X19, X20, [SP, #0x10] @ Restore old X19 and X20
LDP FP, LR, [SP], #0x20 @ Restore old FP, LR, and SP. Frame is popped (destroyed)
RET @ End function, return to Caller

Stack Pointer in Exception Levels

AArch64 supports multiple exception levels (EL0 to EL3). Each level can have its own stack pointer. The processor provides two stack pointers for EL1 and above:

SP_EL0 – stack pointer used when running code at EL0 (user mode)
SP_ELx – stack pointer used for each higher exception level (EL1, EL2, EL3)

When an exception or interrupt occurs, the CPU automatically switches to the appropriate stack pointer for the new level. This prevents user-level code from corrupting kernel or hypervisor data and keeps the stacks for each privilege level separate. It can be manually accessed or configured by using the system registers:
MRS X0, SP_EL0 @ Read user-mode stack pointer
MSR SP_EL0, X1 @ Write a new value to it

Using the Stack for Parameter Passing

In AArch64, the first eight function arguments are passed in registers X0 up toX7. If there are more than eight arguments, the rest are passed on the stack. The caller places these extra arguments at known offsets below SP before executing a BL (branch with link) instruction. The callee can access them either through the frame pointer or via LDR instructions relative to the SP stack pointer. Example:
STR X8, [SP, #-16]!
BL long_function
ADD SP, SP, #16
This pushes the ninth argument onto the stack before calling long_function, which can then read it back.

Interrupts

An interrupt is a special signal that may cause the processor to temporarily stop normal program execution to handle an event that requires immediate attention. Interrupts are part of the previously described exception system or exception layer. Interrupts on the processors are handled similarly, regardless of the architecture – the processor saves the current program state, program counter and status register. AArch64 then switches to privileged exception levels and finally jumps to the exception vector – a specific address that tells the processor where to find the instructions to handle the current event. The exception vector contains the address of a specific function. In the ARMv8 documentations, the interrupt is treated as a subset of exceptions. The exception is any event that can force the CPU to stop current normal code execution and start with the exception handler. There are four types of exceptions.

Synchronous exception – the exceptions of this type are always caused by the currently executed instruction. For example, the use of the str instruction to store some data at a memory location that does not exist or for which write operations are not available. In this case, a synchronous exception is generated. Synchronous exceptions can also be used to create a “software interrupt”. A software interrupt is a synchronous exception generated by the svc instruction.

IRQ (Interrupt Request) – these are normal interrupts. They are always asynchronous, meaning they have nothing to do with the currently executing instruction. In contrast to synchronous exceptions, asynchronous exceptions are not always generated by the processor itself but by external hardware.

FIQ (Fast Interrupt Request) – this type of exception is called “fast interrupts” and exists solely for prioritising exceptions. It is possible to configure some interrupts as “normal” and others as “fast”. Fast interrupts will be signalled first and will be handled by a separate exception handler.

SError (System Error) – like IRQ and FIQ, SError exceptions are asynchronous and are generated by external hardware. Unlike IRQ and FIQ, SError always indicates some error condition.

Each exception type needs its own handler, the special function that handles an exact event. Also, separate handlers should be defined for each different exception level at which an exception is generated. If the current code is working on EL1, those states can be defined as follows: the EL1t Exception is taken from EL1, while the stack pointer is shared with EL0. This happens when the SPSel register holds the value 0. EL1h Exception is taken from EL1 at the time when the dedicated stack pointer was allocated for EL1. This means that SPSel holds the value 1. EL0_64 Exception is taken from EL0 executing in 64-bit mode, and EL0_32 Exception is taken from EL0 executing in 32-bit mode. In total, 16 exception handlers must be defined (four exception levels multiplied by four execution states). A special structure that holds addresses of all exception handlers is called the exception vector table, or just the vector table. The AArch64 Reference Manual has information about the vector table structure. Each exception vector has its own offset:

There is no fixed number of interrupts available for the processor. The total number of available interrupts is defined by the Generic Interrupt Controller (GIC) implemented in the system. The Raspberry Pi 5 have a GIC-500 interrupt controller, and according to ARM GIC architecture, the Raspberry Pi 5 can have up to 1020 different interrupt IDs:

ID0..ID15 is used for Software Generated Interrupts (system calls)
The next 16 IDs are used for Private Peripheral Interrupts for a single core
The rest of the IDs are for Shared Peripheral interrupts

Practically, the Raspberry Pi 5 may have hundreds of interrupts from different sources in use, because the SoC chip BCM2712 have a lot of internal peripheral interrupts, the RP1 chip (the one that handles I/O lines and other peripherals on board) uses additional interrupts over PCIe bus. The Linux OS creates its own software interrupt, and finally, Linux combines them through the GIC.

The interrupts can be disabled and enabled. For example:
MSR DAIFCLR, #2 @ enable IRQ (interrupt request)
MSR DAIFCLR, #1 @ Enable FIQ
MSR DAIFSET, #2 @ disable IRQ

The Stack Pointer and Interrupt Handling

When an interrupt or exception occurs, the processor automatically saves the minimal state. It then switches to the stack pointer associated with the current exception level. The interrupt handler can safely use the stack at that level without overwriting user or kernel data. For example, when an IRQ occurs at EL1, the CPU switches from the user’s stack (SP_EL0) to the kernel’s stack (SP_EL1). This change is invisible to user code and helps isolate privilege levels. Inside an interrupt handler, the code must save and restore any registers it modifies. A minimal handler might look like this:
irq_handler: @ the label for the interrupt handler
STP X0, X1, [SP, #-16]!
@ Handle the interrupt (event)
LDP X0, X1, [SP], #16
ERET @ retorn from innetrupt (exception) handler

Here, the stack pointer ensures the handler has a private area to store data safely, even if multiple interrupts occur.

Listing 5: Simple examples of interrupt handlers

irq_el1_handler:
    @ Save registers
    STP X0, X1, [SP, #-16]!
    STP X2, X3, [SP, #-16]!

    @ Acknowledge interrupt (example for GIC)
    MRS X0, ICC_IAR1_EL1       @ Read interrupt ID
    CMP X0, #1020              @ Spurious?
    BEQ irq_done

    @ Handle interrupt (custom code here)
    BL handle_device_irq

    @ Signal end of interrupt
    MSR ICC_EOIR1_EL1, X0

irq_done:
    @ Restore registers
    LDP X2, X3, [SP], #16
    LDP X0, X1, [SP], #16
    ERET                       @ Return from exception

Table of Contents

Procedures and Functions Call Standards

Conditional Branches

The Stack pointer

Interrupts