Procedures and Functions Call Standards

There is no difference between procedures and functions in assembler. The distinction is more about their purpose and structure - functions usually return values and are often called from multiple places. Procedures, on the other hand, are often more focused on a specific task and typically do not return a value. In this section, we will use functions, as this term is commonly used in other programming languages as well. The function is a labelled block of code that performs a specific task. This block can be reused from different parts of a program. Functions follow a consistent structure that includes setting up a stack frame, saving necessary registers, performing the task, and returning control to the function caller. First, we need to get familiar with branch instructions. Branching is a way in which a processor handles decision-making and control flow. Branches jump to another location in the code, either conditionally or unconditionally. These locations are usually labelled with a unique label. They let the program repeat specific actions, skip parts of code, or handle different tasks based on comparisons.

B exit_loop @ unconditional branch to exit_loop
This instruction forces the CPU to jump directly to exit_loop, and the processor will proceed with the following instruction right after the exit_loop label. It does not check any statuses in the status register. This instruction has a range restriction: 128 MB of address space from the current PC register address. This means that the label must be within 128 MB of the device. If the program code is small, i.e. 2MB of instruction code, then this restriction can be ignored. Otherwise, this must be considered, but there is another instruction that does not have such restrictions.
ADR X0, exit_loop @ load exit_loop address into X0 register
BR X0 @ unconditional branch to address stored in X2 register
But the register must hold an address for that location. The address can be loaded in many ways; in this example, the ADR instruction is used to load it.

Conditional Branches

Conditional branches rely on the status flags in the status register. These flags are:

N - Negative
C - Carry
V - Overflow
Z - Zero

Example code:
SUBS X0, X1, #3 @ subtract and update status register
AND X1, X2, X3 @ Perform logical AND operation
B.EQ label @ conditional branch using flags set with SUBS instruction
As the comments pointed out, the logical AND instruction does not update the status register, which is why the branch condition relies on the SUBS instruction result. In the status register, all status flags are updated by the last instruction that issues the status flag update. AArch64 supports many conditional branches. Here are examples of conditions:
B.EQ label @ if Z==1 Equal
B.NE label @ if Z==0 not Equal
B.CS label @ if C==1 Carry set Greater than, equal to, or unordered (identical to HS).
B.HS label @ if C==1 Identical to B.CS
B.CC label @ if C==0 Less than (identical to B.LO)
B.LO label @ if C==0 identical to B.CC
B.MI label @ if N==1 Less than. The result is negative
B.PL label @ if N==0 Greater than, equal to, or unordered. The result is positive or zero
B.VS label @ if V==1 Signed overflow
B.VC label @ if V==0 No signed overfollow
B.HI label @ if C==1 AND Z==0 Grater than
B.LS label @ if C==0 AND Z==1 Less than or equal
B.GE label @ if N==V Greater than or equal
B.LT label @ if N==V Less than
B.GT label @ if N==V and Z==0 Greater than
B.LE label @ if N==V and Z==0 Less than or equal to
B.AL label @ branch always
B.NV label @ branch never
These listed instructions check the condition flags set by a previous instruction that updates the status register, such as CMP or ANDS instruction. Note that B.AL and B.NV are logically useless instruction condition. The condition B.AL is mostly replaced with the B instruction (Branch) without any condition. That’s because the result is the same. The B.NV condition isn’t used because it's not needed. This is something like a NOP (No Operation) instruction, which forces the processor to do nothing. It can be pointed out that many conditions check the same condition flag, but they differ in naming. These mnemonics exist only for code readability. Taking a deeper look at the instruction set documentation, there are also instruction aliases, not only condition aliases. ARM keeps both mnemonic sets (aliases) because historically ARM assembly used CS (Carry Set) or CC (Carry Clear) mnemonics for arithmetic carry. Then came value comparison, for example, unsigned comparison HS (higher or same) or LO (Lower) to make the comparison meaningful. ARM assembly supports all comparison mnemonics, and the aliasing mnemonics produce the same binary code. Similarly, the aliasing instructions also share the same meaning, to preserve the code readability, even if the binary code for the aliasing instructions is equal. This, of course, makes reverse engineering harder because binary code can be translated into any of the aliasing instructions.

Example: Loop with Condition

loop_start:
    CMP X0, #0
    B.EQ loop_end
    SUB X0, X0, #1
    B loop_start
loop_end:
    NOP

This loop runs until X0 becomes zero.

Example: Compare and branch

CBZ X0, label     @ Branch if zero
CBNZ X0, label    @ Branch if not zero

These combine a test and a branch into one instruction. Useful for tight loops and fast decisions.

Example: Test and branch

TBZ X1, #3, label  @ Branch if bit 3 is zero
TBNZ X1, #3, label @ Branch if bit 3 is not zero

These check a single bit in a register and branch based on its value. Suitable for flag or bit checks.

Note that these branch instructions do not update the link register and cannot be used to call the functions or procedures.

The branch with a link is one of the instructions that updates the link register. The following example uses the BL instruction to call a function add_two.

add_two:
    ADD X0, X0, X1    @ X0 = X0 + X1
    RET
	
MOV X0, #5	@ write value 5 into X1 register
MOV X1, #3	@ write value 3 into X1 register
BL add_two @ branch to function "add_two" and store PC+4 into Link register

This branches to a function, and the link register is updated at the same time. To return and execute the next instruction after the branch instruction, the RET instruction must be used as the very last instruction of the function. Functions in assembly rely on the same calling conventions used in higher-level languages. The AArch64 calling convention specifies how arguments are passed, how return values are handled, and which registers must be preserved across function calls.
The link register is a special-purpose register in the AArch64 architecture. It is held in the X30 register. The primary purpose of the X30 register is to store the address of the program code instruction that the program needs to return to after a function call. It maintains the return address when a function is called. The processor sets it automatically when using branch instructions like BL, which stands for branch with link. This allows the program to return to the correct place after a function finishes.
The link register helps avoid extra memory access. It saves the return address in a register rather than pushing it to the stack. This makes function calls faster. A function also begins with a unique label that is not the same as any instructions. If a function needs some arguments, then arguments are passed using registers X0 up to X7. Register use restricts the number of arguments and sizes. When more than eight 64-bit variables must be passed to the called function, or when an array or a specific data structure must be passed, the stack must be used. If the function must return a value, the X0 register can be used to store a single value. Like function status can be returned in the X0 register, but if the function returns more than just one value, then the stack would be the best place to store multiple values.
A typical function:
function_name: @ Function label or function name
STP X29, X30, [SP, #-16]! @ Prologue of function
MOV X29, SP @ Prologue of function
… @ Function body
LDP X29, X30, [SP], #16 @ Epilogue of function
RET @ Epilogue of function
This code saves the Stack pointer and Link register, then updates the Stack pointer. After doing the work, it restores the saved values and returns. However, if a function calls another function, or if the return address needs to be saved across longer sections of code, then the value in X30 should be saved to the stack. This prevents it from being overwritten. The use of MOV, STR, or STP instructions can help save necessary information, including the stack pointer and the link register, to the stack memory. For example, a line of code, STP X30, [SP, #-16]! stores register X30 (the Link Register) on the stack and updates the stack pointer. Later, the link register can be restored with the LDP X30, [SP], #16 instruction. Such a pattern is typical at the very beginning and right after the end of a function, and it may be modified slightly. As in a typical function example with LDP or STP instructions, these instructions load and store register pairs. This ensures the program can return to the correct place even after nested function calls.

The Stack pointer

The stack pointer (SP) is a special register in AArch64 that always points to the top of the stack. The stack itself is an organised block of memory where data is stored in sequential order. It means that the data cannot be placed everywhere inside the stack. It is used to store the data, return addresses, local variables temporarily, and saved registers during function calls. In AArch64, the stack grows downward, meaning that each time data is pushed onto the stack, the stack pointer value decreases and the stack moves to a lower address. In AArch64, each exception level has its own stack pointer. This allows the processor to handle function calls, interrupts, and exceptions safely without mixing data from different exception levels.

The stack is a Last In, First Out (LIFO) structure. The data pushed onto the stack will become the first data to be removed later. The stack pointer always indicates the current top of the stack.

On AArch64, the stack must always remain 16-byte-aligned.

These requirements are defined by the ABI (Application Binary Interface). This alignment ensures compatibility with SIMD and floating-point operations and avoids unaligned memory access errors. SIMD (NEON) instructions operate on 128-bit registers Q0.. Q31(16 bytes). The processor is optimised for aligned load and store instructions. This allows faster memory operations, reducing data transfer cycles and more.

The stack pointer behaves like a general-purpose register in most arithmetic and memory operations, but with some restrictions. It cannot be used as the destination for instructions if the result would be an unaligned address.
STR X0, [SP, #-16]! @stores the value in X0 at the address SP-16, then updates SP
LDR X0, [SP], #16 @loads a value from the top of the stack into X0 register and then increases SP by 16
Prologues of a function include instructions to push data into the stack, and epilogues include instructions to pop data out of the stack. But actually, data stays in memory; the stack pointer points to the location where the new data can be written. The stack is physical memory, but in the entire memory address space, it reserves only a small portion of that range.

Working with stack and function calls, four key elements are used: Stack Pointer itself, the stack (a memory), the stack frame and the frame pointer (the register X29). The stack pointer and the stack are already explained. The stack pointer points to the stack, but the stack itself is built from multiple Stack Frames. The Stack Frame is an area of the stack that holds data and register values to be preserved during function calls. The register X29 (FP – Frame Pointer) contains the memory address of the previous (pre-Prologue) X29 value that is residing in the Current Stack Frame. Pushing the stack means creating a new stack frame, and this is done with function prologues.

When a function begins executing, it often needs to preserve specific registers and allocate space for local variables. This is called creating a stack frame. The frame holds all the data that must be restored when the function returns. A typical function prologue (the code that runs when entering a function) might look like this:
STP X29, X30, [SP, #-16]!
MOV X29, SP
This saves the previous frame pointer (X29) and link register (X30) on the stack, then updates the current frame pointer to the new value of SP. The link register holds the return address, so saving it ensures the program can return to the caller correctly after the function finishes.

When the function is done, it uses an epilogue to restore the saved registers and free the stack space:
LDP X29, X30, [SP], #16
RET
This restores the old frame pointer and link register, moves the stack pointer back up, and returns to the saved address.

A function may need to back up registers to free up registers for a task. When such a scenario occurs, the callee-saved registers are saved first. If a callee-saved register needs to be saved, the first one to be used is X19. If a function needs more, it will work its way up the register list till X28, the final callee-saved register. The prologue will be like this example:
STP FP, LR, [SP, #-0x20]! @ Push (make) new Frame with size of 32 bytes
STP X19, X20, [SP, #0x10] @ Backup (save) X19 and X20 onto the Frame
MOV FP, SP @ Update fp to update the back chain

The epilogue for the above prologue would be like this example:
LDP X19, X20, [SP, #0x10] @ Restore old X19 and X20
LDP FP, LR, [SP], #0x20 @ Restore old FP, LR, and SP. Frame is popped (destroyed)
RET @ End function, return to Caller

Stack Pointer in Exception Levels

en/multiasm/paarm/chapter_5_7.1764843563.txt.gz · Last modified: 2025/12/04 10:19 by eriks.klavins

Table of Contents

Procedures and Functions Call Standards

Conditional Branches

The Stack pointer