Differences

This shows you the differences between two versions of the page.

--- en:multiasm:paarm:chapter_5_2 [2025/05/29 08:46] – eriks.klavins
+++ en:multiasm:paarm:chapter_5_2 [2025/12/02 21:31] (current) – eriks.klavins
@@ Line 1: / Line 1: @@
 ====== ARM Assembly Language Specifics ======
-CODE FORMAT and IMAGES MISSING
-Before starting programming, it's good to know how to add comments to our newly created code and the syntax specifics. This mainly depends on the selected compiler. All code examples provided will be created for the GNU ARM ASM. { https://sourceware.org/binutils/docs-2.26/as/}
+Before starting programming, it's good to know how to add comments to our newly created code and the syntax specifics. This mainly depends on the selected compiler. All code examples provided will be written in GNU ARM ASM. { https://sourceware.org/binutils/docs-2.26/as/}
-The symbol ‘@’ is used to create a comment in the code till the end of the exact line. Some other assembly languages use ‘;’ to comment, but for ARM assembly language, the ‘;’ character indicates a new line to separate statements. The suggestion is to avoid the use of this ‘;’ character; there will be no such statements that would be divided into separate code lines. Multiline comments are created in the same way as in C/C++ programming languages by use of ‘/*’ and ‘*/’ character combinations.
+The symbol ‘@’ is used to create a comment in the code till the end of the exact line. Some other assembly languages use ‘;’ to comment, but in ARM assembly language, ‘;’ indicates a new line to separate statements. The suggestion is to avoid the use of this ‘;’ character; there will be no such statements that would be divided into separate code lines. Multiline comments are created in the same way as in C/C++ programming languages by use of ‘/*’ and ‘*/’ character combinations. However, it is better to read the manual and syntax for each software used to create the code and compile it.
-Now, let's compare the Cortex-M and Cortex-A series. The ARMv8-A processor series supports three instruction sets: A32, T32, and A64. A32 and T32 are the Arm and Thumb instruction sets, respectively. These instruction sets are used to execute instructions in the AArch32 Execution state. The A64 instruction set is used when executing in the AArch64 Execution state, and there are no 64-bit wide instructions; it does not refer to the size of the instructions in the memory.
-ARM Cortex-M processors based on ARMv7 can operate with two instruction types, Thumb and ARM instruction types. The Thumb instructions are 16-bit wide and have some restrictions, like access to the registers. Many microcontrollers are currently based on the ARMv7 processor, but new ones based on ARMv8 and ARMv8.1 are already available. The ARMv8-M series processors work only with Thumb instructions, and there is no ARM state to execute ARM32 instructions, so these processors do not support the A32 instruction set. The feature is removed, maybe because there was no need to use the ARM state – a lot of compilers use the Thumb instruction set.
+Now, let's compare the Cortex-M and Cortex-A series. The ARMv8-A processor series supports three instruction sets: A32, T32, and A64. A32 and T32 are the Arm and Thumb instruction sets, respectively. These instruction sets are used to execute instructions in the AArch32 Execution state. The A64 instruction set is used when executing in the AArch64 Execution state, and there are no 64-bit-wide instructions; it does not refer to instruction size in memory.
-Looking at the ARMv8-A processors, the instruction sets make some difference. AArch32 execution state can execute programs designed for ARMv7 processors. This means many A32 or T32 instructions on ARMv8 are identical to the ARMv7 ARM or THUMB instructions, respectively. On the ARMv7 processors, the THUMB instruction set has some restrictions, like reduced access to general-purpose registers, whereas the ARM instruction set has all the registers accessible. Similar access restrictions are on the ARMv8 processors.
+ARM Cortex-M processors based on ARMv7 can execute two instruction sets: Thumb and ARM. The Thumb instructions are 16-bit wide and have some restrictions, such as access to registers. Many microcontrollers are currently based on the ARMv7 processor, but new ones based on ARMv8 and ARMv8.1 are already available. The ARMv8-M series processors support only Thumb instructions; there is no ARM state to execute ARM32 instructions, so these processors do not support the A32 instruction set. The feature was removed, maybe because there was no need to use the ARM state – many compilers use the Thumb instruction set.
-Let's look at the machine code produced by the assembler instructions to determine why the exact instruction set has some restrictions on the ARMv8 processor. We will take just one instruction, one of the most common, the MOV instruction. More than one machine code for the MOV instruction is produced because of the different addressing modes. We will take just one instruction, which copies a value stored in one register to another.
-**A64**:
+Looking at the ARMv8-A processors, the instruction sets make a difference. AArch32 execution state can execute programs designed for ARMv7 processors. This means many A32 or T32 instructions on ARMv8 are identical to the ARMv7 ARM or THUMB instructions, respectively. On ARMv7 processors, the THUMB instruction set has some restrictions, such as reduced access to general-purpose registers, whereas the ARM instruction set provides complete access to all registers. Similar access restrictions can also be found on the ARMv8 processors.
-	The machine code for the MOV instruction is given in the figure above. The bit values presented are fixed for proper operation identification. The ‘sf’ bit identifies data encoding variant 32-bit (sf=0) or 64-bit(sf=1). [ISA_A64 p.646]
+Let's look at the machine code generated by the assembler to determine why the exact instruction set imposes restrictions on the ARMv8 processor. We will take just one instruction, one of the most common, the MOV instruction. Multiple machine code variants for the MOV instruction are generated due to different addressing modes. We will take just one instruction that copies a value from one register to another. All information on the machine code is available on the ARM homepage: for A64, the documentation number is DDI0602.
+**A64**
+The machine code for the MOV instruction is given in the figure above. The bit values presented are fixed for proper identification of the operation. The ‘sf’ bit identifies data encoding variant 32-bit (sf=0) or 64-bit(sf=1). The ‘sf’ bit does not change the instruction binary code.
+{{:en:multiasm:paarm:mov64_32_bitform.jpg|}}
 The ‘Rm’ and the ‘Rd’ bit fields identify the exact register number from which the data will be copied to the destination register. Each is 5 bits wide, so all 32 CPU registers can be addressed. The ‘sf’ bit only identifies the number of bits to be copied between registers: 32 or 64 bits of data. The ‘opc’ bitfield identifies the operation variant (addressing mode for this instruction), and the ‘N’ bit, mainly used for bitwise shift operations; for others, like this one, this bit has no meaning. There are instructions where the ‘N’ bit is used to identify some instruction options, but this bit is used with the ‘imm6’ bits together.
-**A32**:
-	The figure above shows the same instruction, but the register address is smaller than for the instruction in A64. The bitfields ‘Rd’ and ‘Rm’ are 4-bit wide, so only 16 CPU registers can be addressed using this instruction in A32.
-Other bitfields, like ‘cond’, are also used for conditional instruction execution. The ‘S’ bit identifies whether the status register must be updated. The ’stype’ bitfield is used for the type of shift to be applied to the second source register, and finally, the ‘imm5’ bitfield is used to identify the shift amount 0 to 31.
-**T32**: The Thumb instructions have multiple machine codes for this one operation
-T1 THUMB instruction D bit and Rd fields together identify the destination register. The source and destination registers can now be addressed through only four bits: only 16 general-purpose registers are accessible. A smaller number of registers can be accessed in the following machine code.
-The OP bitfield identifies the shift type, and imm5 identifies its amount. The result Rd will be shifted by imm5 bits of the Rm register value. Notice that only three bits are used to address the general-purpose registers – only eight registers are accessible.
-Finally, the last machine code for this instruction is 16 16-bit wide instruction, but still, the Rd and the Rm fields are four bits wide. This instruction has more shift operations available than the previous one, but instead of one imm5 field, which identifies the shift amount, it is divided into two parts: imm3 and imm2. Both parts are combined for the same purpose: to identify the shift amount.
-Different machine codes for the T32 instructions give the ability to choose the most suitable one, but the code must be consistent with one machine code type. Switching between machine code types in the processor is still possible, but compiling such code with multiple machine codes will be even more complicated than learning assembler.
-Summarising these instruction sets, the A32 was best for ARMv7 processors, with only 16 general-purpose registers available. For the ARMv8, there are 31 registers to be addressed, forcing ARM to introduce us to the A64 instruction set, where 32 registers can be addressed. This is why use of the A64 instruction set in the following sections.
-Instruction options
-All assembly language types use similar mnemonics for arithmetic operations (some may require additional suffixes to identify some options for the instruction). A32 assembly instructions have specific suffixes to make commands executed conditionally, and those four most significant bits for many instructions give this ability. Unfortunately, there is no such option for A64, but there are special conditional instructions that we are going to describe in the following subsection.
-We looked at a straightforward instruction and its exact machine code in the previous section. Examining machine codes for each instruction is a perfect way to learn all the available options and all the restrictions. To help understand and read the instruction set documentation, there will be another example for the ADD instruction in the A64 instruction set.
-The ADD instruction: let's first look at the assembler instruction, which adds two registers together and stores the result in a third register.
-ADD X0, X1, X2       @X0 = X1 + X2
-We need to look at the instruction set documentation to determine the possible options for this instruction. In the documentation, we can find three main differences between the ADD instructions. Despite that, for the data manipulation instruction, the ‘S’ suffix can be added to update the status flags in the processor Status Register.
-.	The ADD and ADDS instructions with extended registers:
-ADD X3, X4, W5, UXTW       @X3 = X4 + W5
-ADDS X3, X4, W5, UXTW       @X3 = X4 + W5
-The machine code representation to the assembler instruction would be like:
-	ADDS X3, X4, W5, UXTW
-Rd = X3  @ pointer to the register where the result will be stored
-Rn = X4  @ pointer to the First operand of the provided operands
-Rm = W5  @ pointer to the Second operand of the provided operands, which will be extended to 64 bits
-We already know that the ‘sf’ bit identifies the length of the data (32 or 64 bits). The main difference between these two instructions is in the ‘S’ bit. The same is in the name of the instruction. The ‘S’ bit is meant to identify for the processor to update the status bits after instruction execution. These status bits are crucial for conditions. The 30th ‘op’ bit and ‘opt’ bits are fixed and not used for this instruction. The three option bits (13th to 15th) extend the operation. These bits are used to extend the second source (Rm) operand. This is handy when the source operands differ in length, like the first operand is 16-bit wide and the second is 8-bit wide. The second register must be extended to maintain the data alignment.
-Overall, there are three bits: 8 different options to extend the second source operand. The table below explains all these options. Let's look at those options only, the bit values are irrelevant for learning the assembler.
-UXTB/SXTB	Unsigned/Signed byte (8-bit) is extended to word (32-bit)
-UXTH/SXTH	Unsigned/Signed halfword (16-bit) is extended to word (32-bit)
-UXTW/SXTW	Unsigned/Signed word (32-bit) is extended to double word (64-bit)
-UXTX/SXTX or LSL	Unsigned/Signed double word (64-bit) is extended to double word (64-bit), and there is no use for such extension with the unsigned data type.
-For the UXTX, the LSL shift is preferred if the ‘imm3’ bits are set from 0 to 4. Other ranges are reserved and unavailable because the result can be unpredictable. Moreover, this shift is only available if the ‘Rd’ or the ‘Rn’ operands are equal to ‘11111’, which is the stack pointer (SP). In all other cases, the UXTX extension will be used.
-In the conclusion for this instruction type, it is handy when the operands are of different lengths, but that’s not all. The shift provided to the second operand allows us to multiply it by 2, 4, 8 or 16, but it works only if the destination register is 64 bits wide (the Xn registers are used). The shift amount is restricted to 4 bits only, even when the ‘imm3’ can identify the larger values. Also, the SXTB/H/W/X are used when the second operand can store negative integers.
-ADDS X3, X4, W5, SXTX #2
-    /*extend the W5 register to 64 bits and then shift it by 2 (LSL), that makes a multiplication by 4 (W5=W5*4)
-    add the multiplied value to the X4+(W5*4)
-    store the result in the X3 register X3 = X4 + (W5*4) */
-ADD X3, X4, W5, UXTB #1
+**A32**
-    /*Take the lowest byte from W5 (W5[7:0])
-    Zero-extend it to 64-bit
+The figure above shows the same instruction, but the register address is smaller than in the A64 instruction. The bitfields ‘Rd’ and ‘Rm’ are 4-bit wide, so only 16 CPU registers can be addressed using this instruction in A32.
-    Shifts left by 1 (multiply by 2)
-    Add to X4 and store in X3*/
+{{:en:multiasm:paarm:mov32_bitform.svg|}}
+Other bitfields, like ‘cond’, are also used for conditional instruction execution. The ‘S’ bit identifies whether the status register must be updated. The ’stype’ bitfield is used for the type of shift to be applied to the second source register, and finally, the ‘imm5’ bitfield is used to identify the shift amount 0 to 31.
-ADD X7, X8, W9, SXTH
+<table tab_label>
-    /* Take W9[15:0], sign-extend to 64 bits without shifting */
+<caption>Shift options</caption>
-    /* Add to X8 and store in X7 */
+^ ‘stype’ bit-field value ^ Shift type ^ Meaning ^
-    /* X7 = X8 + W9[15:0] */
+|0b00 |LSL |Logical Shift Left |
+|0b01 |LSR |Logical Shift Right |
+|0b10 |ASR |Arithmetic Shift Right |
+|0b11 |ROR |ROtate Right |
+</table>
-.	The ADDS (ADD) instructions with immediate value:
+Many instructions include options such as bit shifting. These operations also have specific instructions for binary bit shifting. These shifts affect the operand values. Shifting the register left or right by one bit multiplies or divides the value by 2, respectively.
-In the machine code, it is possible to identify the maximum value that can be added to the register. The ‘imm12’ bits restrict the value to 0 to 4095. Besides that, the ‘sh’ bit allows to shift left (LSL) the immediate value by 12 bits.
-Examples:
-ADD W0, W1, #100    @Basic 32-bit ADD
-@Adds 100 to W1, stores in W0 and no shift is performed
-ADD X0, X1, #4095   @ Basic 64-bit ADD
+{{ :en:multiasm:paarm:instroptlsl.svg |}}
-@Adds 4095 to X1, stores in X0
+{{ :en:multiasm:paarm:instroptlsr.svg |}}
+{{ :en:multiasm:paarm:instroptasr_1.svg |}}
+{{ :en:multiasm:paarm:instroptasr_2.svg |}}
+{{ :en:multiasm:paarm:instroptror.svg |}}
-ADD X2, X3, #1, LSL #12 @ 64-bit ADD with shifted immediate (LSL #12)
+**T32**
-@Add 4096 to X3 (1 << 12 = 4096)
-@Store the result in X2
-ADD X4, SP, #256    @ Using SP as base register
+The Thumb instructions have multiple machine codes for this one operation.
-@ Add 256 to SP. Useful for frame setup or stack management
-ADD SP, SP, #64     @ Writing result to SP (Stack pointer arithmetic)
+{{:en:multiasm:paarm:thumbt1.svg|}}
-;Add 64 to SP and writes back to SP
+T1 THUMB instruction D bit and Rd fields together identify the destination register. The source and destination registers can now be addressed with only 4 bits, so only 16 general-purpose registers are accessible. A smaller number of registers can be accessed in the following machine code.
-ADD W5, W6, #2, LSL #12 @ 32-bit ADD with shifted immediate
+{{:en:multiasm:paarm:thumbt2.svg|}}
-@Add 8192 to W6 and store the result in W5 (2 << 12 = 8192)
-ADDS X7, X8, #42   @ ADDS (immediate) – flag-setting
+The OP bitfield specifies the shift type, and imm5 specifies the amount. The result Rd will be shifted by imm5 bits from the Rm register. Notice that only three bits are used to address the general-purpose registers – only eight registers are accessible.
-@Add 42 to X8, store the result in X7 and finally update condition flags (NZCV)
+Finally, the last machine code for this instruction is a sixteen 16-bit-wide instruction, but the Rd and Rm fields are still 4 bits wide. This instruction provides more shift operations than the previous one, but instead of a single imm5 field specifying the shift amount, it is split into two fields: imm3 and imm2. Both parts are combined for the same purpose: to identify the shift amount.
-ADDS X9, X10, #3, LSL #12  @ ADDS with shifted immediate
+{{:en:multiasm:paarm:thumbt3.svg|}}
-@Add 12288 to X10, store the result in X9 (3 << 12 = 12288)
-@Update condition flags
-ADDS X11, SP, #512 @ ADDS with SP base
+Different machine codes for the T32 instructions allow you to choose the most suitable one, but the code must be consistent with a single machine code type. Switching between machine code types in the processor is still possible, but compiling code that uses multiple machine codes will be even more complicated than learning assembler.
-@Add 512 to SP, result in X11 and update condition flags
-.	The ADDS (ADD) instruction with a shifted register
+Summarising these instruction sets, the A32 was best suited to ARMv7 processors, with only 16 general-purpose registers. For ARMv8, there are 31 registers to address, forcing ARM to introduce the A64 instruction set, which supports 32 registers. This is why all code examples in the following sections are created using the A64 instruction set.
-The final add instruction type is to add two registers together, where one of them can be shifted, and the shift also can be chosen between LSL (Logical Shift Left), LSR (Logical Shift Right) and ASR (Arithmetic Shift Right). The fourth shift option is not available. The number of bits in the ‘imm6’ field identifies the number of bits to be shifted for the ‘Rm’ register before it is added to the ‘Rn’ register.
-Similar options are available for other arithmetical instructions like SUB, as well as other instructions like LDR. The instruction set documentation may give the necessary information to determine the possibilities of instructions and restrictions on their usage.

en/multiasm/paarm/chapter_5_2.1748508406.txt.gz · Last modified: 2025/05/29 08:46 by eriks.klavins