Data Types and Encoding

Higher-level programming languages define several data types; some languages don’t need to determine the data type because during program compilation (or interpretation), the value type is automatically assigned to them. The first type of division can be made between real and integer numbers. Then the integers can also be divided into only positive values and the values that can be positive or negative. The remaining division can be performed by determining how many bytes are needed to store the largest required value.

Many instructions include additional suffixes to indicate the length of the data used. As an example of multiple data lengths, load/store instructions will be used. Loading floating-point (real numbers) values is described in the next chapter. All the following examples use integers. One 64-bit register can be split into eight 8-bit values, four 16-bit values, or two 32-bit values. It is important to remember that the entire 64-bit register will be used whether storing an 8-bit or a 16-bit value. The remaining bytes in the register depend on the instruction used.

LDR X0, [X1] @ fill the register X0 with the data located at address stored in X1 register

STR X1, [X2] @ store the content from register X1 into the memory at memory address given in the X2 register

The LDR and STR instructions are basic instructions that load or store the entire register from or to memory. It means that 8 bytes or 4 bytes are loaded into the register. Remember that register size depends on the notation used: Xn or Wn. To load a single byte, use the LDRB and LDRSBinstructions. For these instructions, the destination register can be addressed only as a 32-bit register.

LDRB W0, [X1]

LDRSB W0, [X1]

Similar destination register restrictions apply to 16-bit, LDRH, and LDRSH instructions. These restrictions stem from the fact that there is no sign extension for 64-bit registers, only for 32-bit registers. Any write to a 32-bit register automatically clears the upper bits in the whole 64-bit register. To store a byte or half-word in memory, the data must be held in the 32-bit register before it is written. And only the least significant bytes are stored by use of the STRB or STRH instruction.

Big/little-endian

Overall, by ARMv8.0, a new feature was added to the processor. In this section, the primary focus is on endianness, and the feature named “FEAT_MixedEnd” from ARMv8 processors allows programmers to control the endianness of the memory. This means that the ARMv8 have implemented both little-endian and big-endian. Note that the feature “FEAT_MixedEnd” is optional. Not all processor manufacturers implement this feature, but it can be verified. To find out the presence of the “FEAT_MixedEnd” feature, the ID_AA64MMFR0_EL1 register’s BigEnd bit field must be checked. If the feature “FEAT_MixedEnd” is implemented, then the “FEAT_MixedEndEL0” feature is also implemented. This means that at EL0, endianness can be changed. On the Raspberry Pi 5 running AArch64, the higher exception layers control the endianness of the lower exception layers. For example, the code running in the Exception Level EL1 layer can control the endianness of the EL0 Exception Level. Note that if the Linux OS is already running on the Raspberry PI, the kernel EL1 endianness should not be changed, because the OS runs at the EL1 layer and no OS can switch endianness at runtime (at the moment).

As both modes are supported in ARMv8, it may be necessary to determine the EL0 endianness settings. The CPU's registers hold all the information about the current CPU endianness. Similar to the selected CPU, the ID_AA64MMFR0_EL1 register is an AArch64 Memory Model Feature register that stores endianness information.

Two bit-fields identify information about endianness: the BigEndEL0 bits [19:16] and the BigEnd bits [11:8]. If the selected CPU doesn’t support mixed-endian at the EL0 level, it will be indicated by a value of 0x0 in the BigEndEL0 bit field. Note that the manual mentions the SCTLR_EL1.E0E bit has a fixed value. The SCTLR_EL1 is the System Control Register for EL1, which manages the system, including the memory system at EL1 and EL0 levels. The bit E0E (bit number 24) is set to zero by default. When the value is zero, explicit data accesses at EL0 are little-endian; when the value is one, explicit data accesses at EL0 are big-endian.

Other registers can also affect the system's endianness. And still, at the end, it is worth noting that the software is primarily written in little-endian format – all data and program code in memory are in little-endian byte order. Despite that, the network communication protocols (TCP/IP) mostly use big-endian byte order. Some file extensions are also adapted for network use and store data in big-endian format, for example, the JPEG and TIFF image formats and others.