Data Types and Encoding [Robotic & Microcontroller Educational Knowledgepage

Fundamental data types cover the types of data elements that are stored in memory as 8-bit (Byte), 16-bit (Word), 32-bit (Doubleword), 64-bit (Quadword) or 128-bit (Double quadword), as shown in figure 1. Many instructions allow for processing data of these types without special interpretation. It depends on the programming engineer how to interpret the inputs and results of instruction execution. The order of bytes in the data types containing more than a single byte is little-endian. The lower (least significant) byte is stored at the lower address of the data. This address represents the address of the data as a whole.

Illustration of fundamental data types — Figure 1: Fundamental data types

The data types which can be used in the x64 architecture processors can be integers or floating point numbers. Integers are processed by the main CPU as single values (scalars). They can also be packed into vectors and processed with the specific SIMD instructions, including MMX and partially by SSE and AVX instructions. Integers can be interpreted as unsigned or signed. They can also represent the pointer to some variable or address within the code. Scalar real numbers are processed by the FPU. SSE and AVX instructions support calculations with scalars or vectors composed of real values. All possible variants of data types are stored within the fundamental data types. Further in this chapter, we describe all possible singular or packed integers and floating-point values.

Integers are the numbers without the fractional part. In x64 architecture, it is possible to define a variety of data of different sizes, all of them based on bytes. A single byte forms the smallest possible information item stored in memory. Even if not all bits are effectively used, the smallest element which can be stored is a byte. Two bytes form the word. It means that in x64 architecture, Word data type means 16 bits. Two words form the Double Word data type (32 bits), and four words form the Quad Word data type (64 bits). With the use of large registers in modern processors, it is possible to use in a few instructions the Double Quad Word data type, containing 128 bits (sometimes called Octal Word).

Integer data types can be one, two, four or eight bytes in length. Unsigned integers are binary encoded in natural binary code. It means that the starting value is 0 (zero), and the maximum value is formed with bits “1” at all available positions. The x64 architecture supports the unsigned integer data types shown in table 1.

Table 1: Unsigned integer data types

Name	Number of bits	Maximum value	Minimum value (hex)	Maximum value (hex)
Byte	8	255	0x00	0xFF
Word	16	65535	0x0000	0xFFFF
Doubleword	32	4294967295	0x00000000	0xFFFFFFFF
Quadword	64	18446744073709551615	0x0000000000000000	0xFFFFFFFFFFFFFFFF

Signed integers are binary encoded in 2's complement binary code. The highest bit of the value is a sign bit. If it is zero, the number is non-negative; if it is one, the value is negative. It means that the starting value is encoded as the highest bit equal to 1 and all other bits equal to zero. The maximum value is formed with a “0” bit at the highest position and bits “1” at all other positions. The x64 architecture supports the signed integer data types shown in table 2.

Table 2: Signed integer data types

Name	Number of bits	Minimum value	Maximum value	Minimum value (hex)	Maximum value (hex)
Signed Byte	8	-128	127	0x80	0x7F
Signed Word	16	-32768	32767	0x8000	0x7FFF
Signed Doubleword	32	-2147483648	2147483647	0x80000000	0x7FFFFFFF
Signed Quadword	64	-9223372036854775808	9223372036854775807	0x8000000000000000	0x7FFFFFFFFFFFFFFF

Vector data types were introduced with SIMD instructions starting with the MMX extension, and followed in the SSE and AVX extensions. They form the packed data types containing multiple elements of the same size. The elements can be considered as signed or unsigned depending on the algorithm and instructions used.

The 64-bit packed integer data type contains eight Bytes, four Words or two Doublewords as shown in figure 2.

Illustration of 64-bit packed integer data types — Figure 2: 64-bit packed integer data types

The 128-bit packed integer data type contains sixteen Bytes, eight Words, four Doublewords or two Quadwords as shown in figure 3.

Illustration of 128-bit packed integer data types — Figure 3: 128-bit packed integer data types

The 256-bit packed integer data type contains thirty-two Bytes, sixteen Words, eight Doublewords, four Quadwords or two Double Quadwords as shown in figure 4.

Illustration of 256-bit packed integer data types — Figure 4: 256-bit packed integer data types

The 512-bit packed integer data type contains sixty-four Bytes, thirty-two Words, sixteen Doublewords, eight Quadwords or four Double Quadwords as shown in figure 5. Double Quadwords are not used as operands, they are the results of some operations only.

Illustration of 512-bit packed integer data types — Figure 5: 512-bit packed integer data types

Floating point values store the data encoded for calculation on real numbers. Depending on the precision required for the algorithm, we can use different data sizes. Scalar data types are supported by the FPU (Floating Point Unit), offering single precision, double precision or double extended precision real numbers. In C/C++ compilers, they are referred to as float, double and long double data types, respectively. Vector (packed) floating-point data types can be processed by many SSE and AVX instructions, offering fast vector, matrix or artificial intelligence calculations. Vector units can process half precision, single precision and double precision formats. The 16-bit Brain Float format was introduced to calculate the dot scalar product to improve the efficiency of AI training and inference algorithms. Floating point data types are shown in figure 6 and described in table 3. The table shows the number of bits used. In reality, the number of mantissa bits is assumed to be one bit longer, because the highest bit representing the integer part is always “1”, so there is no need to store it (except for Double extended data format, where the integer bit is present).

Illustration of floating point data types — Figure 6: Floating point data types in x64 architecture

Table 3: Floating point data types

Name	Bits	Mantissa bits	Exponent bits
Double extended	80	64	15
Double precision	64	52	11
Single precision	32	23	8
Half precision	16	10	5
Brain Float	16	7	8

Floating point vectors are formed with single or double precision packed data formats. They are processed by SSE or AVX instructions in a SIMD approach of processing. A 128-bit packed data format can store four single-precision data elements or two double-precision data elements. A 256-bit packed data format can store eight single-precision values or four double-precision values. A 512-bit packed data format can store sixteen single-precision values or eight double-precision values. These packed data types are shown in figure 7. Instructions operating on 16-bit half-precision values or Brain Floats can use twice as many such elements simultaneously in comparison to single-precision data. It is worth mentioning that some instructions operate on a single floating-point value, using only the lowest elements of the operands.

Illustration of packed floating point data types — Figure 7: Pcked floating point data types in x64 architecture

A bit field is a data type whose size is counted by the number of bits it occupies. The bit field can start at any bit position in the fundamental data type and can be up to 32 bits long. MASM supports it with the RECORD data type. The bit field type is shown in figure 8.

Illustration of bit field data type — Figure 8: The bit field data type

Pointers store the address of the memory which contains interesting information. They can point to the data or the instruction. If the segmentation is enabled, pointers can be near or far. The far pointer contains the logical address (formed with the segment and offset parts). The near pointer contains the offset only. The offset can be 16, 32 or 64 bits long. The segment selector is always stored as a 16-bit number. Illustration of possible pointer types is shown in figure 9.

Illustration of near and far pointers types — Figure 9: The near and far pointers types

The offset is often the result of complex addressing mode calculations and is called an effective address.

−Table of Contents

Data Types and Encoding

Fundamental data types

Integers

Integer scalar data types

Integer vector data types

Floating point values

Floating point vector data types

Bit field data type

Pointers