Hello !

Trace: • Упражнения • Надо сделать • Kit ARM-CAN • Self-driving Vehicles Simulation • Principles of Instructions Encoding

Principles of Instructions Encoding

Principles of Instructions Encoding

In x86 processors, instructions are encoded using varying numbers of bytes. Some instructions are encoded in just a single byte, others require several bytes, and some are as long as 15 bytes. In general, instructions are composed with some fields as presented in figure 1.

Illustration of fields of the instrution — Figure 1: Fields of the instructions

The fields are as follows

Instruction prefixes
Opcode
MODR/M byte
SIB byte (Scale Index Base byte)
Displacement
Immediate data

The only field which is present in every instruction is the opcode. Other fields appear depending on the detailed function of the instruction, its arguments and modifications. Their appearance also depends on the target version of the processor (16, 32 or 64-bit). In the following sections, we will show some principles of each field encoding.

Registers encoding in the instructions

Registers are used in the instructions as the operands which hold the data for calculations, or as elements of the address expression to calculate the indirect address in memory. Wherever the register appears, it is encoded with the same bit pattern. Initially, in 16 and 32-bit machines, three bits were used. In 64-bit processors, an additional, leading bit is used, which doubles the number of available registers. In table 1, the encoding of general-purpose registers in 16 and 32-bit processors is presented.

Table 1: Encoding of registers in 16- and 32-bit x86 processors

Bits	000	001	010	011	100	101	110	111
8-bit registers	AL	CL	DL	BL	AH	CH	DH	BH
16-bit registers	AX	CX	DX	BX	SP	BP	SI	DI
32-bit registers	EAX	ECX	EDX	EBX	ESP	EBP	ESI	EDI

In table 2, the encoding of general-purpose registers in x64 processors is shown. The additional leading bit = 0 comes from the REX prefix. For instructions without the REX prefix, the encoding of 8-bit registers is the same as in x86.

Table 2: Encoding of registers in x64 processors with a leading bit = 0

Bits	0.000	0.001	0.010	0.011	0.100	0.101	0.110	0.111
8-bit registers	AL	CL	DL	BL	SPL	BPL	SIL	DIL
16-bit registers	AX	CX	DX	BX	SP	BP	SI	DI
32-bit registers	EAX	ECX	EDX	EBX	ESP	EBP	ESI	EDI
64-bit registers	RAX	RCX	RDX	RBX	RSP	RBP	RSI	RDI

In table 3, the encoding of general-purpose registers in x64 processors is shown. The additional bit = 1 comes from the REX prefix.

Table 3: Encoding of registers in x64 processors with a leading bit = 1

Bits	1.000	1.001	1.010	1.011	1.100	1.101	1.110	1.111
8-bit registers	R8L	R9L	R10L	R11L	R12L	R13L	R14L	R15L
16-bit registers	R8W	R9W	R10W	R11W	R12W	R13W	R14W	R15W
32-bit registers	R8D	R9D	R10D	R11D	R12D	R13D	R14D	R15D
64-bit registers	R8	R9	R10	R11	R12	R13	R14	R15

In table 4 encoding of extension registers is presented. Please note that FPU and MMX registers, which are available on 32-bit processors, encoding is the same as with the additional leading bit = 0.

Table 4: Encoding of registers in x64 processors with a leading bit = 0

Bits	0.000	0.001	0.010	0.011	0.100	0.101	0.110	0.111
FPU registers	ST0	ST1	ST2	ST3	ST4	ST5	ST6	ST7
MMX registers	MMX0	MMX1	MMX2	MMX3	MMX4	MMX5	MMX6	MMX7
XMM registers	XMM0	XMM1	XMM2	XMM3	XMM4	XMM5	XMM6	XMM7
YMM registers	YMM0	YMM1	YMM2	YMM3	YMM4	YMM5	YMM6	YMM7

In table 5 encoding of extension registers is presented. As there are only 8 MMX registers, for these registers, there is no difference between a leading bit equal to 0 or 1. FPU registers are not available with the leading bit = 1.

Table 5: Encoding of additional registers in x64 processors with a leading bit = 1

Bits	1.000	1.001	1.010	1.011	1.100	1.101	1.110	1.111
MMX registers	MMX0	MMX1	MMX2	MMX3	MMX4	MMX5	MMX6	MMX7
XMM registers	XMM8	XMM9	XMM10	XMM11	XMM12	XMM13	XMM14	XMM15
YMM registers	YMM8	YMM9	YMM10	YMM11	YMM12	YMM13	YMM14	YMM15

Instruction prefixes

There are four groups of prefixes:

Lock and repeat
Segment override and branch hints
Operand size override
Address size override

In modern 64-bit processors, REX and VEX prefixes were introduced to extend the number of registers and to enable the RISC-like three-argument instructions.

The lock prefix is valid for instructions that work in a read-modify-write manner. An example of such an instruction can be adding a constant or register content to the variable in the memory.

 lock add QWORD PTR [rax], 5

The lock prefix appears as a single byte with the value 0x0F before the opcode. It disables DMA requests (or any other requests that gain control of the buses) during the execution of the instruction to prevent accidental modification of the memory contents at the same address by both the processor and DMA controller.

The repeat prefixes are valid for string instructions. They modify the behaviour of instruction from single to repeated execution. The REP prefix is encoded as a single byte with the value 0xF3. It causes the instruction to repeat up to CX times, decreasing CX with every repetition and appropriately modifying the index registers. The REPE/REPZ (0xF3) and REPNE/REPNZ (0xF2) prefixes are used with string instructions which compare or scan strings. They cause the instruction to repeat with the possibility of finishing the execution if the zero flag is set (REPE) or cleared (REPNE).

The segment override prefix is used in the segmented mode of operation. It causes the instruction to access data placed in a segment other than the default. For example, the mov instruction accesses data in the segment pointed by DS, which can be changed to ES with a prefix.

mov BYTE PTR [ebx], 5     ;DS as the default segment
mov BYTE PTR ES:[ebx], 5  ;ES segment override (results in appearance of the byte 0x26 as the prefix)

0x2E – CS segment override
0x36 – SS segment override
0x3E – DS segment override
0x26 – ES segment override
0x64 – FS segment override
0x65 – GS segment override

In 64-bit mode, the CS, SS, DS and ES segment overrides are ignored.

The branch hint prefixes can appear together with conditional jump instructions. These prefixes can be used to support the branch prediction unit of the processor in determining if the branch shall be taken (0x3E) or not taken (0x2E). This function is enabled if there is no history in the branch prediction unit yet, and static branch prediction is used.

The operand size and address size override prefixes can change the default size of operands and addresses. For example, if the processor operates in 32-bit mode, using the 0x66 prefix changes the size of an operand to 16 bits, and using the 0x67 prefix changes the address encoding from 32 bits to 16 bits. To better understand the behaviour of prefixes, let us consider a simple instruction with different variants. Let's start with a 32-bit processor.

mov BYTE PTR [ebx], 0x5   ;encoded as 0xC6, 0x03, 0x05
mov WORD PTR [ebx], 0x5   ;encoded as 0x66, 0xC7, 0x03, 0x05, 0x00 
mov DWORD PTR [ebx], 0x5  ;encoded as 0xC7, 0x03, 0x05, 0x00, 0x00, 0x00

We can notice that because the default operand size is a 32-bit doubleword, so prefix 0x66 appears in the 16-bit version (WORD PTR). It is also visible that the 8-bit version (BYTE PTR) has a different opcode (0xC6, 0x03 instead of 0xC7, 0x03). Also, the size of the argument is different.

The address override prefix (0x67) appears if we change the register to a 16-bit bx.

mov BYTE PTR [bx], 0x5   ;encoded as 0x67, 0xC6, 0x07, 0x05
mov WORD PTR [bx], 0x5   ;encoded as 0x67, 0x66, 0xC7, 0x07, 0x05, 0x00 
mov DWORD PTR [bx], 0x5  ;encoded as 0x67, 0xC7, 0x07, 0x05, 0x00, 0x00, 0x00

The same situation can be observed if we use a 32-bit address register (ebx) and assemble the same instructions for a 64-bit processor.

mov BYTE PTR [ebx], 0x5   ;encoded as 0x67, 0xC6, 0x03, 0x05
mov WORD PTR [ebx], 0x5   ;encoded as 0x67, 0x66, 0xC7, 0x03, 0x05, 0x00 
mov DWORD PTR [ebx], 0x5  ;encoded as 0x67, 0xC7, 0x03, 0x05, 0x00, 0x00, 0x00

While we use a native 64-bit address register in a 64-bit processor, the address size override prefix disappears.

mov BYTE PTR [rbx], 0x5   ;encoded as 0xC6, 0x03, 0x05
mov WORD PTR [rbx], 0x5   ;encoded as 0x66, 0xC7, 0x03, 0x05, 0x00 
mov DWORD PTR [rbx], 0x5  ;encoded as 0xC7, 0x03, 0x05, 0x00, 0x00, 0x00

REX prefix (0x41, 0x48, 0x49, etc.) appears in long mode, e.g. if we try to enlarge the argument size to 64 bits or use any of the new registers. Long mode enables the possibility of using new registers and 64-bit calculations with the original set of instructions. The REX prefix contains four fixed bits and four control bits as shown in figure 2.

Illustration of REX prefix — Figure 2: REX prefix

bit W: operand size override, it is “1” for 64-bit operand, “0” for default operand size
bit R: extension to ModRM.reg
bit X: extension to SIB.index
bir B: extension to ModRM.rm or the SIB.base

Bits R, X and B enable the use of new registers.

mov BYTE PTR [r8], 0x5    ;encoded as 0x41, 0xC6, 0x00, 0x05
mov BYTE PTR [r9], 0x5    ;encoded as 0x41, 0xC6, 0x01, 0x05
mov BYTE PTR [r10], 0x5   ;encoded as 0x41, 0xC6, 0x02, 0x05
mov DWORD PTR [r8], 0x5   ;encoded as 0x41, 0xC7, 0x00, 0x05, 0x00, 0x00, 0x00 
mov QWORD PTR [rbx], 0x5  ;encoded as 0x48, 0xC7, 0x03, 0x05, 0x00, 0x00, 0x00
mov QWORD PTR [r8], 0x5   ;encoded as 0x49, 0xC7, 0x00, 0x05, 0x00, 0x00, 0x00

The REX prefix From the code shown in the last example, you can observe where the register used in the instructions is encoded.

We encourage you to assemble or disassemble some further instructions. You can do it with an Online x86 / x64 Assembler and Disassembler ^[1] by Taylor Hornby. Experimentation can give you valuable information on where some information is encoded in the instruction.

Instruction opcode

The instruction opcode is the mandatory field in every instruction. It encodes the main function of the operation. Expanding the processor's capabilities by adding new instructions required defining longer opcodes. The opcode can be 1, 2 or 3 bytes in length. New instructions usually contain an additional byte or two bytes at the beginning called an escape sequence. Possible opcode sequences are:

opcode
0x0F opcode
0x0F 0x38 opcode
0x0F 0x3A opcode

In the new instructions, the VEX or XOP prefix is used as the escape sequence.

0xC4 Three-byte VEX
0xC5 Two-byte VEX
0x8F Three-byte XOP

VEX-encoded instructions are written with V at the beginning. Let's look at the example of the blending instruction.

blendvpd xmm0, xmm1               ; encoded as 0x66, 0x0F, 0x38, 0x15, 0xC1 
vblendvpd xmm0, xmm1, xmm2, xmm3  ; encoded as 0xC4, 0xE3, 0x71, 0x4B, 0xC2, 0x30

The first blendvpd instruction has only two arguments; in this encoding scheme is not possible to encode more. It uses the mandatory prefix 0x66 and 0x0F, 0x38 escape sequence. The second version, vblendvpd, has four arguments. It is encoded with a three-byte VEX escape sequence 0xC4, 0xE3, 0x71.

MOD R/M byte

The ModR/M byte encodes the addressing mode, a register which is used as an operand in the instruction, and registers used for addressing of memory. It can also extend the opcode in some instructions. It has the fields as shown in the figure 3.

In general, the fields have meaning:

Mod - Mode. This 2-bit field gives the register/memory mode.
Reg - Register. This 3-bit field specifies one of the general-purpose registers used as the operand. It can also be the opcode extension.
R/M - Register/memory. This 3-bit field specifies the second register as the operand or combination of registers used in the calculation of the address in memory.

In x86, the Mod field specifies one of four possible memory addressing modes, and the R/M field specifies which register, or pair of registers, is used for address calculation. If the Mod field is 11 (binary), the R/M field specifies the second register in the instruction. Details are shown in table 6

Table 6: Encoding of addressing mode with MOD R/M byte in 16-bit x86 processors

	R/M
Mod	000	001	010	011	100	101	110	111
00	[BX + SI]	[BX + DI]	[BP + SI]	[BP + DI]	[SI]	[DI]	[disp16]	[BX]
01	[BX + SI + disp8]	[BX + DI + disp8]	[BP + SI + disp8]	[BP + DI + disp8]	[SI + disp8]	[DI + disp8]	[BP + disp8]	[BX + disp8]
10	[BX + SI + disp16]	[BX + DI + disp16]	[BP + SI + disp16]	[BP + DI + disp16]	[SI + disp16]	[DI + disp16]	[BP + disp16]	[BX + disp16]
11	AX/AL	CX/CL	DX/DL	BX/BL	SP/AH	BP/CH	SI/DH	DI/BH

Let's look at some examples of instruction encoding. First, look at the data transfer between two registers.

              ;                         MOD REG R/M   MOD               REG   R/M
mov al, dl    ;encoded as 0x88, 0xD0    11  010 000   Register operand  DL    AL
mov ax, dx    ;encoded as 0x89, 0xD0    11  010 000   Register operand  DX    AX
mov dx, si    ;encoded as 0x89, 0xF2    11  110 010   Register operand  SI    DX
mov si, dx    ;encoded as 0x89, 0xD6    11  010 110   Register operand  DX    SI

Notice that in the first and second lines, different opcodes are used, but the MOD R/M bytes are identical. The type of instruction determines the order of data transfer.

Now, a few examples of indirect addressing without displacement.

              ;                         MOD REG R/M   MOD               REG   R/M    
mov dx,[si]   ;encoded as 0x8B, 0x14    00  010 100   Reg. only addr.   DX    [SI]
mov dx,[di]   ;encoded as 0x8B, 0x15    00  010 101   Reg. only addr.   DX    [DI]
mov dx,[bx+di];encoded as 0x8B, 0x11    00  010 001   Reg. only addr.   DX    [BX+DI]
mov cx,[bx+di];encoded as 0x8B, 0x09    00  001 001   Reg. only addr.   CX    [BX+DI]

Now, a few examples of indirect addressing with displacement.

              ;                             MOD REG R/M   MOD              REG  R/M       Disp  
mov dx,[bp+62];encoded as 0x8B, 0x56, 0x3E  01  010 110   Reg.+disp addr.  DX   [BP+disp] 0x3E
mov [bp+62],dx;encoded as 0x89, 0x56, 0x3E  01  010 110   Reg.+disp addr.  DX   [BP+disp] 0x3E
mov dx,[si+13];encoded as 0x8B, 0x54, 0x0D  01  010 100   Reg.+disp addr.  DX   [SI+disp] 0x0D
mov si,[bp]   ;encoded as 0x8B, 0x76, 0x00  01  110 110   Reg.+disp addr.  SI   [BP+disp] 0x00

If we look in fort two lines, we can observe that the MOD R/M bytes are identical. The only difference is the opcode, which determines the direction of the data transfer.

Notice also that the last instruction is encoded as BP + displacement, even if there is no displacement in the mnemonic. If you look into the table 6, you can observe that there is no addressing mode with [BP] only. It must appear with the displacement.

In 32-bit mode, registers used for addressing can be specified with the SIB byte, but this is not always the case. If a single register is used (with some exceptions), it still can be encoded with the MOD R/M byte, and there is no SIB byte in the instruction. In 64-bit long mode, the MOD R/M byte encoding works in a similar manner to 32-bit, with the difference that the MOD R/M byte is extended with R and B bits from the REX prefix, enabling the use of more registers in instructions.

Table 7: Encoding of addressing mode with MOD R/M byte in x64 processors for leading bit =0

	0.R/M
Mod	0.000	0.001	0.010	0.011	0.100	0.101	0.110	0.111
00	[RAX]	[RCX]	[RDX]	[RBX]	[SIB]	[RIP + disp]	[SI]	[DI]
01	[RAX + disp8]	[RCX + disp8]	[RDX + disp8]	[RBX + disp8]	[SIB + disp8]	[RBP + disp8]	[RSI + disp8]	[RDI + disp8]
10	[RAX + disp32]	[RCX + disp32]	[RDX + disp32]	[RBX + disp32]	[SIB + disp32]	[RBP + disp32]	[RSI + disp32]	[RDI + disp32]
11	RAX	RCX	RDX	RBX	RSP	RBP	RSI	RDI

Table 8: Encoding of addressing mode with MOD R/M byte in x64 processors for leading bit =1

	1.R/M
Mod	1.000	1.001	1.010	1.011	1.100	1.101	1.110	1.111
00	[R8]	[R9]	[R10]	[R11]	[SIB]	[RIP + disp]	[R14]	[R15]
01	[R8 + disp8]	[R9 + disp8]	[R10 + disp8]	[R11 + disp8]	[SIB + disp8]	[R13 + disp8]	[R14 + disp8]	[R15 + disp8]
10	[R8 + disp32]	[R9 + disp32]	[R10 + disp32]	[R11 + disp32]	[SIB + disp32]	[R13 + disp32]	[R14 + disp32]	[R15 + disp32]
11	R8	R9	R10	R11	R12	R13	R14	R15

Scale Index Base byte

The SIB byte was added in 32-bit machines to enable the use of any register in addressing. It encodes the scaling factor, index and base register used in address calculations. As a result, the choice of registers for calculations is significantly wider. For details, please refer to the addressing modes section. The SIB byte has the fields as shown in the figure 4.

The scaling factor specifies the number 1, 2, 4 or 8 by which the content of the index register is multiplied in the process of the address calculation as presented in table 9. For details on addressing, please refer to the section Addressing Modes in Instructions.

Table 9: Encoding of scaling factor in SIB byte

Bits	00	01	10	11
Scaling factor	1	2	4	8

The index and base fields specify the index and base registers, respectively. In 32-bit processors, the register encoding is similar to that presented in table 1. Details for 32-bit addressing are presented in tables 10 and 11.

Table 10: Encoding of registers in the index field of the SIB byte in 32-bit processors

Bits index	000	001	010	011	100	101	110	111
32-bit index register	EAX	ECX	EDX	EBX	*	EBP	ESI	EDI

Table 11: Encoding of registers in the base field of the SIB byte in 32-bit processors

Bits base	000	001	010	011	100	101	110	111
32-bit base register	EAX	ECX	EDX	EBX	ESP	EBP	ESI	EDI

In 64-bit long mode, the SIB byte is extended with X and B bits from the REX prefix, enabling the use of more registers in instructions. Details are presented in tables 12 and 13.

Table 12: Encoding of registers in the index field of the SIB byte extended with X bit in 64-bit processors

Bits X.Index	0.000	0.001	0.010	0.011	0.100	0.101	0.110	0.111
64-bit index register	RAX	RCX	RDX	RBX	*	RBP	RSI	RDI

Bits B.Index	1.000	1.001	1.010	1.011	1.100	1.101	1.110	1.111
32-bit index register	R8	R9	R10	R11	R12	R13	R14	R15

Table 13: Encoding of registers in the base field of the SIB byte extended with B bit in 64-bit processors

Bits B.Base	0.000	0.001	0.010	0.011	0.100	0.101	0.110	0.111
32-bit base register	RAX	RCX	RDX	RBX	RSP	RBP	RSI	RDI

Bits B.Base	1.000	1.001	1.010	1.011	1.100	1.101	1.110	1.111
32-bit base register	R8	R9	R10	R11	R12	R13	R14	R15

In the tables 10 and 12, the * means that there is no index register encoded in the instruction. It results in base-only or direct addressing.

Let's look at some code examples, considering the 32-bit version first. In all instructions, there is a MOD R/M byte and an SIB byte. MOD R/M is identical for all instructions. REG field indicates eax register and the combination of MOD and R/M indicates that registers are specified with the SIB byte.

;MOD R/M (second byte) is 0x04 for all instructions:
                       ;                    MOD REG R/M   REG  MOD & R/M
                       ;                     00 000 100   eax  SIB is present
 
;SIB (third byte) is 0x0B, 0x4B, 0x8B or 0xCB:
                       ;                  Scale Index Base  Scale Index Base
mov eax, [ebx+ecx]     ;0x8B, 0x04, 0x0B     00   001  011     x1   ecx  ebx
mov eax, [ebx+ecx*2]   ;0x8B, 0x04, 0x4B     01   001  011     x2   ecx  ebx
mov eax, [ebx+ecx*4]   ;0x8B, 0x04, 0x8B     10   001  011     x4   ecx  ebx
mov eax, [ebx+ecx*8]   ;0x8B, 0x04, 0xCB     11   001  011     x8   ecx  ebx

And other examples for x64 processors. The SIB byte is extended with bits from the REX prefix. We'll start with the similar examples as shown for 32-bit machines.

;REX prefix (first byte) is 0x48 for all instructions:
                       ;                   7                           0
                       ;                 +---+---+---+---+---+---+---+---+
                       ;                 | 0   1   0   0 | W | R | X | B |
                       ;                 +---+---+---+---+---+---+---+---+
                       ;                                   1   0   0   0
 
;MOD R/M (second byte) is 0x04 for all instructions:
                       ;                    MOD R.REG R/M   REG  MOD & R/M
                       ;                     00 0.000 100   eax  SIB is present
 
                       ;                        Scale X.Index B.Base  Scale Index Base
mov rax, [rbx+rcx]     ;0x48, 0x8B, 0x04, 0x0B     00   0.001  0.011     x1   rcx  rbx
mov rax, [rbx+rcx*2]   ;0x48, 0x8B, 0x04, 0x4B     01   0.001  0.011     x2   rcx  rbx
mov rax, [rbx+rcx*4]   ;0x48, 0x8B, 0x04, 0x8B     10   0.001  0.011     x4   rcx  rbx
mov rax, [rbx+rcx*8]   ;0x48, 0x8B, 0x04, 0xCB     11   0.001  0.011     x8   rcx  rbx

If any of the new registers (R8-R15) is used in the instruction, it changes the bits in the REX prefix.

                       ;                        Scale X.Index B.Base  Scale Index Base
mov rax, [r10+rcx]     ;0x49, 0x8B, 0x04, 0x0A     00   0.001  1.010     x1   rcx  r10
mov rax, [rbx+r11]     ;0x4A, 0x8B, 0x04, 0x1B     00   1.001  0.011     x1   r11  rbx
mov r12, [rbx+rcx]     ;0x4C, 0x8B, 0x24, 0x0B     10   0.001  0.011     x1   rcx  rbx
 
                       ;Last instruction has the MOD R/M REG field extended 
                       ;by the R bit from the REX prefix.
                       ;                    MOD R.REG R/M   REG  MOD & R/M
                       ;                     00 1.100 100   r12  SIB is present

Certainly, the presented examples do not exhaust all possible situations. For a more detailed explanation, please refer to the documentation by AMD^[2], Intel^[3], OSDev wiki^[4] or other interesting sources mentioned at the bottom of this section.

Displacement

Displacement gives the offset for memory operands. Depending on the addressing mode, it can be the direct memory address or an additional offset added to the contents of the base, index register or both. Displacement can be 1, 2, or 4 bytes long. Some instructions allow using an 8-byte displacement. In these instructions, there is no immediate field.

Immediate

Some instructions require an immediate value. The instruction determines the length of the immediate value. The immediate can be 1, 2, 4 or 8 bytes long. When an 8-byte immediate value is encoded, no displacement can be encoded.

For detailed information on instructions encoding, please refer to the documentation provided by Intel and AMD. You can also find interesting information on websites: X86 Opcode and Instruction Reference ^[5] by MazeGen, x86 and amd64 instruction reference ^[6] by Félix Cloutier.

^[1] https://defuse.ca/online-x86-assembler.htm#disassembly

^[2] https://docs.amd.com/v/u/en-US/40332-PUB_4.08

^[3] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

^[4] https://wiki.osdev.org/X86-64

^[5] http://ref.x86asm.net/index.html

^[6] https://www.felixcloutier.com/x86/

en/multiasm/papc/chapter_6_6.txt · Last modified: 2025/08/01 06:59 by ktokarz