Differences

This shows you the differences between two versions of the page.

--- en:multiasm:cs:chapter_3_9 [2025/01/07 10:59] – [Superscalar] ktokarz
+++ en:multiasm:cs:chapter_3_9 [2025/12/05 11:37] (current) – [Hyperthreading] ktokarz
@@ Line 4: / Line 4: @@
 ===== Pipeline =====
-As was described in the previous chapter, executing a single instruction requires many actions which must be performed by the processor. We could see that each step, or even substep, can be performed by a separate logical unit. This feature has been used by designers of modern processors to create a processor in which instructions are executed in a pipeline. A pipeline is a collection of logical units that execute many instructions at the same time - each of them at a different stage of execution. If the instructions arrive in a continuous stream, the pipeline allows the program to execute faster than a processor that does not support the pipeline. Note that the pipeline does not reduce the time of execution of a single instruction, it increases the throughput of the instruction stream.
+As was described in the previous chapter, executing a single instruction requires many actions which must be performed by the processor. We could see that each step, or even substep, can be performed by a separate logical unit. This feature has been used by designers of modern processors to create a processor in which instructions are executed in a pipeline. A pipeline is a collection of logical units that execute many instructions at the same time - each of them at a different stage of execution. If the instructions arrive in a continuous stream, the pipeline allows the program to execute faster than a processor that does not support the pipeline. Note that the pipeline does not reduce the time of execution of a single instruction. It increases the throughput of the instruction stream.
-A simple pipeline is implemented in AVR microcontrollers. It has two stages, which means that while one instruction is executed another one is fetched as shown in Fig {{ref>pipelineavr}}.
+A simple pipeline is implemented in AVR microcontrollers. It has two stages, which means that while one instruction is executed, another one is fetched as shown in Fig {{ref>pipelineavr}}.
 <figure pipelineavr>
-{{ :en:multiasm:cs:pipeline_AVR.png?600 |Simple 2-stage pipeline in AVR microcontroller}}
+{{ :en:multiasm:cs:pipeline_AVR.png?600 |Simple 2-stage pipeline in an AVR microcontroller}}
 <caption>Simple 2-stage pipeline in AVR microcontroller</caption>
 </figure>
@@ Line 17: / Line 17: @@
 </figure>
-Modern processors implement longer pipelines. For example, Pentium III used the 10-stage pipeline, Pentium 4 20-stage, and Pentium 4 Prescott even a 31-stage pipeline. Does the longer pipeline mean faster program execution? Everything has benefits and drawbacks. The undoubted benefit of a longer pipeline is more instructions executed at the same time which gives the higher instruction throughput. But the problem appears when branch instructions come. While in the instruction stream a conditional jump appears the processor must choose what way the instruction stream should go. Should the jump be taken or not? The answer usually is based on the result of the preceding instruction and is known when the branch instruction is close to the end of the pipeline. In such a situation in modern processors, the branch prediction unit guesses what to do with the branch. If it misses, the pipeline content is invalidated and the pipeline starts operation from the beginning. This causes stalls in the program execution. If the pipeline is longer - the number of instructions to invalidate is bigger. That's why Intel decided to return to shorter pipelines. In modern microarchitectures, the length of the pipeline varies between 12 and 20.
+Modern processors implement longer pipelines. For example, Pentium III used the 10-stage pipeline, Pentium 4 20-stage pipeline, and Pentium 4 Prescott even used a 31-stage pipeline. Does the longer pipeline mean faster program execution? Everything has benefits and drawbacks. The undoubted benefit of a longer pipeline is more instructions executed at the same time, which gives a higher instruction throughput. But the problem appears when branch instructions come. While a conditional jump appears in the instruction stream, the processor must choose which way the stream should follow. Should the jump be taken or not? The answer is usually based on the result of the preceding instruction and is known when the branch instruction is close to the end of the pipeline. In such a situation, in modern processors, the branch prediction unit guesses what to do with the branch. If it misses, the pipeline content is invalidated, and the pipeline starts operation from the beginning. This causes stalls in the program execution. If the pipeline is longer, the number of instructions to invalidate is bigger. In modern microarchitectures, the length of the pipeline varies between 12 and 20.
 ===== Superscalar =====
-The superscalar processor increases the speed of program execution because it can execute more than one instruction during a clock cycle. It is realised by simultaneously dispatching instructions to different execution units on the processor. The superscalar processor doesn't implement two or more independent pipelines, rather decoded instructions are sent for further processing to the chosen execution unit as shown in Fig. {{ref>superscalar}}.
+The superscalar processor increases the speed of program execution because it can execute more than a single instruction during a clock cycle. It is realised by simultaneously dispatching instructions to different execution units on the processor. The superscalar processor can, but doesn't have to, implement two or more independent pipelines. Rather, decoded instructions are sent for further processing to the chosen execution unit as shown in Fig. {{ref>superscalar}}.
 <figure superscalar>
@@ Line 28: / Line 28: @@
 </figure>
-In the x86 family first processor with two paths of execution was Pentium with U and V pipelines. Modern x64 processors like i7 implement six execution units. Not all execution units have the same functionality. In the i7 processor, every execution unit has different possibilities, as presented in table
+In the x86 family first processor with two paths of execution was the Pentium with two execution units called U and V. Modern x64 processors like i7 implement six execution units. Not all execution units have the same functionality. For example, in the i7 processor, every execution unit has different possibilities, as presented in table {{ref>executionunits}}.
-^ Execution unit  ^ Functionality                                                       ^
+<table executionunits>
-| 0               | Integer, Floating point multiplication, SSE multiplication, divide  |
+<caption>Execution units of i7 processor</caption>
-| 1               | Integer, Floating point addition, SSE addition                      |
+^ Execution unit  ^ Functionality                                                                    ^
-| 2               | Address generation, load                                            |
+| 0               | Integer calculations, Floating point multiplication, SSE multiplication, divide  |
-| 3               | Address generation, store                                           |
+| 1               | Integer calculations, Floating point addition, SSE addition                      |
-| 4               | Data store                                                          |
+| 2               | Address generation, load                                                         |
-| 5               | Integer, Branch, SSE addition                                       |
+| 3               | Address generation, store                                                        |
+| 4               | Data store                                                                       |
+| 5               | Integer calculations, Branch, SSE addition                                       |
+</table>
+===== Branch prediction =====
+As it was mentioned, the pipeline can suffer invalidation if the conditional branch is not properly predicted. The branch prediction unit is used to guess the outcome of conditional branch instructions. It helps to reduce delays in program execution by predicting the path the program will take. Prediction is based on historical data and program execution patterns.
+There are many methods of predicting the branches. In general, the processor implements the buffer with the addresses of the last few branch instructions with a history register for every branch. Based on history, the branch prediction unit can guess if the branch should be taken.
+===== Hyperthreading =====
+Hyper-Threading Technology is an Intel approach to simultaneous multithreading technology, which allows the operating system to execute more than one thread on a single physical core.
+For each physical core, the operating system defines two logical processor cores and shares the load between them when possible. The hyperthreading technology uses a superscalar architecture to increase the number of instructions that operate in parallel in the pipeline on separate data. With Hyper-Threading, one physical core appears to the operating system as two separate processors. The logical processors share the execution resources, including the execution engine, caches, and system bus interface. Only the elements that store the architectural state of the processor are duplicated, including essential registers for code execution.
+<note info>
+The real path of instruction processing is much more complex. Additional techniques are implemented to achieve better performance, e.g. out-of-order execution and register renaming. They are performed automatically by the processor, and the assembler programmer does not influence their behaviour.
+</note>

en/multiasm/cs/chapter_3_9.1736247556.txt.gz · Last modified: 2025/01/07 10:59 by ktokarz