This is an old revision of the document!
Energy Efficient Coding
Assembler code is assembled into a single object code. Compilers, instead, take high-level language code and convert it to machine code. And during compilation, the code may be optimised in several ways. For example, there are many ways to implement statements, FOR loops or Do-While loops in the assembler. There are some good hints for optimising the assembler code as well, but these are just hints for the programmer.
Take into account the instruction execution time (or cycle). Some instructions take more than one CPU cycle to execute, and there may be other instructions that achieve the desired result.
Try to use the register as much as possible without storing the temporary data in the memory.
Eliminate unnecessary compare instructions by doing the appropriate conditional jump instruction based on the flags that are already set from a previous arithmetic instruction. Remember that arithmetic instructions can update the status flags if the postfix S is used in the instruction mnemonic.
It is essential to align both your code and data to get a good speedup. For ARMv8, the data must be aligned on 16-byte boundaries. In general, if alignment is not used on a 16-byte boundary, the CPU will eventually raise an exception.
And yet, there are still multiple hints that can help speed up the computation. In small code examples, the speedup will not be noticeable. The processors can execute millions of instructions per second.
Processors today can execute many instructions in parallel using pipelining and multiple functional units. These techniques allow the reordering of instructions internally to avoid pipeline stalls (Out-of-Order execution), branch prediction to guess the branching path, and others. Without speculation, each branch in the code would stall the pipeline until the outcome is known. These situations are among the factors that reduce the processor's computational power.
Speculative instruction execution
Barriers(instruction synchronization / data memory / data synchronization / one way BARRIER)
Conditional instructions
Power saving