Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:multiasm:cs:chapter_3_11 [2025/01/08 17:39] ktokarzen:multiasm:cs:chapter_3_11 [2025/05/15 06:20] (current) – [Integers] ktokarz
Line 3: Line 3:
  
 ===== Integers ===== ===== Integers =====
-Integer data types can be 8, 16, 32 or 64 bits long. If the encoded number is unsigned it is stored in binary representation, while the value is signed the representation is two's complement. In two's complement representation, the most significant bit (MSB) represents the sign of the number. Zero means a non-negative number, one represents a negative value. The table {{ref>binarynumbers}} shows the integer data types with their ranges.+Integer data types can be 8, 16, 32 or 64 bits long. If the encoded number is unsignedit is stored in binary representation, while if the value is signedthe representation is two's complement. A natural binary number starts with aero if it contains all bits equal to zero. While it contains all bits equal to one, the value can be calculated with the expression {{ :en:multiasm:cs:equation_binary.png?200 |}}, where n is the number of bits in a number.  
 + 
 +In two's complement representation, the most significant bit (MSB) represents the sign of the number. Zero means a non-negative number, one represents a negative value. The table {{ref>binarynumbers}} shows the integer data types with their ranges.
 <table binarynumbers> <table binarynumbers>
 <caption> Integer binary numbers</caption> <caption> Integer binary numbers</caption>
Line 18: Line 20:
  
 ===== Floating point ===== ===== Floating point =====
-Integer calculations do not always cover all mathematical requirements of the algorithm. To represent real numbers the floating point encoding is used. A floating point is the representation of the value //A// which is composed of two numbers +Integer calculations do not always cover all mathematical requirements of the algorithm. To represent real numbers the floating point encoding is used. A floating point is the representation of the value //A// which is composed of three fields: 
-  * Mantissa (Ma+  * Sign bit 
-  * Exponent (Ea)+  * Exponent (E
 +  * Mantissa (M)
 fulfilling the equation fulfilling the equation
 {{ :en:multiasm:cs:equation_floating.png?200 |}} {{ :en:multiasm:cs:equation_floating.png?200 |}}
  
-There are two main types of real numbers, called floating point values. Single precision is the number which is encoded in 32 bits. Double precision floating point number is encoded with 64 bits. They are presented in Fig{{ref>realtypes}} while Table{{ref>realnumbers}} shows their properties.+There are two main types of real numbers, called floating point values. Single precision is the number which is encoded in 32 bits. Double precision floating point number is encoded with 64 bits. They are presented in Fig{{ref>realtypes}}.
  
 <figure realtypes> <figure realtypes>
Line 30: Line 33:
 <caption>Illustration of a single and double precision real numnbers</caption> <caption>Illustration of a single and double precision real numnbers</caption>
 </figure> </figure>
 +
 +The Table{{ref>realnumbers}} shows a number of bits for exponent and mantissa for single and double precision floating point numbers. It also presents the minimal and maximal values which can be stored using these formats (they are absolute values, and can be positive or negative depending on the sign bit).
  
 <table realnumbers> <table realnumbers>
 <caption> Floating point numbers</caption> <caption> Floating point numbers</caption>
-^ Precision        ^ Exponent  ^ Mantissa  ^ The smallest  ^ The largest  +^ Precision        ^ Exponent  ^ Mantissa  ^ The smallest                             ^ The largest                              
-| Single (32 bit)  | 8 bits    | 23 bits   10(-44.85)    10(38.53)    +| Single (32 bit)  | 8 bits    | 23 bits   {{ :en:multiasm:cs:min_fp_32.png?105 }}  {{ :en:multiasm:cs:max_fp_32.png?100 }}  
-| Double (64 bit)  | 11 bits   | 52 bits   10(-323.3)    10(308.3)    |+| Double (64 bit)  | 11 bits   | 52 bits   {{ :en:multiasm:cs:min_fp_64.png?120 }}  {{ :en:multiasm:cs:max_fp_64.png?100 }}  |
 </table> </table>
 +
 +The most common representation for real numbers on computers is standardised in the document IEEE Standard 754. There are two modifications implemented which make the calculations easier for computers.
 +  * The Biased exponent
 +  * The Normalised Mantissa
 +Biased exponent means that the bias value is added to the real exponent value. This results with all positive exponents which makes it easier to compare numbers.
 +The normalised mantissa is adjusted to have only one bit of the value "1" to the left of the decimal. It requires an appropriate exponent adjustment.
 +
 +
  
 ===== Texts ===== ===== Texts =====
en/multiasm/cs/chapter_3_11.1736357988.txt.gz · Last modified: 2025/01/08 17:39 by ktokarz
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0