CS 3843 Computer Organization Notes on Chapter 2

Floating Point - summary

Basic representation:

V = (-1)^s * M * 2^E.

Format:

    ------------------------
   |s|  exp   |     frac    |
    ------------------------

k = number of exp bits
Bias=2^k-1 - 1
f = number of frac bits

Normalized:

exp not all 0 or all 1:
M = 1 + .frac which means 1 + frac × 2^-f
E = exp - Bias

Denormalized:

exp = 0
M = .frac which means frac × 2^-f
E = 1 - Bias

Infinity:

exp = all 1's, frac == 0

NaN

exp = all 1's, frac != 0