previous
 next 
CS 3843 Computer Organization
Notes on Chapter 2

Floating Point - summary

Basic representation:
V = (-1)s * M * 2E.

Format:
    ------------------------
   |s|  exp   |     frac    |
    ------------------------
k = number of exp bits
Bias=2k-1 - 1
f = number of frac bits

Normalized:
exp not all 0 or all 1:
M = 1 + .frac which means 1 + frac × 2-f
E = exp - Bias

Denormalized:
exp = 0
M = .frac which means frac × 2-f
E = 1 - Bias

Infinity:
exp = all 1's, frac == 0


NaN
exp = all 1's, frac != 0