CS 3843 Computer Organization Notes on Chapter 3 through Section 3.3

Start Reading Chapter 3

Chapter 3: Machine Level Representation of Programs

This chapter deals with the representation of machine code and its relationship to assembly language.

All of the examples in this chapter are related to the Intel architecture, mainly the 32-bit IA32 instructions.

Section 3.1: History

The Intel instruction set has its ancestry in the Intel 8008 microprocessor which was launched in 1972.
Each successive microprocessor was backwards compatible with the previous one, requiring the later microprocessors to contain instructions that are mostly unused.

History:

1972: 8008 (3.5K-10,000nm) - first Intel microprocessor with 8-bit words.
The instruction set was designed by Datapoint Corporation which was a leading maker of programmable CRT terminals.
Datapoint was based in San Antonio, so you might say that the Intel architecture started just a few miles from here.
1974: 8080 (4.5K-6,000nm) - first successful Intel microprocessor, had some 16-bit instructions.
1978: 8086 (29K-3,000nm) - One of the first 16-bit microprocessors.
20-bit addresses with segmented address space.
1979: 8088 (29K-3,000nm) - An 8086 with an 8-bit external bus - basis of the original IBM PC
1980: 8087 (45K-3,000nm) - A floating point coprocessor for the 8086 and 8088, formed the bases for IEEE floating point standard.
1982: 80286 (134K-1,500nm) - basis of the IBM PC-AT and MS Windows
1985: 80386 (275K-1,500nm) (also called i386) - added flat address space, could run Linux.
1989: 80486 (1.2M-1000nm) - integrated the floating point processor
1993: Pentium (3.1M-800nm) - improved performance
1995: PentiumPro (5.5M-500nm) - new processor design
1997: Pentium 2 (7M-350) - more of the same
1999: Pentium 3 (8.2M-250) - new floating point instructions
2000: Pentium 4 (42M-180) - double precision floating point and many new instructions.
2004: Pentium 4E (125M-90) - added hyperthreading
2006: Core2 Duo (291M-65nm) - multiple cores, not hyperthreading
2008: Core i7 Quad (781M-45nm) - multiple cores and hyperthreading
2010: Itanium Tukwila (2B-65nm) - instruction-level parallelism
2011: Xeon Westmere (2.6B-32nm) - 10 cores
2012: Xeon Phi (5.0B-22nm) - 62 cores

Moore's Law
In 1965, Gordon Moore, the founder of Intel predicted that the number of transistors that could fit on a chip would double every year for 10 years.
In 1975, he revised this to doubling every 2 years.

Question:

If the number of transistors on a chip was 3500 in 1972,
and the number doubled every 2 years,
how many transistors could fit on a chip in 2012?

Answer:

Section 3.2: Program Encoding

IA32 instruction set
The full IA32 has a large number of instructions, partly because of backward compatibility.
We will learn about it by looking at how C code is converted in to assembly language.

What happens when you compile a C program?

gcc -O1 -o t t.c
The -O1 is a compiler directive telling it to limit the optimizations used.
The C preprocessor expands include files and #define macros, plus some other things
The compiler generates assembly code: t.s
The assembler converts the assembly code into object code: t.o
The linker combines the object code with the libraries to produce an executable: t
The t.s file is not saved by default.
You can look at the assembly code generated using:
gcc -O1 -S t.c

This produces a file t.s.

Today's News: February 14

Exam on Wednesday of next week.

IA32 32-bit registers

Name	Use
`%eax`	accumulator - general caller-save register used for returning 32-bit values
`%ecx`	counter - general caller-save register
`%edx`	data - general caller-save register
`%ebx`	base - general callee-save register
`%esi`	source - general callee-save register
`%edi`	destination - general callee-save register
`%esp`	stack pointer
`%ebp`	frame pointer

Question:

What is meant by "caller-save" and "callee-save" registers?

Answer:

Why these strange names for the registers:

It goes back to the 8080, an 8-bit machine with registers: A, B, C, D, etc.
The 8086 had 16-bit registers: ax, bx, cd, dx, where ax was made up of two 8-bit registers, al and ah. Similarly with bx, cx, and dx.
The 32-bit version (80386) extended these to 32 bits, making eax, ebx, etc.
The low 16 bits of eax are just ax, and ax is made up of ah and al.

The 64-bit architecture has 128 64-bit registers called r0 - r127.

Example 1

int add(int x, int y) { 
   int z;
   z = x + y;
   return z;
}

If this is in sum.c we can produce assembly code with:

gcc -O1 -S sum.c

This generates the following:

        .file   "sum.c"
        .text
.globl add
        .type   add, @function
add:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        addl    8(%ebp), %eax
        popl    %ebp
        ret
        .size   add, .-add
        .ident  "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
        .section        .note.GNU-stack,"",@progbits

Note:

ignore the lines that begin with .
The pushl and popl save and restore %ebp
In movl, the first argument is the source, the second is the destination
addl adds the source and destination and stores the result in the destination.
%eax is used to hold the return value.
x and y are at 8(%ebp) and 12(%ebp)

Today's News: February 17

Exam on Wednesday of this week.
See the floating point summary here.

Question:

Could we directly use %esp to avoid using %ebp and eliminate the push and pop?
That is, would the following code work? (3 instructions instead of 6)

add:
   movl    12(%esp), %eax
   addl    8(%esp), %eax
   ret

Answer:

We can generate an object file using:

cc -c sum sum.s

This produces sum.o which we can examine with:

objdump -d sum.o

which produces:

sum.o:     file format elf32-i386

Disassembly of section .text:

00000000 <add>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 0c                mov    0xc(%ebp),%eax
   6:   03 45 08                add    0x8(%ebp),%eax
   9:   5d                      pop    %ebp
   a:   c3                      ret

Note:

Each instruction takes up 1 to 15 bytes
Common instructions such as push, pop, or ret, are short.
An instruction can be decoded by its starting bits, for example, an instruction beginning with 01011 is a pop and 01010 is a push.

To use this program, we need a main to call it:

int add(int x, int y);

int main() {
   int x = 12;
   int y = 31;
   int z;
   z = add(x, y);
   printf("x is %d, y is %d, and z is %d\n",x,y,z);
   return 0;
}

If this is called e1.c we do:

cc -O1 -S e1.c

to create: e1.s which is

	.file	"e1.c"
	.section	.rodata.str1.4,"aMS",@progbits,1
	.align 4
.LC0:
	.string	"x is %d, y is %d, and z is %d\n"
	.text
.globl main
	.type	main, @function
main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ecx
	subl	$20, %esp
	movl	$31, 4(%esp)
	movl	$12, (%esp)
	call	add
	movl	%eax, 16(%esp)
	movl	$31, 12(%esp)
	movl	$12, 8(%esp)
	movl	$.LC0, 4(%esp)
	movl	$1, (%esp)
	call	__printf_chk
	movl	$0, %eax
	addl	$20, %esp
	popl	%ecx
	popl	%ebp
	leal	-4(%ecx), %esp
	ret
	.size	main, .-main
	.ident	"GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
	.section	.note.GNU-stack,"",@progbits

Question:

When add is called in the main program, the first paramter is at (%esp) and the second parameter is at 4(%esp).
Why does the add function use offsets 8 and 12 to access these?

Answer:

Section 3.3: Data Formats

Data formats for IA32:

suffix	type	bits	used for
`b`	Byte	8	`char`
`w`	Word	16	`short`
`l`	Double Word	32	`int`, `long`, pointers
`s`	Single Precision	32	`float`
`l`	Double Precision	64	`double`
`t`	Extended Precision	80 or 96	`long double`

Note: No direct support for long long (64 bit integer)

Back to CS 3843 Notes Table of Contents