Start Reading Chapter 3
Chapter 3: Machine Level Representation of Programs
This chapter deals with the representation of machine code and its
relationship to assembly language.
All of the examples in this chapter are related to the Intel architecture,
mainly the 32-bit IA32 instructions.
Section 3.1: History
The Intel instruction set has its ancestry in the Intel 8008 microprocessor
which was launched in 1972.
Each successive microprocessor was backwards compatible with the previous one,
requiring the later microprocessors to contain instructions that are mostly
unused.
History:
- 1972: 8008 (3.5K-10,000nm) - first Intel microprocessor with 8-bit words.
The instruction set was designed by Datapoint Corporation
which was a leading maker of programmable CRT terminals.
Datapoint was based in San Antonio, so you might say that the Intel
architecture started just a few miles from here.
- 1974: 8080 (4.5K-6,000nm) - first successful Intel microprocessor, had some 16-bit instructions.
- 1978: 8086 (29K-3,000nm) - One of the first 16-bit microprocessors.
20-bit addresses with segmented address space.
- 1979: 8088 (29K-3,000nm) - An 8086 with an 8-bit external bus - basis of the original IBM PC
- 1980: 8087 (45K-3,000nm) - A floating point coprocessor for the 8086 and 8088, formed the
bases for IEEE floating point standard.
- 1982: 80286 (134K-1,500nm) - basis of the IBM PC-AT and MS Windows
- 1985: 80386 (275K-1,500nm) (also called i386) - added flat address space, could run Linux.
- 1989: 80486 (1.2M-1000nm) - integrated the floating point processor
- 1993: Pentium (3.1M-800nm) - improved performance
- 1995: PentiumPro (5.5M-500nm) - new processor design
- 1997: Pentium 2 (7M-350) - more of the same
- 1999: Pentium 3 (8.2M-250) - new floating point instructions
- 2000: Pentium 4 (42M-180) - double precision floating point and many new instructions.
- 2004: Pentium 4E (125M-90) - added hyperthreading
- 2006: Core2 Duo (291M-65nm) - multiple cores, not hyperthreading
- 2008: Core i7 Quad (781M-45nm) - multiple cores and hyperthreading
- 2010: Itanium Tukwila (2B-65nm) - instruction-level parallelism
- 2011: Xeon Westmere (2.6B-32nm) - 10 cores
- 2012: Xeon Phi (5.0B-22nm) - 62 cores
Moore's Law
In 1965, Gordon Moore, the founder of Intel predicted that the
number of transistors that could fit on a chip would double
every year for 10 years.
In 1975, he revised this to doubling every 2 years.
Question:
If the number of transistors on a chip was 3500 in 1972,
and the number doubled every 2 years,
how many transistors could fit on a chip in 2012?
Answer:
Section 3.2: Program Encoding
IA32 instruction set
The full IA32 has a large number of instructions, partly because of backward
compatibility.
We will learn about it by looking at how C code is converted in to assembly language.
What happens when you compile a C program?
- gcc -O1 -o t t.c
- The -O1 is a compiler directive telling it to limit the optimizations
used.
- The C preprocessor expands include files and #define macros, plus some other things
- The compiler generates assembly code: t.s
- The assembler converts the assembly code into object code: t.o
- The linker combines the object code with the libraries to produce
an executable: t
- The t.s file is not saved by default.
- You can look at the assembly code generated using:
gcc -O1 -S t.c
This produces a file t.s.
Today's News: February 14
Exam on Wednesday of next week.
IA32 32-bit registers
Name | Use |
%eax | accumulator - general caller-save register used for returning 32-bit values |
%ecx | counter - general caller-save register |
%edx | data - general caller-save register |
%ebx | base - general callee-save register |
%esi | source - general callee-save register |
%edi | destination - general callee-save register |
%esp | stack pointer |
%ebp | frame pointer |
Question:
What is meant by "caller-save" and "callee-save" registers?
Answer:
Why these strange names for the registers:
- It goes back to the 8080, an 8-bit machine with
registers: A, B, C, D, etc.
- The 8086 had 16-bit registers: ax, bx, cd, dx, where ax was made up of two 8-bit registers, al and ah.
Similarly with bx, cx, and dx.
- The 32-bit version (80386) extended these to 32 bits, making eax, ebx, etc.
- The low 16 bits of eax are just ax, and ax is made up of ah and al.
The 64-bit architecture has 128 64-bit registers called
r0 - r127.
Example 1
int add(int x, int y) {
int z;
z = x + y;
return z;
}
If this is in
sum.c we can produce assembly code with:
gcc -O1 -S sum.c
This generates the following:
.file "sum.c"
.text
.globl add
.type add, @function
add:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
popl %ebp
ret
.size add, .-add
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",@progbits
Note:
- ignore the lines that begin with .
- The pushl and popl save and restore %ebp
- In movl, the first argument is the source, the second is the destination
- addl adds the source and destination and stores the result in the
destination.
- %eax is used to hold the return value.
- x and y are at 8(%ebp) and 12(%ebp)
Today's News: February 17
Exam on Wednesday of this week.
See the floating point summary
here.
Question:
Could we directly use
%esp to avoid using
%ebp and eliminate the
push and
pop?
That is, would the following code work? (3 instructions instead of 6)
add:
movl 12(%esp), %eax
addl 8(%esp), %eax
ret
Answer:
We can generate an object file using:
cc -c sum sum.s
This produces
sum.o which we can examine with:
objdump -d sum.o
which produces:
sum.o: file format elf32-i386
Disassembly of section .text:
00000000 <add>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 5d pop %ebp
a: c3 ret
Note:
- Each instruction takes up 1 to 15 bytes
- Common instructions such as push, pop, or ret, are short.
- An instruction can be decoded by its starting bits, for example,
an instruction beginning with 01011 is a pop and 01010 is a push.
To use this program, we need a main to call it:
int add(int x, int y);
int main() {
int x = 12;
int y = 31;
int z;
z = add(x, y);
printf("x is %d, y is %d, and z is %d\n",x,y,z);
return 0;
}
If this is called
e1.c we do:
cc -O1 -S e1.c
to create:
e1.s which is
.file "e1.c"
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
.string "x is %d, y is %d, and z is %d\n"
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $31, 4(%esp)
movl $12, (%esp)
call add
movl %eax, 16(%esp)
movl $31, 12(%esp)
movl $12, 8(%esp)
movl $.LC0, 4(%esp)
movl $1, (%esp)
call __printf_chk
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",@progbits
Question:
When add is called in the main program, the first paramter is at (%esp)
and the second parameter is at 4(%esp).
Why does the add function use offsets 8 and 12 to access these?
Answer:
Section 3.3: Data Formats
Data formats for IA32:
suffix | type | bits | used for |
b | Byte | 8 | char |
w | Word | 16 | short |
l | Double Word | 32 | int, long, pointers |
s | Single Precision | 32 | float |
l | Double Precision | 64 | double |
t | Extended Precision | 80 or 96 | long double |
Note: No direct support for
long long (64 bit integer)