CS 3843 Computer Organization Notes on Chapter 3 Sections 3.6 and 3.7

Sections 3.6 and 3.7 Overview

Conditional jump instructions are typically used as follows:

     cmpl %eax, %ebx
     ja   .label
     ...
.label:

The instruction:

cmpl %eax, %ebx

is interpreted as compare %ebx to %eax.
Note that the order may seem backwards for now.
The ja instruction is jump on above which transfers control if the previous comparison yielded %ebx was above %eax.
When doing unsigned comparisons, we use above and below.

If we wanted to do a signed comparison we would use jg instead of ja.
For signed comparisons, we use greater and less.
The code would look like:

     cmpl %eax, %ebx
     jg   .label
     ...
.label:

Notice that there is no change in the cmpl instruction.

How this works
The cmp instructions set or clears 4 flags.
A flag has a boolean value (true or false) and each can be represented by a single bit.
If the value of the bit is 1, we say the bit is set, or is true.
If the value of the bit is 0, we say the bit is clear, or is false.
The 4 flags that are used are the carry flag, zero flag, sign flag, and overflow flag.
The instruction cmpl %eax, %ebx calculates (%ebx-%eax) and sets the flags based on the result.

The zero flag (ZF) tells up if the result was 0.
So je (jump on equal) will just use the zero flag.
The carry flag (CF) is set if the subtraction (of unsigned values) produces the wrong value (is not ≥ 0).
The jb (jump on below) will just use the CF flag for the CF flag is set when %ebx is smaller than %eax (as unsigned numbers)
The jbe (jump if below or equal) will use both of these flags since we want to jump if either is true.
Note that the ja (jump on above) can use the same 2 flags since this is the opposite of jbe.
Since jbe will jump if (CF|ZF) is set, ja will jump if ~(CF|ZF) = ~CF&~ZF.

Section 3.6: Control

IA32 uses jump instructions for transfer of control.

Section 3.6.1: Condition Codes

IA32 uses four single-bit flags called condition codes which are set by certain instructions based on the result of the instruction.

Flag	Name	Use
CF	carry flag	carry out of most significant bit: for unsigned overflow
ZF	zero flag	zero
SF	sign flag	sign bit is set
OF	overflow flag	two's complement overflow: for signed overflow true if the sign bit is not correct

OF flag: result of add or sub has wrong sign

addl sets the OF if both operands have the same sign
but the result has a different sign.
subl A, B calculates B - A and sets OF if
B > 0, A < 0 and B - A < 0 or
B < 0, A > 0 and B - A > 0 or

CF flag: carry out of high bit

addl sets CF if unsigned result does not fit
subl A, B calculates B - A and sets CF if A (unsigned) is bigger than B
sal and shl set the carry bit to the last MSB (most significant bit) shifted out
sar and shl set the carry bit to the last LSB (least significant bit) shifted out

The following instructions set the condition codes appropriately:

inc, dec, neg, not, add, sub, mul, imul, div, idiv, xor, or, and, sal, shl, sar, shr,

The following instructions do not modify the condition codes:

mov, leal, push, pop, call, ret, cltd

Today's News: March 7

Assignment 2 regrade due today.

In addition, there are test and compare instructions which take two operands:

cmp S₂, S₁ : cmpb, cmpw, cmpl : performs S₁ - S₂
test S₂, S₁ : testb, testw, testl : performs S₁ & S₂

These do not store the resulting computation in the destination, only the condition codes are set.

Section 3.6.2: Accessing (and Understanding) the Condition Codes

You can set a byte to 0 or 1 based on the condition flags with the set instructions:
These take a single byte operand as the destination: either an 8-bit register or a single byte of memory.
The variations are:

instruction	synonym	effect	description
sete D	setz	D = ZF	equal or zero
setne D	setnz	D = ~ZF	not equal or not zero

sets D		D = SF	negative
setns D		D = ~SF	nonnegative

setg D	setnle	D = ~(SF^OF)&~ZF	signed greater
setge D	setnl	D = ~(SF^OF)	signed greater or equal
setl D	setnge	D = SF^OF	signed less
setle D	setng	D = (SF^OF)\|ZF	signed less or equal

seta D	setnbe	D = ~CF&~ZF	unsigned above
setae D	setnb	D = ~CF	unsigned above or equal
setb D	setnae	D = CF	unsigned below
setbe D	setna	D = CF\|ZF	unsigned below or equal

Note that the carry flag (CF) is used for unsigned comparisons while the combination of SF and OF are used for signed comparisons.

The important part of this table is the effect field which shows how the 4 condition codes are related to various tests.
The entries in red are easiest to understand. Others from the same group can be easily derived from these.

The description field is based on a previous instruction of the form:

cmp S₂, S₁

negative refers to the value of S₁ - S₂
greater, less, above, or below refer to comparing S₁ to S₂.
For example, greater is true if S₁ is greater than S₂
Note that the order might seem backwards.

cmpl Example 1

Question:

Consider the following code segment:

cmpl   $10, $20
jle     .L1

Does this jump?

Answer:

You should be able to derive the effect column for each line of the above table.
The ones in red are the simplest, and should be relatively easy. The others in a group can be derived from the red one.

Unsigned comparisons

In interpreting the effect and description, consider the instruction:
cmpl S₂, S₁ which calculates S₁ - S₂.
If S₁ and S₂ are unsigned, S₁ is below S₂ if the result of S₁ - S₂ cannot be correctly represented. In this case the carry flag is set.
The other three comparisons can be understood from this one using de Morgan's laws.

Example: derive the flags conditions for seta from those of setb:

setb is determined by the CF, so setbe is determined by CF|ZF
seta is the opposite of setbe, so seta is determined by ~(CF|ZF) = ~CF&~ZF

Signed comparisons

The signed comparisons are a bit more complicated.
Consider setl
- When is S₁ < S₂?
- Answer: S₁ - S₂ < 0.
- This is true if the sign bit is set, so you might think that just SF is sufficient.
- But recall, that sometimes the SF is incorrect.
- This is indicated by the OF flag.
- So if SF is correct (OF=0), we just test SF = 1, or SF.
- If SF is incorrect (OF=1), we want SF = 0.
- This is SF^OF.
Other signed comparisons can be gotten from >= using de Morgan's laws.

Example: derive the flags conditions for setle from those of setl:

setl is determined by SF^OF, so setle is determined by (SF^OF)|ZF

See a summary of discussion of flags here.

Section 3.6.3: Jump Instructions and Their Encoding
Section 3.6.4: Translating Conditional Branches

Jump instructions change the flow of control so that the next instruction executed is not the next instruction.

Traditional instruction cycle, also called fetch-and-execute cycle or fetch-decode-execute cycle.
The program counter (PC) register contains the address of the next instruction to execute.

Fetch: read the instruction whose address is in the PC
Increment PC: increment PC so that it points to the next instruction
Decode: determine what instruction this is
Execute: do what the instruction indicates
Store: store the result

Continue doing this in a loop forever.

A jump instruction is one that modifies the PC during the execute phase.

The IA32 has two types of unconditional jump instructions:

direct and indirect.

jmp Label
jmp *Operand (Operand is one of the addressing modes)

Unconditional jumps are rarely used, except with conditional jumps.
Examples to follow.

IA32 Conditional Jump instructions.
There are no indirect conditional jumps.
This table is similar to the table for set

instruction	synonym	Jump Condition	description
je Label	jz	ZF	equal or zero
jne Label	jnz	~ZF	not equal or not zero

js Label		SF	negative
jns Label		~SF	nonnegative

jg Label	jnle	~(SF^OF)&~ZF	signed greater
jge Label	jnl	~(SF^OF)	signed greater or equal
jl Label	jnge	SF^OF	signed less
jle Label	jng	(SF^OF)\|ZF	signed less or equal

ja Label	jnbe	~CF&~ZF	unsigned above
jae Label	jnb	~CF	unsigned above or equal
jb Label	jnae	CF	unsigned below
jbe Label	jna	CF\|ZF	unsigned below or equal

Today's News: March 17

Welcome back from spring break.

Jump Example

Consider the C program in jump.c

int simple_jump(int x, int y, int z) {
   if (x == 0)
      return y-z;
   return z-y;
}

After cc -O1 -S jump.c, jump.s contains:

simple_jump:
        pushl   %ebp
        movl    %esp, %ebp

        cmpl    $0, 8(%ebp)     // compare x to 0
        jne     .L2             // jump if x != 0
                                // get here if x == 0
        movl    12(%ebp), %eax  // y into %eax
        subl    16(%ebp), %eax  // y - z into %eax
        jmp     .L3             // done
.L2:                            // this is the case x != 0
        movl    16(%ebp), %eax  // z in %eax
        subl    12(%ebp), %eax  // z - y in %eax

.L3:                            // common return
        popl    %ebp
        ret

There are several ways that jump instructions are encoded.
The simplest of which is with PC-relative destination.
After cc -c -O1 jump.c and objdump -d jump.o we get

00000000 <simple_jump>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 7d 08 00             cmpl   $0x0,0x8(%ebp)
   7:   75 08                   jne    11 <simple_jump+0x11>
   9:   8b 45 0c                mov    0xc(%ebp),%eax
   c:   2b 45 10                sub    0x10(%ebp),%eax
   f:   eb 06                   jmp    17 <simple_jump+0x17>
  11:   8b 45 10                mov    0x10(%ebp),%eax
  14:   2b 45 0c                sub    0xc(%ebp),%eax
  17:   5d                      pop    %ebp
  18:   c3                      ret

In the assembly code, labels have been replaced by addresses relative to the start of the program.
During the execute phase of the jne instruction at 7, the PC has the value 9.
The encoding of jne shows a jump offset of 8. 9 + 8 = 11.
During execution of the jmp instruction at f, the PC has value 11.
The jump offset is 6, giving 11 + 6 = 17.

Section 3.6.5: Loops

Loop Example 1: a do-while loop

int fact_do(int n) {
   int result = 1;
   do {
      result *=n;
      n--;
   } while (n > 1);
   return result;
}

and the corresponding assembly code:

fact_do:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx  // n in %edx
        movl    $1, %eax       // result is in %eax
.L2:
        imull   %edx, %eax     // result = result * n
        subl    $1, %edx       // n--;
        cmpl    $1, %edx       // compare 1 to n
        jg      .L2            // jump if n > 1
        popl    %ebp
        ret

Loop Example 2: A while loop

int fact_while(int n) {
   int result = 1;
   while (n > 1) {
      result *= n;
      n--;
   }
   return result;
}

and the corresponding assembly code:

fact_while:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx   // n in %edx
        movl    $1, %eax        // result in %eax
        cmpl    $1, %edx        // see if n > 1
        jle     .L3             // no, we are done
.L6:
        imull   %edx, %eax      // result = result * n
        subl    $1, %edx        // n--;
        cmpl    $1, %edx        // compare again
        jg      .L6             // keep going if n > 1
.L3:
        popl    %ebp
        ret

Note the use of the test before entering the loop and again at the end of the loop.

Loop Example 3: A for loop

int fact_for(int n) {
   int i;
   int result = 1;
   for (i=2; i <= n; i++)
      result *=i;
   return result;
}

and the corresponding assembly code:

fact_for:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %ecx  // n in %ecx
        movl    $2, %edx       // 2 in %edx (this is i)
        movl    $1, %eax       // 1 into %eax (the result)
        cmpl    $1, %ecx       // compare n to 1
        jle     .L3            // done if n <= 1 (continue if n >= 2)
.L6:
        imull   %edx, %eax    // result = result * n;
        addl    $1, %edx      // i++
        cmpl    %edx, %ecx    // compare n to i
        jge     .L6           // continue if n >= i
.L3:
        popl    %ebp
        ret

You can find a trace of this example here

We will skip sections 3.6.6 and 3.6.7

Today's News: March 19

Section 3.7: Procedures

A procedure involves:

Passing data
Passing control
Returning a value

Section 3.7.1: Stack Frame Structure

The stack is used for passing parameters, for local variables, and storing other values.

The stack in organized into pieces called Stack Frames.

See Figure 3.21 from the book.

Look at the current frame in the diagram.
All procedures start with the following two instructions:

push %ebp
movl $esp, %ebp

Notice the following:

The %ebp frame pointer points to the saved %ebp register on the stack.
Usually %ebp does not change.
The first parameter is at 8(%ebp) because of the return address and saved %ebp.
%ebp is used to address data on the caller's stack (such as the parameters)
In our examples, %esp usually did not change during execution, but in general it will when space on the stack is allocated for
- saved registers
- local variables
- parameters of procedures that will be called
You need to understand the terminology: caller and callee.
In Example 8 of the previous section, we saved the callee-save register %ebx

Section 3.7.2: Transferring Control

Three instructions used for supporting procedures:

call label
leave
ret

call can also have the form call *Operand, but we will not be using it.

call pushes the return address (current PC) on the stack and sets the PC to the label.
leave is equivalent to:

movl %ebp, %esp
popl %ebp

The purpose of the first of these is to restore the stack pointer to the value it had after the initial push of %ebp.
We have not seen leave before because none of our procedures have needed to change %esp, so the first of these was not necessary.

ret pops the return address into the PC

Section 3.7.3: Register Usage Conventions

IA32 has 8 32-bit registers.

%ebp and %esp have special uses and must be maintained as in the examples.
%eax, %ecx, and %edx are caller-save registers
- This means that the caller is responsible for saving these registers, if necessary, before calling a procedure.
- The procedure may modify these registers as it sees fit.
- %eax is used for return values of 32 bits or less
%ebx, %esi, and %edi are callee-save registers.
- The caller assumes that they will not be changed by a procedure call.
- If a procedure wants to use these, it must save and restore them.

Section 3.7.4: A Procedure Example from the book

Example: swap_add (from book)

int swap_add(int *xp, int *yp) {
   int x = *xp;
   int y = *yp;
   *xp = y;
   *yp = x;
   return x + y;
}

And here is the caller:

int caller() {
   int arg1 = 534;
   int arg2 = 1057;
   int sum = swap_add(&arg1, &arg2);
   int diff = arg1 - arg2;
   return sum*diff;
}

See Figure 3.24 from the book.

Here is the assembly code generated for swap_add:

swap_add:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    8(%ebp), %edx   // xp
        movl    12(%ebp), %ecx  // yp
        movl    (%edx), %ebx    // x
        movl    (%ecx), %eax    // y
        movl    %eax, (%edx)    // *xp = y
        movl    %ebx, (%ecx)    // *yp = x
        addl    %ebx, %eax      // x + y for return
        popl    %ebx
        popl    %ebp
        ret

And here is the assembly language code for the caller:

caller:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp      // allocate 6 words on the stack
        movl    $534, -4(%ebp) // 534 on stack
        movl    $1057, -8(%ebp)// 1057 on stack
        leal    -8(%ebp), %eax // address of 1057 into %eax
        movl    %eax, 4(%esp)  // address of 1057 on stack
        leal    -4(%ebp), %eax // address of 534 into %eax
        movl    %eax, (%esp)   // address of 534 on stack
        call    swap_add
.R1:    movl    -4(%ebp), %edx // arg1 into %edx
        subl    -8(%ebp), %edx // arg1 - arg2 into %edx
        imull   %edx, %eax     // diff * return value in %eax
        leave                  // restore the stack pointer
        ret

I added the .R1 label to aid in tracing.
You can find a trace of this example here

Why did the compiler reserve 6 words = 24 bytes on the stack when it only needed 3 words?

Convention: the total number of stack bytes used by a function should be a multiple of 16.
This counts the 4 bytes for the return address and the 4 bytes for the saved %ebp.
If only 3 words were reserved, this would be 12 + 8 = 20 bytes.
To get this up to 32, we need to add 12 more bytes, or 3 more words.
This does not reduce the speed of execution.
It does use a small amount of extra memory.

Today's News: March 21

Section 3.7.5: Recursive Procedures

Recursive Factorial

int rfact(int n) {
   int result;
   if (n < 1)
      result = 1;
   else
      result = n * rfact(n-1);
   return result;
}

and the corresponding assembly code:

rfact:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx              // %ebx is a callee-save register
        subl    $4, %esp          // reserve 4 extra bytes on the stack
        movl    8(%ebp), %ebx     // n into %ebx
        movl    $1, %eax          // 1 int %eax
        testl   %ebx, %ebx        // test n
        jle     .L3               // jump if n <= 0  (same as n < 1)
        leal    -1(%ebx), %eax    // %eax = n - 1
        movl    %eax, (%esp)      // move n-1 onto the stack
        call    rfact             // call rfact with parameter n-1
.R1:
        imull   %ebx, %eax        // n * return value into %eax for return
.L3:
        addl    $4, %esp          // restore %esp
        popl    %ebx              // restore %ebx
        popl    %ebp
        ret

I have added the label .R1 so we can use it in tracing.
You can find a trace of this example here

Recursion Efficiency

Question:

What is the maximum amount of stack space needed for rfact(n)?

Answer:

Today's News: March 24

Question:

We need to push %ebx since it is a callee-save register.
How would this change if we used a caller-save register instead?

Answer:

Question:

Compare the time for calculating n! using the for loop and the recursive method.

Answer:

Back to CS 3843 Notes Table of Contents