Context Switching

the fifth chapter in our series on How to Write an Operating System

Causing a Task Switch

The i386 provides a hardware data structure called the task state segment (TSS), which keeps track of segment selectors, general purpose registers, 4 levels of stack pointers, the instruction pointer, and various permissions and flags.

After the task and TSS have been created and all the relevant fields are filled in correctly, you need to create a GDT entry for the TSS, and from there you can switch to the new context at will.

Creating a Task State Segment

struct TSS32 {
   unsigned short backlink, __blh;
   unsigned long esp0;
   unsigned short ss0, __ss0h;
   unsigned long esp1;
   unsigned short ss1, __ss1h;
   unsigned long esp2;
   unsigned short ss2, __ss2h;
   unsigned long cr3, eip, eflags;
   unsigned long eax,ecx,edx,ebx,esp,ebp,esi,edi;
   unsigned short es, __esh;
   unsigned short cs, __csh;
   unsigned short ss, __ssh;
   unsigned short ds, __dsh;
   unsigned short fs, __fsh;
   unsigned short gs, __gsh;
   unsigned short ldt, __ldth;
   unsigned short trace, iomapbase;
};

The fields of the TSS should reflect the initial state of the registers when the context is started. Most of the registers can be left alone for the application's use; however, the following fields must be filled in before you start the task initially:

The Page Directory Base Pointer (or CR3) should be set to its current value until you've created a new address space.
The Instruction Pointer (EIP) needs to be set to the entry point of the task.
eFlags should, at the very least, be set to 0x0002.
The Segment Registers need to be set to appropriate values.
The Stack Pointer (ESP) should be set to the (double-word aligned) top of the stack you've set aside for this task.

Adding the TSS to the GDT

A TSS needs to be added to the GDT before anything context switching can ever occur. A TSS entry (type 9) looks just like every other GDT entry, with appropriate base, limit, and protection values. It is your choice if you want to dynamically or statically allocate entries in the GDT for tasks; however, if you choose to statically allocate TSS entries (they persist beyond a single context switch), you will most likely need to clear the Busy Bit (a Busy TSS' is type 11, an Available TSS is type 9).

Setting and Reading the Current Context

The ltr() function will load the current Task Register (TR), thereby setting the current task. Because every context switch both stores the old set of register values and loads in a new set, the TR needs to be loaded before the first task switch occurs. And because each task switch loads the TR with a new value, you will probably not need to touch it again once you start switching.

void ltr(unsigned int selector)
{
   asm ("ltr %0": :"r" (selector));
}

unsigned int str(void)
{
   unsigned int selector;

   asm ("str %0":"=r" (selector));
   return selector;
}

Switching to Another Context

A far jump to a TSS selector (regardless of offset) will cause a task switch:

void switch(unsigned int selector)
{
   unsigned int sel[2];
  
   /* sel[0] = offset, which is irrelevant for a task switch */
   sel[1] = selector;

   /* NOTE: lcall sets eFlags[NT] bit, whereas ljmp does not */
   asm ("lcall %0": :"m" (*sel));    

   /* we get back to here when someone else has switch()'ed 
      back to us */
}

Returning to a Called Context

When a context switch occurs, either manually (via the lcall instruction as above) or because of an exception, the processor sets the Nested Task (NT) bit in the eFlags register, sets the Busy (B) Bit (B) in the type field of its GDT entry, stores the selector of the previous task in the back-link field of the called TSS, and transfers control to the next task.

So if the called task executes an iret instruction with the eFlags[NT] bit set, the processor assumes that it should return to the calling task (which is conveniently stored in the backlink field). This is very useful, in a few instances:

when writing interrupt or exception handlers. These can be written identically if you've used an interrupt/trap gate or a task gate (the mechanisms of which, of course, are very different).
before you've written a scheduler. By transferring control to another thread and then returning, you can test the operating system (complete with context-switching ability) without losing the determinism of only having one thread of control.

NOTE: an iret from a system call and an iret from a task switch have very different mechanics, but they are the exact same instruction. The processor decides which mechanism to use based on the NT flag. The kernel needs to know which method it expects the processor to take, and to set the NT flag to a known state if there is any possibility of the opposite case being true.

void taskReturn(void) 
{
/* asm ("pushf; orl  $0x00004000, (%esp); popf; iret");     sets NT   */
/* asm ("pushf; andl $0xffffbfff, (%esp); popf; iret");     clears NT */
   asm ("iret");            /* assumes the processor will be correct */
}

NOTE: a context switch due to an ljmp instruction does not set eFlags[NT], nor change the value of the backlink field.