Week 2 - Processes and Process Management
Process
Link to originalProcess
A process is an instance of an executing program. It has some state in memory such as:
- The code that is being ran,
- Initial data it started with,
- The heap associated with the application, and
- The stack associated with the application. This all lives in the processes address space.
Aside
Heap (OS)
Link to originalHeap
The heap of a processes is dynamic memory which is allocated at run time. It will be used to store variables which my vary dramatically in size depending on what the application is run on - for example reading data into memory. This memory will stay allocated until it is explicit de-allocated. Therefore the heap can come with considerable overheads and require techniques like garbage collection or custom allocations to handle.
Stack (OS)
Link to originalStack (OS)
The stack of an application is a FIFO queue of stack frames - these contains a functions parameters local variables and return address. These get added when a function is called and removed once a function completes. The stack acts as the control flow for a process determining where to return to once a function as completed. The stack has a fixed size when a process starts and if it goes beyond that size can cause a stack overflow.
Address space (OS)
Link to originalAddress space (OS)
For a process the address space is the virtual memory associated to the executing program. This is used to make memory management within the program simple and abstracts handing the physical memory to the OS.
Metaphor
A process is line an order of toys:
- State of execution:
- Completed,
- waiting, or
- In progress.
- Parts and temporary holding area
- Pieces used to make the toy, or
- Containers to put the pieces.
- May require special hardware:
- Sewing machine or
- glue gun.
This is analogy to an OS where a process has:
- State of execution:
- Program counter, or
- Stack
- Parts and temporary holding area
- data, or
- registered state in memory.
- May require special hardware:
- I/O, or
- access to sound output.
Process execution state
For the OS to stop and start running processes it must keep track of what it is doing. For this it uses:
All this information is stored in the PCB:
Process control block (PCB)
Link to originalProcess control block (PCB)
A Process control block is a data structure that holds the state for a process. This includes but is not limited to:
- Process Identification (PID),
- Of both the process and its parent if that exists.
- Process state
- Program counter
- CPU register
- Memory management information,
- Scheduling information,
- Accounting information,
- SPU usage, elapsed time, user/system time.
- I/O status
- Process privileges, and
- Process metadata.
This block is fully instantiated when a process starts however it is frequently updated as the process is executing. It is the job of the OS to keep this up to date and correct - it will need this when it starts and stops processes.
Switching process
When running a given process that CPU has the PCB loaded into the CPU registers. If the CPU were to suspend that process it would have to write that PCB to memory and load the new processes PCB into the CPU registers. This is called a Context switch (CPU).
Context switch (CPU)
Link to originalContext switch
Context switching is costly for two reasons:
- Direct costs: This comes from physically having to write the PCB from the CPU registers into memory and vice versa.
- Indirect costs: The CPU has multiple layers of caches. When switching from one process to another you have to switch the data present in all these caches normally making data access temporarily very costly.
Process life cycle
During a processes time it goes through multiple different stages.
- New: Once the user issues a process that they want to start a PCB is made and it is admitted to the CPU.
- Ready: This is a process that has something to do but is not being ran on the CPU yet.
- Waiting: If the process has to wait on some event from the network or I/O then it will be moved into a waiting stating for that to finish.
- Running: It will have been context switched onto and the PCB loaded into the CPU register.
- Terminated: Once a process has exited or error-ed it moves state to terminated to be cleaned up.
Creation
When you start the computer the OS starts a number of processes that have privileged access. These in tern create the application that you run on your computer. There are two system calls to create a new process:
- Fork: This creates an exact copy of the current process, including the program counter.
- Exec: This replaces a processes PCB with that of a new program. The normal flow for a process to start another one is to call fork followed by exec.
CPU Scheduler
This is a process that determines which one of the ready processes will be dispatched next to the CPU and how long it should run for. This is done via 3 operations:
- Preempt: Interrupt and save the current context.
- Schedule: Run the scheduler to choose the next process.
- Dispatch: dispatch a process and switch to its context. An efficient OS wants to spend as much time running processes the user wants to run and the least time possible running the above 3 operations.
There are two important decisions that you must take when deciding on the scheduler.
- How long should processes run for?
- What metrics to choose the next process to run?
I/O scheduling
When a process is stopped by an IO operation this is then handled by the device driver associated with that IO task. The process will enter the waiting state until the device driver interrupts the CPU to let it know the operation has been completed and the process can move back to the ready state. Though there are other ways this waiting state can end - for example a time out.
Inter-process communication
As modern applications get more complex they are being structured to be multiple processes communicating with on another. However, the OS is on purposely structured to isolate different applications from one another. Therefore they need to communicate to each other using IPC
Inter-process communication (IPC)
Link to originalInter-process communication (IPC)
Inter-process communication is the method or API in which different processes can communicate with one another. There are four main methods to communicate messages between two processes.
- Message-passing IPC: This is via the OS which offers an API to pass messages between processes the OS puts them on a message bus that is sent to the other process. This has the advantage that it is managed by the OS and is safe - though it has the disadvantage of needing the OS which incurs a lot of overhead.
- Shared memory IPC: This lets two processes share some physical memory which is mapped into both their virtual memory space. This means the OS is out of the way but the two processes must know how to use that shared memory with one another - sometimes having to re-implement code in the OS.
- Higher level semantics: Such as shared files or Remote Procedure Calls (RPC).
- Synchronization: Methods in which two processes can synchronize so not to adversely effect one an others operation. Examples are mutexes or semaphores.