Operating System Summary

Operating System Summary#

I. Overview#

Concurrency and Parallelism#

Concurrency: A single processor handles multiple tasks simultaneously; here, "simultaneously" is a macro concept, referring to tasks being executed alternately within a certain time period.
Parallelism: Multiple processors or multi-core processors handle multiple different tasks at the same time; here, "simultaneously" means truly executing multiple tasks at the same moment.

Sharing refers to resources in the system being used by multiple concurrent processes. There are two types of sharing: mutual exclusion sharing and simultaneous sharing. Resources that are mutually exclusive are called critical resources, such as printers, where only one process is allowed to access them at a time, requiring synchronization mechanisms to manage access to critical resources.

Virtualization#

Virtualization technology converts a physical entity into multiple logical entities. There are mainly two types of virtualization technologies: time division multiplexing and space division multiplexing. For example, multiple processes can concurrently execute on the same processor using time division multiplexing, allowing each process to take turns occupying the processor, executing for a small time slice and quickly switching.

Synchronous and Asynchronous#

Synchronous means the entire processing sequence is executed in order, and results are returned only after all processes are completed. It is a linear execution method where the execution flow cannot cross. It is generally used in programs with strong procedural requirements, such as user login, where the system can only log in after user verification is complete.
Asynchronous means that only the call instruction is sent, and the caller does not need to wait for the called method to complete; instead, it continues executing the following processes. It is a form of parallel processing that does not require waiting for one program to finish before executing other tasks, such as in the process of loading page data, where the page does not need to wait for all data to be retrieved before displaying.

Kernel Mode and User Mode#

Kernel Mode: The CPU can access all data in memory, including peripheral devices such as hard drives and network cards. The CPU can also switch itself from one program to another.
User Mode: Access to memory is limited, and access to peripheral devices is not allowed. The ability to occupy the CPU is deprived, and CPU resources can be acquired by other programs.
There are three ways for user mode to switch to kernel mode: system calls, exceptions, and interrupts from peripheral devices.

II. Processes and Threads#

Process#

A process is an instance of a program in execution, possessing its own program counter and internal state. It is an independent unit for resource allocation and scheduling in the system. The Process Control Block (PCB) describes the basic information and running state of the process. The creation and termination of processes refer to operations on the PCB.

Thread#

A thread is the basic unit of independent scheduling. A process can have multiple threads that share the process's resources. For example, in a browser, the browser process contains many threads, such as HTTP request threads, event response threads, rendering threads, etc. The concurrent execution of threads allows the browser to respond to other user events while initiating an HTTP request when a new link is clicked.

Differences#

Resource Ownership
A process is the basic unit of resource allocation, while a thread does not own resources; threads can access resources belonging to the process.
Scheduling
Threads are the basic unit of independent scheduling. Within the same process, switching threads does not cause a process switch. Switching from a thread in one process to a thread in another process will cause a process switch.
System Overhead
Creating or terminating a process requires the system to allocate or reclaim resources such as memory space and I/O devices, which incurs much greater overhead than creating or terminating threads. Similarly, during process switching, the current executing process's CPU environment must be saved, and the new scheduled process's CPU environment must be set, while thread switching only requires saving and setting a small amount of register content, resulting in minimal overhead.
Communication
Inter-process communication (IPC) requires synchronization and mutual exclusion mechanisms to ensure data consistency. Threads can communicate by directly reading/writing data segments (such as global variables) within the same process.

Process States#

Ready State: The process has acquired all resources outside the CPU and can execute immediately once the processor allocates resources.
Running State: The process has obtained resources allocated by the processor, and the program begins execution.
Blocked State: When the program does not meet certain conditions, it must wait until those conditions are satisfied to execute, such as waiting for I/O operations. This state is called the blocked state.

Thread States#

New (New): The thread is derived within a process; it can be spawned by either a process or another thread.
Blocked (Block): If a thread needs to wait for an event during execution, it becomes blocked.
Unblocked (Unblock): If the event blocking the thread occurs, the thread is activated and enters the ready queue.
Scheduled (Schedule): A ready thread is selected to enter the execution state.
Finished (Finish): If a thread finishes execution, its register context and stack contents will be released.

Thread Implementation Methods#

User-Space Thread Implementation
User-level threads are implemented in user programs without kernel support. They do not depend on the operating system core, and application processes use thread library functions to create, synchronize, schedule, and manage threads. There is no need for user mode/kernel mode switching, making it fast. The operating system kernel is unaware of the existence of multiple threads, so if one thread blocks, the entire process (including all its threads) will block. Since processor time slices are allocated based on processes, the execution time of each thread is relatively reduced.
Kernel-Level Thread Implementation
Kernel-level threads are created and terminated by the operating system kernel. The kernel maintains context information for processes and threads, as well as thread switching. A kernel thread blocked due to I/O operations will not affect the execution of other threads.
Advantages of User-Space Thread Implementation
1. Can be implemented in operating systems that do not support threads.
2. The cost of creating and destroying threads, as well as thread switching, is much lower than that of kernel threads.
3. Allows each process to customize its own scheduling algorithm, making thread management more flexible.
4. Threads can utilize more table space and stack space than kernel-level threads.
Disadvantages of User-Space Thread Implementation
1. Only one thread can run at a time within the same process. If one thread makes a system call and blocks, the entire process will be suspended.
2. Page faults can also cause the entire process to be suspended.
3. The advantages and disadvantages of kernel threads are the opposite of user threads. In practice, the operating system can use a hybrid approach to implement threads.

Coroutines#

Coroutines are a lighter-weight alternative to threads. Just as a process can have multiple threads, a thread can also have multiple coroutines. Most importantly, coroutines are not managed by the operating system kernel but are entirely controlled by the program (executed in user mode). This brings significant performance improvements, as switching between coroutines does not consume resources like thread switching does. Essentially, they are threads in user space (i.e., there is no context switching, and user and kernel address space switching occurs, but coroutines run within a single thread).

Coroutine Applications#

Lua has used coroutines since version 5.0, implemented through the coroutine extension library.
Python can implement coroutines using yield/send. After Python 3.5, async/await became a better alternative.
Go language has a very powerful and concise implementation of coroutines, allowing the easy creation of hundreds or thousands of coroutines to execute concurrently.
Java does not have native support for coroutines, but some open-source frameworks simulate coroutine functionality, such as the Kilim framework source code.

Questions and Thoughts#

There is too much mixed information online about process and thread states, making it hard to understand their relationship. Currently, I believe that processes should also include new and finished states, which is similar to threads. I think only running, ready, and blocked states should be considered basic states, while others like new, finished, and awakened should only be seen as state transition operations.
In Java, threads also have a waiting state, which actually refines the blocked state into BLOCKED and WAITING. The difference is that waiting is active waiting, while blocking is passive blocking. In the waiting state, only an active wake-up can transition it to the ready state, while in the blocking state, the thread itself is continuously contending for a lock. Once it acquires the lock, it will switch to the ready state without needing to be actively awakened by others.

Reference Links#

III. Inter-Process Communication Methods#

Pipe#

A pipe is a shared file that connects a reading process and a writing process for data exchange.

It is half-duplex (i.e., data can only flow in one direction) and has fixed read and write ends. A pipe only has significance when both read and write ends exist.
When the read end finds the pipe empty, it needs to sleep and wait until data is available to be awakened. Similarly, the write end waits when the pipe is full until it is awakened.
It can be seen as a special file, and normal read, write, etc., functions can be used for its read and write. However, it is not an ordinary file and does not belong to any other file system, existing only in memory.
Mutual exclusion: Only one process can read or write to the pipe at any given time.
There are unnamed pipes and named pipes; the former is used for communication between parent and child processes, while the latter is used for communication between any two processes running on the same machine.

Message Queue#

A message queue is a linked list of messages stored in the kernel. A message queue is identified by an identifier (i.e., queue ID).

Message queues are record-oriented, with messages having specific formats and priorities.
Message queues are independent of the sending and receiving processes; when a process terminates, the message queue and its contents are not deleted.
Message queues can implement random querying of messages; messages do not necessarily have to be read in a first-in-first-out order and can also be read by message type.

Signal#

A more complex communication method used to notify the receiving process that a certain event has occurred. It is the only asynchronous communication mechanism in inter-process communication.

Semaphore#

Unlike the IPC structures introduced earlier, a semaphore is a counter. Semaphores are used to implement mutual exclusion and synchronization between processes, rather than to store inter-process communication data.

Semaphores are used for inter-process synchronization. To pass data between processes, shared memory must be combined.
Semaphores are based on the operating system's PV operations, and operations on semaphores are atomic operations.
Each PV operation on a semaphore is not limited to incrementing or decrementing the semaphore value by 1 but can add or subtract any positive integer.
Supports semaphore groups.
Note⚠️: Semaphores and signals are not the same concept; they are very different.

Shared Memory#

Refers to two or more processes sharing a given storage area.

Shared memory is the fastest form of IPC because processes directly access memory.
Since multiple processes can operate simultaneously, synchronization is required.
Semaphores and shared memory are usually used together, with semaphores synchronizing access to shared memory.

Socket#

Sockets are also a mechanism for inter-process communication, enabling communication between processes on different hosts. A socket can be seen as an endpoint for inter-process communication, with each socket having a unique name that other processes can access, connect to, and communicate data.

Summary#

Pipe: Slow speed, limited capacity.
Message Queue: Capacity is limited by the system, and care must be taken when reading for the first time to consider any previous unread data.
Semaphore: Cannot pass complex messages, only used for synchronization.
Shared Memory: Can easily control capacity, fast, but synchronization must be maintained. For example, when one process is writing, another process must be cautious about read/write issues, similar to thread safety. Of course, shared memory can also be used for communication between threads, but it is unnecessary since threads already share a block of memory within the same process.

IV. Synchronization Mechanisms#

Background#

Concurrent execution of multiple processes (or threads) can lead to mutual constraints, such as when two processes need to:
1. Share a unique hardware device.
2. Share the same memory area.
3. The execution of one process depends on the execution results of another process regarding shared resources.

If there is a temporal relationship between multiple processes that need to work together to complete a task, it is called synchronization; if the conditions for collaboration are not met, but the relationship arises from sharing exclusive resources, it is called mutual exclusion, which is a special form of synchronization.

For synchronization mechanisms, the following four rules must be followed:
1. Free to enter: Any process can enter when no processes/threads are in the critical section.
2. Busy waiting: When there are processes/threads in the critical section, other processes cannot enter the critical section.
3. Bounded waiting: Processes/threads waiting to enter the critical section cannot wait indefinitely.
4. Yielding wait (optional): Processes/threads that cannot enter the critical section should release the CPU, such as transitioning to a blocked state.

Semaphore#

Semaphore and PV primitive operations were invented by Dijkstra and are one of the most widely used mutual exclusion methods.
Semaphore and PV operation principles:
1. When a process wants to enter a shared area, it first executes the P operation, S-1.
2. When a process wants to exit the shared area, it executes the V operation, S+1.
3. The operations of processes entering and exiting the shared area are atomic operations (the execution process cannot be interrupted).
Defect: The readability of the program is relatively poor, and the management of semaphores is dispersed among the participating entities, which may lead to deadlocks, process starvation, and other issues.

Monitor (Unique to Process Synchronization)#

A monitor is an extension and improvement of the semaphore mechanism and is a simpler synchronization method.
A monitor is an object or module that can be safely accessed by multiple processes/threads.
The methods within a monitor are protected by a mutex, meaning that only one visitor is allowed to use them at any given time.
Characteristics: Safety, Mutual Exclusion, Sharing.

Critical Section (Thread Synchronization)#

Accessing shared resources or a segment of code through serialization of multithreading is fast and suitable for controlling data access. At any given moment, only one thread is allowed to access shared resources. If multiple threads attempt to access shared resources, once one thread enters, the other threads attempting to access the shared resources will be suspended and will wait until the thread in the critical section exits. After the critical section is released, other threads can then seize access.

Mutual Exclusion#

Mutual exclusion can achieve safe sharing of public resources not only within the same application but also across different applications. Mutexes are more complex than critical sections. Using mutexes can achieve safe sharing of resources not only among different threads in the same application but also among threads in different applications.

Event (Thread Synchronization)#

Maintaining thread synchronization through notification operations also facilitates operations for comparing the priorities of multiple threads.

Differences in Synchronization Mechanisms Between Processes and Threads#

Semaphores, monitors, and mutexes are synchronization mechanisms for processes, while semaphores and mutexes can also be used for thread synchronization, but monitors are only used in process synchronization.
Thread synchronization, in addition to semaphores and mutexes, also includes critical sections and events. There is no mention of these two methods being used for process synchronization (unknown).

Classic Synchronization Problems#

Producer-Consumer Problem
Dining Philosophers Problem
Reader-Writer Problem

For specific references:

Reference Links#

V. Deadlock#

A deadlock refers to a situation where multiple processes are in a state of impasse due to competing for resources. If no external force intervenes, the processes in deadlock cannot continue executing.

Causes of Deadlock#

Insufficient system resources.
Improper order of process execution.
Improper resource allocation, etc.

Four Necessary Conditions for Deadlock#

Mutual Exclusion Condition: Processes exclusively use the resources allocated to them.
Hold and Wait Condition: Processes do not release the resources they have acquired while waiting for other resources.
No Preemption Condition: Resources already acquired by a process cannot be forcibly taken away until they are used up.
Circular Wait Condition: There exists a circular wait chain of processes and resources during deadlock.

Deadlock Handling#

Preventing Deadlock: Disrupt one or more of the four necessary conditions for deadlock; this is relatively simple to implement, but overly strict limitations can reduce system resource utilization and throughput.
Avoiding Deadlock: In dynamic resource allocation, prevent the system from entering an unsafe state (a state that may lead to deadlock) - such as the Banker's algorithm. Refer to Banker's Algorithm.
Detecting Deadlock: Allow the system to run and potentially enter deadlock, then use certain algorithms to detect deadlock and identify the resources and processes involved, taking relevant measures to eliminate detected deadlocks. This is difficult to implement.
Resolving Deadlock: In conjunction with deadlock detection, free the system from deadlock (by terminating processes or preempting resources). For processes and resources related to detected deadlocks, release some resources by terminating or suspending them and allocate them to processes in a blocked state, changing their state to ready. This is also difficult to implement.

VI. Scheduling Algorithms#

`First-Come, First-Served (FCFS)`#

Can be used as both job scheduling and process scheduling algorithms; schedules according to the order in which jobs or processes arrive; thus, it is more favorable for long jobs.

`Shortest Job First (SJF)`#

A job scheduling algorithm that selects the job with the shortest estimated time from the ready queue for processing until a result is obtained or it cannot continue executing. Disadvantage: Not favorable for long jobs; does not consider the importance of jobs; running time is estimated and unreliable.

`Highest Priority First (HPF)`#

Can be used as both job scheduling and process scheduling algorithms; when scheduling jobs, it selects the job with the highest priority from the ready queue for processing. Since it involves priority, it can be divided into preemptive and non-preemptive. Additionally, priority determination can be static (a fixed value determined in advance based on process type, resource requirements, user requests, etc.) or dynamic (increasing or decreasing with the progress of the process or waiting time).

`Highest Response Ratio Next (HRRN)`#

FCFS may cause dissatisfaction among short job users, while SJF may cause dissatisfaction among long job users. Therefore, HRRN was proposed, selecting the job with the highest response ratio for execution. Response ratio = 1 + (job waiting time / job processing time).

`Round Robin`#

Processes are placed in a queue based on their arrival order, and the CPU time slice is allocated to the process at the front of the queue. When the time slice is exhausted, a timer interrupts, pauses the current process, and places it at the end of the queue, repeating the cycle.

`Multilevel Feedback Queue`#

Currently recognized as a better scheduling algorithm; it sets multiple ready queues, each with different priorities, with the first queue having the highest priority and the others decreasing in priority. The higher the priority of the queue, the shorter the time slice allocated. When a process arrives, it is placed in the first queue according to FCFS. If it is not completed after scheduling, it is placed at the end of the second queue for waiting. If it is still not completed after the second scheduling, it is placed at the end of the third queue, and so on. Only when the previous queue is empty will the next queue's processes be scheduled.

VII. Memory Management#

Basic Knowledge#

Linking#

Generating loading modules to form a complete logical address.
Three linking methods:

Static Linking: Linking before loading.
Dynamic Linking at Load Time: Linking while loading.
Dynamic Linking at Runtime: Linking and loading only when needed during execution.

Purpose: To determine the complete logical address.

Loading#

Loading into memory to form a physical address.

Three loading methods:

Absolute Loading: Generates absolute addresses at compile time, single job environment.
Static Relocation: Converts logical addresses to physical addresses at load time, multi-job batch processing operating systems.
Dynamic Relocation: Converts logical addresses to physical addresses at runtime, requiring setting relocation registers, modern operating systems.

Purpose: To determine the final physical address.

Fragmentation#

Internal Fragmentation (allocated): Memory areas allocated to a process where some parts are not used.
External Fragmentation (unallocated): Some free partitions in memory are too small to be utilized.

Continuous Allocation Management Methods#

Single Continuous Allocation: Memory is divided into system and user areas, allowing only one user program (no external fragmentation, internal fragmentation exists).
Fixed Partition Allocation: The entire user space is divided into several fixed-size partitions, which can be equal or unequal. Each partition can only hold one job, managed by a partition description table (no external fragmentation, internal fragmentation exists).
Dynamic Partition Allocation: Partitions are dynamically established based on process size, using free partition tables or free partition chains for management. The dynamic partition algorithm selects suitable free partitions for allocation (external fragmentation exists, no internal fragmentation).

Dynamic Partition Allocation Algorithms#

`First Fit Algorithm (FF)`#

When allocating memory, it starts searching from the head of the free partition table until it finds the first free partition that can satisfy the size requirement. This algorithm is simple to implement and prioritizes utilizing low-address free partitions, preserving larger free partitions in higher addresses for later allocation of larger jobs. However, the continuous partitioning of low-address areas leaves many small, unusable free partitions, and each search starts from the low-address area, increasing search overhead.

`Next Fit Algorithm (NF)`#

Unlike the first fit algorithm, when allocating memory space for a process, it starts searching from the next free partition after the last found free partition. Therefore, a search pointer is needed to indicate the starting point for the next search. If the last free partition's size still cannot meet the requirements, it should return to the first free partition. This algorithm makes the internal distribution of free partitions more uniform and appropriately reduces the overhead of searching for free partitions, but it can easily split large free partitions.

`Best Fit Algorithm (BF)`#

Each time memory is allocated for a job, the smallest free partition that meets the requirements is allocated to the process, i.e., finding a free partition with the smallest value. This can avoid "wasting large resources on small jobs," but it can easily produce small fragments.

`Worst Fit Algorithm (WF)`#

In contrast to the best fit algorithm, it always selects the largest free partition that meets the requirements for allocation, i.e., finding a free partition with the largest value. This minimizes the possibility of fragmentation and is beneficial for medium and small jobs.

Non-Continuous Allocation Management Methods#

Basic Paging Storage Management#

Basic paging storage management does not have page replacement functionality (i.e., it does not implement virtual memory functionality), so the entire program's pages must be loaded into memory before it can run. Since program data is stored in different pages, which are dispersed in memory, a page table is needed to record the mapping relationship between logical addresses and actual storage addresses to achieve mapping from page numbers to physical block numbers. Since the page table is also stored in memory, accessing memory data in a paging system requires two memory accesses (one to access the page table in memory to find the specified physical block number, plus the page offset to obtain the actual physical address; the second is to access memory based on the first obtained physical address to retrieve data).

To reduce the efficiency impact caused by two memory accesses, paging management introduces a translation lookaside buffer (TLB) mechanism (similar to caching mechanisms). In memory management with TLB, when accessing memory data, the page number is first queried in the TLB. If found, it indicates that the page table entry to be accessed is in the TLB, and the corresponding physical block number is read directly from the TLB. If not found, the page table in memory is accessed to obtain the physical address, while adding that mapping entry from the page table to the TLB, which may involve TLB replacement algorithms.

Advantages and disadvantages: No external fragmentation, high memory utilization. However, the contents of each page are unrelated, making programming and sharing less convenient.

Supplementary Concepts:

Page Frame = Memory Block = Physical Block = Physical Page (Memory Partition)
Page = Logical Address Space Partition of the Process
Pages and Page Frames correspond one-to-one (Page Table, page table entries consist of "page number" and "block number," with page numbers not occupying memory space).
Page Table Length refers to the total number of page table entries, i.e., the total number of pages.
Page Table Entry Length refers to how much storage space each page table entry occupies.
Page Size refers to how much storage space a page occupies.

Basic Segmented Storage Management#

Programs are divided into segments based on content or procedural relationships (functions), with each segment having its own name. A user job or process containing segments corresponds to a two-dimensional linear virtual space, which is a two-dimensional virtual memory. Segmented management allocates memory in units of segments and then translates segmented virtual addresses into actual physical addresses through an address mapping mechanism.

Process Address Space: Divided into several segments according to the program's logical relationships, each segment has a segment name and is addressed starting from 0.
Logical Address Structure: Segment number and segment offset (offset within the segment).
Memory Allocation Rules: Allocation is done in units of segments, with each segment occupying contiguous space in memory, and segments do not need to be adjacent.
Segment table entries consist of segment number (implicit, does not occupy storage space), segment length, and base address.
The difference from page table address translation is that it also checks whether the segment offset exceeds the segment length recorded in the segment table.

Advantages and disadvantages:

Different types of segments can have different protections.
Segments can be shared in units, including through dynamic linking for code sharing.
Does not produce internal fragmentation but can produce external fragmentation, resulting in lower memory utilization compared to paging.

Differences Between Paging and Segmentation#

Pages are the physical units of information, proposed as a discrete allocation mechanism from the perspective of system memory utilization; segments are logical units of information, each containing a complete set of meaningful information, proposed as a memory management mechanism from the user's perspective.
The size of a page is fixed and determined by the system; the size of a segment is uncertain and determined by the user.
The logical address structure of paging is one-dimensional (offset), while segmentation is two-dimensional (segment name + offset).

Segmented Paging Storage Management#

Processes are divided into logical modules into segments, and each segment is then paged, which does not produce external fragmentation but only a small amount of internal fragmentation.
Logical address structure (two-dimensional): segment number, page number, page offset + page number, block number where the page is stored.

Address transformation: Obtain segment number, page number, and page offset from the logical address --> Compare with the segment length in the segment table register to check for boundary overflow --> Find the corresponding segment table entry based on the segment table starting address and segment number --> Check the page number against the page table length recorded in the segment table to see if it exceeds the boundary --> Query the page table entry based on the page table address and page number --> Obtain the physical address from the block number and page offset --> Access the target unit.

In simple terms, the process involves three memory accesses: checking the segment table --> checking the page table --> accessing the target unit (if a TLB is set up, this can be done in one memory access).

Memory Space Expansion#

Overlay Technique#

The overlay technique includes a fixed area (where program segments do not get loaded or unloaded during execution) and several overlay areas (where program segments need to be loaded and unloaded during execution). This increases the user's programming burden transparently and occurs within the same program or process.

Swapping Technique#

The swapping technique is used when memory is tight, swapping out certain processes to free up memory space, and then swapping in certain processes. The external disk is divided into file areas and swap areas, with swapped-out processes placed in the swap area, occurring between different processes (or jobs).

Virtual Storage Technology (Virtual Memory)#

The principle of virtual memory usage: Locality Principle (similar to the reason for using B-trees as index data structures in MySQL).
1. Temporal Locality: Instructions and data currently accessed will likely be accessed again soon.
2. Spatial Locality: Memory units surrounding the currently accessed memory space are likely to be accessed soon.
3. Cache Technology: Frequently used data is placed in faster storage.

Based on the locality principle, when a program is loaded, only part of the program is loaded into memory, while the rest remains in external storage, allowing the program to start executing. During program execution, when the accessed information is not in memory, the operating system loads the required part into memory and continues executing the program. On the other hand, the operating system swaps out temporarily unused content from memory to external storage, freeing up space for information that will be loaded into memory. In this way, the system appears to provide users with a storage device that is much larger than the actual memory, referred to as virtual storage.

Definition of virtual memory: Programs can run without being fully loaded; dynamic loading occurs during execution, and if memory is insufficient, replacement occurs.
Characteristics of virtual memory:
1. Multiplicity: There is no need to load the entire job into memory at once during execution; it can be divided into multiple loads.
2. Swapping: There is no need for the job to remain resident in memory throughout its execution; it can be swapped in and out during execution.
3. Virtuality: Logically expands memory capacity; users see a memory capacity much larger than the actual capacity.
Implementation of virtual memory technology: Demand Paging Management, Demand Segmented Management, Demand Segmented Paging Management.
Costs of using virtual memory:
1. Management of virtual memory requires establishing many data structures, occupying additional memory.
2. The conversion from virtual addresses to physical addresses increases instruction execution time.
3. Paging in and out requires disk I/O, consuming time.
4. If a page contains only part of the data, memory is wasted.

Page Replacement Algorithms#

`Optimal Page Replacement Algorithm (OPT)`#

An algorithm that only has theoretical significance, used to evaluate other page replacement algorithms. The replacement strategy is to replace the page that will not be accessed for the longest time in the future.

`First-In, First-Out Page Replacement Algorithm (FIFO)`#

The operating system maintains a linked list of all pages currently in memory, with the most recently entered page at the tail and the earliest entered page at the head. When a page fault occurs, the head page is eliminated, and the newly loaded page is appended to the tail. This is a simple and crude replacement algorithm that does not consider page access frequency information.

`Least Recently Used Algorithm (LRU)`#

The algorithm assigns each page an access field to record the time t since the last access. Each time a replacement occurs, the page with the largest t value is replaced (implementation can use registers or stacks).

`Clock Replacement Algorithm (also known as Not Recently Used Algorithm, NRU)`#

Pages are set with an access bit; when a page is accessed, the access bit is set to 1. All pages are stored in a circular queue, with a pointer pointing to the oldest page. When replacing a page, if the current pointer's page access bit is 0, it is replaced; otherwise, it is set to 0, and the process continues until a page with an access bit of 0 is found.

`Improved Clock Algorithm`#

Based on the Clock algorithm, an additional modified bit is added, and replacement is judged based on both the access bit and the modified bit. Pages with both access and modified bits set to 0 are prioritized for replacement, followed by pages with an access bit of 0 and a modified bit of 1.

`Least Frequently Used Algorithm (LFU)`#

Registers are set to record the number of times a page has been accessed, and the page with the least access count is replaced during each replacement. The problem is that the access register does not accurately reflect the current page access count because access speed is fast, so accessing once and accessing 100 times within the update interval of the register are treated the same. Additionally, LFU and LRU are very similar, with similar hardware support, but the key distinction is that one is based on time and the other on count (for example, for registers pa 001111 and pb 111000, if using LRU, pa will be eliminated; if using LFU, pb will be eliminated).

`Page Buffering Algorithm (PBA)`#

During replacement, pages, regardless of whether they have been modified, are not swapped out to disk but are first temporarily held in a linked list of pages in memory (modified and unmodified page lists can be combined or not). When a page is accessed again, it can be retrieved directly from these lists without disk I/O. When the number of modified pages in the list reaches a certain threshold, sequential write operations to disk are performed (effectively merging multiple I/Os into one).

Page Allocation Strategies#

Resident Set: Refers to the collection of physical blocks allocated to a process in demand paging management (i.e., the number of physical blocks allocated to the process by the system).

Fixed Allocation with Local Replacement: The resident set is allocated before the process runs, and when a page fault occurs, only a page from the process can be swapped out.

Variable Allocation with Global Replacement: Whenever a page fault occurs, a new physical block is allocated, which may come from free physical blocks or require swapping out pages from other processes.

Variable Allocation with Local Replacement: Dynamically increases or decreases the number of physical blocks allocated to a process based on the frequency of page faults.

When to Load Pages: Pre-load strategy (before running); demand load strategy (during running).

Where to Load Pages: UNIX method: Pages that are used for the first time are loaded from the file area, and swapped-out pages are written back to the swap area, being loaded from the swap area again when used.

Thrashing: Caused by insufficient physical blocks allocated to a process, where just swapped-out pages are immediately swapped back into main memory, and just swapped-in pages are immediately swapped out. If this is due to a page replacement strategy error, the replacement algorithm can be modified to resolve the issue. If it is due to too many running programs, causing the program to be unable to load all frequently accessed pages into memory simultaneously, the number of concurrent programs should be reduced; otherwise, terminate the process or increase the size of the resident set.

Working Set: Refers to the set of pages that a process actually accesses within a certain time interval (the size of the resident set is generally not less than the size of the working set to prevent thrashing).

Supplement:

Page Fault (or Page Error): When a program references a part of the address space that is not in physical memory, the operating system is responsible for loading the missing part into physical memory and re-executing the failed instruction.
An example problem calculating page faults.
Belady's Anomaly: When using the FIFO algorithm, if a process is not allocated all the pages it requests, there may be an anomaly where increasing the number of allocated pages results in a higher page fault rate.

VIII. File System#

Disk Scheduling Algorithms#

First-Come, First-Served (FCFS)
Shortest Seek Time First (SSTF): Allows the request closest to the current track to start the disk drive, i.e., the job with the shortest seek time is executed first, regardless of the order in which requests arrive, thus overcoming the issue of excessive movement of the disk arm in the FCFS scheduling algorithm.
Scan Algorithm (SCAN) or Elevator Scheduling Algorithm: Always starts from the current position of the disk arm, selecting the request closest to the current disk arm in the direction of movement. If there are no requests in the direction of movement, the disk arm changes direction. In this scheduling method, the movement of the disk arm is similar to elevator scheduling, hence it is also called the elevator scheduling algorithm.
Circular Scan Algorithm (CSCAN): The circular scan scheduling algorithm is an improvement on the scan algorithm. The disk arm moves in a single direction, from the outside to the inside. Starting from the current position, it selects the request closest to the current disk arm in the direction of movement. If there are no requests in that direction, it returns to the outermost position and accesses the job request with the smallest cylinder number.
NStepSCAN Algorithm: The N-step SCAN algorithm divides the disk request queue into several subsequences of length N, processing these sub-queues in order using the FCFS algorithm, while each sub-queue is processed according to the SCAN algorithm. When N is large, the algorithm approaches the SCAN algorithm; when N=1, it degenerates into the FCFS algorithm. The N-step SCAN algorithm can effectively avoid the "disk arm sticking" phenomenon.

IX. I/O Model#

Reference: I/O in Operating Systems and High-Performance I/O Models

I/O Between Devices#

Polling
The CPU actively polls various devices to check their status; if there is data, it performs I/O.
Interrupt
When a device has data, it sends an interrupt, and the CPU decides whether to respond to the interrupt, then interrupts to handle the device's I/O. The CPU does not need to frequently poll the device status; it simply passively receives interrupts.
Direct Memory Access (DMA)
If one byte of data interrupts once, transferring 1KB of data would require 1024 interrupts, which wastes CPU time. Thus, DMA was introduced, where the CPU only needs to provide the starting and ending addresses to DMA, and DMA handles the data I/O in between. This liberates the CPU from byte-level intervention to block-level intervention.
Channel Control
The DMA method can only control one device's data at a time; for multiple blocks of data, the CPU still needs to intervene multiple times. Thus, channels were introduced to control I/O, which are more powerful than DMA, capable of controlling multiple blocks of data and I/O for multiple devices, further liberating the CPU from involvement in the I/O process.

I/O Between User Processes#

Blocking
User threads call certain system functions to retrieve data from the kernel, and the thread remains in a blocking state until the data reaches the kernel cache.
Non-blocking
User threads retrieve data without caring whether the kernel cache has data, returning directly, which may or may not yield data, without putting the thread into a blocking state.
I/O Multiplexing
Multiplexing means a thread manages multiple I/Os; the thread is still blocked during the call, and when one or several I/Os have data, it returns. The thread needs to traverse all I/Os to determine which one has data. For example, the socket's select() function allows the thread to call select() and enter a blocking state; when any I/O has data, the thread exits the blocking state and gets the opportunity to continue executing.
Signal-Driven I/O
Registering a signal and a callback function for an I/O means that once the signal is triggered, the callback function reads the data. For example, registering a "readable" signal for a socket means that when data arrives and becomes readable, the signal is triggered, executing the callback function to copy data from the kernel cache to user space.
Asynchronous I/O
In asynchronous I/O, once the operating system completes the copy of data from the kernel to user space, it notifies the user thread via a signal that it can proceed with the next operation. This eliminates the need for the user thread to block while copying data.

Principles of select, epoll, and kqueue#

select
Select manages multiple sockets, returning when it receives an I/O interrupt from the network card, without knowing which socket fd corresponds to this interrupt. The user thread needs to traverse and determine.
epoll
When epoll receives an I/O interrupt, it looks up which socket fd corresponds to this interrupt. Epoll establishes a red-black tree (a type of balanced binary tree), making lookups very efficient. The user registers interested socket events by inserting the socket fd into the red-black tree, using the interrupt number as the key, which can be understood as a (interrupt number, socket fd) tuple. Removing events involves deleting a node from the tree.

When an I/O interrupt occurs, epoll copies the data from the network card to the kernel cache, looks up the corresponding fd in the red-black tree based on the interrupt number, and adds the fd to the ready list, preparing to return it to the user thread. The user directly obtains the ready fd.

kqueue
When a socket I/O interrupt occurs, it looks up the corresponding socket fd in a hash table and then places it in a linked list to return. When the user registers an interested event, it adds a fd to the hash table.