JVM Series - Garbage Collection in JVM#
Content from:
1. Garbage Collection Determination#
Types of References#
-
Strong Reference
: As long as a strong reference exists, the garbage collector will never reclaim the referenced object. -
Soft Reference
: When the JVM deems memory to be insufficient, it will reclaim the objects pointed to by soft references before throwing an OutOfMemoryError. -
Weak Reference
: As long as the JVM performs garbage collection, regardless of whether memory is sufficient, it will reclaim the objects associated with weak references. -
Phantom Reference
: Also known as ghost reference or phantom reference, it is the weakest type of reference. The existence of a phantom reference does not affect the object's lifetime. It merely provides a mechanism to perform certain actions after the object has been finalized, typically used for post-mortem cleanup mechanisms.
Determining Object Reachability#
If an object is not referenced by any other object or variable, it is considered an invalid object and needs to be reclaimed.
Reference Counting Method#
A counter is maintained in the object header; each time the object is referenced, the counter is incremented by 1; when the reference is invalidated, the counter is decremented by 1. When the counter reaches 0, the object is considered invalid.
The implementation of the reference counting algorithm is simple, and its determination efficiency is high, making it a good algorithm in most cases. However, mainstream Java virtual machines do not use the reference counting algorithm for memory management, primarily because it struggles to resolve the issue of circular references
between objects:
For example, if object objA and objB both have a field instance, setting objA.instance = objB and objB.instance = objA leads to a situation where their reference counts are not 0 due to mutual referencing, preventing the reference counting algorithm from notifying the GC collector to reclaim them.
Reachability Analysis Method#
All objects directly or indirectly associated with GC Roots are valid objects, while objects not associated with GC Roots are invalid objects.
GC Roots refers to:
- Objects referenced in the Java Virtual Machine stack (local variable table in stack frames)
- Objects referenced in the native method stack
- Objects referenced by constants in the method area
- Objects referenced by class static properties in the method area
GC Roots does not include objects referenced by objects in the heap, thus avoiding the issue of circular references.
Reclaiming Invalid Objects in the Heap#
For objects deemed unreachable in reachability analysis, it does not mean they cannot be alive.
Determining if finalize() Needs to Be Executed#
The JVM will determine whether it is necessary to execute the finalize() method for this object. If the object does not override the finalize() method or if the finalize() method has already been called by the virtual machine, it is considered "not necessary to execute." In this case, the object is essentially reclaimed.
If the object is deemed necessary to execute the finalize() method, it will be placed in an F-Queue queue, and the virtual machine will execute these finalize() methods with lower priority, but it will not ensure that all finalize() methods will finish executing. If the finalize() method contains time-consuming operations, the virtual machine will stop pointing to that method and clear the object.
Object Rebirth or Death#
If during the execution of the finalize() method, this
is assigned to some reference, then the object is reborn. If not, it will be cleared by the garbage collector.
Note:
The finalize() method of any object will only be automatically called by the system once. If the object faces another collection, its finalize() method will not be executed again, and attempts to self-rescue in finalize() will fail.
Reclaiming Memory in the Method Area#
The method area stores class information, constants, and static variables with longer lifetimes. Only a small amount of garbage is cleared during each garbage collection. The method area primarily clears two types of garbage:
- Abandoned constants
- Useless classes
Determining Abandoned Constants#
As long as constants in the constant pool are not referenced by any variable or object, these constants will be cleared. For example, if a string "bingo" enters the constant pool, but currently no String object references the "bingo" constant in the constant pool, and there are no other references to this literal, if necessary, the "bingo" constant will be cleaned out of the constant pool.
Determining Useless Classes#
Determining whether a class is a "useless class" has stringent conditions.
- All objects of this class have been cleared.
- The ClassLoader that loaded this class has been reclaimed.
- The java.lang.Class object of this class is not referenced anywhere, and its methods cannot be accessed through reflection.
When a class is loaded into the method area by the virtual machine, an object representing that class is created in the heap. This object is created when the class is loaded into the method area and is cleared when the class is deleted from the method area.
Memory Leak#
In Java, if a long-lived object holds a reference to a short-lived object
, it is very likely to cause a memory leak. For example, in the singleton pattern, we can often consider the lifecycle of this singleton object to be roughly the same as that of the entire program, making it a long-lived object. If this object holds references to other objects, a memory leak may occur.
For more detailed content, please refer to: Detailed Explanation of JAVA Memory Leak (Causes, Examples, and Solutions)
2. Garbage Collection Algorithms#
After learning how to determine invalid objects, useless classes, and abandoned constants, the remaining task is to reclaim this garbage. Common garbage collection algorithms include the following:
Mark-and-Sweep Algorithm#
Determine which data needs to be cleared, mark them, and then clear the marked data.
This method has two disadvantages:
- Efficiency issue: The efficiency of both the marking and clearing processes is not high.
- Space issue: Mark-and-sweep can produce a large amount of non-contiguous memory fragments, and too many fragments may lead to the inability to find enough contiguous memory for allocating larger objects, necessitating an early trigger of another garbage collection action.
Copying Algorithm (Young Generation)#
To solve the efficiency problem, the "copying" collection algorithm was introduced. It divides the available memory into two equal-sized blocks, using only one block at a time. When this block runs out of memory and garbage collection is needed, the surviving objects are copied to the other block, and the first block is completely cleared. This algorithm has its pros and cons:
- Advantages: There is no issue of memory fragmentation.
- Disadvantages: The memory is halved, wasting space.
To solve the space utilization problem, memory can be divided into three blocks: Eden, From Survivor, and To Survivor, with a ratio of 8:1:1, using Eden and one Survivor block at a time. During garbage collection, the surviving objects in Eden and Survivor are copied to the other Survivor space in one go, and then Eden and the recently used Survivor space are cleared. This way, only 10% of the memory is wasted.
However, we cannot guarantee that no more than 10% of the objects will survive during each collection. When the Survivor space is insufficient, it needs to rely on other memory (referring to the old generation) for allocation guarantees.
Allocation Guarantee#
By clearing abandoned data in the old generation to expand the free space in the old generation, allocation guarantees can be provided for the young generation.
Before JDK 6 Update 24:
Before a Minor GC occurs, the virtual machine will first check whether the maximum available contiguous space in the old generation is greater than the total space of all objects in the young generation. If this condition is met, Minor GC can be ensured to be safe; if not, the virtual machine will check whether the HandlePromotionFailure value is set to allow guarantee failures. If so, it will continue to check whether the maximum available contiguous space in the old generation is greater than the average size of objects promoted to the old generation in previous collections. If it is, an attempt will be made to perform a Minor GC, even though this Minor GC is risky; if it is less than that or if HandlePromotionFailure is set to disallow risky operations, a Full GC will be performed.
After JDK 6 Update 24:
As long as the contiguous space in the old generation is greater than the total size of objects in the young generation or the average size of previous promotions, a Minor GC will be performed; otherwise, a Full GC will be performed.
This process is called allocation guarantee.
Mark-and-Compact Algorithm (Old Generation)#
Before reclaiming garbage, first mark the abandoned objects, then move the unmarked objects to one side, and finally clear the other side.
This is a garbage collection algorithm for the old generation. Objects in the old generation generally have longer lifetimes, so each garbage collection will have a large number of surviving objects. If the copying algorithm is used, a large number of surviving objects need to be copied each time, which is inefficient.
Generational Collection Algorithm#
Memory is divided into several blocks based on the different lifetimes of objects. Generally, the Java heap is divided into young generation and old generation, and the most appropriate collection algorithm is used based on the characteristics of each generation.
- Young Generation: Copying Algorithm
- Old Generation: Mark-and-Sweep Algorithm, Mark-and-Compact Algorithm
3. Garbage Collectors#
Young Generation Garbage Collectors#
Serial Garbage Collector (Single-threaded)#
Only one GC thread is opened for garbage collection, and all user threads are stopped during garbage collection (Stop The World).
Generally, client applications require less memory, do not create too many objects, and the heap memory is not large, so the garbage collection time is short. Even if all user threads are stopped during this time, there will not be a noticeable lag. Therefore, the Serial garbage collector is suitable for client use.
Since the Serial collector only uses one GC thread, it avoids the overhead of thread switching, making it simple and efficient.
ParNew Garbage Collector (Multi-threaded)#
ParNew is the multi-threaded version of Serial. Multiple GC threads perform garbage cleanup in parallel. However, the cleanup process still requires Stop The World.
ParNew pursues "low pause time," and the only difference from Serial is that it uses multiple threads for garbage collection, which can improve performance compared to Serial in a multi-CPU environment; however, thread switching incurs additional overhead, so it performs worse than Serial in a single CPU environment.
Parallel Scavenge Garbage Collector (Multi-threaded)#
Parallel Scavenge, like ParNew, is also a multi-threaded young generation garbage collector. However, there are significant differences between the two:
- Parallel Scavenge: Pursues CPU throughput and can complete specified tasks in a short time, making it suitable for non-interactive background calculations.
- ParNew: Aims to reduce user pause time, making it suitable for interactive applications.
Pursuing high throughput can be achieved by reducing the time spent on actual work during GC execution; however, running GC infrequently means that each time GC runs, there will be a lot of work to do because the number of objects accumulated in the heap during this period is high. A single GC will take more time to complete, leading to higher pause times. Conversely, considering low pause times, it is better to run GC frequently to complete it more quickly, which in turn leads to a decrease in throughput.
- Set the percentage of garbage collection time to total CPU time through the parameter
-XX:GCTimeRadio
. - Set the maximum pause time during garbage processing through the parameter
-XX:MaxGCPauseMillis
. - Enable adaptive policy through the command
-XX:+UseAdaptiveSizePolicy
. As long as we set the heap size and MaxGCPauseMillis or GCTimeRadio, the collector will automatically adjust the size of the young generation, the ratio of Eden and Survivor, and the age at which objects enter the old generation to get as close as possible to our set MaxGCPauseMillis or GCTimeRadio.
Old Generation Garbage Collectors#
Serial Old Garbage Collector (Single-threaded)#
The Serial Old collector is the old generation version of Serial, both being single-threaded collectors that only enable one GC thread, suitable for client applications. Their only difference is that Serial Old works in the old generation using the "mark-compact" algorithm, while Serial works in the young generation using the "copying" algorithm.
Parallel Old Garbage Collector (Multi-threaded)#
The Parallel Old collector is the old generation version of Parallel Scavenge, pursuing CPU throughput.
CMS Garbage Collector#
The CMS (Concurrent Mark Sweep) collector is designed to achieve the shortest collection pause time (pursuing low pause). It allows user threads and GC threads to execute concurrently during garbage collection, so users do not experience noticeable lag during garbage collection.
The CMS collector implements a "mark-and-sweep" algorithm and consists of four steps:
Initial Mark
: Stop The World, using only one initial marking thread to mark all objects directly associated with GC Roots.Concurrent Marking
: Using multiple marking threads to execute concurrently with user threads. This process performs reachability analysis and marks all abandoned objects. It is slow.Re-mark
: Stop The World, using multiple marking threads to execute concurrently, marking newly abandoned objects identified during the concurrent marking process.Concurrent Cleanup
: Using only one GC thread to execute concurrently with user threads, clearing the previously marked objects. This process is very time-consuming.
The concurrent marking and concurrent cleanup processes take the longest time and can work alongside user threads, so overall, the memory reclamation process of the CMS collector is executed concurrently with user threads.
Disadvantages
- Sensitive to CPU resources, low throughput;
- Cannot handle floating garbage, leading to frequent Full GC;
- The recycling algorithm it uses - the "mark-and-sweep" algorithm can lead to a large amount of space fragmentation at the end of the collection.
G1 Garbage Collector#
G1 (Garbage-First) is a garbage collector aimed at server applications. It does not have the concept of young and old generations but divides the heap into independent Regions. The G1 collector maintains a priority list in the background and, when performing garbage collection, first estimates the amount of garbage in each Region, then prioritizes the Regions with the highest reclamation value based on the allowed collection time. This method of dividing memory space into Regions and prioritizing reclamation ensures that the G1 collector can achieve as high a collection efficiency as possible within a limited time (breaking memory into pieces).
Overall, the G1 collector is based on the "mark-and-compact" algorithm, while locally (between two Regions), it is based on the "copying" algorithm, meaning that no memory fragmentation occurs during operation.
Each Region has a Remembered Set, which records the regions of all objects referenced within that region. During reachability analysis, simply adding the Remembered Set to GC Roots prevents the need to traverse the entire heap memory. Therefore, even if an object and the objects it references may not be in the same Region, when garbage collection occurs, it does not need to scan the entire heap memory to perform a complete reachability analysis.
If we do not consider the maintenance of the Remembered Set, the work process of the G1 collector can be divided into the following steps:
Initial Mark
: Stop The World, using only one initial marking thread to mark all objects directly associated with GC Roots.Concurrent Marking
: Using one marking thread to execute concurrently with user threads. This process performs reachability analysis and is slow.Final Mark
: Stop The World, using multiple marking threads to execute concurrently.Filtering Reclamation
: Reclaiming abandoned objects, which also requires Stop The World and uses multiple filtering reclamation threads to execute concurrently.
Additional Expansion#
Situations That Trigger JVM to Perform Full GC#
Calling the System.gc() Method#
The invocation of this method is a suggestion for the JVM to perform Full GC; note that this is just a suggestion and not guaranteed
, but in many cases, it will trigger Full GC, thereby increasing the frequency of Full GC. Generally, we only need to let the virtual machine manage memory by itself; we can disable the call to System.gc() through -XX:+DisableExplicitGC
.
Insufficient Old Generation Space#
Insufficient space in the old generation will trigger Full GC operations. If space remains insufficient after this operation, the following error will be thrown: java.lang.OutOfMemoryError: Java heap space
.
Insufficient Permanent Generation Space#
In the JVM specification, the method area in the runtime data area is also known as the permanent generation (Permanent Generation) in the HotSpot virtual machine, storing class information, constants, static variables, etc. When the system needs to load many classes, reflect classes, and call methods, the permanent generation may become full, triggering Full GC. If Full GC still cannot reclaim space, the JVM will throw the following error: java.lang.OutOfMemoryError: PermGen space
.
Others#
-
Promotion failed and concurrent mode failure during CMS GC.
Promotion failed refers to the allocation guarantee failure mentioned above, while concurrent mode failure occurs when objects need to be placed in the old generation during CMS GC, but there is insufficient space in the old generation. -
The average size of Minor GC promotions to the old generation exceeds the remaining space in the old generation.