Yige

Yige

Build

JVM Series - Class Loader Mechanism

JVM Series - Class Loader Mechanism#

Content organized from:

  1. JVM Series (Five) - Detailed Explanation of JVM Class Loading Mechanism
  2. In-depth Understanding of JVM Class Loading Mechanism

I. Brief Introduction#

The virtual machine loads class data from the class file into memory, verifies, transforms, parses, and initializes the data, ultimately forming a Java type that can be directly used by the virtual machine. This is the class loading mechanism of the virtual machine.

II. Class Loading Process and Lifecycle#

image.png

The class loading process is divided into three steps (five stages): Loading -> Linking (Verification, Preparation, Resolution) -> Initialization

Loading#

Description of the loading process:

  • Locate the .class file using the fully qualified name of the class and obtain its binary byte stream.
  • Convert the static storage structure represented by the byte stream into the runtime data structure of the method area.
  • Generate a java.lang.Class object for this class in the Java heap, serving as the access entry for these data in the method area.

Linking#

Linking: includes three steps: verification, preparation, and resolution.

Verification#

Verification is the first step of the linking phase, used to ensure that the information in the Class byte stream meets the requirements of the virtual machine.

Specific forms of verification:

  • File format verification: Verifies whether the byte stream conforms to the Class file format specification; for example: whether it starts with 0xCAFEBABE, whether the major and minor version numbers are within the processing range of the current virtual machine, and whether the constants in the constant pool have unsupported types.
  • Metadata verification: Performs semantic analysis on the information described by the bytecode (note: compare with the semantic analysis during the javac compilation phase) to ensure that the described information meets the requirements of the Java language specification; for example: whether this class has a superclass, other than java.lang.Object.
  • Bytecode verification: Determines whether the program semantics are legal and logical through data flow and control flow analysis.
  • Symbolic reference verification: Ensures that the resolution action can be executed correctly.

Preparation#

Allocate memory for the class's static variables and initialize them to default values. The preparation process typically allocates a structure to store class information, which includes member variables, methods, and interface information defined in the class.

Specific behaviors:

  • At this time, memory allocation only includes class variables (static), and does not include instance variables, which will be allocated in the Java heap along with the object during instantiation.
  • The initial values set here are usually the default zero values of the data types (such as 0, 0L, null, false, etc.), rather than values explicitly assigned in Java code (with the exception of explicitly assigned constants).

Resolution#

Resolution: Converts the symbolic references in the class to direct references in the constant pool.

Symbolic References: Symbolic references describe the referenced target with a set of symbols, which can be any form of literal as long as it can uniquely locate the target. Symbolic references are independent of memory layout, so the referenced object does not necessarily need to be loaded into memory. Various virtual machine implementations may have different memory layouts, but the accepted symbolic references must be consistent, as their literal forms are clearly defined in the Class file format.

Direct References: Direct references are pointers, relative offsets, or handles that can indirectly locate the target. Direct references are related to the memory layout of the virtual machine, and the same symbolic reference generally does not translate to the same direct reference across different virtual machines. If a direct reference exists, then it must already be in memory.

Types of constants in the constant pool:

  • The number of constants in the constant pool is not fixed, so a u2 type unsigned number is placed at the beginning of the constant pool to store the current capacity of the constant pool.
  • Each constant in the constant pool is a table, with the first position of the table being a u1 type flag (tag), representing the type of the current constant.
TypeTagDescription
CONSTANT_utf8_info1UTF-8 encoded string
CONSTANT_Integer_info3Integer literal
CONSTANT_Float_info4Float literal
CONSTANT_Long_info5Long literal
CONSTANT_Double_info6Double literal
CONSTANT_Class_info7Symbolic reference to a class or interface
CONSTANT_String_info8String literal
CONSTANT_Fieldref_info9Symbolic reference to a field
CONSTANT_Methodref_info10Symbolic reference to a method in a class
CONSTANT_InterfaceMethodref_info11Symbolic reference to a method in an interface
CONSTANT_NameAndType_info12Symbolic reference to a field or method
CONSTANT_MethodHandle_info15Represents a method handle
CONSTANT_MethodType_info16Identifies method type
CONSTANT_InvokeDynamic_info18Represents a dynamic method call site

The resolution action mainly targets seven types of symbolic references: class or interface, field, class method, interface method, method type, method handle, and call site qualifier.

Initialization#

Initialization: Assigns correct initial values to class static variables.

Goals of Initialization#

  • Achieve initialization of the initial values specified when declaring class static variables;
  • Achieve initialization of initial values set using static code blocks.

Steps of Initialization#

  • If this class has not been loaded or linked, first load and link this class;
  • If the direct superclass of this class has not been initialized, first initialize its direct superclass;
  • If there are initialization statements in the class, execute them in order.

Timing of Initialization#

image.png

In situation 1, the four bytecode instructions are most commonly seen in Java when:

  • Creating a new object
  • Setting or getting a static field of a class (excluding static fields marked as final that are placed in the constant pool)
  • Calling a static method of a class

Order of Initialization between Parent and Child Classes in Java#

  1. Static member variables and static code blocks in the parent class
  2. Static member variables and static code blocks in the child class
  3. Ordinary member variables and code blocks in the parent class, constructor of the parent class
  4. Ordinary member variables and code blocks in the child class, constructor of the child class

Active and Passive References of Classes#

In the Java Virtual Machine specification, it is strictly stipulated that only active references to a class will trigger its initialization method. Other forms of references are called passive references and will not trigger the class's initialization method.

Active Reference
Active reference: During the class loading phase, only loading and linking operations are performed, and no initialization operations are executed.

Passive Reference
Any reference that is not an active reference is called a passive reference, and these references will not trigger initialization.
Forms of passive references include:

  1. Referencing a static field of a parent class through a subclass does not lead to the initialization of the subclass;
  2. Defining an array reference of a class without assigning a value does not trigger the initialization of this class;
  3. Accessing constants defined in a class does not trigger the initialization of this class.

III. Three Types of Class Loaders#

image.png

  • The Bootstrap Classloader is initialized after the Java Virtual Machine starts.
  • The Bootstrap Classloader is responsible for loading the ExtClassLoader and setting the parent loader of ExtClassLoader to the Bootstrap Classloader.
  • After the Bootstrap Classloader loads the ExtClassLoader, it will load the AppClassLoader and set the parent loader of AppClassLoader to ExtClassLoader.

Bootstrap ClassLoader#

Bootstrap Classloader: Responsible for loading libraries stored in JDK\jre\lib (where JDK represents the installation directory of JDK, and so on), or those specified by the -Xbootclasspath parameter, which can be recognized by the virtual machine (such as rt.jar, all classes starting with java. are loaded by the Bootstrap ClassLoader). The Bootstrap Classloader cannot be directly referenced by Java programs.

Extension ClassLoader#

Extension Classloader: This loader is implemented by sun.misc.Launcher$ExtClassLoader, responsible for loading all libraries in the JDK\jre\lib\ext directory or those specified by the java.ext.dirs system variable (such as classes starting with javax.). Developers can directly use the extension class loader.

Application ClassLoader#

Application Classloader: This class loader is implemented by sun.misc.Launcher$AppClassLoader, responsible for loading classes specified by the user class path (classes under the program's own classpath). Developers can directly use this class loader, and if the application does not define its own class loader, this is generally the default class loader in the program.

Class Loader Isolation Issues#

Each class loader has its own namespace to store loaded classes. When a class loader loads a class, it searches for the class using the fully qualified name stored in its namespace to check if the class has already been loaded.

The unique identification of classes in JVM and Dalvik is ClassLoader id + PackageName + ClassName, so it is possible for two classes with the same package name and class name to exist in a running program. Moreover, if these two classes are not loaded by the same ClassLoader, it is impossible to cast an instance of one class to another. This is the isolation of ClassLoader.

To solve the isolation problem of class loaders, the JVM introduces the parent delegation model.

IV. Parent Delegation Model#

Core idea: First, check whether the class has been loaded from the bottom up; second, attempt to load the class from the top down.

The workflow of the parent delegation model is: If a class loader receives a class loading request, it will first delegate the request to its parent loader to complete, and so on. Therefore, all class loading requests should ultimately be passed to the top-level Bootstrap Classloader. Only when the parent loader cannot find the required class within its search range and cannot complete the loading, will the child loader attempt to load the class itself.

Specific Loading Process#

  • When the AppClassLoader loads a class, it will first delegate the class loading request to its parent loader, ExtClassLoader, to complete.
  • When the ExtClassLoader loads a class, it will also first delegate the class loading request to the Bootstrap Classloader to complete.
  • If the Bootstrap Classloader fails to load (for example, if the class is not found in %JAVA_HOME%/jre/lib), it will use the ExtClassLoader to attempt loading;
  • If the ExtClassLoader also fails, it will use the AppClassLoader to load, and if the AppClassLoader also fails, a ClassNotFoundException will be thrown.

Significance of the Parent Delegation Model#

  • Prevents multiple copies of the same bytecode from appearing in memory, creating a hierarchy for classes.
  • Ensures the safe and stable operation of Java programs.

Take java.lang.Object as an example; when you load it, it ultimately goes through layers of delegation to be loaded by the Bootstrap ClassLoader, which means it is ultimately found in <JAVA_HOME>\lib in rt.jar, loading java.lang.Object into the JVM. Thus, if someone maliciously creates their own java.lang.Object with bad code, following the parent delegation model for class loading will ensure that only the contents from our rt.jar are loaded into the JVM, thereby protecting these core foundational class codes.

Expansion
Why does Java SPI break the parent delegation model?

V. Class Loading Methods#

  • Initialized and loaded by the JVM when starting the application from the command line.
  • Dynamically loaded using the Class.forName() method.
  • Dynamically loaded using the ClassLoader.loadClass() method.

Class.forName() and ClassLoader.loadClass()

  • Class.forName(): Loads the .class file into the JVM and executes the static code blocks in the class while interpreting the class;
  • ClassLoader.loadClass(): Only loads the .class file into the JVM without executing the contents of the static code blocks; it will only execute during newInstance.

VI. Custom Loaders#

Applications are loaded by the cooperation of these three types of class loaders, and if necessary, we can add custom class loaders. Since the built-in ClassLoader of the JVM only knows how to load standard Java class files from the local file system, writing your own ClassLoader allows for the following:

  • Automatically verify digital signatures before executing untrusted code.
  • Dynamically create customized classes that meet specific user needs.
  • Retrieve Java classes from specific locations, such as databases and networks.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.