My notes from the insightful talk “JVM Anatomy 101” by Nikita Lipsky. The talk provides a deep dive into the internals of Java Virtual Machine (JVM), covering everything from bytecode and class loading to memory management and garbage collection. You can watch the original talk here.

Java class file and bytecode

Class file

  1. Version
  2. Constant Pool
  3. Class name, modifiers
  4. Superclass, superinterfaces
  5. Fields
  6. Methods
  7. Attributes
  • Fields, methods may have attributes (e.g., values of constant fields)
  • The main attributes of a method is its code: Java bytecode

Java bytecode

  1. Instruction array
  2. Operand stack
  3. Local variables array (method arguments, local variables)

In the JVM specification each instruction is strictly defined. Two different JVMs that obey JVM specification have no chance to execute the same bytecode differently.

Java Runtime

It is not enough to have a JVM only to run a Java program. Java Runtime is needed for it:

  • JVM
  • Platform classes
    • core classes (Object, String, etc)
    • Java standard APIs (IO, NET, NIO, AWT/Swing, etc.)
  • Implementation of native methods of platform classes (OS-specific, distributed as native dynamic libraries — .dll, .so, .dylib)
  • Auxiliary files (time zones, locales descriptions, media resources, etc.)

Classloading engine

Classloading

JVM executes classes from the following sources

  • Java Runtime (platform classes)
  • Application classpath
  • Autogenerated on-the-fly (Proxy, Reflection accessors, invoke dynamic implementation)
  • Provided by the application itself

Every class is loaded by a class loader:

  • Platform classes are loaded by the bootstrap class loader
  • Classes from application are loaded by the system class loader (AppClassLoader)
  • Application classes may create user-defined class loaders

JVM can load two different classes with the same name provided that they’re loaded with different class loaders

  • A class loader forms a unique class names space

Class loading process

  • Class file parsing: class format is checked (may throw ClassFormatError)
  • Creation of a runtime representation of the class in a special JVM memory area: runtime constant pool in Method Area aka Meta Space aka Permanent Generation
  • Loading of a superclass and superinterfaces

Linking

  • Java bytecode verification
  • Preparation
  • Resolution of symbolic reference

Java bytecode verification

  • Performed once for a class
  • Instructions correctness checks
  • Operand stack and local variables out of bounds checks
  • Type assign compatibility checks

Class initialization

Before any method of a class can be executed, class initialization should happen, which is a call of a static initializer of a class

class MyClass {
	static int i = 10; 
	static String s = "Hello"; 
	static {
		System.out.println(s)
	}
}
  • Happens on first use
    • new
    • static field access
    • static method call
  • Provokes initialization of a super class and super interfaces with default methods

Execution engine: interpreters, JIT, AOT

JVM may execute bytecode via

  • Interpretation
  • Translation into native code, that will run directly on CPU

Interpreter (Simple)

pc = 0 
	do {
		fetch opcode at pc;
		if (operands) fetch operands;
		execute the opcode;
		calculate pc;
	} while (there is more); 

Template interpreter

  • Every bytecode instruction is implemented as a sequence of target CPU instructions (template)
  • Instruction interpretation is just jump to a corresponding template

Compilers

  • Non-optimizing
    • make it up as I go along
  • Simple Optimizing
    • HotSpot Client (C1)
  • Sophisticated Optimizing
    • HotSpot Server (C2)
  • Dynamic (Just-In-Time - JIT)
    • Translation into native code happens at application runtime
  • Static (Ahead-Of-Time - AOT)
    • Translation happens before program execution
Dynamic Compilers (JIT)
  • Work concurrently with program execution
  • Compiler hot code only (determined by profiling)
  • Profiling information is used for optimal optimizations
Static Compilers (AOT)
  • Are not limited in resources for optimizations
  • Compile every method of a program using the most aggressive optimizations
  • No overheads at run-time (fast startup)

Meta information access subsystem: reflection, indy, JNI

Reflection

  • Allows access to classes, fields, methods via name (by string literal) from a Java program
  • Is implemented in the JVM via access to Meta Space
  • Key feature of Java for many popular frameworks and JVM-based programming languages implementations (Groovy, Clojure, Ruby, etc.)

Method Handles and invokedynamic (JSR-292, indy)

To allow dynamic languages to be executed efficiently on JVM, a new instruction called invokedynamic was added to JVM instruction set

Java Native Interface (JNI)

  • Binds the JVM with the outside world (OS)
  • C interface of the JVM
    • Does not depend on implementation details of a JVM
    • Is used for implementation of native methods in C language (or another low-level language)
    • JNI is used to implement platform specific parts of Java SE API: IO, NET, AWT
  • JNI is implemented in the JVM as an access to Meta Space

Project Panama

  • C interop without coding in C:
    • Direct external C functions call from Java
    • C data structures access from Java

Threading, exception handling, synchronization

Threads

  • Java thread is mapped to a native thread in a 1-1
  • Each thread has a reserved region of memory referred to as its stack containing local variables and operand stacks of methods (method frames) being executed within a thread
    • Thread stack size is a JVM argument: -Xss
  • Java thread has a detailed information about its stack (stack trace)
    • You may print or examine stack from a Java program

Project Loom

Problem: there are limitations on how many native threads can be created (native threads are expensive)

Solution: virtual threads (light-weight threads) managed by the JVM (quitting 1-1 scheme)

Exception handling

Threads and Java Memory Model

JVM knows everything about the call stack, it helps it in exception handling implementation.

Synchronization

  • For safe access to a shared memory between threads
  • Naive implementation may use OS synchronization primitives
    • Java object has an OS monitor as a hidden field
  • Highly optimized when a resource contention happens less rarely than an enter to a synchronized block
  • Today, it is recommended to use java.util.concurrent primitives instead of built-in synchronization

Memory management: heap, allocation, GC

Memory allocation

  • Implementation of the new operator

  • Objects allocated with the new operator reside in so called Java heap

  • Java heap structure is JVM implementation specific

  • Java objects layout is JVM implementation specific as well

  • Must be fast

    • JVM queries OS for memory for many objects at once, not for one
    • Allocation by bump pointer technic
  • Must be thread-safe but parallel (non-blocking)

    • Thread local heaps: every thread consumes thread local memory region

Java Object Layout

The layout is not specified by the JVM, however it requires:

  • Java Object header
    • Reference to a class object
    • Monitor (lock word)
    • Identity hashcode
    • GC flags
  • Fields
    • May be reordered for the sake of size optimization, alignment, or target architecture specifics

Project Valhala

  • The main idea is to introduce the concept of objects in the JVM, which do not require a header at all
    • Object erasure to its primitive types data
    • Removing unnecessary indirection in arrays

Garbage collection

Garbage is objects that cannot be used by a program.

Question: What objects can be used by a program?

Not Garbage
  1. Objects in static fields of classes
  2. Objects that are accessible from all method frames (local variables and operand stack)
  3. Objects referenced by “not garbage”
GC roots
  1. Objects in static fields of classes
  2. Objects that are accessible from threads stacks
  3. Objects that are referenced by JNI references in native methods

Not garbage aka live objects:

  1. Objects from GC roots
  2. Objects that are referenced by live objects

Everything else is garbage.

Tracing garbage collectors
  • Mark-and-sweep
    • Marks live objects, sweeps (frees) the garbage
  • Stop-and-copy
    • Heap is divided into two semi-spaces
    • Copies live objects to the second semi-space
    • First semi-space is used as a second semi-space on the next collection
Stop the World
  • Live objects are defined for a specific moment of a program execution: the set of live objects is being changed during the execution
  • To collect the garbage, all threads should be paused to determine the garbage (STW pause)

One of the main tasks of modern garbage collectors is to reduce the STW pause. Methods to reduce the STW pause:

  • Incremental
    • Do not collect all the garbage within GC pause
  • Parallel
    • Collect the garbage in parallel threads within GC pause
  • Concurrent
    • Collect the garbage concurrently with program execution
Generation Garbage Collection

Weak Generation Hypothesis

  • Most objects die young
  • Old objects rarely reference young objects

Generation GC:

  • Particular case of incremental GC
  • During minor collection cycles, only young objects are traced
  • Objects that survived several cycles are moved to old generation
Thread Local Garbage Collection

Thread Local Hypothesis

  • Most objects die in a thread that created them

Thread local GC:

  • Collects thread local garbage within a respective thread, not pausing other threads