TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
These articles are AI-generated summaries. Please check the original sources for full details.
TornadoVM 2.0: Heterogeneous Hardware Runtime for Java
The TornadoVM project has released version 2.0, an open-source runtime designed to automatically accelerate Java programs on CPUs, GPUs, and FPGAs. This release is especially relevant for developers building Large Language Model (LLM) solutions on the Java Virtual Machine (JVM).
While existing JVMs excel at portability and safety, they often struggle to fully utilize the potential of heterogeneous hardware. TornadoVM bridges this gap by offloading Java code to accelerators, managing memory transfers, and executing compute kernels, enabling significant performance gains for suitable workloads and reducing the cost of compute-intensive tasks.
Key Insights
- Runtime Compilation: TornadoVM acts as a Just-In-Time (JIT) compiler, translating Java bytecode to OpenCL C, NVIDIA CUDA PTX, or SPIR-V binary.
- Parallelism Models: Offers both a simple Loop Parallel API using annotations (@Parallel, @Reduce) and a more explicit Kernel API for GPU-style programming.
- LLM Inference Library: Includes GPULlama3.java, a pure Java library for LLM inference on GPUs, removing external dependencies and simplifying setup.
Working Example
public static void vectorMul(FloatArray a, FloatArray b, FloatArray result) {
for (@Parallel int i = 0; i < result.getSize(); i++) {
result.set(i, a.get(i) * b.get(i));
}
}
var taskGraph = new TaskGraph("multiply")
.transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)
.task("vectorMul", Example::vectorMul, a, b, result)
.transferToHost(DataTransferMode.EVERY_EXECUTION, result);
var snapshot = taskGraph.snapshot();
new TornadoExecutionPlan(snapshot).execute();
Practical Applications
- LLM Inference: GPULlama3.java enables running LLMs like Llama 3 and Qwen3 directly within Java applications on GPUs.
- Pitfall: Workloads without loop dependencies may not benefit from TornadoVM’s acceleration; careful analysis of code structure is required.
References:
Continue reading
Next article
Amazon Exposes Years-Long GRU Cyber Campaign Targeting Energy and Cloud Infrastructure
Related Content
Jlama: Running LLMs Locally in Java
Jlama 0.8.4 enables local LLM inference in Java, eliminating reliance on external APIs and offering greater control.
JUnit 6.0.0 Released with Java 17 Baseline, Kotlin Suspend Support, and Enhanced Features
JUnit 6.0.0 introduces significant improvements including Java 17 baseline, native Kotlin suspend test support, a new CancellationToken API for fail-fast execution, built-in Java Flight Recorder (JFR) listeners, and upgraded CSV parsing with FastCSV. The deprecated JUnit 4 runner (junit-platform-runner) is removed, with Vintage remaining as a temporary bridge.
InfoQ Java Trends Report 2025
The InfoQ Java Trends Report 2025 highlights the acceleration of AI on the JVM, with new frameworks like Embabel and Koog driving adoption.