Kernels and Function Inlining for Particle Simulation¶
Introduction¶
Kernels are the fundamental building blocks of parallel computation in Warp. For particula, kernels enable massively parallel computation of aerosol dynamics where each particle can be processed independently.
Kernel Basics¶
Defining a Kernel¶
Kernels are Python functions decorated with @wp.kernel:
import warp as wp
PI = 3.141592653589793 # matches math.pi exactly — never truncate
@wp.kernel
def compute_particle_masses(
diameters: wp.array(dtype=float),
densities: wp.array(dtype=float),
masses: wp.array(dtype=float),
):
"""Compute particle mass from diameter and density."""
tid = wp.tid()
# Volume of sphere: (4/3) * pi * r^3
radius = diameters[tid] * 0.5
volume = (4.0 / 3.0) * PI * radius * radius * radius
masses[tid] = densities[tid] * volume
Key Characteristics¶
- Thread Index: Use
wp.tid()to get the current particle's index - Type Annotations: All parameters must have Warp type annotations
- Parallel Execution: Each particle is processed independently
- No Return Values: Kernels write results to output arrays
Launching Kernels¶
# Create particle arrays
n_particles = 10000
diameters = wp.array(np.ones(n_particles) * 1e-6, dtype=float) # 1 micron
densities = wp.array(np.ones(n_particles) * 1000.0, dtype=float) # kg/m³
masses = wp.zeros(n_particles, dtype=float)
# Launch kernel - one thread per particle
wp.launch(compute_particle_masses, dim=n_particles, inputs=[diameters, densities, masses])
Function Inlining¶
Warp Functions¶
Functions decorated with @wp.func can be called from kernels and are automatically inlined during compilation for optimal performance.
@wp.func
def knudsen_number(diameter: float, mean_free_path: float) -> float:
"""Calculate Knudsen number for a particle."""
return 2.0 * mean_free_path / diameter
@wp.func
def cunningham_correction(knudsen: float) -> float:
"""Cunningham slip correction factor."""
# Cc = 1 + Kn * (A1 + A2 * exp(-A3/Kn))
A1 = 1.257
A2 = 0.400
A3 = 0.55
return 1.0 + knudsen * (A1 + A2 * wp.exp(-A3 / knudsen))
@wp.kernel
def compute_slip_corrections(
diameters: wp.array(dtype=float),
mean_free_path: float,
corrections: wp.array(dtype=float),
):
"""Compute Cunningham slip correction for each particle."""
tid = wp.tid()
kn = knudsen_number(diameters[tid], mean_free_path)
corrections[tid] = cunningham_correction(kn)
Benefits of Inlining¶
- Zero Overhead: Function calls are eliminated at compile time
- Optimization: Compiler can optimize across function boundaries
- Code Reuse: Write modular physics functions without performance penalty
- Type Safety: Full type checking at compile time
Advanced Kernel Patterns¶
Multi-Dimensional Kernels¶
For particle-particle interactions like coagulation:
@wp.kernel
def compute_coagulation_kernel_matrix(
diameters: wp.array(dtype=float),
kernel_matrix: wp.array2d(dtype=float),
temperature: float,
viscosity: float,
k_boltzmann: float, # from particula.util.constants.BOLTZMANN_CONSTANT
):
"""Compute coagulation kernel K(i,j) for all particle pairs."""
i, j = wp.tid()
d_i = diameters[i]
d_j = diameters[j]
# Brownian coagulation kernel
coag_kernel = (2.0 * k_boltzmann * temperature / (3.0 * viscosity)) * \
(d_i + d_j) * (1.0/d_i + 1.0/d_j)
kernel_matrix[i, j] = coag_kernel
# Launch with constants from particula.util.constants
from particula.util.constants import BOLTZMANN_CONSTANT
wp.launch(compute_coagulation_kernel_matrix,
dim=(n_particles, n_particles),
inputs=[diameters, kernel_matrix, temperature, viscosity,
float(BOLTZMANN_CONSTANT)])
Conditional Logic¶
@wp.kernel
def classify_particles(
diameters: wp.array(dtype=float),
classifications: wp.array(dtype=int),
):
"""Classify particles by size regime."""
tid = wp.tid()
d = diameters[tid]
# PM2.5: < 2.5 microns, PM10: < 10 microns
if d < 2.5e-6:
classifications[tid] = 0 # PM2.5 (fine)
elif d < 10.0e-6:
classifications[tid] = 1 # PM10 (coarse)
else:
classifications[tid] = 2 # Large particles
Atomic Operations¶
For operations that need synchronization across particles (e.g., total mass):
@wp.kernel
def compute_total_mass(
masses: wp.array(dtype=float),
concentrations: wp.array(dtype=float),
total_mass: wp.array(dtype=float),
):
"""Compute total aerosol mass concentration."""
tid = wp.tid()
mass_conc = masses[tid] * concentrations[tid]
wp.atomic_add(total_mass, 0, mass_conc)
Performance Considerations¶
Best Practices¶
- Minimize Memory Access: Reuse loaded particle properties
- Avoid Divergence: Keep conditional branches minimal
- Use Local Variables: Store frequently accessed data in registers
- Coalesce Memory Access: Access particle arrays in contiguous patterns
Example: Optimized Condensation Kernel¶
Physical constants are passed as kernel parameters from
particula.util.constants — never hardcoded in kernel code. This ensures
exact parity with the Python/NumPy implementations and a single source of
truth for all constants.
@wp.func
def saturation_vapor_pressure(
temperature: float,
latent_heat: float, # passed from Python side
gas_constant_vapor: float, # passed from Python side
ref_temperature: float, # passed from Python side
ref_pressure: float, # passed from Python side
) -> float:
"""Clausius-Clapeyron equation for water vapor.
Must match the Python equivalent exactly — no approximations.
"""
return ref_pressure * wp.exp(
(latent_heat / gas_constant_vapor)
* (1.0 / ref_temperature - 1.0 / temperature)
)
@wp.func
def kelvin_effect(
diameter: float,
surface_tension: float,
molar_mass: float,
density: float,
temperature: float,
gas_constant: float, # from particula.util.constants.GAS_CONSTANT
) -> float:
"""Kelvin effect correction for small droplets."""
exponent = (4.0 * surface_tension * molar_mass) / \
(gas_constant * temperature * density * diameter)
return wp.exp(exponent)
@wp.kernel
def compute_condensation_rates(
diameters: wp.array(dtype=float),
temperatures: wp.array(dtype=float),
vapor_pressures: wp.array(dtype=float),
growth_rates: wp.array(dtype=float),
surface_tension: float,
molar_mass: float,
density: float,
diffusion_coeff: float,
gas_constant: float, # from particula.util.constants.GAS_CONSTANT
latent_heat: float,
gas_constant_vapor: float,
ref_temperature: float,
ref_pressure: float,
):
"""Compute condensation growth rate for each particle.
All physical constants passed as parameters — see launch call below.
"""
tid = wp.tid()
# Load particle properties once
d = diameters[tid]
T = temperatures[tid]
p_vapor = vapor_pressures[tid]
# Compute equilibrium vapor pressure with Kelvin effect
p_sat = saturation_vapor_pressure(
T, latent_heat, gas_constant_vapor, ref_temperature, ref_pressure
)
kelvin = kelvin_effect(d, surface_tension, molar_mass, density, T,
gas_constant)
p_eq = p_sat * kelvin
# Supersaturation
S = p_vapor / p_eq - 1.0
# Growth rate (Maxwell equation)
growth_rates[tid] = (2.0 * diffusion_coeff * molar_mass * p_eq * S) / \
(density * gas_constant * T * d)
# Launch with constants from particula.util.constants
from particula.util.constants import GAS_CONSTANT
wp.launch(
compute_condensation_rates,
dim=n_particles,
inputs=[diameters, temperatures, vapor_pressures, growth_rates,
surface_tension, molar_mass, density, diffusion_coeff,
float(GAS_CONSTANT), latent_heat, gas_constant_vapor,
ref_temperature, ref_pressure],
)
Debugging Kernels¶
Print Statements¶
@wp.kernel
def debug_particle_kernel(
diameters: wp.array(dtype=float),
masses: wp.array(dtype=float),
):
tid = wp.tid()
if tid == 0: # Print only from first thread
wp.printf("First particle: d=%e m, m=%e kg\n",
diameters[0], masses[0])
Bounds Checking¶
Warp automatically checks array bounds in debug mode:
wp.config.verify_cuda = True # Enable bounds checking
Common Patterns for Aerosol Physics¶
Particle Size Distribution Moments¶
@wp.kernel
def compute_moments(
diameters: wp.array(dtype=float),
concentrations: wp.array(dtype=float),
moments: wp.array(dtype=float), # [M0, M1, M2, M3]
):
"""Compute size distribution moments."""
tid = wp.tid()
d = diameters[tid]
n = concentrations[tid]
# M_k = sum(n_i * d_i^k)
wp.atomic_add(moments, 0, n) # M0 (number)
wp.atomic_add(moments, 1, n * d) # M1 (mean diameter)
wp.atomic_add(moments, 2, n * d * d) # M2 (surface area)
wp.atomic_add(moments, 3, n * d * d * d) # M3 (volume/mass)
Diffusion Coefficient Calculation¶
PI = 3.141592653589793 # matches math.pi exactly
@wp.func
def stokes_einstein_diffusion(
diameter: float,
temperature: float,
viscosity: float,
slip_correction: float,
k_boltzmann: float, # from particula.util.constants.BOLTZMANN_CONSTANT
) -> float:
"""Stokes-Einstein diffusion coefficient with slip correction.
Must match the Python equivalent exactly — no approximations.
"""
return (k_boltzmann * temperature * slip_correction) / \
(3.0 * PI * viscosity * diameter)
@wp.kernel
def compute_diffusion_coefficients(
diameters: wp.array(dtype=float),
slip_corrections: wp.array(dtype=float),
diffusion_coeffs: wp.array(dtype=float),
temperature: float,
viscosity: float,
k_boltzmann: float, # from particula.util.constants.BOLTZMANN_CONSTANT
):
"""Compute diffusion coefficient for each particle."""
tid = wp.tid()
diffusion_coeffs[tid] = stokes_einstein_diffusion(
diameters[tid], temperature, viscosity, slip_corrections[tid],
k_boltzmann
)
# Launch with constants from particula.util.constants
from particula.util.constants import BOLTZMANN_CONSTANT
wp.launch(
compute_diffusion_coefficients,
dim=n_particles,
inputs=[diameters, slip_corrections, diffusion_coeffs,
temperature, viscosity, float(BOLTZMANN_CONSTANT)],
)
Testing Warp Kernels¶
Principle: Exact Parity with Python/NumPy¶
Every @wp.func and @wp.kernel must produce numerically identical
results to the equivalent Python/NumPy function in particula. There are no
approximations or shortcuts in the GPU version.
Pattern: Lightweight Wrapper Kernels¶
Test each @wp.func by writing a small @wp.kernel in the test file that
calls the function and writes results to an output array:
import numpy as np
import numpy.testing as npt
import pytest
wp = pytest.importorskip("warp")
from particula.util.constants import BOLTZMANN_CONSTANT
@wp.kernel
def _test_diffusion_kernel(
diameters: wp.array(dtype=float),
slip_corrections: wp.array(dtype=float),
results: wp.array(dtype=float),
temperature: float,
viscosity: float,
k_boltzmann: float,
):
"""Lightweight test wrapper for stokes_einstein_diffusion."""
tid = wp.tid()
results[tid] = stokes_einstein_diffusion(
diameters[tid], slip_corrections[tid],
temperature, viscosity, k_boltzmann,
)
def test_diffusion_gpu_matches_numpy():
"""GPU diffusion matches Python/NumPy implementation exactly."""
diameters = np.array([1e-7, 1e-6, 1e-5], dtype=np.float64)
slips = np.array([1.15, 1.02, 1.001], dtype=np.float64)
temp, visc = 298.15, 1.81e-5
# Python reference
from particula.particles.properties import get_diffusion_coefficient
expected = get_diffusion_coefficient(diameters, temp, visc, slips)
# GPU
results_wp = wp.zeros(3, dtype=float)
wp.launch(
_test_diffusion_kernel, dim=3,
inputs=[wp.array(diameters), wp.array(slips), results_wp,
temp, visc, float(BOLTZMANN_CONSTANT)],
)
wp.synchronize()
npt.assert_allclose(results_wp.numpy(), expected, rtol=1e-10)
Checklist¶
pytest.importorskip("warp")— skip entire test file if Warp unavailable- Wrapper kernel per function — one small
@wp.kernelper@wp.func - Compare against Python —
npt.assert_allclose(..., rtol=1e-10) - Constants as parameters — never hardcode; pass from
util.constants - Both devices — test
"cpu"always,"cuda"if available wp.synchronize()— always call before.numpy()
For full details, see the Testing Guide.
See Also¶
- Data Structures - Particle and aerosol data types
- Aerosol Dynamics Example - Coagulation, condensation
- Particle Interactions Example - Collisions, wall loss