Kernels and Function Inlining for Particle Simulation¶

Introduction¶

Kernels are the fundamental building blocks of parallel computation in Warp. For particula, kernels enable massively parallel computation of aerosol dynamics where each particle can be processed independently.

Kernel Basics¶

Defining a Kernel¶

Kernels are Python functions decorated with @wp.kernel:

import warp as wp

PI = 3.141592653589793  # matches math.pi exactly — never truncate

@wp.kernel
def compute_particle_masses(
    diameters: wp.array(dtype=float),
    densities: wp.array(dtype=float),
    masses: wp.array(dtype=float),
):
    """Compute particle mass from diameter and density."""
    tid = wp.tid()

    # Volume of sphere: (4/3) * pi * r^3
    radius = diameters[tid] * 0.5
    volume = (4.0 / 3.0) * PI * radius * radius * radius

    masses[tid] = densities[tid] * volume

Key Characteristics¶

Thread Index: Use wp.tid() to get the current particle's index
Type Annotations: All parameters must have Warp type annotations
Parallel Execution: Each particle is processed independently
No Return Values: Kernels write results to output arrays

Launching Kernels¶

# Create particle arrays
n_particles = 10000
diameters = wp.array(np.ones(n_particles) * 1e-6, dtype=float)  # 1 micron
densities = wp.array(np.ones(n_particles) * 1000.0, dtype=float)  # kg/m³
masses = wp.zeros(n_particles, dtype=float)

# Launch kernel - one thread per particle
wp.launch(compute_particle_masses, dim=n_particles, inputs=[diameters, densities, masses])

Function Inlining¶

Warp Functions¶

Functions decorated with @wp.func can be called from kernels and are automatically inlined during compilation for optimal performance.

@wp.func
def knudsen_number(diameter: float, mean_free_path: float) -> float:
    """Calculate Knudsen number for a particle."""
    return 2.0 * mean_free_path / diameter

@wp.func
def cunningham_correction(knudsen: float) -> float:
    """Cunningham slip correction factor."""
    # Cc = 1 + Kn * (A1 + A2 * exp(-A3/Kn))
    A1 = 1.257
    A2 = 0.400
    A3 = 0.55
    return 1.0 + knudsen * (A1 + A2 * wp.exp(-A3 / knudsen))

@wp.kernel
def compute_slip_corrections(
    diameters: wp.array(dtype=float),
    mean_free_path: float,
    corrections: wp.array(dtype=float),
):
    """Compute Cunningham slip correction for each particle."""
    tid = wp.tid()
    kn = knudsen_number(diameters[tid], mean_free_path)
    corrections[tid] = cunningham_correction(kn)

Benefits of Inlining¶

Zero Overhead: Function calls are eliminated at compile time
Optimization: Compiler can optimize across function boundaries
Code Reuse: Write modular physics functions without performance penalty
Type Safety: Full type checking at compile time

Advanced Kernel Patterns¶

Multi-Dimensional Kernels¶

For particle-particle interactions like coagulation:

@wp.kernel
def compute_coagulation_kernel_matrix(
    diameters: wp.array(dtype=float),
    kernel_matrix: wp.array2d(dtype=float),
    temperature: float,
    viscosity: float,
    k_boltzmann: float,  # from particula.util.constants.BOLTZMANN_CONSTANT
):
    """Compute coagulation kernel K(i,j) for all particle pairs."""
    i, j = wp.tid()

    d_i = diameters[i]
    d_j = diameters[j]

    # Brownian coagulation kernel
    coag_kernel = (2.0 * k_boltzmann * temperature / (3.0 * viscosity)) * \
                  (d_i + d_j) * (1.0/d_i + 1.0/d_j)

    kernel_matrix[i, j] = coag_kernel

# Launch with constants from particula.util.constants
from particula.util.constants import BOLTZMANN_CONSTANT
wp.launch(compute_coagulation_kernel_matrix, 
          dim=(n_particles, n_particles), 
          inputs=[diameters, kernel_matrix, temperature, viscosity,
                  float(BOLTZMANN_CONSTANT)])

Conditional Logic¶

@wp.kernel
def classify_particles(
    diameters: wp.array(dtype=float),
    classifications: wp.array(dtype=int),
):
    """Classify particles by size regime."""
    tid = wp.tid()
    d = diameters[tid]

    # PM2.5: < 2.5 microns, PM10: < 10 microns
    if d < 2.5e-6:
        classifications[tid] = 0  # PM2.5 (fine)
    elif d < 10.0e-6:
        classifications[tid] = 1  # PM10 (coarse)
    else:
        classifications[tid] = 2  # Large particles

Atomic Operations¶

For operations that need synchronization across particles (e.g., total mass):

@wp.kernel
def compute_total_mass(
    masses: wp.array(dtype=float),
    concentrations: wp.array(dtype=float),
    total_mass: wp.array(dtype=float),
):
    """Compute total aerosol mass concentration."""
    tid = wp.tid()
    mass_conc = masses[tid] * concentrations[tid]
    wp.atomic_add(total_mass, 0, mass_conc)

Performance Considerations¶

Best Practices¶

Minimize Memory Access: Reuse loaded particle properties
Avoid Divergence: Keep conditional branches minimal
Use Local Variables: Store frequently accessed data in registers
Coalesce Memory Access: Access particle arrays in contiguous patterns

Example: Optimized Condensation Kernel¶

Physical constants are passed as kernel parameters from particula.util.constants — never hardcoded in kernel code. This ensures exact parity with the Python/NumPy implementations and a single source of truth for all constants.

@wp.func
def saturation_vapor_pressure(
    temperature: float,
    latent_heat: float,      # passed from Python side
    gas_constant_vapor: float,  # passed from Python side
    ref_temperature: float,  # passed from Python side
    ref_pressure: float,     # passed from Python side
) -> float:
    """Clausius-Clapeyron equation for water vapor.

    Must match the Python equivalent exactly — no approximations.
    """
    return ref_pressure * wp.exp(
        (latent_heat / gas_constant_vapor)
        * (1.0 / ref_temperature - 1.0 / temperature)
    )

@wp.func
def kelvin_effect(
    diameter: float,
    surface_tension: float, 
    molar_mass: float,
    density: float, 
    temperature: float,
    gas_constant: float,  # from particula.util.constants.GAS_CONSTANT
) -> float:
    """Kelvin effect correction for small droplets."""
    exponent = (4.0 * surface_tension * molar_mass) / \
               (gas_constant * temperature * density * diameter)
    return wp.exp(exponent)

@wp.kernel
def compute_condensation_rates(
    diameters: wp.array(dtype=float),
    temperatures: wp.array(dtype=float),
    vapor_pressures: wp.array(dtype=float),
    growth_rates: wp.array(dtype=float),
    surface_tension: float,
    molar_mass: float,
    density: float,
    diffusion_coeff: float,
    gas_constant: float,  # from particula.util.constants.GAS_CONSTANT
    latent_heat: float,
    gas_constant_vapor: float,
    ref_temperature: float,
    ref_pressure: float,
):
    """Compute condensation growth rate for each particle.

    All physical constants passed as parameters — see launch call below.
    """
    tid = wp.tid()

    # Load particle properties once
    d = diameters[tid]
    T = temperatures[tid]
    p_vapor = vapor_pressures[tid]

    # Compute equilibrium vapor pressure with Kelvin effect
    p_sat = saturation_vapor_pressure(
        T, latent_heat, gas_constant_vapor, ref_temperature, ref_pressure
    )
    kelvin = kelvin_effect(d, surface_tension, molar_mass, density, T,
                           gas_constant)
    p_eq = p_sat * kelvin

    # Supersaturation
    S = p_vapor / p_eq - 1.0

    # Growth rate (Maxwell equation)
    growth_rates[tid] = (2.0 * diffusion_coeff * molar_mass * p_eq * S) / \
                        (density * gas_constant * T * d)

# Launch with constants from particula.util.constants
from particula.util.constants import GAS_CONSTANT
wp.launch(
    compute_condensation_rates,
    dim=n_particles,
    inputs=[diameters, temperatures, vapor_pressures, growth_rates,
            surface_tension, molar_mass, density, diffusion_coeff,
            float(GAS_CONSTANT), latent_heat, gas_constant_vapor,
            ref_temperature, ref_pressure],
)

Debugging Kernels¶

Print Statements¶

@wp.kernel
def debug_particle_kernel(
    diameters: wp.array(dtype=float),
    masses: wp.array(dtype=float),
):
    tid = wp.tid()

    if tid == 0:  # Print only from first thread
        wp.printf("First particle: d=%e m, m=%e kg\n", 
                  diameters[0], masses[0])

Bounds Checking¶

Warp automatically checks array bounds in debug mode:

wp.config.verify_cuda = True  # Enable bounds checking

Common Patterns for Aerosol Physics¶

Particle Size Distribution Moments¶

@wp.kernel
def compute_moments(
    diameters: wp.array(dtype=float),
    concentrations: wp.array(dtype=float),
    moments: wp.array(dtype=float),  # [M0, M1, M2, M3]
):
    """Compute size distribution moments."""
    tid = wp.tid()

    d = diameters[tid]
    n = concentrations[tid]

    # M_k = sum(n_i * d_i^k)
    wp.atomic_add(moments, 0, n)              # M0 (number)
    wp.atomic_add(moments, 1, n * d)          # M1 (mean diameter)
    wp.atomic_add(moments, 2, n * d * d)      # M2 (surface area)
    wp.atomic_add(moments, 3, n * d * d * d)  # M3 (volume/mass)

Diffusion Coefficient Calculation¶

PI = 3.141592653589793  # matches math.pi exactly

@wp.func
def stokes_einstein_diffusion(
    diameter: float,
    temperature: float,
    viscosity: float,
    slip_correction: float,
    k_boltzmann: float,  # from particula.util.constants.BOLTZMANN_CONSTANT
) -> float:
    """Stokes-Einstein diffusion coefficient with slip correction.

    Must match the Python equivalent exactly — no approximations.
    """
    return (k_boltzmann * temperature * slip_correction) / \
           (3.0 * PI * viscosity * diameter)

@wp.kernel
def compute_diffusion_coefficients(
    diameters: wp.array(dtype=float),
    slip_corrections: wp.array(dtype=float),
    diffusion_coeffs: wp.array(dtype=float),
    temperature: float,
    viscosity: float,
    k_boltzmann: float,  # from particula.util.constants.BOLTZMANN_CONSTANT
):
    """Compute diffusion coefficient for each particle."""
    tid = wp.tid()
    diffusion_coeffs[tid] = stokes_einstein_diffusion(
        diameters[tid], temperature, viscosity, slip_corrections[tid],
        k_boltzmann
    )

# Launch with constants from particula.util.constants
from particula.util.constants import BOLTZMANN_CONSTANT
wp.launch(
    compute_diffusion_coefficients,
    dim=n_particles,
    inputs=[diameters, slip_corrections, diffusion_coeffs,
            temperature, viscosity, float(BOLTZMANN_CONSTANT)],
)

Testing Warp Kernels¶

Principle: Exact Parity with Python/NumPy¶

Every @wp.func and @wp.kernel must produce numerically identical results to the equivalent Python/NumPy function in particula. There are no approximations or shortcuts in the GPU version.

Pattern: Lightweight Wrapper Kernels¶

Test each @wp.func by writing a small @wp.kernel in the test file that calls the function and writes results to an output array:

import numpy as np
import numpy.testing as npt
import pytest

wp = pytest.importorskip("warp")

from particula.util.constants import BOLTZMANN_CONSTANT


@wp.kernel
def _test_diffusion_kernel(
    diameters: wp.array(dtype=float),
    slip_corrections: wp.array(dtype=float),
    results: wp.array(dtype=float),
    temperature: float,
    viscosity: float,
    k_boltzmann: float,
):
    """Lightweight test wrapper for stokes_einstein_diffusion."""
    tid = wp.tid()
    results[tid] = stokes_einstein_diffusion(
        diameters[tid], slip_corrections[tid],
        temperature, viscosity, k_boltzmann,
    )


def test_diffusion_gpu_matches_numpy():
    """GPU diffusion matches Python/NumPy implementation exactly."""
    diameters = np.array([1e-7, 1e-6, 1e-5], dtype=np.float64)
    slips = np.array([1.15, 1.02, 1.001], dtype=np.float64)
    temp, visc = 298.15, 1.81e-5

    # Python reference
    from particula.particles.properties import get_diffusion_coefficient
    expected = get_diffusion_coefficient(diameters, temp, visc, slips)

    # GPU
    results_wp = wp.zeros(3, dtype=float)
    wp.launch(
        _test_diffusion_kernel, dim=3,
        inputs=[wp.array(diameters), wp.array(slips), results_wp,
                temp, visc, float(BOLTZMANN_CONSTANT)],
    )
    wp.synchronize()

    npt.assert_allclose(results_wp.numpy(), expected, rtol=1e-10)

Checklist¶

pytest.importorskip("warp") — skip entire test file if Warp unavailable
Wrapper kernel per function — one small @wp.kernel per @wp.func
Compare against Python — npt.assert_allclose(..., rtol=1e-10)
Constants as parameters — never hardcode; pass from util.constants
Both devices — test "cpu" always, "cuda" if available
wp.synchronize() — always call before .numpy()

For full details, see the Testing Guide.