GPU Acceleration

This section outlines how to enable and utilize GPU acceleration in the Simulation-Based Inference (SBI) pipeline. Leveraging CUDA-capable GPUs can significantly improve training performance, especially for large-scale simulations or high-dimensional parameter spaces.

Requirements

To use GPU acceleration, your system must meet the following conditions:

NVIDIA GPU with CUDA support
CUDA Toolkit installed (version compatible with PyTorch)
PyTorch with CUDA backend installed

You can install PyTorch with CUDA 11.8 support as follows:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Using the GPU in SBI Workflows

PyTorch automatically places tensors and models on the CPU by default. To utilize a GPU, explicitly transfer relevant objects to the CUDA device:

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Example: preparing input data
theta = theta.to(device)
x = x.to(device)

Most sbi components are compatible with GPU execution, provided that inputs and models are transferred appropriately.

Training with GPU

When training the neural posterior estimator, GPU acceleration is automatically used if:

Input tensors (theta, x) are on the GPU
The neural network model is on the GPU

This process is managed internally by the sbi API, but it's good practice to confirm that inputs and outputs remain on the same device.

inference = SNPE(prior=prior)
density_estimator = inference.append_simulations(theta.to(device), x.to(device)).train()

Performance Considerations

Batch Size and Memory

Increase batch size to better utilize GPU memory, if available.
Monitor GPU usage with nvidia-smi.

Data Movement

Avoid frequent CPU–GPU transfers, which can bottleneck performance.
Pre-load and pre-transform data on the GPU when possible.

Parallelism

Simulation may remain CPU-bound unless the simulator itself is parallelized.
If training time dominates, GPU offers the most significant benefit.

Verifying GPU Usage

To confirm that training uses the GPU:

Run nvidia-smi in the terminal during training.
Use PyTorch utilities to print device assignment:

for param in density_estimator.parameters():
    print(param.device)

Summary

GPU acceleration can dramatically reduce training time for neural density estimators in the SBI pipeline. Ensure all data and models are transferred to the CUDA device, and monitor resource usage to achieve optimal performance.

In the next section, we demonstrate a complete inference workflow using a real example.

Requirements​

Using the GPU in SBI Workflows​

Training with GPU​

Performance Considerations​

Batch Size and Memory​

Data Movement​

Parallelism​

Verifying GPU Usage​

Summary​