Python API Reference

Benchmark Module

nvbenjo.benchmark.benchmark_models(model_cfgs, measure_memory=True)[source]

Benchmark the given models.

Parameters:
Returns:

A DataFrame containing the benchmarking results

Return type:

pd.DataFrame

Examples

Basic usage with single PyTorch model:

from nvbenjo import cfg
from nvbenjo.utils import PrecisionType
from nvbenjo import benchmark

model_cfg = cfg.TorchModelConfig(
    name="torch-shufflenet-v2-x0-5",
    type_or_path="torchvision:shufflenet_v2_x0_5",
    shape=(("B", 3, 224, 224),),
    devices=["cpu"],
    batch_sizes=[1],
    num_warmup_batches=1,
    num_batches=2,
    runtime_options={
        "test1": cfg.TorchRuntimeConfig(compile=False, precision=PrecisionType.FP32),
    },
    custom_batchmetrics={
        "fps": 1.0,
    },
)
results = benchmark.benchmark_models({"model_1": model_cfg})
nvbenjo.benchmark.load_model(type_or_path, device, runtime_config, **kwargs)[source]

Load a model, may be a PyTorch or ONNX model based on the runtime configuration.

Parameters:
Returns:

Loaded model instance

Return type:

Any

PyTorch Utilities

nvbenjo.torch_utils.get_model(type_or_path, device, runtime_config, verbose=False, **kwargs)[source]

Load PyTorch model.

Parameters:
  • type_or_path (str) –

    Model type or path. Supports prefixes to specify the model source:

    • torchvision:<name> – Load a torchvision model (e.g. torchvision:resnet50), see torchvision.models.list_models()

    • huggingface:<name> – Load a HuggingFace AutoModel (e.g. huggingface:bert-base-uncased), see https://huggingface.co/docs/transformers/model_doc/auto

    • jit:<path> – Load a TorchScript/JIT model

    • torchexport:<path> – Load a torch.export saved model

    • aot:<path> – Load a pre-compiled AOT model

    • (no prefix) – Path to a model saved with torch.save or torch.jit.save

  • device (torch.device) – Device to load the model onto.

  • runtime_config (TorchRuntimeConfig) – Runtime configuration for the model.

  • verbose (bool, optional) – Whether to print verbose output, by default False

Returns:

Loaded model.

Return type:

ty.Any

Examples

>>> model = get_model("torchvision:resnet18", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig())
>>> model = get_model("/path/to/model.pth", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig())
>>> model = get_model("jit:/path/to/model.pt", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig())
>>> model = get_model("torchexport:/path/to/model.pt2", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig())
>>> model = get_model("aot:/path/to/model.pt2", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig())
>>> model = get_model("huggingface:bert-base-uncased", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig())
nvbenjo.torch_utils.measure_gpu_memory_allocation(model, batch, device, iterations=3)[source]

Measure peak memory usage during inference.

Returns both the PyTorch allocator peak (via torch.cuda.max_memory_allocated) and the process-level GPU memory peak (via pynvml sampling).

Parameters:
  • model (nn.Module | Callable) – The model to benchmark.

  • batch (nvbenjo.utils.TensorLike) – Sample input to the model.

  • device (torch.device) – The device where the model is located and shall be used for benchmarking.

  • iterations (int, optional) – Number of iterations to run for measuring memory allocation, by default 3

Returns:

(torch_memory_bytes, gpu_memory_bytes) — PyTorch allocator peak and process-level GPU memory peak.

Return type:

tuple[int, int]

nvbenjo.torch_utils.measure_repeated_inference_timing(model, sample, batch_size, model_device, transfer_to_device_fn=<function transfer_to_device>, num_runs=100, progress_callback=None)[source]

Measure inference times.

Parameters:
  • model (nn.Module) – The model to benchmark.

  • sample (nvbenjo.utils.TensorLike) – Sample input to the model.

  • batch_size (int) – The batch size of the sample.

  • model_device (torch.device) – The device where the model is located and shall be used for benchmarking.

  • transfer_to_device_fn (Callable, optional) – Function to transfer data to the specified device, by default transfer_to_device

  • num_runs (int, optional) – Number of inference runs to perform, by default 100

  • progress_callback (Optional[Callable], optional) – Callback function to report progress, by default None

Returns:

DataFrame containing timing results.

Return type:

pd.DataFrame

Examples

Measure Inference:

import torch
from nvbenjo.torch_utils import measure_repeated_inference_timing
from nvbenjo.torch_utils import get_model
from nvbenjo.cfg import TorchRuntimeConfig

model = get_model("torchvision:resnet18", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig())
sample = torch.randn(2, 3, 224, 224)  # batch size 2
results = measure_repeated_inference_timing(
    model=model,
    sample=sample,
    batch_size=2,
    model_device=torch.device("cpu"),
    num_runs=2
)

ONNX Utilities

System Information

nvbenjo.system_info.get_system_info()[source]

Retrieve system information.

Collects information about the operating system, CPU, memory, and GPU.

Returns:

A dictionary containing system information.

Return type:

dict[str, Any]

nvbenjo.system_info.get_gpu_info()[source]

Retrieve information about GPUs in the system.

Includes information such as name, architecture, memory, clock speeds, CUDA capability, and driver version.

Returns:

A list of dictionaries containing GPU information.

Return type:

list[dict[str, Any]]

Examples

PyTorch

Basic PyTorch benchmark
"""Basic PyTorch benchmark comparing precision modes."""

import torch

from nvbenjo import benchmark, cfg
from nvbenjo.utils import PrecisionType

device = "cuda" if torch.cuda.is_available() else "cpu"

model_cfg = cfg.TorchModelConfig(
    name="resnet50",
    type_or_path="torchvision:resnet50",
    shape=(("B", 3, 224, 224),),
    devices=(device,),
    batch_sizes=(1, 8),
    num_warmup_batches=2,
    num_batches=5,
    runtime_options={
        "fp32": cfg.TorchRuntimeConfig(
            precision=PrecisionType.FP32,
            matmul_precision="high",
            cuda_graphs=True,
            compile="torch_compile",
            enable_profiling=False,
        ),
    },
)
results = benchmark.benchmark_models({"resnet50": model_cfg})
# results is a pandas DataFrame with latency, throughput, and memory columns
print(results[["model", "runtime_options", "batch_size", "time_inference"]].to_string())

ONNX

Basic ONNX benchmark
"""Basic ONNX benchmark with runtime options."""

import os

from nvbenjo import benchmark, cfg

model_path = os.path.expanduser("~/Downloads/resnet50-v2-7.onnx")
if not os.path.isfile(model_path):
    raise SystemExit(
        f"ONNX model not found at {model_path}. Download it with:\n"
        "  wget -O ~/Downloads/resnet50-v2-7.onnx "
        "https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx"
    )

model_cfg = cfg.OnnxModelConfig(
    name="resnet50-onnx",
    type_or_path=model_path,
    shape=({"name": "data", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},),
    devices=("cpu",),
    batch_sizes=(1, 8),
    num_warmup_batches=2,
    num_batches=5,
    runtime_options={
        "default": cfg.OnnxRuntimeConfig(
            intra_op_num_threads=2,
            graph_optimization_level="ORT_ENABLE_BASIC",
        ),
    },
)
results = benchmark.benchmark_models({"resnet50": model_cfg})
print(results[["model", "runtime_options", "batch_size", "time_inference"]].to_string())