Python API Reference¶
Benchmark Module¶
- nvbenjo.benchmark.benchmark_models(model_cfgs, measure_memory=True)[source]¶
Benchmark the given models.
- Parameters:
model_cfgs (
Dict[str,TorchModelConfig|OnnxModelConfig])measure_memory (
bool, optional) – Whether to measure memory usage during benchmarking, by default True
- Returns:
A DataFrame containing the benchmarking results
- Return type:
pd.DataFrame
Examples
Basic usage with single PyTorch model:
from nvbenjo import cfg from nvbenjo.utils import PrecisionType from nvbenjo import benchmark model_cfg = cfg.TorchModelConfig( name="torch-shufflenet-v2-x0-5", type_or_path="torchvision:shufflenet_v2_x0_5", shape=(("B", 3, 224, 224),), devices=["cpu"], batch_sizes=[1], num_warmup_batches=1, num_batches=2, runtime_options={ "test1": cfg.TorchRuntimeConfig(compile=False, precision=PrecisionType.FP32), }, custom_batchmetrics={ "fps": 1.0, }, ) results = benchmark.benchmark_models({"model_1": model_cfg})
- nvbenjo.benchmark.load_model(type_or_path, device, runtime_config, **kwargs)[source]¶
Load a model, may be a PyTorch or ONNX model based on the runtime configuration.
- Parameters:
type_or_path (
str) – String specifying the model type or pathdevice (
torch.device) – Device to load the model ontoruntime_config (
TorchRuntimeConfigorOnnxRuntimeConfig) – Runtime configuration for the model
- Returns:
Loaded model instance
- Return type:
Any
PyTorch Utilities¶
- nvbenjo.torch_utils.get_model(type_or_path, device, runtime_config, verbose=False, **kwargs)[source]¶
Load PyTorch model.
- Parameters:
type_or_path (
str) –Model type or path. Supports prefixes to specify the model source:
torchvision:<name>– Load a torchvision model (e.g.torchvision:resnet50), see torchvision.models.list_models()huggingface:<name>– Load a HuggingFace AutoModel (e.g.huggingface:bert-base-uncased), see https://huggingface.co/docs/transformers/model_doc/autojit:<path>– Load a TorchScript/JIT modeltorchexport:<path>– Load atorch.exportsaved modelaot:<path>– Load a pre-compiled AOT model(no prefix) – Path to a model saved with
torch.saveortorch.jit.save
device (
torch.device) – Device to load the model onto.runtime_config (
TorchRuntimeConfig) – Runtime configuration for the model.verbose (
bool, optional) – Whether to print verbose output, by default False
- Returns:
Loaded model.
- Return type:
ty.Any
Examples
>>> model = get_model("torchvision:resnet18", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig()) >>> model = get_model("/path/to/model.pth", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig()) >>> model = get_model("jit:/path/to/model.pt", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig()) >>> model = get_model("torchexport:/path/to/model.pt2", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig()) >>> model = get_model("aot:/path/to/model.pt2", device=torch.device("cuda"), runtime_config=TorchRuntimeConfig()) >>> model = get_model("huggingface:bert-base-uncased", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig())
- nvbenjo.torch_utils.measure_gpu_memory_allocation(model, batch, device, iterations=3)[source]¶
Measure peak memory usage during inference.
Returns both the PyTorch allocator peak (via torch.cuda.max_memory_allocated) and the process-level GPU memory peak (via pynvml sampling).
- Parameters:
model (
nn.Module | Callable) – The model to benchmark.batch (
nvbenjo.utils.TensorLike) – Sample input to the model.device (
torch.device) – The device where the model is located and shall be used for benchmarking.iterations (
int, optional) – Number of iterations to run for measuring memory allocation, by default 3
- Returns:
(torch_memory_bytes, gpu_memory_bytes) — PyTorch allocator peak and process-level GPU memory peak.
- Return type:
tuple[int,int]
- nvbenjo.torch_utils.measure_repeated_inference_timing(model, sample, batch_size, model_device, transfer_to_device_fn=<function transfer_to_device>, num_runs=100, progress_callback=None)[source]¶
Measure inference times.
- Parameters:
model (
nn.Module) – The model to benchmark.sample (
nvbenjo.utils.TensorLike) – Sample input to the model.batch_size (
int) – The batch size of the sample.model_device (
torch.device) – The device where the model is located and shall be used for benchmarking.transfer_to_device_fn (
Callable, optional) – Function to transfer data to the specified device, by default transfer_to_devicenum_runs (
int, optional) – Number of inference runs to perform, by default 100progress_callback (
Optional[Callable], optional) – Callback function to report progress, by default None
- Returns:
DataFrame containing timing results.
- Return type:
pd.DataFrame
Examples
Measure Inference:
import torch from nvbenjo.torch_utils import measure_repeated_inference_timing from nvbenjo.torch_utils import get_model from nvbenjo.cfg import TorchRuntimeConfig model = get_model("torchvision:resnet18", device=torch.device("cpu"), runtime_config=TorchRuntimeConfig()) sample = torch.randn(2, 3, 224, 224) # batch size 2 results = measure_repeated_inference_timing( model=model, sample=sample, batch_size=2, model_device=torch.device("cpu"), num_runs=2 )
ONNX Utilities¶
System Information¶
Examples¶
PyTorch¶
"""Basic PyTorch benchmark comparing precision modes."""
import torch
from nvbenjo import benchmark, cfg
from nvbenjo.utils import PrecisionType
device = "cuda" if torch.cuda.is_available() else "cpu"
model_cfg = cfg.TorchModelConfig(
name="resnet50",
type_or_path="torchvision:resnet50",
shape=(("B", 3, 224, 224),),
devices=(device,),
batch_sizes=(1, 8),
num_warmup_batches=2,
num_batches=5,
runtime_options={
"fp32": cfg.TorchRuntimeConfig(
precision=PrecisionType.FP32,
matmul_precision="high",
cuda_graphs=True,
compile="torch_compile",
enable_profiling=False,
),
},
)
results = benchmark.benchmark_models({"resnet50": model_cfg})
# results is a pandas DataFrame with latency, throughput, and memory columns
print(results[["model", "runtime_options", "batch_size", "time_inference"]].to_string())
ONNX¶
"""Basic ONNX benchmark with runtime options."""
import os
from nvbenjo import benchmark, cfg
model_path = os.path.expanduser("~/Downloads/resnet50-v2-7.onnx")
if not os.path.isfile(model_path):
raise SystemExit(
f"ONNX model not found at {model_path}. Download it with:\n"
" wget -O ~/Downloads/resnet50-v2-7.onnx "
"https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx"
)
model_cfg = cfg.OnnxModelConfig(
name="resnet50-onnx",
type_or_path=model_path,
shape=({"name": "data", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},),
devices=("cpu",),
batch_sizes=(1, 8),
num_warmup_batches=2,
num_batches=5,
runtime_options={
"default": cfg.OnnxRuntimeConfig(
intra_op_num_threads=2,
graph_optimization_level="ORT_ENABLE_BASIC",
),
},
)
results = benchmark.benchmark_models({"resnet50": model_cfg})
print(results[["model", "runtime_options", "batch_size", "time_inference"]].to_string())