Configuration¶

Nvbenjo uses Hydra for configuration using the dataclasses listed below which you may use with the Python API. See Examples for configuration file examples to use with the command line interface.

Main configuration classes¶

class nvbenjo.cfg.BenchConfig(nvbenjo=<factory>, output_dir=None)[source]¶

Main benchmark configuration container.

Parameters:

nvbenjo (NvbenjoConfig) – Nvbenjo-specific configuration settings.
output_dir (str or None) – Directory path where benchmark results will be saved. If None, uses Hydra’s default output directory.

class nvbenjo.cfg.NvbenjoConfig(measure_memory=True, models=<factory>)[source]¶

Root configuration for nvbenjo benchmarking.

Parameters:

measure_memory (bool) – Whether to measure GPU memory allocation during benchmarking.
models (dict[str, TorchModelConfig | OnnxModelConfig]) – Dictionary mapping model names to their configurations. See TorchModelConfig and OnnxModelConfig for details.

Pytorch¶

class nvbenjo.cfg.TorchModelConfig(name='resnet', type_or_path='torchvision:wide_resnet101_2', kwargs=<factory>, shape=('B', 3, 224, 224), num_warmup_batches=5, num_batches=50, batch_sizes=(16, 32), devices=('cpu', ), runtime_options=<factory>, custom_batchmetrics=<factory>, model_kwargs=<factory>)[source]¶

PyTorch model configuration

Parameters:

name (str) – Name of the model.
type_or_path (str) –
Model type or path. Supports prefixes to specify the model source:
- torchvision:<name> – Load a torchvision model (e.g. torchvision:resnet50)
- huggingface:<name> – Load a HuggingFace AutoModel (e.g. huggingface:bert-base-uncased)
- jit:<path> – Load a TorchScript/JIT model
- torchexport:<path> – Load a torch.export saved model
- aot:<path> – Load a pre-compiled AOT model
Note

For torchexport and aot models, precision is baked in at export time and cannot be changed at runtime.
- (no prefix) – Path to a model saved with torch.save or torch.jit.save
kwargs (dict) – Additional keyword arguments to pass when instantiating the model.

shape (tuple) –

Input shape of the model. Use “B” to denote the batch size dimension. Examples:

# Single input shape
("B", 3, 224, 224)

# Multiple input shapes
(("B", 3, 224, 224), ("B", 10))

# Dictionary with metadata
({"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},)

# Multiple dictionary inputs
(
    {"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},
    {"name": "input2", "type": "int", "shape": (1, 3)},
    {"name": "input3", "type": "int", "shape": (), "value": 42},
)

num_warmup_batches (int) – Number of warm-up batches to run before measuring performance.
num_batches (int) – Number of batches to run for performance measurement.
batch_sizes (tuple) – Tuple of batch sizes to benchmark.
devices (tuple of str) – Tuple of device names to benchmark on (e.g., ‘cpu’, ‘cuda:0’).
runtime_options (dict[str, TorchRuntimeConfig]) – Dictionary mapping runtime names to their specific runtime configurations.
custom_batchmetrics (dict[str, float])
model_kwargs (dict)

class nvbenjo.cfg.TorchRuntimeConfig(compile='False', compile_kwargs=<factory>, precision=PrecisionType.FP32, matmul_precision=None, cuda_graphs=False, cuda_graph_kwargs=<factory>, enable_profiling=False, profiling_prefix=None, profiler_kwargs=<factory>, cache_dir=<factory>)[source]¶

PyTorch Runtime configuration:

Parameters:

compile (str) –
Model compilation mode:
- false – No compilation (default)
- torch_compile – Compile with torch.compile (PyTorch 2.0+)
- aot_compile – Ahead-of-time compilation via torch._inductor
compile_kwargs (dict) – Additional keyword arguments passed to torch.compile or aoti_compile_and_package.
precision (PrecisionType) – Precision type for model inference (e.g., fp32, fp16, amp).
matmul_precision (str or None) – Precision for float32 matrix multiplications on GPUs with tensor cores (torch.set_float32_matmul_precision). One of "highest", "high", or "medium". When None (default), the current PyTorch global setting is left unchanged.
cuda_graphs (bool) – Wrap inference in a CUDA Graph capture/replay. Eliminates per-launch CPU dispatch overhead at the cost of one captured graph per (model, batch_size, shape, dtype). Requires a CUDA device; ignored on CPU.
cuda_graph_kwargs (dict) – Additional keyword arguments passed to torch.cuda.graph
enable_profiling (bool) – Whether to enable PyTorch profiler during inference.
profiling_prefix (str or None) – Prefix for profiler output files. If None, a default path will be used.
profiler_kwargs (dict) – Additional keyword arguments for torch.profiler.profile.
cache_dir (str or None) – Directory for caching AOT-compiled packages. When set, the AOT compile step is skipped on cache hits keyed by (torch/cuda version, model identity, file size+mtime for path-based models, shape, batch_size, precision, compile_kwargs, device type, GPU compute capability). Defaults to $XDG_CACHE_HOME/nvbenjo/torchcache (or ~/.cache/nvbenjo/torchcache if XDG_CACHE_HOME is unset). Set to None to disable caching.

Onnx¶

class nvbenjo.cfg.OnnxModelConfig(name='resnet', type_or_path='torchvision:wide_resnet101_2', kwargs=<factory>, shape=('B', 3, 224, 224), num_warmup_batches=5, num_batches=50, batch_sizes=(16, 32), devices=('cpu', ), runtime_options=<factory>, custom_batchmetrics=<factory>)[source]¶

ONNX model configuration

Parameters:

name (str) – Name of the model.
type_or_path (str) – Model type or path. Can be a local file path or a model identifier.
kwargs (dict) – Additional keyword arguments to pass when instantiating the model.

shape (tuple) –

Input shape of the model. Use “B” to denote the batch size dimension.

Examples:

# Single input shape
("B", 3, 224, 224)

# Multiple input shapes
(("B", 3, 224, 224), ("B", 10))

# Dictionary with metadata
({"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},)

# Multiple dictionary inputs
(
    {"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},
    {"name": "input2", "type": "int", "shape": (1, 3)},
    {"name": "input3", "type": "int", "shape": (), "value": 42},
)

num_warmup_batches (int) – Number of warm-up batches to run before measuring performance.
num_batches (int) – Number of batches to run for performance measurement.
batch_sizes (tuple) – Tuple of batch sizes to benchmark.
devices (tuple of str) – Tuple of device names to benchmark on (e.g., ‘cpu’, ‘cuda:0’).
runtime_options (dict[str, OnnxRuntimeConfig]) – Dictionary mapping runtime names to their specific runtime configurations.
custom_batchmetrics (dict[str, float])

class nvbenjo.cfg.OnnxRuntimeConfig(execution_providers=None, graph_optimization_level='ORT_ENABLE_ALL', intra_op_num_threads=1, inter_op_num_threads=0, log_severity_level=3, enable_profiling=False, profiling_prefix=None, provider_options=None)[source]¶

ONNX Runtime configuration:

Parameters:

execution_providers (tuple of str or None) – Tuple of execution providers to use (e.g., (‘CPUExecutionProvider’, ‘CUDAExecutionProvider’)). If None, uses the default provider.
graph_optimization_level (str) – Graph optimization level for ONNX Runtime. Options are ‘ORT_ENABLE_ALL’, ‘ORT_ENABLE_LAYOUT’, ‘ORT_ENABLE_BASIC’, ‘ORT_DISABLE_ALL’.
intra_op_num_threads (int) – Number of threads used to parallelize the execution within nodes.
inter_op_num_threads (int) – Number of threads used to parallelize the execution of the graph (between nodes)
log_severity_level (int) – Logging severity level (0=VERBOSE, 1=INFO, 2=WARNING, 3=ERROR, 4=FATAL)
enable_profiling (bool) – Whether to enable profiling in ONNX Runtime.
profiling_prefix (str or None) – Prefix for profiling output files. If None, a default path will be used.
provider_options (sequence of dict or None) – Additional options for each execution provider.