Configuration¶
Nvbenjo uses Hydra for configuration using the dataclasses listed below which you may use with the Python API. See Examples for configuration file examples to use with the command line interface.
Main configuration classes¶
- class nvbenjo.cfg.BenchConfig(nvbenjo=<factory>, output_dir=None)[source]¶
Main benchmark configuration container.
- Parameters:
nvbenjo (
NvbenjoConfig) – Nvbenjo-specific configuration settings.output_dir (
strorNone) – Directory path where benchmark results will be saved. If None, uses Hydra’s default output directory.
- class nvbenjo.cfg.NvbenjoConfig(measure_memory=True, models=<factory>)[source]¶
Root configuration for nvbenjo benchmarking.
- Parameters:
measure_memory (
bool) – Whether to measure GPU memory allocation during benchmarking.models (
dict[str,TorchModelConfig | OnnxModelConfig]) – Dictionary mapping model names to their configurations. SeeTorchModelConfigandOnnxModelConfigfor details.
Pytorch¶
- class nvbenjo.cfg.TorchModelConfig(name='resnet', type_or_path='torchvision:wide_resnet101_2', kwargs=<factory>, shape=('B', 3, 224, 224), num_warmup_batches=5, num_batches=50, batch_sizes=(16, 32), devices=('cpu', ), runtime_options=<factory>, custom_batchmetrics=<factory>, model_kwargs=<factory>)[source]¶
PyTorch model configuration
- Parameters:
name (
str) – Name of the model.type_or_path (
str) –Model type or path. Supports prefixes to specify the model source:
torchvision:<name>– Load a torchvision model (e.g.torchvision:resnet50)huggingface:<name>– Load a HuggingFace AutoModel (e.g.huggingface:bert-base-uncased)jit:<path>– Load a TorchScript/JIT modeltorchexport:<path>– Load atorch.exportsaved modelaot:<path>– Load a pre-compiled AOT model
Note
For
torchexportandaotmodels, precision is baked in at export time and cannot be changed at runtime.(no prefix) – Path to a model saved with
torch.saveortorch.jit.save
kwargs (
dict) – Additional keyword arguments to pass when instantiating the model.shape (
tuple) –Input shape of the model. Use “B” to denote the batch size dimension. Examples:
# Single input shape ("B", 3, 224, 224) # Multiple input shapes (("B", 3, 224, 224), ("B", 10)) # Dictionary with metadata ({"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},) # Multiple dictionary inputs ( {"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)}, {"name": "input2", "type": "int", "shape": (1, 3)}, {"name": "input3", "type": "int", "shape": (), "value": 42}, )
num_warmup_batches (
int) – Number of warm-up batches to run before measuring performance.num_batches (
int) – Number of batches to run for performance measurement.batch_sizes (
tuple) – Tuple of batch sizes to benchmark.devices (
tupleofstr) – Tuple of device names to benchmark on (e.g., ‘cpu’, ‘cuda:0’).runtime_options (
dict[str,TorchRuntimeConfig]) – Dictionary mapping runtime names to their specific runtime configurations.model_kwargs (dict)
- class nvbenjo.cfg.TorchRuntimeConfig(compile='False', compile_kwargs=<factory>, precision=PrecisionType.FP32, matmul_precision=None, cuda_graphs=False, cuda_graph_kwargs=<factory>, enable_profiling=False, profiling_prefix=None, profiler_kwargs=<factory>, cache_dir=<factory>)[source]¶
PyTorch Runtime configuration:
- Parameters:
compile (
str) –Model compilation mode:
false– No compilation (default)torch_compile– Compile withtorch.compile(PyTorch 2.0+)aot_compile– Ahead-of-time compilation viatorch._inductor
compile_kwargs (
dict) – Additional keyword arguments passed totorch.compileoraoti_compile_and_package.precision (
PrecisionType) – Precision type for model inference (e.g., fp32, fp16, amp).matmul_precision (
strorNone) – Precision for float32 matrix multiplications on GPUs with tensor cores (torch.set_float32_matmul_precision). One of"highest","high", or"medium". WhenNone(default), the current PyTorch global setting is left unchanged.cuda_graphs (
bool) – Wrap inference in a CUDA Graph capture/replay. Eliminates per-launch CPU dispatch overhead at the cost of one captured graph per (model, batch_size, shape, dtype). Requires a CUDA device; ignored on CPU.cuda_graph_kwargs (
dict) – Additional keyword arguments passed totorch.cuda.graphenable_profiling (
bool) – Whether to enable PyTorch profiler during inference.profiling_prefix (
strorNone) – Prefix for profiler output files. If None, a default path will be used.profiler_kwargs (
dict) – Additional keyword arguments for torch.profiler.profile.cache_dir (
strorNone) – Directory for caching AOT-compiled packages. When set, the AOT compile step is skipped on cache hits keyed by (torch/cuda version, model identity, file size+mtime for path-based models, shape, batch_size, precision, compile_kwargs, device type, GPU compute capability). Defaults to$XDG_CACHE_HOME/nvbenjo/torchcache(or~/.cache/nvbenjo/torchcacheifXDG_CACHE_HOMEis unset). Set toNoneto disable caching.
Onnx¶
- class nvbenjo.cfg.OnnxModelConfig(name='resnet', type_or_path='torchvision:wide_resnet101_2', kwargs=<factory>, shape=('B', 3, 224, 224), num_warmup_batches=5, num_batches=50, batch_sizes=(16, 32), devices=('cpu', ), runtime_options=<factory>, custom_batchmetrics=<factory>)[source]¶
ONNX model configuration
- Parameters:
name (
str) – Name of the model.type_or_path (
str) – Model type or path. Can be a local file path or a model identifier.kwargs (
dict) – Additional keyword arguments to pass when instantiating the model.shape (
tuple) –Input shape of the model. Use “B” to denote the batch size dimension.
Examples:
# Single input shape ("B", 3, 224, 224) # Multiple input shapes (("B", 3, 224, 224), ("B", 10)) # Dictionary with metadata ({"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)},) # Multiple dictionary inputs ( {"name": "input1", "type": "float", "shape": ("B", 3, 224, 224), "min_max": (0, 1)}, {"name": "input2", "type": "int", "shape": (1, 3)}, {"name": "input3", "type": "int", "shape": (), "value": 42}, )
num_warmup_batches (
int) – Number of warm-up batches to run before measuring performance.num_batches (
int) – Number of batches to run for performance measurement.batch_sizes (
tuple) – Tuple of batch sizes to benchmark.devices (
tupleofstr) – Tuple of device names to benchmark on (e.g., ‘cpu’, ‘cuda:0’).runtime_options (
dict[str,OnnxRuntimeConfig]) – Dictionary mapping runtime names to their specific runtime configurations.
- class nvbenjo.cfg.OnnxRuntimeConfig(execution_providers=None, graph_optimization_level='ORT_ENABLE_ALL', intra_op_num_threads=1, inter_op_num_threads=0, log_severity_level=3, enable_profiling=False, profiling_prefix=None, provider_options=None)[source]¶
ONNX Runtime configuration:
- Parameters:
execution_providers (
tupleofstrorNone) – Tuple of execution providers to use (e.g., (‘CPUExecutionProvider’, ‘CUDAExecutionProvider’)). If None, uses the default provider.graph_optimization_level (
str) – Graph optimization level for ONNX Runtime. Options are ‘ORT_ENABLE_ALL’, ‘ORT_ENABLE_LAYOUT’, ‘ORT_ENABLE_BASIC’, ‘ORT_DISABLE_ALL’.intra_op_num_threads (
int) – Number of threads used to parallelize the execution within nodes.inter_op_num_threads (
int) – Number of threads used to parallelize the execution of the graph (between nodes)log_severity_level (
int) – Logging severity level (0=VERBOSE, 1=INFO, 2=WARNING, 3=ERROR, 4=FATAL)enable_profiling (
bool) – Whether to enable profiling in ONNX Runtime.profiling_prefix (
strorNone) – Prefix for profiling output files. If None, a default path will be used.provider_options (
sequenceofdictorNone) – Additional options for each execution provider.