Trace¶
- pydantic model flashinfer_bench.data.Trace¶
Complete trace linking a solution to a definition with evaluation results.
A Trace represents the complete record of benchmarking a specific solution implementation against a specific computational workload definition. It includes the workload configuration and evaluation results.
Special case: A “workload trace” contains only definition and workload fields (with solution and evaluation set to None), representing a workload configuration without an actual benchmark execution.
Show JSON schema
{ "title": "Trace", "description": "Complete trace linking a solution to a definition with evaluation results.\n\nA Trace represents the complete record of benchmarking a specific solution\nimplementation against a specific computational workload definition. It includes\nthe workload configuration and evaluation results.\n\nSpecial case: A \"workload trace\" contains only definition and workload fields\n(with solution and evaluation set to None), representing a workload configuration\nwithout an actual benchmark execution.", "type": "object", "properties": { "definition": { "description": "Name of the Definition that specifies the computational workload.", "minLength": 1, "title": "Definition", "type": "string" }, "workload": { "$ref": "#/$defs/Workload", "description": "Concrete workload configuration with specific axis values and inputs." }, "solution": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name of the Solution implementation (None for workload-only traces).", "title": "Solution" }, "evaluation": { "anyOf": [ { "$ref": "#/$defs/Evaluation" }, { "type": "null" } ], "default": null, "description": "Evaluation results from benchmarking (None for workload-only traces)." } }, "$defs": { "Correctness": { "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.", "properties": { "max_relative_error": { "default": 0.0, "description": "Maximum relative error observed across all output elements.", "title": "Max Relative Error", "type": "number" }, "max_absolute_error": { "default": 0.0, "description": "Maximum absolute error observed across all output elements.", "title": "Max Absolute Error", "type": "number" }, "extra": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Extra metrics for correctness evaluation.", "title": "Extra" } }, "title": "Correctness", "type": "object" }, "Environment": { "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.", "properties": { "hardware": { "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').", "minLength": 1, "title": "Hardware", "type": "string" }, "libs": { "additionalProperties": { "type": "string" }, "description": "Dictionary of library names to version strings used during evaluation.", "title": "Libs", "type": "object" } }, "required": [ "hardware" ], "title": "Environment", "type": "object" }, "Evaluation": { "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.", "properties": { "status": { "$ref": "#/$defs/EvaluationStatus", "description": "The overall evaluation status indicating success or failure mode." }, "environment": { "$ref": "#/$defs/Environment", "description": "Environment details where the evaluation was performed." }, "timestamp": { "description": "Timestamp when the evaluation was performed (ISO format recommended).", "minLength": 1, "title": "Timestamp", "type": "string" }, "log": { "default": "", "description": "Captured stdout/stderr from the evaluation run.", "title": "Log", "type": "string" }, "correctness": { "anyOf": [ { "$ref": "#/$defs/Correctness" }, { "type": "null" } ], "default": null, "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)." }, "performance": { "anyOf": [ { "$ref": "#/$defs/Performance" }, { "type": "null" } ], "default": null, "description": "Performance metrics (present only for PASSED status)." } }, "required": [ "status", "environment", "timestamp" ], "title": "Evaluation", "type": "object" }, "EvaluationStatus": { "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.", "enum": [ "PASSED", "INCORRECT_SHAPE", "INCORRECT_NUMERICAL", "INCORRECT_DTYPE", "RUNTIME_ERROR", "COMPILE_ERROR", "TIMEOUT" ], "title": "EvaluationStatus", "type": "string" }, "Performance": { "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.", "properties": { "latency_ms": { "default": 0.0, "description": "Solution execution latency in milliseconds.", "minimum": 0.0, "title": "Latency Ms", "type": "number" }, "reference_latency_ms": { "default": 0.0, "description": "Reference implementation latency in milliseconds for comparison.", "minimum": 0.0, "title": "Reference Latency Ms", "type": "number" }, "speedup_factor": { "default": 0.0, "description": "Performance speedup factor compared to reference (reference_time / solution_time).", "minimum": 0.0, "title": "Speedup Factor", "type": "number" } }, "title": "Performance", "type": "object" }, "RandomInput": { "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.", "properties": { "type": { "const": "random", "default": "random", "title": "Type", "type": "string" } }, "title": "RandomInput", "type": "object" }, "SafetensorsInput": { "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.", "properties": { "type": { "const": "safetensors", "default": "safetensors", "description": "The input type identifier for safetensors data.", "title": "Type", "type": "string" }, "path": { "description": "Path to the safetensors file containing the tensor data.", "minLength": 1, "title": "Path", "type": "string" }, "tensor_key": { "description": "Key identifier for the specific tensor within the safetensors file.", "minLength": 1, "title": "Tensor Key", "type": "string" } }, "required": [ "path", "tensor_key" ], "title": "SafetensorsInput", "type": "object" }, "ScalarInput": { "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.", "properties": { "type": { "const": "scalar", "default": "scalar", "description": "The input type identifier for scalar values.", "title": "Type", "type": "string" }, "value": { "anyOf": [ { "type": "integer" }, { "type": "number" }, { "type": "boolean" } ], "description": "The scalar value to be used as input. Must be int, float, or bool.", "title": "Value" } }, "required": [ "value" ], "title": "ScalarInput", "type": "object" }, "Workload": { "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.", "properties": { "axes": { "additionalProperties": { "minimum": 0, "type": "integer" }, "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.", "title": "Axes", "type": "object" }, "inputs": { "additionalProperties": { "anyOf": [ { "$ref": "#/$defs/RandomInput" }, { "$ref": "#/$defs/SafetensorsInput" }, { "$ref": "#/$defs/ScalarInput" } ] }, "description": "Dictionary mapping input names to their data specifications.", "title": "Inputs", "type": "object" }, "uuid": { "description": "Unique identifier for this specific workload configuration.", "minLength": 1, "title": "Uuid", "type": "string" } }, "required": [ "axes", "inputs", "uuid" ], "title": "Workload", "type": "object" } }, "required": [ "definition", "workload" ] }
- Fields:
definition (str)workload (flashinfer_bench.data.trace.Workload)solution (str | None)evaluation (flashinfer_bench.data.trace.Evaluation | None)
- field definition: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Name of the Definition that specifies the computational workload.
- Constraints:
min_length = 1
- field workload: Workload [Required]¶
Concrete workload configuration with specific axis values and inputs.
- field solution: str | None = None¶
Name of the Solution implementation (None for workload-only traces).
- field evaluation: Evaluation | None = None¶
Evaluation results from benchmarking (None for workload-only traces).
- is_workload_trace() bool¶
Check if this is a workload-only trace.
- Returns:
True if this is a workload trace without solution/evaluation data.
- Return type:
bool
- is_successful() bool¶
Check if the benchmark execution was successful.
- Returns:
True if this is a regular trace with successful evaluation status. False for workload traces or failed evaluations.
- Return type:
bool
- pydantic model flashinfer_bench.data.RandomInput¶
Random input generation descriptor.
Represents a specification for generating random tensor input data during workload execution and benchmarking.
Show JSON schema
{ "title": "RandomInput", "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.", "type": "object", "properties": { "type": { "const": "random", "default": "random", "title": "Type", "type": "string" } } }
- Fields:
type (Literal['random'])
- field type: Literal['random'] = 'random'¶
The input type identifier for random data generation.
- pydantic model flashinfer_bench.data.ScalarInput¶
Scalar literal input specification.
Represents a scalar value (integer, float, or boolean) that will be used as a direct input parameter to the computational workload.
Show JSON schema
{ "title": "ScalarInput", "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.", "type": "object", "properties": { "type": { "const": "scalar", "default": "scalar", "description": "The input type identifier for scalar values.", "title": "Type", "type": "string" }, "value": { "anyOf": [ { "type": "integer" }, { "type": "number" }, { "type": "boolean" } ], "description": "The scalar value to be used as input. Must be int, float, or bool.", "title": "Value" } }, "required": [ "value" ] }
- Fields:
type (Literal['scalar'])value (int | float | bool)
- field type: Literal['scalar'] = 'scalar'¶
The input type identifier for scalar values.
- field value: int | float | bool [Required]¶
The scalar value to be used as input. Must be int, float, or bool.
- pydantic model flashinfer_bench.data.SafetensorsInput¶
Input specification for data loaded from safetensors files.
Represents tensor data that will be loaded from a safetensors file using a specific tensor key within that file.
Show JSON schema
{ "title": "SafetensorsInput", "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.", "type": "object", "properties": { "type": { "const": "safetensors", "default": "safetensors", "description": "The input type identifier for safetensors data.", "title": "Type", "type": "string" }, "path": { "description": "Path to the safetensors file containing the tensor data.", "minLength": 1, "title": "Path", "type": "string" }, "tensor_key": { "description": "Key identifier for the specific tensor within the safetensors file.", "minLength": 1, "title": "Tensor Key", "type": "string" } }, "required": [ "path", "tensor_key" ] }
- Fields:
type (Literal['safetensors'])path (str)tensor_key (str)
- field type: Literal['safetensors'] = 'safetensors'¶
The input type identifier for safetensors data.
- field path: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Path to the safetensors file containing the tensor data.
- Constraints:
min_length = 1
- field tensor_key: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Key identifier for the specific tensor within the safetensors file.
- Constraints:
min_length = 1
- flashinfer_bench.data.InputSpec¶
Union type representing all possible input specification types. alias of
RandomInput|SafetensorsInput|ScalarInput
- pydantic model flashinfer_bench.data.Workload¶
Concrete workload configuration for benchmarking.
Defines a specific instance of a computational workload with concrete values for all variable axes and specifications for all input data. This represents an executable configuration that can be benchmarked.
Show JSON schema
{ "title": "Workload", "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.", "type": "object", "properties": { "axes": { "additionalProperties": { "minimum": 0, "type": "integer" }, "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.", "title": "Axes", "type": "object" }, "inputs": { "additionalProperties": { "anyOf": [ { "$ref": "#/$defs/RandomInput" }, { "$ref": "#/$defs/SafetensorsInput" }, { "$ref": "#/$defs/ScalarInput" } ] }, "description": "Dictionary mapping input names to their data specifications.", "title": "Inputs", "type": "object" }, "uuid": { "description": "Unique identifier for this specific workload configuration.", "minLength": 1, "title": "Uuid", "type": "string" } }, "$defs": { "RandomInput": { "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.", "properties": { "type": { "const": "random", "default": "random", "title": "Type", "type": "string" } }, "title": "RandomInput", "type": "object" }, "SafetensorsInput": { "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.", "properties": { "type": { "const": "safetensors", "default": "safetensors", "description": "The input type identifier for safetensors data.", "title": "Type", "type": "string" }, "path": { "description": "Path to the safetensors file containing the tensor data.", "minLength": 1, "title": "Path", "type": "string" }, "tensor_key": { "description": "Key identifier for the specific tensor within the safetensors file.", "minLength": 1, "title": "Tensor Key", "type": "string" } }, "required": [ "path", "tensor_key" ], "title": "SafetensorsInput", "type": "object" }, "ScalarInput": { "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.", "properties": { "type": { "const": "scalar", "default": "scalar", "description": "The input type identifier for scalar values.", "title": "Type", "type": "string" }, "value": { "anyOf": [ { "type": "integer" }, { "type": "number" }, { "type": "boolean" } ], "description": "The scalar value to be used as input. Must be int, float, or bool.", "title": "Value" } }, "required": [ "value" ], "title": "ScalarInput", "type": "object" } }, "required": [ "axes", "inputs", "uuid" ] }
- Fields:
axes (Dict[str, int])inputs (Dict[str, flashinfer_bench.data.trace.RandomInput | flashinfer_bench.data.trace.SafetensorsInput | flashinfer_bench.data.trace.ScalarInput])uuid (str)
- field axes: Dict[str, Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]] [Required]¶
Dictionary mapping axis names to their concrete integer values. All values must be positive.
Dictionary mapping axis names to their concrete integer values. All values must be positive.
- field inputs: Dict[str, RandomInput | SafetensorsInput | ScalarInput] [Required]¶
Dictionary mapping input names to their data specifications.
- field uuid: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Unique identifier for this specific workload configuration.
- Constraints:
min_length = 1
- pydantic model flashinfer_bench.data.Correctness¶
Correctness metrics from numerical evaluation.
Contains error measurements comparing the solution output against a reference implementation to assess numerical accuracy.
Show JSON schema
{ "title": "Correctness", "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.", "type": "object", "properties": { "max_relative_error": { "default": 0.0, "description": "Maximum relative error observed across all output elements.", "title": "Max Relative Error", "type": "number" }, "max_absolute_error": { "default": 0.0, "description": "Maximum absolute error observed across all output elements.", "title": "Max Absolute Error", "type": "number" }, "extra": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Extra metrics for correctness evaluation.", "title": "Extra" } } }
- Fields:
max_relative_error (float)max_absolute_error (float)extra (Dict[str, Any] | None)
- field max_relative_error: float = 0.0¶
Maximum relative error observed across all output elements.
- field max_absolute_error: float = 0.0¶
Maximum absolute error observed across all output elements.
- field extra: Dict[str, Any] | None = None¶
Extra metrics for correctness evaluation.
- pydantic model flashinfer_bench.data.Performance¶
Performance metrics from timing evaluation.
Contains timing measurements and performance comparisons from benchmarking the solution against reference implementations.
Show JSON schema
{ "title": "Performance", "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.", "type": "object", "properties": { "latency_ms": { "default": 0.0, "description": "Solution execution latency in milliseconds.", "minimum": 0.0, "title": "Latency Ms", "type": "number" }, "reference_latency_ms": { "default": 0.0, "description": "Reference implementation latency in milliseconds for comparison.", "minimum": 0.0, "title": "Reference Latency Ms", "type": "number" }, "speedup_factor": { "default": 0.0, "description": "Performance speedup factor compared to reference (reference_time / solution_time).", "minimum": 0.0, "title": "Speedup Factor", "type": "number" } } }
- Fields:
latency_ms (float)reference_latency_ms (float)speedup_factor (float)
- field latency_ms: float = 0.0¶
Solution execution latency in milliseconds.
- Constraints:
ge = 0.0
- field reference_latency_ms: float = 0.0¶
Reference implementation latency in milliseconds for comparison.
- Constraints:
ge = 0.0
- field speedup_factor: float = 0.0¶
Performance speedup factor compared to reference (reference_time / solution_time).
- Constraints:
ge = 0.0
- pydantic model flashinfer_bench.data.Environment¶
Environment information from evaluation execution.
Records the hardware and software environment details from when the evaluation was performed, enabling reproducibility analysis.
Show JSON schema
{ "title": "Environment", "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.", "type": "object", "properties": { "hardware": { "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').", "minLength": 1, "title": "Hardware", "type": "string" }, "libs": { "additionalProperties": { "type": "string" }, "description": "Dictionary of library names to version strings used during evaluation.", "title": "Libs", "type": "object" } }, "required": [ "hardware" ] }
- Fields:
hardware (str)libs (Dict[str, str])
- field hardware: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Hardware identifier where the evaluation was performed (e.g., ‘NVIDIA_H100’).
- Constraints:
min_length = 1
- field libs: Dict[str, str] [Optional]¶
Dictionary of library names to version strings used during evaluation.
- class flashinfer_bench.data.EvaluationStatus¶
Status codes for evaluation results.
Enumeration of all possible outcomes when evaluating a solution against a workload, covering success and various failure modes.
- PASSED = 'PASSED'¶
Evaluation completed successfully with correct results.
- INCORRECT_SHAPE = 'INCORRECT_SHAPE'¶
Solution produced output with incorrect tensor shape.
- __new__(value)¶
- INCORRECT_NUMERICAL = 'INCORRECT_NUMERICAL'¶
Solution produced numerically incorrect results.
- INCORRECT_DTYPE = 'INCORRECT_DTYPE'¶
Solution produced output with incorrect data type.
- RUNTIME_ERROR = 'RUNTIME_ERROR'¶
Solution encountered a runtime error during execution.
- COMPILE_ERROR = 'COMPILE_ERROR'¶
Solution failed to compile or build successfully.
- TIMEOUT = 'TIMEOUT'¶
Evaluation did not complete within the configured timeout.
- pydantic model flashinfer_bench.data.Evaluation¶
Complete evaluation result for a solution on a workload.
Records the full outcome of benchmarking a solution implementation against a specific workload, including status, metrics, and environment.
Show JSON schema
{ "title": "Evaluation", "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.", "type": "object", "properties": { "status": { "$ref": "#/$defs/EvaluationStatus", "description": "The overall evaluation status indicating success or failure mode." }, "environment": { "$ref": "#/$defs/Environment", "description": "Environment details where the evaluation was performed." }, "timestamp": { "description": "Timestamp when the evaluation was performed (ISO format recommended).", "minLength": 1, "title": "Timestamp", "type": "string" }, "log": { "default": "", "description": "Captured stdout/stderr from the evaluation run.", "title": "Log", "type": "string" }, "correctness": { "anyOf": [ { "$ref": "#/$defs/Correctness" }, { "type": "null" } ], "default": null, "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)." }, "performance": { "anyOf": [ { "$ref": "#/$defs/Performance" }, { "type": "null" } ], "default": null, "description": "Performance metrics (present only for PASSED status)." } }, "$defs": { "Correctness": { "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.", "properties": { "max_relative_error": { "default": 0.0, "description": "Maximum relative error observed across all output elements.", "title": "Max Relative Error", "type": "number" }, "max_absolute_error": { "default": 0.0, "description": "Maximum absolute error observed across all output elements.", "title": "Max Absolute Error", "type": "number" }, "extra": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Extra metrics for correctness evaluation.", "title": "Extra" } }, "title": "Correctness", "type": "object" }, "Environment": { "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.", "properties": { "hardware": { "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').", "minLength": 1, "title": "Hardware", "type": "string" }, "libs": { "additionalProperties": { "type": "string" }, "description": "Dictionary of library names to version strings used during evaluation.", "title": "Libs", "type": "object" } }, "required": [ "hardware" ], "title": "Environment", "type": "object" }, "EvaluationStatus": { "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.", "enum": [ "PASSED", "INCORRECT_SHAPE", "INCORRECT_NUMERICAL", "INCORRECT_DTYPE", "RUNTIME_ERROR", "COMPILE_ERROR", "TIMEOUT" ], "title": "EvaluationStatus", "type": "string" }, "Performance": { "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.", "properties": { "latency_ms": { "default": 0.0, "description": "Solution execution latency in milliseconds.", "minimum": 0.0, "title": "Latency Ms", "type": "number" }, "reference_latency_ms": { "default": 0.0, "description": "Reference implementation latency in milliseconds for comparison.", "minimum": 0.0, "title": "Reference Latency Ms", "type": "number" }, "speedup_factor": { "default": 0.0, "description": "Performance speedup factor compared to reference (reference_time / solution_time).", "minimum": 0.0, "title": "Speedup Factor", "type": "number" } }, "title": "Performance", "type": "object" } }, "required": [ "status", "environment", "timestamp" ] }
- Fields:
status (flashinfer_bench.data.trace.EvaluationStatus)environment (flashinfer_bench.data.trace.Environment)timestamp (str)log (str)correctness (flashinfer_bench.data.trace.Correctness | None)performance (flashinfer_bench.data.trace.Performance | None)
- field status: EvaluationStatus [Required]¶
The overall evaluation status indicating success or failure mode.
- field environment: Environment [Required]¶
Environment details where the evaluation was performed.
- field timestamp: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶
Timestamp when the evaluation was performed (ISO format recommended).
- Constraints:
min_length = 1
- field log: str = ''¶
Captured stdout/stderr from the evaluation run.
- field correctness: Correctness | None = None¶
Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status).
- field performance: Performance | None = None¶
Performance metrics (present only for PASSED status).