flashinfer_bench.data.Trace¶

pydantic model flashinfer_bench.data.Trace¶

Complete trace linking a solution to a definition with evaluation results.

A Trace represents the complete record of benchmarking a specific solution implementation against a specific computational workload definition. It includes the workload configuration and evaluation results.

Special case: A “workload trace” contains only definition and workload fields (with solution and evaluation set to None), representing a workload configuration without an actual benchmark execution.

_json"> class="highlight-json notranslate">

{ "title": "Trace", "description": "Complete trace linking a solution to a definition with evaluation results.\n\nA Trace represents the complete record of benchmarking a specific solution\nimplementation against a specific computational workload definition. It includes\nthe workload configuration and evaluation results.\n\nSpecial case: A \"workload trace\" contains only definition and workload fields\n(with solution and evaluation set to None), representing a workload configuration\nwithout an actual benchmark execution.", "type": "object", "properties": { "definition": { "description": "Name of the Definition that specifies the computational workload.", "minLength": 1, "title": "Definition", "type": "string" }, "workload": { "$ref": "#/$defs/Workload", "description": "Concrete workload configuration with specific axis values and inputs." }, "solution": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name of the Solution implementation (None for workload-only traces).", "title": "Solution" }, "evaluation": { "anyOf": [ { "$ref": "#/$defs/Evaluation" }, { "type": "null" } ], "default": null, "description": "Evaluation results from benchmarking (None for workload-only traces)." } }, "$defs": { "Correctness": { "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.", "properties": { "max_relative_error": { "default": 0.0, "description": "Maximum relative error observed across all output elements.", "title": "Max Relative Error", "type": "number" }, "max_absolute_error": { "default": 0.0, "description": "Maximum absolute error observed across all output elements.", "title": "Max Absolute Error", "type": "number" }, "extra": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Extra metrics for correctness evaluation.", "title": "Extra" } }, "title": "Correctness", "type": "object" }, "Environment": { "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.", "properties": { "hardware": { "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').", "minLength": 1, "title": "Hardware", "type": "string" }, "libs": { "additionalProperties": { "type": "string" }, "description": "Dictionary of library names to version strings used during evaluation.", "title": "Libs", "type": "object" } }, "required": [ "hardware" ], "title": "Environment", "type": "object" }, "Evaluation": { "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.", "properties": { "status": { "$ref": "#/$defs/EvaluationStatus", "description": "The overall evaluation status indicating success or failure mode." }, "environment": { "$ref": "#/$defs/Environment", "description": "Environment details where the evaluation was performed." }, "timestamp": { "description": "Timestamp when the evaluation was performed (ISO format recommended).", "minLength": 1, "title": "Timestamp", "type": "string" }, "log": { "default": "", "description": "Captured stdout/stderr from the evaluation run.", "title": "Log", "type": "string" }, "correctness": { "anyOf": [ { "$ref": "#/$defs/Correctness" }, { "type": "null" } ], "default": null, "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)." }, "performance": { "anyOf": [ { "$ref": "#/$defs/Performance" }, { "type": "null" } ], "default": null, "description": "Performance metrics (present only for PASSED status)." } }, "required": [ "status", "environment", "timestamp" ], "title": "Evaluation", "type": "object" }, "EvaluationStatus": { "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.", "enum": [ "PASSED", "INCORRECT_SHAPE", "INCORRECT_NUMERICAL", "INCORRECT_DTYPE", "RUNTIME_ERROR", "COMPILE_ERROR", "TIMEOUT" ], "title": "EvaluationStatus", "type": "string" }, "Performance": { "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.", "properties": { "latency_ms": { "default": 0.0, "description": "Solution execution latency in milliseconds.", "minimum": 0.0, "title": "Latency Ms", "type": "number" }, "reference_latency_ms": { "default": 0.0, "description": "Reference implementation latency in milliseconds for comparison.", "minimum": 0.0, "title": "Reference Latency Ms", "type": "number" }, "speedup_factor": { "default": 0.0, "description": "Performance speedup factor compared to reference (reference_time / solution_time).", "minimum": 0.0, "title": "Speedup Factor", "type": "number" } }, "title": "Performance", "type": "object" }, "RandomInput": { "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.", "properties": { "type": { "const": "random", "default": "random", "description": "The input type identifier for random data generation.", "title": "Type", "type": "string" } }, "title": "RandomInput", "type": "object" }, "SafetensorsInput": { "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.", "properties": { "type": { "const": "safetensors", "default": "safetensors", "description": "The input type identifier for safetensors data.", "title": "Type", "type": "string" }, "path": { "description": "Path to the safetensors file containing the tensor data. The path is relative to the root\npath of the TraceSet.", "minLength": 1, "title": "Path", "type": "string" }, "tensor_key": { "description": "Key identifier for the specific tensor within the safetensors file.", "minLength": 1, "title": "Tensor Key", "type": "string" } }, "required": [ "path", "tensor_key" ], "title": "SafetensorsInput", "type": "object" }, "ScalarInput": { "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.", "properties": { "type": { "const": "scalar", "default": "scalar", "description": "The input type identifier for scalar values.", "title": "Type", "type": "string" }, "value": { "anyOf": [ { "type": "integer" }, { "type": "number" }, { "type": "boolean" } ], "description": "The scalar value to be used as input. Must be int, float, or bool.", "title": "Value" } }, "required": [ "value" ], "title": "ScalarInput", "type": "object" }, "Workload": { "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.", "properties": { "axes": { "additionalProperties": { "minimum": 0, "type": "integer" }, "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.", "title": "Axes", "type": "object" }, "inputs": { "additionalProperties": { "anyOf": [ { "$ref": "#/$defs/RandomInput" }, { "$ref": "#/$defs/SafetensorsInput" }, { "$ref": "#/$defs/ScalarInput" } ] }, "description": "Dictionary mapping input names to their data specifications.", "title": "Inputs", "type": "object" }, "uuid": { "description": "Unique identifier for this specific workload configuration.", "minLength": 1, "title": "Uuid", "type": "string" } }, "required": [ "axes", "inputs", "uuid" ], "title": "Workload", "type": "object" } }, "required": [ "definition", "workload" ] }

Fields:

definition (str)
workload (flashinfer_bench.data.workload.Workload)
solution (str | None)
evaluation (flashinfer_bench.data.trace.Evaluation | None)

field definition: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶

Name of the Definition that specifies the computational workload.

Constraints:

min_length = 1

field workload: Workload [Required]¶: Concrete workload configuration with specific axis values and inputs.

field solution: str | None = None¶: Name of the Solution implementation (None for workload-only traces).

field evaluation: Evaluation | None = None¶: Evaluation results from benchmarking (None for workload-only traces).

is_workload_trace() → bool¶

Check if this is a workload-only trace.

Returns:: True if this is a workload trace without solution/evaluation data.
Return type:: bool

is_successful() → bool¶

Check if the benchmark execution was successful.

Returns:: True if this is a regular trace with successful evaluation status. False for workload traces or failed evaluations.
Return type:: bool

pydantic model flashinfer_bench.data.Correctness¶

Correctness metrics from numerical evaluation.

Contains error measurements comparing the solution output against a reference implementation to assess numerical accuracy.

Show JSON schema

{
   "title": "Correctness",
   "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
   "type": "object",
   "properties": {
      "max_relative_error": {
         "default": 0.0,
         "description": "Maximum relative error observed across all output elements.",
         "title": "Max Relative Error",
         "type": "number"
      },
      "max_absolute_error": {
         "default": 0.0,
         "description": "Maximum absolute error observed across all output elements.",
         "title": "Max Absolute Error",
         "type": "number"
      },
      "extra": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Extra metrics for correctness evaluation.",
         "title": "Extra"
      }
   }
}

Fields:

max_relative_error (float)
max_absolute_error (float)
extra (Dict[str, Any] | None)

field max_relative_error: float = 0.0¶: Maximum relative error observed across all output elements.

field max_absolute_error: float = 0.0¶: Maximum absolute error observed across all output elements.

field extra: Dict[str, Any] | None = None¶: Extra metrics for correctness evaluation.

pydantic model flashinfer_bench.data.Performance¶

Performance metrics from timing evaluation.

Contains timing measurements and performance comparisons from benchmarking the solution against reference implementations.

Show JSON schema

{
   "title": "Performance",
   "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
   "type": "object",
   "properties": {
      "latency_ms": {
         "default": 0.0,
         "description": "Solution execution latency in milliseconds.",
         "minimum": 0.0,
         "title": "Latency Ms",
         "type": "number"
      },
      "reference_latency_ms": {
         "default": 0.0,
         "description": "Reference implementation latency in milliseconds for comparison.",
         "minimum": 0.0,
         "title": "Reference Latency Ms",
         "type": "number"
      },
      "speedup_factor": {
         "default": 0.0,
         "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
         "minimum": 0.0,
         "title": "Speedup Factor",
         "type": "number"
      }
   }
}

Fields:

latency_ms (float)
reference_latency_ms (float)
speedup_factor (float)

field latency_ms: float = 0.0¶

Solution execution latency in milliseconds.

Constraints:

ge = 0.0

field reference_latency_ms: float = 0.0¶

Reference implementation latency in milliseconds for comparison.

Constraints:

ge = 0.0

field speedup_factor: float = 0.0¶

Performance speedup factor compared to reference (reference_time / solution_time).

Constraints:

ge = 0.0

pydantic model flashinfer_bench.data.Environment¶

Environment information from evaluation execution.

Records the hardware and software environment details from when the evaluation was performed, enabling reproducibility analysis.

Show JSON schema

{
   "title": "Environment",
   "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
   "type": "object",
   "properties": {
      "hardware": {
         "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
         "minLength": 1,
         "title": "Hardware",
         "type": "string"
      },
      "libs": {
         "additionalProperties": {
            "type": "string"
         },
         "description": "Dictionary of library names to version strings used during evaluation.",
         "title": "Libs",
         "type": "object"
      }
   },
   "required": [
      "hardware"
   ]
}

Fields:

hardware (str)
libs (Dict[str, str])

field hardware: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶

Hardware identifier where the evaluation was performed (e.g., ‘NVIDIA_H100’).

Constraints:

min_length = 1

field libs: Dict[str, str] [Optional]¶: Dictionary of library names to version strings used during evaluation.

class flashinfer_bench.data.EvaluationStatus¶

Status codes for evaluation results.

Enumeration of all possible outcomes when evaluating a solution against a workload, covering success and various failure modes.

PASSED = 'PASSED'¶: Evaluation completed successfully with correct results.

INCORRECT_SHAPE = 'INCORRECT_SHAPE'¶: Solution produced output with incorrect tensor shape.

INCORRECT_NUMERICAL = 'INCORRECT_NUMERICAL'¶: Solution produced numerically incorrect results.

INCORRECT_DTYPE = 'INCORRECT_DTYPE'¶: Solution produced output with incorrect data type.

RUNTIME_ERROR = 'RUNTIME_ERROR'¶: Solution encountered a runtime error during execution.

COMPILE_ERROR = 'COMPILE_ERROR'¶: Solution failed to compile or build successfully.

TIMEOUT = 'TIMEOUT'¶: Evaluation did not complete within the configured timeout.

__new__(value)¶

pydantic model flashinfer_bench.data.Evaluation¶

Complete evaluation result for a solution on a workload.

Records the full outcome of benchmarking a solution implementation against a specific workload, including status, metrics, and environment.

Show JSON schema

{
   "title": "Evaluation",
   "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.",
   "type": "object",
   "properties": {
      "status": {
         "$ref": "#/$defs/EvaluationStatus",
         "description": "The overall evaluation status indicating success or failure mode."
      },
      "environment": {
         "$ref": "#/$defs/Environment",
         "description": "Environment details where the evaluation was performed."
      },
      "timestamp": {
         "description": "Timestamp when the evaluation was performed (ISO format recommended).",
         "minLength": 1,
         "title": "Timestamp",
         "type": "string"
      },
      "log": {
         "default": "",
         "description": "Captured stdout/stderr from the evaluation run.",
         "title": "Log",
         "type": "string"
      },
      "correctness": {
         "anyOf": [
            {
               "$ref": "#/$defs/Correctness"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)."
      },
      "performance": {
         "anyOf": [
            {
               "$ref": "#/$defs/Performance"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Performance metrics (present only for PASSED status)."
      }
   },
   "$defs": {
      "Correctness": {
         "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
         "properties": {
            "max_relative_error": {
               "default": 0.0,
               "description": "Maximum relative error observed across all output elements.",
               "title": "Max Relative Error",
               "type": "number"
            },
            "max_absolute_error": {
               "default": 0.0,
               "description": "Maximum absolute error observed across all output elements.",
               "title": "Max Absolute Error",
               "type": "number"
            },
            "extra": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Extra metrics for correctness evaluation.",
               "title": "Extra"
            }
         },
         "title": "Correctness",
         "type": "object"
      },
      "Environment": {
         "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
         "properties": {
            "hardware": {
               "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
               "minLength": 1,
               "title": "Hardware",
               "type": "string"
            },
            "libs": {
               "additionalProperties": {
                  "type": "string"
               },
               "description": "Dictionary of library names to version strings used during evaluation.",
               "title": "Libs",
               "type": "object"
            }
         },
         "required": [
            "hardware"
         ],
         "title": "Environment",
         "type": "object"
      },
      "EvaluationStatus": {
         "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.",
         "enum": [
            "PASSED",
            "INCORRECT_SHAPE",
            "INCORRECT_NUMERICAL",
            "INCORRECT_DTYPE",
            "RUNTIME_ERROR",
            "COMPILE_ERROR",
            "TIMEOUT"
         ],
         "title": "EvaluationStatus",
         "type": "string"
      },
      "Performance": {
         "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
         "properties": {
            "latency_ms": {
               "default": 0.0,
               "description": "Solution execution latency in milliseconds.",
               "minimum": 0.0,
               "title": "Latency Ms",
               "type": "number"
            },
            "reference_latency_ms": {
               "default": 0.0,
               "description": "Reference implementation latency in milliseconds for comparison.",
               "minimum": 0.0,
               "title": "Reference Latency Ms",
               "type": "number"
            },
            "speedup_factor": {
               "default": 0.0,
               "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
               "minimum": 0.0,
               "title": "Speedup Factor",
               "type": "number"
            }
         },
         "title": "Performance",
         "type": "object"
      }
   },
   "required": [
      "status",
      "environment",
      "timestamp"
   ]
}

Fields:

status (flashinfer_bench.data.trace.EvaluationStatus)
environment (flashinfer_bench.data.trace.Environment)
timestamp (str)
log (str)
correctness (flashinfer_bench.data.trace.Correctness | None)
performance (flashinfer_bench.data.trace.Performance | None)

field status: EvaluationStatus [Required]¶: The overall evaluation status indicating success or failure mode.

field environment: Environment [Required]¶: Environment details where the evaluation was performed.

field timestamp: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]¶

Timestamp when the evaluation was performed (ISO format recommended).

Constraints:

min_length = 1

field log: str = ''¶: Captured stdout/stderr from the evaluation run.

field correctness: Correctness | None = None¶: Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status).

field performance: Performance | None = None¶: Performance metrics (present only for PASSED status).