flashinfer_bench.data.Trace

pydantic model flashinfer_bench.data.Trace

Complete trace linking a solution to a definition with evaluation results.

A Trace represents the complete record of benchmarking a specific solution implementation against a specific computational workload definition. It includes the workload configuration and evaluation results.

Special case: A “workload trace” contains only definition and workload fields (with solution and evaluation set to None), representing a workload configuration without an actual benchmark execution.

Show JSON schema
{
   "title": "Trace",
   "description": "Complete trace linking a solution to a definition with evaluation results.\n\nA Trace represents the complete record of benchmarking a specific solution\nimplementation against a specific computational workload definition. It includes\nthe workload configuration and evaluation results.\n\nSpecial case: A \"workload trace\" contains only definition and workload fields\n(with solution and evaluation set to None), representing a workload configuration\nwithout an actual benchmark execution.",
   "type": "object",
   "properties": {
      "definition": {
         "description": "Name of the Definition that specifies the computational workload.",
         "minLength": 1,
         "title": "Definition",
         "type": "string"
      },
      "workload": {
         "$ref": "#/$defs/Workload",
         "description": "Concrete workload configuration with specific axis values and inputs."
      },
      "solution": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Name of the Solution implementation (None for workload-only traces).",
         "title": "Solution"
      },
      "evaluation": {
         "anyOf": [
            {
               "$ref": "#/$defs/Evaluation"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Evaluation results from benchmarking (None for workload-only traces)."
      }
   },
   "$defs": {
      "Correctness": {
         "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
         "properties": {
            "max_relative_error": {
               "default": 0.0,
               "description": "Maximum relative error observed across all output elements.",
               "title": "Max Relative Error",
               "type": "number"
            },
            "max_absolute_error": {
               "default": 0.0,
               "description": "Maximum absolute error observed across all output elements.",
               "title": "Max Absolute Error",
               "type": "number"
            },
            "extra": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Extra metrics for correctness evaluation.",
               "title": "Extra"
            }
         },
         "title": "Correctness",
         "type": "object"
      },
      "Environment": {
         "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
         "properties": {
            "hardware": {
               "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
               "minLength": 1,
               "title": "Hardware",
               "type": "string"
            },
            "libs": {
               "additionalProperties": {
                  "type": "string"
               },
               "description": "Dictionary of library names to version strings used during evaluation.",
               "title": "Libs",
               "type": "object"
            }
         },
         "required": [
            "hardware"
         ],
         "title": "Environment",
         "type": "object"
      },
      "Evaluation": {
         "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.",
         "properties": {
            "status": {
               "$ref": "#/$defs/EvaluationStatus",
               "description": "The overall evaluation status indicating success or failure mode."
            },
            "environment": {
               "$ref": "#/$defs/Environment",
               "description": "Environment details where the evaluation was performed."
            },
            "timestamp": {
               "description": "Timestamp when the evaluation was performed (ISO format recommended).",
               "minLength": 1,
               "title": "Timestamp",
               "type": "string"
            },
            "log": {
               "default": "",
               "description": "Captured stdout/stderr from the evaluation run.",
               "title": "Log",
               "type": "string"
            },
            "correctness": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Correctness"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)."
            },
            "performance": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Performance"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Performance metrics (present only for PASSED status)."
            }
         },
         "required": [
            "status",
            "environment",
            "timestamp"
         ],
         "title": "Evaluation",
         "type": "object"
      },
      "EvaluationStatus": {
         "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.",
         "enum": [
            "PASSED",
            "INCORRECT_SHAPE",
            "INCORRECT_NUMERICAL",
            "INCORRECT_DTYPE",
            "RUNTIME_ERROR",
            "COMPILE_ERROR",
            "TIMEOUT"
         ],
         "title": "EvaluationStatus",
         "type": "string"
      },
      "Performance": {
         "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
         "properties": {
            "latency_ms": {
               "default": 0.0,
               "description": "Solution execution latency in milliseconds.",
               "minimum": 0.0,
               "title": "Latency Ms",
               "type": "number"
            },
            "reference_latency_ms": {
               "default": 0.0,
               "description": "Reference implementation latency in milliseconds for comparison.",
               "minimum": 0.0,
               "title": "Reference Latency Ms",
               "type": "number"
            },
            "speedup_factor": {
               "default": 0.0,
               "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
               "minimum": 0.0,
               "title": "Speedup Factor",
               "type": "number"
            }
         },
         "title": "Performance",
         "type": "object"
      },
      "RandomInput": {
         "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.",
         "properties": {
            "type": {
               "const": "random",
               "default": "random",
               "description": "The input type identifier for random data generation.",
               "title": "Type",
               "type": "string"
            }
         },
         "title": "RandomInput",
         "type": "object"
      },
      "SafetensorsInput": {
         "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.",
         "properties": {
            "type": {
               "const": "safetensors",
               "default": "safetensors",
               "description": "The input type identifier for safetensors data.",
               "title": "Type",
               "type": "string"
            },
            "path": {
               "description": "Path to the safetensors file containing the tensor data. The path is relative to the root\npath of the TraceSet.",
               "minLength": 1,
               "title": "Path",
               "type": "string"
            },
            "tensor_key": {
               "description": "Key identifier for the specific tensor within the safetensors file.",
               "minLength": 1,
               "title": "Tensor Key",
               "type": "string"
            }
         },
         "required": [
            "path",
            "tensor_key"
         ],
         "title": "SafetensorsInput",
         "type": "object"
      },
      "ScalarInput": {
         "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.",
         "properties": {
            "type": {
               "const": "scalar",
               "default": "scalar",
               "description": "The input type identifier for scalar values.",
               "title": "Type",
               "type": "string"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "number"
                  },
                  {
                     "type": "boolean"
                  }
               ],
               "description": "The scalar value to be used as input. Must be int, float, or bool.",
               "title": "Value"
            }
         },
         "required": [
            "value"
         ],
         "title": "ScalarInput",
         "type": "object"
      },
      "Workload": {
         "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.",
         "properties": {
            "axes": {
               "additionalProperties": {
                  "minimum": 0,
                  "type": "integer"
               },
               "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.",
               "title": "Axes",
               "type": "object"
            },
            "inputs": {
               "additionalProperties": {
                  "anyOf": [
                     {
                        "$ref": "#/$defs/RandomInput"
                     },
                     {
                        "$ref": "#/$defs/SafetensorsInput"
                     },
                     {
                        "$ref": "#/$defs/ScalarInput"
                     }
                  ]
               },
               "description": "Dictionary mapping input names to their data specifications.",
               "title": "Inputs",
               "type": "object"
            },
            "uuid": {
               "description": "Unique identifier for this specific workload configuration.",
               "minLength": 1,
               "title": "Uuid",
               "type": "string"
            }
         },
         "required": [
            "axes",
            "inputs",
            "uuid"
         ],
         "title": "Workload",
         "type": "object"
      }
   },
   "required": [
      "definition",
      "workload"
   ]
}

Fields:
  • definition (str)

  • workload (flashinfer_bench.data.workload.Workload)

  • solution (str | None)

  • evaluation (flashinfer_bench.data.trace.Evaluation | None)

field definition: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Name of the Definition that specifies the computational workload.

Constraints:
  • min_length = 1

field workload: Workload [Required]

Concrete workload configuration with specific axis values and inputs.

field solution: str | None = None

Name of the Solution implementation (None for workload-only traces).

field evaluation: Evaluation | None = None

Evaluation results from benchmarking (None for workload-only traces).

is_workload_trace() bool

Check if this is a workload-only trace.

Returns:

True if this is a workload trace without solution/evaluation data.

Return type:

bool

is_successful() bool

Check if the benchmark execution was successful.

Returns:

True if this is a regular trace with successful evaluation status. False for workload traces or failed evaluations.

Return type:

bool

pydantic model flashinfer_bench.data.Correctness

Correctness metrics from numerical evaluation.

Contains error measurements comparing the solution output against a reference implementation to assess numerical accuracy.

Show JSON schema
{
   "title": "Correctness",
   "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
   "type": "object",
   "properties": {
      "max_relative_error": {
         "default": 0.0,
         "description": "Maximum relative error observed across all output elements.",
         "title": "Max Relative Error",
         "type": "number"
      },
      "max_absolute_error": {
         "default": 0.0,
         "description": "Maximum absolute error observed across all output elements.",
         "title": "Max Absolute Error",
         "type": "number"
      },
      "extra": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Extra metrics for correctness evaluation.",
         "title": "Extra"
      }
   }
}

Fields:
  • max_relative_error (float)

  • max_absolute_error (float)

  • extra (Dict[str, Any] | None)

field max_relative_error: float = 0.0

Maximum relative error observed across all output elements.

field max_absolute_error: float = 0.0

Maximum absolute error observed across all output elements.

field extra: Dict[str, Any] | None = None

Extra metrics for correctness evaluation.

pydantic model flashinfer_bench.data.Performance

Performance metrics from timing evaluation.

Contains timing measurements and performance comparisons from benchmarking the solution against reference implementations.

Show JSON schema
{
   "title": "Performance",
   "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
   "type": "object",
   "properties": {
      "latency_ms": {
         "default": 0.0,
         "description": "Solution execution latency in milliseconds.",
         "minimum": 0.0,
         "title": "Latency Ms",
         "type": "number"
      },
      "reference_latency_ms": {
         "default": 0.0,
         "description": "Reference implementation latency in milliseconds for comparison.",
         "minimum": 0.0,
         "title": "Reference Latency Ms",
         "type": "number"
      },
      "speedup_factor": {
         "default": 0.0,
         "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
         "minimum": 0.0,
         "title": "Speedup Factor",
         "type": "number"
      }
   }
}

Fields:
  • latency_ms (float)

  • reference_latency_ms (float)

  • speedup_factor (float)

field latency_ms: float = 0.0

Solution execution latency in milliseconds.

Constraints:
  • ge = 0.0

field reference_latency_ms: float = 0.0

Reference implementation latency in milliseconds for comparison.

Constraints:
  • ge = 0.0

field speedup_factor: float = 0.0

Performance speedup factor compared to reference (reference_time / solution_time).

Constraints:
  • ge = 0.0

pydantic model flashinfer_bench.data.Environment

Environment information from evaluation execution.

Records the hardware and software environment details from when the evaluation was performed, enabling reproducibility analysis.

Show JSON schema
{
   "title": "Environment",
   "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
   "type": "object",
   "properties": {
      "hardware": {
         "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
         "minLength": 1,
         "title": "Hardware",
         "type": "string"
      },
      "libs": {
         "additionalProperties": {
            "type": "string"
         },
         "description": "Dictionary of library names to version strings used during evaluation.",
         "title": "Libs",
         "type": "object"
      }
   },
   "required": [
      "hardware"
   ]
}

Fields:
  • hardware (str)

  • libs (Dict[str, str])

field hardware: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Hardware identifier where the evaluation was performed (e.g., ‘NVIDIA_H100’).

Constraints:
  • min_length = 1

field libs: Dict[str, str] [Optional]

Dictionary of library names to version strings used during evaluation.

class flashinfer_bench.data.EvaluationStatus

Status codes for evaluation results.

Enumeration of all possible outcomes when evaluating a solution against a workload, covering success and various failure modes.

PASSED = 'PASSED'

Evaluation completed successfully with correct results.

INCORRECT_SHAPE = 'INCORRECT_SHAPE'

Solution produced output with incorrect tensor shape.

INCORRECT_NUMERICAL = 'INCORRECT_NUMERICAL'

Solution produced numerically incorrect results.

INCORRECT_DTYPE = 'INCORRECT_DTYPE'

Solution produced output with incorrect data type.

RUNTIME_ERROR = 'RUNTIME_ERROR'

Solution encountered a runtime error during execution.

COMPILE_ERROR = 'COMPILE_ERROR'

Solution failed to compile or build successfully.

TIMEOUT = 'TIMEOUT'

Evaluation did not complete within the configured timeout.

__new__(value)
pydantic model flashinfer_bench.data.Evaluation

Complete evaluation result for a solution on a workload.

Records the full outcome of benchmarking a solution implementation against a specific workload, including status, metrics, and environment.

Show JSON schema
{
   "title": "Evaluation",
   "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.",
   "type": "object",
   "properties": {
      "status": {
         "$ref": "#/$defs/EvaluationStatus",
         "description": "The overall evaluation status indicating success or failure mode."
      },
      "environment": {
         "$ref": "#/$defs/Environment",
         "description": "Environment details where the evaluation was performed."
      },
      "timestamp": {
         "description": "Timestamp when the evaluation was performed (ISO format recommended).",
         "minLength": 1,
         "title": "Timestamp",
         "type": "string"
      },
      "log": {
         "default": "",
         "description": "Captured stdout/stderr from the evaluation run.",
         "title": "Log",
         "type": "string"
      },
      "correctness": {
         "anyOf": [
            {
               "$ref": "#/$defs/Correctness"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)."
      },
      "performance": {
         "anyOf": [
            {
               "$ref": "#/$defs/Performance"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Performance metrics (present only for PASSED status)."
      }
   },
   "$defs": {
      "Correctness": {
         "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
         "properties": {
            "max_relative_error": {
               "default": 0.0,
               "description": "Maximum relative error observed across all output elements.",
               "title": "Max Relative Error",
               "type": "number"
            },
            "max_absolute_error": {
               "default": 0.0,
               "description": "Maximum absolute error observed across all output elements.",
               "title": "Max Absolute Error",
               "type": "number"
            },
            "extra": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Extra metrics for correctness evaluation.",
               "title": "Extra"
            }
         },
         "title": "Correctness",
         "type": "object"
      },
      "Environment": {
         "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
         "properties": {
            "hardware": {
               "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
               "minLength": 1,
               "title": "Hardware",
               "type": "string"
            },
            "libs": {
               "additionalProperties": {
                  "type": "string"
               },
               "description": "Dictionary of library names to version strings used during evaluation.",
               "title": "Libs",
               "type": "object"
            }
         },
         "required": [
            "hardware"
         ],
         "title": "Environment",
         "type": "object"
      },
      "EvaluationStatus": {
         "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.",
         "enum": [
            "PASSED",
            "INCORRECT_SHAPE",
            "INCORRECT_NUMERICAL",
            "INCORRECT_DTYPE",
            "RUNTIME_ERROR",
            "COMPILE_ERROR",
            "TIMEOUT"
         ],
         "title": "EvaluationStatus",
         "type": "string"
      },
      "Performance": {
         "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
         "properties": {
            "latency_ms": {
               "default": 0.0,
               "description": "Solution execution latency in milliseconds.",
               "minimum": 0.0,
               "title": "Latency Ms",
               "type": "number"
            },
            "reference_latency_ms": {
               "default": 0.0,
               "description": "Reference implementation latency in milliseconds for comparison.",
               "minimum": 0.0,
               "title": "Reference Latency Ms",
               "type": "number"
            },
            "speedup_factor": {
               "default": 0.0,
               "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
               "minimum": 0.0,
               "title": "Speedup Factor",
               "type": "number"
            }
         },
         "title": "Performance",
         "type": "object"
      }
   },
   "required": [
      "status",
      "environment",
      "timestamp"
   ]
}

Fields:
  • status (flashinfer_bench.data.trace.EvaluationStatus)

  • environment (flashinfer_bench.data.trace.Environment)

  • timestamp (str)

  • log (str)

  • correctness (flashinfer_bench.data.trace.Correctness | None)

  • performance (flashinfer_bench.data.trace.Performance | None)

field status: EvaluationStatus [Required]

The overall evaluation status indicating success or failure mode.

field environment: Environment [Required]

Environment details where the evaluation was performed.

field timestamp: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Timestamp when the evaluation was performed (ISO format recommended).

Constraints:
  • min_length = 1

field log: str = ''

Captured stdout/stderr from the evaluation run.

field correctness: Correctness | None = None

Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status).

field performance: Performance | None = None

Performance metrics (present only for PASSED status).