Trace

pydantic model flashinfer_bench.data.Trace

Complete trace linking a solution to a definition with evaluation results.

A Trace represents the complete record of benchmarking a specific solution implementation against a specific computational workload definition. It includes the workload configuration and evaluation results.

Special case: A “workload trace” contains only definition and workload fields (with solution and evaluation set to None), representing a workload configuration without an actual benchmark execution.

Show JSON schema
{
   "title": "Trace",
   "description": "Complete trace linking a solution to a definition with evaluation results.\n\nA Trace represents the complete record of benchmarking a specific solution\nimplementation against a specific computational workload definition. It includes\nthe workload configuration and evaluation results.\n\nSpecial case: A \"workload trace\" contains only definition and workload fields\n(with solution and evaluation set to None), representing a workload configuration\nwithout an actual benchmark execution.",
   "type": "object",
   "properties": {
      "definition": {
         "description": "Name of the Definition that specifies the computational workload.",
         "minLength": 1,
         "title": "Definition",
         "type": "string"
      },
      "workload": {
         "$ref": "#/$defs/Workload",
         "description": "Concrete workload configuration with specific axis values and inputs."
      },
      "solution": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Name of the Solution implementation (None for workload-only traces).",
         "title": "Solution"
      },
      "evaluation": {
         "anyOf": [
            {
               "$ref": "#/$defs/Evaluation"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Evaluation results from benchmarking (None for workload-only traces)."
      }
   },
   "$defs": {
      "Correctness": {
         "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
         "properties": {
            "max_relative_error": {
               "default": 0.0,
               "description": "Maximum relative error observed across all output elements.",
               "title": "Max Relative Error",
               "type": "number"
            },
            "max_absolute_error": {
               "default": 0.0,
               "description": "Maximum absolute error observed across all output elements.",
               "title": "Max Absolute Error",
               "type": "number"
            },
            "extra": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Extra metrics for correctness evaluation.",
               "title": "Extra"
            }
         },
         "title": "Correctness",
         "type": "object"
      },
      "Environment": {
         "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
         "properties": {
            "hardware": {
               "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
               "minLength": 1,
               "title": "Hardware",
               "type": "string"
            },
            "libs": {
               "additionalProperties": {
                  "type": "string"
               },
               "description": "Dictionary of library names to version strings used during evaluation.",
               "title": "Libs",
               "type": "object"
            }
         },
         "required": [
            "hardware"
         ],
         "title": "Environment",
         "type": "object"
      },
      "Evaluation": {
         "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.",
         "properties": {
            "status": {
               "$ref": "#/$defs/EvaluationStatus",
               "description": "The overall evaluation status indicating success or failure mode."
            },
            "environment": {
               "$ref": "#/$defs/Environment",
               "description": "Environment details where the evaluation was performed."
            },
            "timestamp": {
               "description": "Timestamp when the evaluation was performed (ISO format recommended).",
               "minLength": 1,
               "title": "Timestamp",
               "type": "string"
            },
            "log": {
               "default": "",
               "description": "Captured stdout/stderr from the evaluation run.",
               "title": "Log",
               "type": "string"
            },
            "correctness": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Correctness"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)."
            },
            "performance": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Performance"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Performance metrics (present only for PASSED status)."
            }
         },
         "required": [
            "status",
            "environment",
            "timestamp"
         ],
         "title": "Evaluation",
         "type": "object"
      },
      "EvaluationStatus": {
         "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.",
         "enum": [
            "PASSED",
            "INCORRECT_SHAPE",
            "INCORRECT_NUMERICAL",
            "INCORRECT_DTYPE",
            "RUNTIME_ERROR",
            "COMPILE_ERROR",
            "TIMEOUT"
         ],
         "title": "EvaluationStatus",
         "type": "string"
      },
      "Performance": {
         "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
         "properties": {
            "latency_ms": {
               "default": 0.0,
               "description": "Solution execution latency in milliseconds.",
               "minimum": 0.0,
               "title": "Latency Ms",
               "type": "number"
            },
            "reference_latency_ms": {
               "default": 0.0,
               "description": "Reference implementation latency in milliseconds for comparison.",
               "minimum": 0.0,
               "title": "Reference Latency Ms",
               "type": "number"
            },
            "speedup_factor": {
               "default": 0.0,
               "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
               "minimum": 0.0,
               "title": "Speedup Factor",
               "type": "number"
            }
         },
         "title": "Performance",
         "type": "object"
      },
      "RandomInput": {
         "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.",
         "properties": {
            "type": {
               "const": "random",
               "default": "random",
               "title": "Type",
               "type": "string"
            }
         },
         "title": "RandomInput",
         "type": "object"
      },
      "SafetensorsInput": {
         "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.",
         "properties": {
            "type": {
               "const": "safetensors",
               "default": "safetensors",
               "description": "The input type identifier for safetensors data.",
               "title": "Type",
               "type": "string"
            },
            "path": {
               "description": "Path to the safetensors file containing the tensor data.",
               "minLength": 1,
               "title": "Path",
               "type": "string"
            },
            "tensor_key": {
               "description": "Key identifier for the specific tensor within the safetensors file.",
               "minLength": 1,
               "title": "Tensor Key",
               "type": "string"
            }
         },
         "required": [
            "path",
            "tensor_key"
         ],
         "title": "SafetensorsInput",
         "type": "object"
      },
      "ScalarInput": {
         "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.",
         "properties": {
            "type": {
               "const": "scalar",
               "default": "scalar",
               "description": "The input type identifier for scalar values.",
               "title": "Type",
               "type": "string"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "number"
                  },
                  {
                     "type": "boolean"
                  }
               ],
               "description": "The scalar value to be used as input. Must be int, float, or bool.",
               "title": "Value"
            }
         },
         "required": [
            "value"
         ],
         "title": "ScalarInput",
         "type": "object"
      },
      "Workload": {
         "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.",
         "properties": {
            "axes": {
               "additionalProperties": {
                  "minimum": 0,
                  "type": "integer"
               },
               "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.",
               "title": "Axes",
               "type": "object"
            },
            "inputs": {
               "additionalProperties": {
                  "anyOf": [
                     {
                        "$ref": "#/$defs/RandomInput"
                     },
                     {
                        "$ref": "#/$defs/SafetensorsInput"
                     },
                     {
                        "$ref": "#/$defs/ScalarInput"
                     }
                  ]
               },
               "description": "Dictionary mapping input names to their data specifications.",
               "title": "Inputs",
               "type": "object"
            },
            "uuid": {
               "description": "Unique identifier for this specific workload configuration.",
               "minLength": 1,
               "title": "Uuid",
               "type": "string"
            }
         },
         "required": [
            "axes",
            "inputs",
            "uuid"
         ],
         "title": "Workload",
         "type": "object"
      }
   },
   "required": [
      "definition",
      "workload"
   ]
}

Fields:
  • definition (str)

  • workload (flashinfer_bench.data.trace.Workload)

  • solution (str | None)

  • evaluation (flashinfer_bench.data.trace.Evaluation | None)

field definition: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Name of the Definition that specifies the computational workload.

Constraints:
  • min_length = 1

field workload: Workload [Required]

Concrete workload configuration with specific axis values and inputs.

field solution: str | None = None

Name of the Solution implementation (None for workload-only traces).

field evaluation: Evaluation | None = None

Evaluation results from benchmarking (None for workload-only traces).

is_workload_trace() bool

Check if this is a workload-only trace.

Returns:

True if this is a workload trace without solution/evaluation data.

Return type:

bool

is_successful() bool

Check if the benchmark execution was successful.

Returns:

True if this is a regular trace with successful evaluation status. False for workload traces or failed evaluations.

Return type:

bool

pydantic model flashinfer_bench.data.RandomInput

Random input generation descriptor.

Represents a specification for generating random tensor input data during workload execution and benchmarking.

Show JSON schema
{
   "title": "RandomInput",
   "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.",
   "type": "object",
   "properties": {
      "type": {
         "const": "random",
         "default": "random",
         "title": "Type",
         "type": "string"
      }
   }
}

Fields:
  • type (Literal['random'])

field type: Literal['random'] = 'random'

The input type identifier for random data generation.

pydantic model flashinfer_bench.data.ScalarInput

Scalar literal input specification.

Represents a scalar value (integer, float, or boolean) that will be used as a direct input parameter to the computational workload.

Show JSON schema
{
   "title": "ScalarInput",
   "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.",
   "type": "object",
   "properties": {
      "type": {
         "const": "scalar",
         "default": "scalar",
         "description": "The input type identifier for scalar values.",
         "title": "Type",
         "type": "string"
      },
      "value": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "number"
            },
            {
               "type": "boolean"
            }
         ],
         "description": "The scalar value to be used as input. Must be int, float, or bool.",
         "title": "Value"
      }
   },
   "required": [
      "value"
   ]
}

Fields:
  • type (Literal['scalar'])

  • value (int | float | bool)

field type: Literal['scalar'] = 'scalar'

The input type identifier for scalar values.

field value: int | float | bool [Required]

The scalar value to be used as input. Must be int, float, or bool.

pydantic model flashinfer_bench.data.SafetensorsInput

Input specification for data loaded from safetensors files.

Represents tensor data that will be loaded from a safetensors file using a specific tensor key within that file.

Show JSON schema
{
   "title": "SafetensorsInput",
   "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.",
   "type": "object",
   "properties": {
      "type": {
         "const": "safetensors",
         "default": "safetensors",
         "description": "The input type identifier for safetensors data.",
         "title": "Type",
         "type": "string"
      },
      "path": {
         "description": "Path to the safetensors file containing the tensor data.",
         "minLength": 1,
         "title": "Path",
         "type": "string"
      },
      "tensor_key": {
         "description": "Key identifier for the specific tensor within the safetensors file.",
         "minLength": 1,
         "title": "Tensor Key",
         "type": "string"
      }
   },
   "required": [
      "path",
      "tensor_key"
   ]
}

Fields:
  • type (Literal['safetensors'])

  • path (str)

  • tensor_key (str)

field type: Literal['safetensors'] = 'safetensors'

The input type identifier for safetensors data.

field path: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Path to the safetensors file containing the tensor data.

Constraints:
  • min_length = 1

field tensor_key: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Key identifier for the specific tensor within the safetensors file.

Constraints:
  • min_length = 1

flashinfer_bench.data.InputSpec

Union type representing all possible input specification types. alias of RandomInput | SafetensorsInput | ScalarInput

pydantic model flashinfer_bench.data.Workload

Concrete workload configuration for benchmarking.

Defines a specific instance of a computational workload with concrete values for all variable axes and specifications for all input data. This represents an executable configuration that can be benchmarked.

Show JSON schema
{
   "title": "Workload",
   "description": "Concrete workload configuration for benchmarking.\n\nDefines a specific instance of a computational workload with concrete\nvalues for all variable axes and specifications for all input data.\nThis represents an executable configuration that can be benchmarked.",
   "type": "object",
   "properties": {
      "axes": {
         "additionalProperties": {
            "minimum": 0,
            "type": "integer"
         },
         "description": "Dictionary mapping axis names to their concrete integer values. All values must be\npositive.",
         "title": "Axes",
         "type": "object"
      },
      "inputs": {
         "additionalProperties": {
            "anyOf": [
               {
                  "$ref": "#/$defs/RandomInput"
               },
               {
                  "$ref": "#/$defs/SafetensorsInput"
               },
               {
                  "$ref": "#/$defs/ScalarInput"
               }
            ]
         },
         "description": "Dictionary mapping input names to their data specifications.",
         "title": "Inputs",
         "type": "object"
      },
      "uuid": {
         "description": "Unique identifier for this specific workload configuration.",
         "minLength": 1,
         "title": "Uuid",
         "type": "string"
      }
   },
   "$defs": {
      "RandomInput": {
         "description": "Random input generation descriptor.\n\nRepresents a specification for generating random tensor input data\nduring workload execution and benchmarking.",
         "properties": {
            "type": {
               "const": "random",
               "default": "random",
               "title": "Type",
               "type": "string"
            }
         },
         "title": "RandomInput",
         "type": "object"
      },
      "SafetensorsInput": {
         "description": "Input specification for data loaded from safetensors files.\n\nRepresents tensor data that will be loaded from a safetensors file\nusing a specific tensor key within that file.",
         "properties": {
            "type": {
               "const": "safetensors",
               "default": "safetensors",
               "description": "The input type identifier for safetensors data.",
               "title": "Type",
               "type": "string"
            },
            "path": {
               "description": "Path to the safetensors file containing the tensor data.",
               "minLength": 1,
               "title": "Path",
               "type": "string"
            },
            "tensor_key": {
               "description": "Key identifier for the specific tensor within the safetensors file.",
               "minLength": 1,
               "title": "Tensor Key",
               "type": "string"
            }
         },
         "required": [
            "path",
            "tensor_key"
         ],
         "title": "SafetensorsInput",
         "type": "object"
      },
      "ScalarInput": {
         "description": "Scalar literal input specification.\n\nRepresents a scalar value (integer, float, or boolean) that will be\nused as a direct input parameter to the computational workload.",
         "properties": {
            "type": {
               "const": "scalar",
               "default": "scalar",
               "description": "The input type identifier for scalar values.",
               "title": "Type",
               "type": "string"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "number"
                  },
                  {
                     "type": "boolean"
                  }
               ],
               "description": "The scalar value to be used as input. Must be int, float, or bool.",
               "title": "Value"
            }
         },
         "required": [
            "value"
         ],
         "title": "ScalarInput",
         "type": "object"
      }
   },
   "required": [
      "axes",
      "inputs",
      "uuid"
   ]
}

Fields:
  • axes (Dict[str, int])

  • inputs (Dict[str, flashinfer_bench.data.trace.RandomInput | flashinfer_bench.data.trace.SafetensorsInput | flashinfer_bench.data.trace.ScalarInput])

  • uuid (str)

field axes: Dict[str, Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]] [Required]

Dictionary mapping axis names to their concrete integer values. All values must be positive.

Dictionary mapping axis names to their concrete integer values. All values must be positive.

field inputs: Dict[str, RandomInput | SafetensorsInput | ScalarInput] [Required]

Dictionary mapping input names to their data specifications.

field uuid: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Unique identifier for this specific workload configuration.

Constraints:
  • min_length = 1

pydantic model flashinfer_bench.data.Correctness

Correctness metrics from numerical evaluation.

Contains error measurements comparing the solution output against a reference implementation to assess numerical accuracy.

Show JSON schema
{
   "title": "Correctness",
   "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
   "type": "object",
   "properties": {
      "max_relative_error": {
         "default": 0.0,
         "description": "Maximum relative error observed across all output elements.",
         "title": "Max Relative Error",
         "type": "number"
      },
      "max_absolute_error": {
         "default": 0.0,
         "description": "Maximum absolute error observed across all output elements.",
         "title": "Max Absolute Error",
         "type": "number"
      },
      "extra": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Extra metrics for correctness evaluation.",
         "title": "Extra"
      }
   }
}

Fields:
  • max_relative_error (float)

  • max_absolute_error (float)

  • extra (Dict[str, Any] | None)

field max_relative_error: float = 0.0

Maximum relative error observed across all output elements.

field max_absolute_error: float = 0.0

Maximum absolute error observed across all output elements.

field extra: Dict[str, Any] | None = None

Extra metrics for correctness evaluation.

pydantic model flashinfer_bench.data.Performance

Performance metrics from timing evaluation.

Contains timing measurements and performance comparisons from benchmarking the solution against reference implementations.

Show JSON schema
{
   "title": "Performance",
   "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
   "type": "object",
   "properties": {
      "latency_ms": {
         "default": 0.0,
         "description": "Solution execution latency in milliseconds.",
         "minimum": 0.0,
         "title": "Latency Ms",
         "type": "number"
      },
      "reference_latency_ms": {
         "default": 0.0,
         "description": "Reference implementation latency in milliseconds for comparison.",
         "minimum": 0.0,
         "title": "Reference Latency Ms",
         "type": "number"
      },
      "speedup_factor": {
         "default": 0.0,
         "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
         "minimum": 0.0,
         "title": "Speedup Factor",
         "type": "number"
      }
   }
}

Fields:
  • latency_ms (float)

  • reference_latency_ms (float)

  • speedup_factor (float)

field latency_ms: float = 0.0

Solution execution latency in milliseconds.

Constraints:
  • ge = 0.0

field reference_latency_ms: float = 0.0

Reference implementation latency in milliseconds for comparison.

Constraints:
  • ge = 0.0

field speedup_factor: float = 0.0

Performance speedup factor compared to reference (reference_time / solution_time).

Constraints:
  • ge = 0.0

pydantic model flashinfer_bench.data.Environment

Environment information from evaluation execution.

Records the hardware and software environment details from when the evaluation was performed, enabling reproducibility analysis.

Show JSON schema
{
   "title": "Environment",
   "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
   "type": "object",
   "properties": {
      "hardware": {
         "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
         "minLength": 1,
         "title": "Hardware",
         "type": "string"
      },
      "libs": {
         "additionalProperties": {
            "type": "string"
         },
         "description": "Dictionary of library names to version strings used during evaluation.",
         "title": "Libs",
         "type": "object"
      }
   },
   "required": [
      "hardware"
   ]
}

Fields:
  • hardware (str)

  • libs (Dict[str, str])

field hardware: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Hardware identifier where the evaluation was performed (e.g., ‘NVIDIA_H100’).

Constraints:
  • min_length = 1

field libs: Dict[str, str] [Optional]

Dictionary of library names to version strings used during evaluation.

class flashinfer_bench.data.EvaluationStatus

Status codes for evaluation results.

Enumeration of all possible outcomes when evaluating a solution against a workload, covering success and various failure modes.

PASSED = 'PASSED'

Evaluation completed successfully with correct results.

INCORRECT_SHAPE = 'INCORRECT_SHAPE'

Solution produced output with incorrect tensor shape.

__new__(value)
INCORRECT_NUMERICAL = 'INCORRECT_NUMERICAL'

Solution produced numerically incorrect results.

INCORRECT_DTYPE = 'INCORRECT_DTYPE'

Solution produced output with incorrect data type.

RUNTIME_ERROR = 'RUNTIME_ERROR'

Solution encountered a runtime error during execution.

COMPILE_ERROR = 'COMPILE_ERROR'

Solution failed to compile or build successfully.

TIMEOUT = 'TIMEOUT'

Evaluation did not complete within the configured timeout.

pydantic model flashinfer_bench.data.Evaluation

Complete evaluation result for a solution on a workload.

Records the full outcome of benchmarking a solution implementation against a specific workload, including status, metrics, and environment.

Show JSON schema
{
   "title": "Evaluation",
   "description": "Complete evaluation result for a solution on a workload.\n\nRecords the full outcome of benchmarking a solution implementation\nagainst a specific workload, including status, metrics, and environment.",
   "type": "object",
   "properties": {
      "status": {
         "$ref": "#/$defs/EvaluationStatus",
         "description": "The overall evaluation status indicating success or failure mode."
      },
      "environment": {
         "$ref": "#/$defs/Environment",
         "description": "Environment details where the evaluation was performed."
      },
      "timestamp": {
         "description": "Timestamp when the evaluation was performed (ISO format recommended).",
         "minLength": 1,
         "title": "Timestamp",
         "type": "string"
      },
      "log": {
         "default": "",
         "description": "Captured stdout/stderr from the evaluation run.",
         "title": "Log",
         "type": "string"
      },
      "correctness": {
         "anyOf": [
            {
               "$ref": "#/$defs/Correctness"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status)."
      },
      "performance": {
         "anyOf": [
            {
               "$ref": "#/$defs/Performance"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Performance metrics (present only for PASSED status)."
      }
   },
   "$defs": {
      "Correctness": {
         "description": "Correctness metrics from numerical evaluation.\n\nContains error measurements comparing the solution output against\na reference implementation to assess numerical accuracy.",
         "properties": {
            "max_relative_error": {
               "default": 0.0,
               "description": "Maximum relative error observed across all output elements.",
               "title": "Max Relative Error",
               "type": "number"
            },
            "max_absolute_error": {
               "default": 0.0,
               "description": "Maximum absolute error observed across all output elements.",
               "title": "Max Absolute Error",
               "type": "number"
            },
            "extra": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Extra metrics for correctness evaluation.",
               "title": "Extra"
            }
         },
         "title": "Correctness",
         "type": "object"
      },
      "Environment": {
         "description": "Environment information from evaluation execution.\n\nRecords the hardware and software environment details from when\nthe evaluation was performed, enabling reproducibility analysis.",
         "properties": {
            "hardware": {
               "description": "Hardware identifier where the evaluation was performed (e.g., 'NVIDIA_H100').",
               "minLength": 1,
               "title": "Hardware",
               "type": "string"
            },
            "libs": {
               "additionalProperties": {
                  "type": "string"
               },
               "description": "Dictionary of library names to version strings used during evaluation.",
               "title": "Libs",
               "type": "object"
            }
         },
         "required": [
            "hardware"
         ],
         "title": "Environment",
         "type": "object"
      },
      "EvaluationStatus": {
         "description": "Status codes for evaluation results.\n\nEnumeration of all possible outcomes when evaluating a solution\nagainst a workload, covering success and various failure modes.",
         "enum": [
            "PASSED",
            "INCORRECT_SHAPE",
            "INCORRECT_NUMERICAL",
            "INCORRECT_DTYPE",
            "RUNTIME_ERROR",
            "COMPILE_ERROR",
            "TIMEOUT"
         ],
         "title": "EvaluationStatus",
         "type": "string"
      },
      "Performance": {
         "description": "Performance metrics from timing evaluation.\n\nContains timing measurements and performance comparisons from\nbenchmarking the solution against reference implementations.",
         "properties": {
            "latency_ms": {
               "default": 0.0,
               "description": "Solution execution latency in milliseconds.",
               "minimum": 0.0,
               "title": "Latency Ms",
               "type": "number"
            },
            "reference_latency_ms": {
               "default": 0.0,
               "description": "Reference implementation latency in milliseconds for comparison.",
               "minimum": 0.0,
               "title": "Reference Latency Ms",
               "type": "number"
            },
            "speedup_factor": {
               "default": 0.0,
               "description": "Performance speedup factor compared to reference (reference_time / solution_time).",
               "minimum": 0.0,
               "title": "Speedup Factor",
               "type": "number"
            }
         },
         "title": "Performance",
         "type": "object"
      }
   },
   "required": [
      "status",
      "environment",
      "timestamp"
   ]
}

Fields:
  • status (flashinfer_bench.data.trace.EvaluationStatus)

  • environment (flashinfer_bench.data.trace.Environment)

  • timestamp (str)

  • log (str)

  • correctness (flashinfer_bench.data.trace.Correctness | None)

  • performance (flashinfer_bench.data.trace.Performance | None)

field status: EvaluationStatus [Required]

The overall evaluation status indicating success or failure mode.

field environment: Environment [Required]

Environment details where the evaluation was performed.

field timestamp: Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])] [Required]

Timestamp when the evaluation was performed (ISO format recommended).

Constraints:
  • min_length = 1

field log: str = ''

Captured stdout/stderr from the evaluation run.

field correctness: Correctness | None = None

Correctness metrics (present for PASSED and INCORRECT_NUMERICAL status).

field performance: Performance | None = None

Performance metrics (present only for PASSED status).