flashinfer_bench.compile¶

flashinfer_bench.compile provides infrastructure for building solutions into executable runnables.

The typical workflow is:

Get the singleton registry: registry = BuilderRegistry.get_instance()
Build a solution: runnable = registry.build(definition, solution)
Execute: result = runnable(**inputs)

Registry¶

class flashinfer_bench.compile.BuilderRegistry¶

Central registry for managing and dispatching builders.

The BuilderRegistry maintains a list of available builders and automatically selects the appropriate one for each solution. It also provides caching to avoid redundant builds of the same solution.

This class follows the singleton pattern - use get_instance() to obtain the shared registry instance.

__init__(builders: List[Builder]) → None¶

Initialize the registry with a list of builders.

Parameters:: builders (List[Builder]) – List of builder instances to use. Must contain at least one builder.
Raises:: ValueError – If the builders list is empty.
Return type:: None

classmethod get_instance() → BuilderRegistry¶

Get the singleton registry instance.

On first call, this method initializes the registry by instantiating all available builders (those whose is_available() returns True) in priority order. Subsequent calls return the same instance.

The following builders are available (high to low priority):

TritonBuilder: Build Triton solutions.
TileLangBuilder: Build TileLang solutions.
PythonBuilder: Build Python solutions.
TVMFFIBuilder: Build CUDA/C++ solutions using TVM-FFI backend.
TorchBuilder: Build CUDA/C++ solutions using PyTorch extension system.

Returns:: The shared registry instance.
Return type:: BuilderRegistry

build(definition: Definition, solution: Solution) → Runnable¶

Build a solution into a runnable, using cache if available.

This method first checks if the solution has already been built (by comparing its hash). If not, it tries each registered builder in priority order until one reports it can build the solution. The resulting runnable is cached for future use.

This method is process-safe: concurrent builds of the same solution are serialized using file locks.

Parameters:

definition (Definition) – The problem definition specifying the expected interface.
solution (Solution) – The solution to build.

Returns:

An executable wrapper around the built solution.

Return type:

Runnable

Raises:

BuildError – If no registered builder can build this solution, or if the build fails.

build_reference(definition: Definition) → Runnable¶

Build the reference implementation for a definition.

This is a convenience method that creates a pseudo-solution from the definition’s reference code and builds it using the standard build() method.

Parameters:: definition (Definition) – The definition containing the reference implementation.
Returns:: An executable wrapper around the reference implementation.
Return type:: Runnable
Raises:: BuildError – If the reference implementation cannot be built.

cleanup() → None¶

Cleanup the cache and cleanup all built runnables.

This method calls cleanup() on all cached runnables to release resources, then clears the cache. Cleanup errors are caught and ignored to ensure all runnables are cleaned up.

Return type:: None

Builder¶

class flashinfer_bench.compile.Builder¶

Abstract base class for building solutions into runnable implementations.

A Builder transforms a (Definition, Solution) pair into a Runnable object, which is an executable implementation of the solution. Different builders handle different programming languages (e.g., Python, CUDA, Triton) and build systems.

Subclasses must implement all its abstract methods. Expectedly, the concrete builder should operate in the folder FIB_CACHE_PATH / builder_specific_subfolder / package_name, where package_name is a unique name created from the solution.

__init__(package_prefix: str, build_dir_name: str) → None¶

Initialize the builder.

Parameters:

package_prefix (str) – The prefix to prepend to the package name. This should be unique for each builder type.
build_dir_name (str) – The name of the build subdirectory of the concrete builder. This should be unique for each builder type.

Return type:

None

abstract static is_available() → bool¶

Check if this builder is available in the current environment.

Override this method in subclasses to check for specific dependencies (e.g., CUDA, Triton, TVM). The default implementation returns True.

Returns:: True if the builder can be used, False otherwise.
Return type:: bool

abstract can_build(solution: Solution) → bool¶

Check if this builder can handle the given solution.

Parameters:: solution (Solution) – The solution to check.
Returns:: True if this builder can build the solution, False otherwise.
Return type:: bool

abstract build(definition: Definition, solution: Solution) → Runnable¶

Build a solution into a runnable implementation.

This method compiles/loads the solution’s source code and returns a Runnable object that can be executed with the interface specified by the definition.

Parameters:

definition (Definition) – The problem definition that specifies the expected interface.
solution (Solution) – The solution implementation to build.

Returns:

An executable wrapper around the built implementation.

Return type:

Runnable

Raises:

BuildError – If the build fails for any reason (compilation errors, missing dependencies, etc.).

exception flashinfer_bench.compile.BuildError¶

Raised when a builder fails to construct a runnable implementation.

__new__(**kwargs)¶

__init__(*args, **kwargs)¶

Runnable¶

class flashinfer_bench.compile.Runnable¶

An executable wrapper around a compiled solution.

A Runnable encapsulates a callable function along with metadata about how it was built and a cleanup function to release resources. It provides a uniform interface for executing solutions regardless of the build system or language used.

__init__(callable: Callable[[...], Any], metadata: RunnableMetadata, cleaner: Callable[[], None] | None = None) → None¶

Constructor for the Runnable class.

Parameters:

callable (Callable[..., Any]) – The callable that is wrapped by the runnable.
metadata (RunnableMetadata) – The metadata for the runnable.
cleaner (Optional[Callable[[], None]]) – The cleaner function for the runnable. It will clean up the build artifacts/resources.

Return type:

None

metadata: RunnableMetadata¶: Metadata about the build process and source solution.

call_kwargs(**kwargs: Any) → Any¶

Call the runnable with keyword arguments.

This method calls the underlying compiled function with the provided inputs. If the function returns a single-element tuple, it is automatically unpacked to a scalar value for convenience.

Parameters:: kwargs (Any)
Return type:: Any

call_destination_passing(*args: Any) → None¶

Call the callable in destination-passing style (DPS). If the callable is already in DPS style, this method calls it directly. If the callable is in value-returning style, this method converts it to DPS style and calls it.

Parameters:: args (Any) – Positional arguments for the underlying function. Includes input tensors and output tensors.
Return type:: None

call_value_returning(*args: Any) → Any¶

Call a destination-passing style (DPS) function in value-returning style.

Some solutions use the destination-passing style, where output tensors are passed as arguments and the function modifies them in-place:

function(**input_tensors, **output_tensors) -> None

This method provides a value-returning interface by automatically allocating output tensors based on the definition, calling the DPS function, and returning the outputs:

result = runnable.call_dps(**input_tensors)  # -> output_tensors

Parameters:

kwargs (Any) – Keyword arguments for input tensors matching the definition’s input specification.
args (Any)

Returns:

The output tensor(s). Single outputs are returned as-is, multiple outputs are returned as a tuple, and empty outputs return None.

Return type:

Any

Raises:

ValueError – If the metadata does not contain the full definition object needed for output tensor allocation.

cleanup() → None¶

Clean up build artifacts and release resources.

This method calls the cleaner function if one was provided during construction. It is idempotent: calling it multiple times is safe and has no additional effect after the first call.

Return type:: None

class flashinfer_bench.compile.RunnableMetadata¶

Metadata about a runnable implementation.

This class stores information about how a runnable was built, including the builder type, source definition/solution, and additional builder-specific data.

field build_type: Literal['torch', 'tvm_ffi', 'python', 'triton'] | str [Required]¶: The type of build that produced this runnable (e.g., ‘python’, ‘torch’, ‘triton’, ‘tvm_ffi’).

field definition_name: str [Required]¶: Name of the definition that specifies the expected interface.

field solution_name: str [Required]¶: Name of the solution that was compiled into this runnable.

field destination_passing_style: bool = True¶: Whether the runnable uses destination-passing style.

field definition: Definition | None = None¶: The full definition that was used to build the runnable. It’s not necessary to be set, but required when calling in keyword passing style and value-returning style.

field misc: Dict[str, Any] [Optional]¶: Miscellaneous metadata about the runnable. Contents vary by builder type.

Concrete Builders¶

class flashinfer_bench.compile.builders.PythonBuilder¶

Bases: Builder

Builder for Python solutions.

This builder loads Python source files into a temporary module and returns a callable that can be executed. The sources are written to a cache directory and imported as a Python package.

__init__() → None¶

Initialize the builder.

Parameters:

package_prefix (str) – The prefix to prepend to the package name. This should be unique for each builder type.
build_dir_name (str) – The name of the build subdirectory of the concrete builder. This should be unique for each builder type.

Return type:

None

static is_available() → bool¶

Check if Python is available in the current environment.

Return type:: bool

can_build(solution: Solution) → bool¶

Check if this builder can handle the given solution.

Parameters:: solution (Solution)
Return type:: bool

build(definition: Definition, solution: Solution) → Runnable¶

Build a Python solution into a runnable.

This method writes the solution sources to a temporary directory, imports the module, and extracts the entry point function.

Parameters:

definition (Definition) – The problem definition.
solution (Solution) – The Python solution to build.

Returns:

An executable wrapper around the Python function.

Return type:

Runnable

Raises:

BuildError – If the entry file is not a Python file, the module import fails, or the entry symbol is not found or not callable.

class flashinfer_bench.compile.builders.TritonBuilder¶

Bases: PythonBuilder

Builder for Triton solutions.

This builder extends PythonBuilder to handle Triton GPU kernels. Triton code is Python-based, so the build process is similar to PythonBuilder, with the main difference being the language tag in metadata.

__init__() → None¶

Initialize the builder.

Parameters:

package_prefix (str) – The prefix to prepend to the package name. This should be unique for each builder type.
build_dir_name (str) – The name of the build subdirectory of the concrete builder. This should be unique for each builder type.

Return type:

None

static is_available() → bool¶

Check if Triton is available in the current environment.

Returns:: True if Triton is installed, False otherwise.
Return type:: bool

can_build(solution: Solution) → bool¶

Check if this builder can build the given solution. The solution should be Triton source code.

Parameters:: solution (Solution) – Solution to check
Returns:: True if solution language is Triton
Return type:: bool

build(definition: Definition, solution: Solution) → Runnable¶

Build a Triton solution into a runnable.

This method delegates to PythonBuilder.build() and updates the build_type in metadata to ‘triton’.

Parameters:

definition (Definition) – The problem definition.
solution (Solution) – The Triton solution to build.

Returns:

An executable wrapper around the Triton kernel.

Return type:

Runnable

class flashinfer_bench.compile.builders.TVMFFIBuilder¶

Bases: Builder

Builder using TVM-FFI with automatic caching and supports multi-process and multi-threaded compilation. The result is framework agnostic and supports DLPack interop with PyTorch, JAX, etc.

Cache logic: If the builder is asked to build the same solution again, it will return the cached result. If another builder is asking to build the same solution, as long as the build directory exists, it will return the cached result.

The solution to compile should be written in destination-passing style, i.e. the function should take the input tensors and the output tensors as arguments.

Examples

>>> builder = TVMFFIBuilder()
>>> runnable = builder.build(definition, solution)
>>> output = runnable(x=input_tensor)  # Allocates and returns output
>>> runnable.call_dest(x=input_tensor, output=output_tensor)  # Destination-passing style

__init__() → None¶

Initialize the TVMFFIBuilder.

Return type:: None

static is_available() → bool¶

Check if TVM-FFI is available in the current environment.

Return type:: bool

can_build(solution: Solution) → bool¶

Check if this builder can build the given solution. The solution should be CUDA or C++ source code with TVM-FFI binding (or no binding specified, which defaults to TVM-FFI).

Parameters:: solution (Solution) – Solution to check
Returns:: True if solution language is CUDA or C++ and binding is TVM-FFI or None
Return type:: bool

build(definition: Definition, solution: Solution) → Runnable¶

Build with automatic caching - compile once, load from cache afterwards.

This method implements intelligent caching: 1. Checks if a compiled .so file already exists 2. If not, writes source files and compiles them 3. Loads the compiled module (from cache or fresh build) 4. Returns a runnable wrapper

The caching is multi-process safe, enabling efficient parallel benchmarking.

Parameters:

definition (Definition) – Problem definition specifying inputs/outputs
solution (Solution) – Solution containing source code and build specification

Returns:

A runnable wrapper around the compiled TVM-FFI module that supports both value-returning style (via __call__) and destination-passing style (via call_dps)

Return type:

Runnable

Raises:

BuildError – If compilation fails, module loading fails, or entry point is invalid

class flashinfer_bench.compile.builders.TorchBuilder¶

Bases: Builder

Builder for CUDA solutions using PyTorch’s C++/CUDA extension loader.

This builder compiles C++/CUDA source files into a Python extension module using torch.utils.cpp_extension.load(). It supports common CUDA dependencies like cuBLAS, cuDNN, and CUTLASS.

__init__() → None¶

Initialize the TorchBuilder and discover available CUDA dependencies.

Return type:: None

static is_available() → bool¶

Check if CUDA is available in the current environment.

Returns:: True if PyTorch is installed and CUDA is available, False otherwise.
Return type:: bool

can_build(solution: Solution) → bool¶

Check if this builder can handle the given solution. The solution should be CUDA or C++ source code with torch binding.

Parameters:: solution (Solution) – Solution to check
Returns:: True if solution language is CUDA or C++ and binding is torch
Return type:: bool

build(definition: Definition, solution: Solution) → Runnable¶

Build a CUDA solution into a runnable.

This method writes the solution sources to a build directory, compiles them using PyTorch’s cpp_extension.load(), and returns a callable wrapper.

Parameters:

definition (Definition) – The problem definition.
solution (Solution) – The CUDA solution to build.

Returns:

An executable wrapper around the compiled extension.

Return type:

Runnable

Raises:

BuildError – If the entry file is not a C/C++/CUDA file, compilation fails, or the entry symbol is not found in the compiled extension.