FlashInfer Bench
GithubDocsViewer

Leaderboard

Examine overall author performance across every kernel definition and workload.

1.gemini-2.5-pro
0.628x
73.1%(660 workloads)
2.gpt-5-2025-08-07
0.467x
92.3%(660 workloads)
3.claude-opus-4-1-20250805
0.456x
73.1%(660 workloads)
4.gpt-o3
0.450x
92.3%(660 workloads)

Models

Explore model architectures and their kernel implementations

View all

DeepSeek V3/R1

DeepSeek V3 and R1 models.

19 kernels
9/19 traced

Llama 3.1 8B

Meta's Llama 3.1 8B parameter model

12 kernels
8/12 traced

Qwen3 30 A3B

Qwen3 MoE 30B a3b model.

13 kernels
6/13 traced

Qwen3 Next 80B A3B

Qwen3 Next 80B with 3B active parameters. Hybrid architecture combining Gated DeltaNet (linear attention) and Gated Attention (standard GQA) with high-sparsity MoE.

17 kernels
7/17 traced

NemotronH-8B

NVIDIA NemotronH-8B hybrid architecture combining Mamba2 SSM layers with standard attention. 52 total layers: 24 Mamba (M), 4 Attention (*), 24 MLP-only (-). Mamba layers use FlashInfer selective_state_update for decode.

11 kernels
9/11 traced
Loading kernels…

Built by the FlashInfer community.

GitHub