FlashInfer Bench
GithubDocsViewer

Models

Explore model architectures and their kernel implementations

DeepSeek V3/R1

DeepSeek V3 and R1 models.

19 kernels
9/19 traced

Llama 3.1 8B

Meta's Llama 3.1 8B parameter model

12 kernels
8/12 traced

Qwen3 30 A3B

Qwen3 MoE 30B a3b model.

13 kernels
6/13 traced

Qwen3 Next 80B A3B

Qwen3 Next 80B with 3B active parameters. Hybrid architecture combining Gated DeltaNet (linear attention) and Gated Attention (standard GQA) with high-sparsity MoE.

17 kernels
7/17 traced

Built by the FlashInfer community.

GitHub