dsa_topk_indexer_fp8_h64_d128_topk2048_ps64

dsa_paged

Native Sparse Attention (DSA) TopK indexer with FP8 quantization for DeepSeek-V3.2. Computes sparse attention scores using ReLU activation and learned weights, then selects top-K KV cache indices. Formula: sum(relu(q @ K.T) * weights). Matches SGLang/deep_gemm implementation. Page size 64 variant.

stage:indexer

status:verified

model:deepseek-v3.2

sparse:topk

quant:fp8

Axes

batch_size

var

num_index_heads

index_head_dim

128

page_size

topk

2048

max_num_pages

var

num_pages

var

kv_cache_num_heads

head_dim_with_scale

132

Signature

Inputs

Name	Type	Shape
q_index_fp8	float8_e4m3fn	[batch_size, num_index_heads, index_head_dim]
k_index_cache_fp8	int8	[num_pages, page_size, kv_cache_num_heads, head_dim_with_scale]
weights	float32	[batch_size, num_index_heads]
seq_lens	int32	[batch_size]
block_table	int32	[batch_size, max_num_pages]

Outputs

Name	Type	Shape
topk_indices	int32	[batch_size, topk]

Constraints

• topk <= max_num_pages * page_size

Reference Implementation

Loading editor...

Loading solutions…