dsa_topk_indexer_fp8_h64_d128_topk2048_ps64

dsa_paged
·Solutions (0)

Native Sparse Attention (DSA) TopK indexer with FP8 quantization for DeepSeek-V3.2. Computes sparse attention scores using ReLU activation and learned weights, then selects top-K KV cache indices. Formula: sum(relu(q @ K.T) * weights). Matches SGLang/deep_gemm implementation. Page size 64 variant.

Axes

batch_size
var
num_index_heads
64
index_head_dim
128
page_size
64
topk
2048
max_num_pages
var
num_pages
var
kv_cache_num_heads
1
head_dim_with_scale
132

Signature

Inputs

NameTypeShape
q_index_fp8
float8_e4m3fn[batch_size, num_index_heads, index_head_dim]
k_index_cache_fp8
int8[num_pages, page_size, kv_cache_num_heads, head_dim_with_scale]
weights
float32[batch_size, num_index_heads]
seq_lens
int32[batch_size]
block_table
int32[batch_size, max_num_pages]

Outputs

NameTypeShape
topk_indices
int32[batch_size, topk]

Constraints

  • topk <= max_num_pages * page_size

Reference Implementation

Loading editor...
Loading solutions…