Native Sparse Attention (DSA) TopK indexer with FP8 quantization for DeepSeek-V3.2. Computes sparse attention scores using ReLU activation and learned weights, then selects top-K KV cache indices. Formula: sum(relu(q @ K.T) * weights). Matches SGLang/deep_gemm implementation. Page size 64 variant.
Axes
batch_size
varnum_index_heads
64index_head_dim
128page_size
64topk
2048max_num_pages
varnum_pages
varkv_cache_num_heads
1head_dim_with_scale
132Signature
Inputs
| Name | Type | Shape |
|---|---|---|
q_index_fp8 | float8_e4m3fn | [batch_size, num_index_heads, index_head_dim] |
k_index_cache_fp8 | int8 | [num_pages, page_size, kv_cache_num_heads, head_dim_with_scale] |
weights | float32 | [batch_size, num_index_heads] |
seq_lens | int32 | [batch_size] |
block_table | int32 | [batch_size, max_num_pages] |
Outputs
| Name | Type | Shape |
|---|---|---|
topk_indices | int32 | [batch_size, topk] |
Constraints
- • topk <= max_num_pages * page_size
Reference Implementation
Loading editor...
Loading solutions…
