Gated Delta Net prefill with GVA configuration and k-last state layout. The state is in k-last layout [N, H, V, K]. Captured from Qwen3 Next linear attention layers.
Axes
total_seq_len
varnum_seqs
varnum_q_heads
16num_k_heads
16num_v_heads
32head_size
128len_cu_seqlens
varSignature
Inputs
| Name | Type | Shape |
|---|---|---|
q | bfloat16 | [total_seq_len, num_q_heads, head_size] |
k | bfloat16 | [total_seq_len, num_k_heads, head_size] |
v | bfloat16 | [total_seq_len, num_v_heads, head_size] |
state | float32 | [num_seqs, num_v_heads, head_size, head_size] |
A_log | float32 | [num_v_heads] |
a | bfloat16 | [total_seq_len, num_v_heads] |
dt_bias | float32 | [num_v_heads] |
b | bfloat16 | [total_seq_len, num_v_heads] |
cu_seqlens | int64 | [len_cu_seqlens] |
scale | float32 | Scalar |
Outputs
| Name | Type | Shape |
|---|---|---|
output | bfloat16 | [total_seq_len, num_v_heads, head_size] |
new_state | float32 | [num_seqs, num_v_heads, head_size, head_size] |
Constraints
- • len_cu_seqlens == num_seqs + 1
- • total_seq_len == cu_seqlens[-1].item()
Reference Implementation
Loading editor...
Loading solutions…
