Gated Delta Net decode with GVA configuration and k-last state layout. Single-token generation with recurrent state update. Captured from Qwen3 Next linear attention layers.
Axes
batch_size
varseq_len
1num_q_heads
16num_k_heads
16num_v_heads
32head_size
128Signature
Inputs
| Name | Type | Shape |
|---|---|---|
q | bfloat16 | [batch_size, seq_len, num_q_heads, head_size] |
k | bfloat16 | [batch_size, seq_len, num_k_heads, head_size] |
v | bfloat16 | [batch_size, seq_len, num_v_heads, head_size] |
state | float32 | [batch_size, num_v_heads, head_size, head_size] |
A_log | float32 | [num_v_heads] |
a | bfloat16 | [batch_size, seq_len, num_v_heads] |
dt_bias | float32 | [num_v_heads] |
b | bfloat16 | [batch_size, seq_len, num_v_heads] |
scale | float32 | Scalar |
Outputs
| Name | Type | Shape |
|---|---|---|
output | bfloat16 | [batch_size, seq_len, num_v_heads, head_size] |
new_state | float32 | [batch_size, num_v_heads, head_size, head_size] |
Constraints
- • num_v_heads >= num_q_heads
- • num_v_heads % num_q_heads == 0
- • num_k_heads == num_q_heads
Reference Implementation
Loading editor...
Loading solutions…
