gdn_decode_qk16_v32_d128_k_last

gdn
·Solutions (0)

Gated Delta Net decode with GVA configuration and k-last state layout. Single-token generation with recurrent state update. Captured from Qwen3 Next linear attention layers.

Axes

batch_size
var
seq_len
1
num_q_heads
16
num_k_heads
16
num_v_heads
32
head_size
128

Signature

Inputs

NameTypeShape
q
bfloat16[batch_size, seq_len, num_q_heads, head_size]
k
bfloat16[batch_size, seq_len, num_k_heads, head_size]
v
bfloat16[batch_size, seq_len, num_v_heads, head_size]
state
float32[batch_size, num_v_heads, head_size, head_size]
A_log
float32[num_v_heads]
a
bfloat16[batch_size, seq_len, num_v_heads]
dt_bias
float32[num_v_heads]
b
bfloat16[batch_size, seq_len, num_v_heads]
scale
float32Scalar

Outputs

NameTypeShape
output
bfloat16[batch_size, seq_len, num_v_heads, head_size]
new_state
float32[batch_size, num_v_heads, head_size, head_size]

Constraints

  • num_v_heads >= num_q_heads
  • num_v_heads % num_q_heads == 0
  • num_k_heads == num_q_heads

Reference Implementation

Loading editor...
Loading solutions…