MiniMax M2
MiniMax M2 model. 62 decoder layers, GQA attention (48 q-heads / 8 kv-heads), sparse MoE (256 experts, top-8 sigmoid routing).
Architecture Overview
Loading visualization...
Architecture Summary
18
Total Modules
4
Blocks
14
Kernels
8
Traced Kernels
