MiniMax M2

MiniMax M2 model. 62 decoder layers, GQA attention (48 q-heads / 8 kv-heads), sparse MoE (256 experts, top-8 sigmoid routing).

Architecture Overview

Loading visualization...

Architecture Summary

18

Total Modules

4

Blocks

14

Kernels

8

Traced Kernels