NemotronH-8B
NVIDIA NemotronH-8B hybrid architecture combining Mamba2 SSM layers with standard attention. 52 total layers: 24 Mamba (M), 4 Attention (*), 24 MLP-only (-). Mamba layers use FlashInfer selective_state_update for decode.
Architecture Overview
Loading visualization...
Architecture Summary
18
Total Modules
7
Blocks
11
Kernels
9
Traced Kernels
