NemotronH-8B

NVIDIA NemotronH-8B hybrid architecture combining Mamba2 SSM layers with standard attention. 52 total layers: 24 Mamba (M), 4 Attention (*), 24 MLP-only (-). Mamba layers use FlashInfer selective_state_update for decode.

Architecture Overview

Loading visualization...

Architecture Summary

18

Total Modules

7

Blocks

11

Kernels

9

Traced Kernels