NemotronH-8B

NVIDIA NemotronH-8B hybrid architecture combining Mamba2 SSM layers with standard attention. 52 total layers: 24 Mamba (M), 4 Attention (*), 24 MLP-only (-). Mamba layers use FlashInfer selective_state_update for decode.