NVIDIA Nemotron Hackathon 2026

MoE Quantization Calibration Pipeline

Routing-aware calibration data synthesis for MoE LLM quantization

PHASE 0 Quantize & Evaluate GPTQ W4A16 · TRT-LLM
PHASE 1 Analyze Expert Routing Token → Expert mapping
PHASE 2 Extract Text Patterns Nemotron-3-Super
PHASE 3 Generate Synthetic Data NVIDIA Data Designer
Repeat 1→3 until balanced
0
Quantize initial model and evaluate on benchmark tasks
IDLE
IN Base model checkpoint OUT Quantized checkpoint Benchmark results
QUANTIZEGPTQ_W4A16 CALIB_SIZE128

NeMo Evaluator Benchmark Results

GSM8K67.25% GPQA Diamond22.73% MMLU-Pro38.29%
1
Analyze activated experts statistics from calibration data
IDLE
IN Calibration dataset (D0_128) Base model weights OUT Token-expert routing analysis

Click "Load Demo" to view results.

2
Extract text patterns causing frequent/scarce expert activation
IDLE
IN Annotated token samples Nemotron-3-Super (vLLM) OUT Per-domain guidelines

Click "Load Demo" to view results.

3
Generate synthetic calibration dataset achieving balanced expert activation
IDLE
IN Per-domain guidelines OUT Synthetic calibration dataset

Click "Load Demo" to view results.