Higher α = more imbalanced routing. Real workloads typically fall in 1.2–1.8.
Redundant Experts0
Extra physical expert copies for load balancing. Each costs ~2.4 GB HBM (DeepSeek-V3).
EP Ranks (GPUs)8
Number of GPUs in the expert-parallel group.
Adjust controls to explore load balancing
Load per GPU—
Balanced load Max-load GPU (straggler) Mean load
Expert Replication Map—
Each cell is a logical expert. Size encodes token load; badge shows replica count.
Hot experts (replicated) are highlighted. EPLB spreads their load across multiple GPUs.
γ = maxr / meanr — the straggler factor from Section 3.
EPLB replicates popular experts so that token load is spread evenly across GPUs,
reducing γ toward 1.0 and directly cutting tcompute.