Expert Parallel Load Balancing (EPLB)

γ = —
Routing Skew (α) 1.4

Higher α = more imbalanced routing. Real workloads typically fall in 1.2–1.8.

Redundant Experts 0

Extra physical expert copies for load balancing. Each costs ~2.4 GB HBM (DeepSeek-V3).

EP Ranks (GPUs) 8

Number of GPUs in the expert-parallel group.

Adjust controls to explore load balancing
Load per GPU
Balanced load Max-load GPU (straggler) Mean load
Expert Replication Map

Each cell is a logical expert. Size encodes token load; badge shows replica count. Hot experts (replicated) are highlighted. EPLB spreads their load across multiple GPUs.

γ = maxr / meanr — the straggler factor from Section 3. EPLB replicates popular experts so that token load is spread evenly across GPUs, reducing γ toward 1.0 and directly cutting tcompute.