Expert Parallel Load Balancing (EPLB)

Routing Skew (α) 1.4

Higher α = more imbalanced routing. Real workloads typically fall in 1.2–1.8.

Redundant Experts 0

Extra physical expert copies for load balancing. Each costs ~2.4 GB HBM (DeepSeek-V3).

EP Ranks (GPUs) 8

Number of GPUs in the expert-parallel group.

Adjust controls to explore load balancing

Load per GPU —

Balanced load Max-load GPU (straggler) Mean load

Expert Replication Map —

Each cell is a logical expert. Size encodes token load; badge shows replica count. Hot experts (replicated) are highlighted. EPLB spreads their load across multiple GPUs.