Wide‑EP MoE Serving
3 partsPart 1
Wide‑EP Mixture-of-Experts (MoE) Serving (Part 1/3): Why the Wire Becomes the Bottleneck
Part 1 of 3: build the Wide‑EP communication model from first principles—problem framing, notation, payload sizing, and the core communication-vs-compute time model.
Part 2
Wide‑EP Mixture-of-Experts (MoE) Serving (Part 2/3): Dual-Batch Overlap (DBO), Kernel Crossover, and the Hardware Cliff
Part 2 of 3: convert the model into tuning decisions—Dual-Batch Overlap (DBO), DeepEP low-latency (LL) vs high-throughput (HT) crossover, and where hardware locality boundaries create throughput cliffs.
Part 3
Wide‑EP Mixture-of-Experts (MoE) Serving (Part 3/3): Failure Modes, Load Balancing, and Portability
Part 3 of 3: production hardening for Wide‑EP—failure diagnostics, Expert Parallel Load Balancing (EPLB), Linear-Programming-Based Load Balancer (LPLB), software-stack portability, and final operator decision flow.