Impala AI

Technical Blog

Production-grade notes on MoE serving, systems bottlenecks, and practical optimization in vLLM.

Impala Team

Wide‑EP Mixture-of-Experts (MoE) Serving: Dispatch/Combine, Dual-Batch Overlap (DBO), and Real Scaling Limits

A production guide to multi-node Mixture-of-Experts (MoE) inference in vLLM: model communication vs compute, choose DeepEP low-latency (LL) vs high-throughput (HT) by measurement, and tune Dual-Batch Overlap (DBO) and Expert Parallel Load Balancing (EPLB) for maximum throughput.