DeepEP: Performance Analysis

Batch tokens —

EP ranks —

Latency —

Bandwidth —

Compute/tok —

Expert skew —

—

Low‑latency High‑throughput

LL wins for small batches (lower L), HT wins for large (higher BW_eff).

Metric

LL w/ DBO

LL w/o DBO

HT w/ DBO

HT w/o DBO

Comm Time

—

Step Latency

—

Throughput

—

Throughput: R = B / t_{\text{step}} (tokens/sec)

Latency: t_{\text{step}} \approx \max(t_{\text{compute}},\ t_{\text{comm}}) (with DBO)