DeepEP: Performance Analysis

Batch tokens
EP ranks
Latency
Bandwidth
Compute/tok
Expert skew

Comm Time: LL vs HT

Low‑latency High‑throughput

LL wins for small batches (lower L), HT wins for large (higher BW_eff).

Throughput & Latency Analysis

Metric
LL w/ DBO
LL w/o DBO
HT w/ DBO
HT w/o DBO
Comm Time
Step Latency
Throughput

Throughput: R = B / t_{\text{step}} (tokens/sec)

Latency: t_{\text{step}} \approx \max(t_{\text{compute}},\ t_{\text{comm}}) (with DBO)

DeepEP dispatch/compose kernel intuition: low‑latency reduces fixed overhead, while high‑throughput improves effective bandwidth via topology‑aware forwarding.