DeepEP: Under the Hood

Step time:

Topology Sketch

HT: NVLink gather → RDMA → NVLink scatter. LL: direct RDMA, double‑buffered.

Double Buffering Schedule

compute phase LL comm: direct RDMA HT comm: RDMA stage HT comm: gather/scatter

LL shows direct RDMA per slot; HT shows hierarchical gather → RDMA → scatter. Double buffering keeps overlap safe without overwriting in‑flight buffers.