DBO Pipeline: Sequential vs. Overlapped Execution

Dispatch (comm) Expert Compute Combine (comm) Idle

Top: Without overlap, each step runs Dispatch → Compute → Combine sequentially. GPUs idle during network transfers; the network idles during compute.
Bottom: With DBO, double buffering lets Step N's compute overlap with Step N+1's communication. Steady-state step time drops from tcomm + tcompute to max(tcomm, tcompute). The shaded idle blocks show wasted GPU time that DBO eliminates.