Topology-aware scheduling

技术雷达

发布于 : Nov 05, 2025

Nov 2025

评估

GPUs and LPUs are no longer standalone devices but tightly coupled networks of accelerators whose performance depends on placement and topology. In rack-scale systems like NVIDIA’s NVL72, 72 GPUs share over 13 TB of VRAM and act as a single accelerator — until workloads cross-switch islands, turning collective operations into bottlenecks. Similarly, Groq’s compile-time, software-scheduled architecture assumes deterministic data movement; random scheduling breaks those assumptions and predictability. Even within the same data center, GPU performance can vary significantly, creating demand for topology-aware scheduling that accounts for both hardware layout and variability when placing jobs.

Naive schedulers that ignore NVLink, PCIe or NIC topology often scatter multi-GPU workloads arbitrarily, resulting in degraded step time and efficiency. Training workloads, which are synchronous and bandwidth-bound, favor contiguous NVLink islands with uniform, high-bandwidth paths for all-reduce and pipeline stages. These jobs should co-schedule based on fabric bandwidth, avoid cross-switch hops and treat link, switch and node boundaries as failure domains. Inference workloads, by contrast, are latency and SLO-bound and typically balance replication for high availability across domains with sharding to keep mixture of experts (MoE) and KV-cache locality on the shortest paths. Optimizing placement for prefill versus decode phases, micro-batching and tenant isolation further improves efficiency. We believe topology-aware scheduling will become essential as accelerator performance grows increasingly dependent on network and data center topology. Our teams are already assessing Kueue and related projects to improve placement precision, boost performance and ensure reliable scaling for our clients.

下载

English | Español | Português | | 中文

订阅科技雷达新闻简报

订购

解决方案

行业

数字出版物和工具

所有洞见

下载

订阅科技雷达新闻简报

查看存档并阅读往期内容