OMNIA’s paper accepted to USENIX ATC 2019
Prof. Jeon’s paper titled “Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads” has been accepted to USENIX ATC 2019. This work was done in collaboration with researchers at Microsoft in US and China.
This paper presents an in-depth analysis on a multi-tenant GPU cluster running deep learning workloads, with discussion focused on efficiency during job scheduling and execution, and failure taxonomy.
OMNIA’s research on “Systems for AI” is currently well on the track: out of 3 papers Prof. Jeon authored in top-tier systems conferences this year, 2 are about “Systems for AI”.