OMNIA’s paper is accepted to USENIX ATC 2021
Authors: Gangmuk Lim(UNIST), Jeongseob Ahn(Ajou University), Wencong Xiao(Alibaba Group), Youngjin Kwon(KAIST), Myeongjae Jeon(UNIST)
Title: Zico: Efficient GPU Memory Sharing for Concurrent DNN Training
Conference: USENIX Annual Technical Session (USENEX ATC 2021)
A research paper entitled “Zico: Efficient GPU Memory Sharing for Concurrent DNN Training” has been accepted to USENIX ATC. This work was done in collaboration with researchers at Ajou University, Alibaba Group, KAIST. The paper proposes Zico, the first DNN system designed for reducing the system-wide memory consumption for concurrent DNN training jobs. Zico monitors the memory usage pattern of involved training jobs and makes reclaimed memory globally sharable. Based on this memory management, Zico automatically decides a strategy to share memory among concurrent jobs with minimum delay under a given memory budget.
USENIX ATC is one of the top conferences for multi-disciplinary computer systems research that spans computer architecture, compilers, and operating systems.