Community

Seminar

ECE Colloquium: Yangwook Kang (Samsung Electronics USA) – “Accelerating I/O Intensive Applications Using In-Storage Computing”

Date
  ( ~ )
Location
Speaker

Abstract:

  Moving code close to data has become a popular technique to improve the performance of data processing. For example, big-data processing engines such as Hadoop execute user-provided code in data nodes where data to be processed is located and receive only the results from each node, aiming to process data concurrently and locally. As the performance of flash devices increases, however, the CPU bottleneck in data nodes has also become a critical issue. The recent performance studies have shown that saturating the bandwidth of an NVM-e SSD or a high performance PCI-e based SSD would require two dedicated server CPUs (e.g., Intel Xeon E-5), just for I/O processing. Considering the current flash rack architecture has relatively small number of CPUs than the number of devices, adding data processing to data nodes might decrease both I/O and computation performance; CPUs are too busy handling both I/O requests and data processing, making the device bandwidth idle most of the time.

  To address the issue, we introduce the design and use of a compute-enabled flash device, which leverages its internal computation power to process data. Based on the analysis and optimization of data processing applications, we demonstrate that offloading jobs to a device provides a better scalability, performance, and energy efficiency by increasing the internal bandwidth utilization of a device and lessening the resource requirements for data nodes, such as CPU, memory, and I/O bandwidth. In this presentation, we discuss the bottlenecks of data processing applications with high-performance SSDs, and show how effectively in-storage computing can alleviate the issues in terms of host resource utilization, bandwidth utilization and energy efficiency.

Speaker Bio:

  Yangwook Kang is a storage system researcher at Samsung Semiconductor, Inc. in America. He received his Ph.D in computer science at University of California, Santa Cruz in 2014. Before, he had studied at Hongik University for his master’s degree on non-volatile memory systems.

  His main research interests are object-based storage systems, operating systems, non-volatile memories, and distributed data processing. He has focused on the use of non-volatile memories in object-based devices since 2008 and introduced various types of object devices such as compute-enabled SSDs and flash key-value store devices. He currently works on the design and development of an in-storage compute engine for SSDs and its infrastructure at Samsung.