Recommendation models are an important class of Machine Learning (ML) algorithms that offer a targeted user experience for a variety of web-services. E-commerce websites use recommendation algorithms to suggest purchases, search engines to rank search results, video streaming services to recommend videos, and social networks to suggest relevant posts. Over time, recommendation models have evolved from simple collaborative filtering designs to deep learning based approaches. Deep-learning-based recommendation models are at the core of a wide variety of services and consume significant infrastructure capacity and compute cycles in datacenters. Meta reports that more than 50% of their training cycles and around 80% of inference cycles are devoted to deep learning based recommendation models. Similar trends can be found at Google, Alibaba, and Amazon.

Deep learning based recommendation models are a disaggregated conflation of compute-intensive neural networks, memory-intensive embedding tables, and communication collectives-heavy feature interactions. The embedding tables store the user and item features, and their size is expected to increase as more users and items interact. Production-scale recommendation models are already several TeraBytes in size with trillions of parameters. Size of each components differs based on the use case. This results into model-level heterogeneity across deep-learning based recommendation models. Given the limited capacity of GPU HBM, number of GPU’s scale with embedding table size. Other option is to use commodity servers with limited number of GPU’s where embedding tables are placed on CPU main memory. But placing embedding table on CPU main memory faces system bottlenecks and slows down the overall training process.

The goal of the tutorial is to investigate the system bottlenecks associated with placing embedding on CPU main memory and optimizing the embedding placement on commodity servers without scaling the number of GPU devices. We aim to use our work at VLDB 2022.

Goal

  • Understanding the challenges associated with training big sparse recommendation models on commodity servers using real-world data?
  • How to utilize the limited GPU HBM in an efficient way for training big sparse recommendation models.
  • Investigate the popularity within training data and how to exploit the skewness in access patterns into embedding tables.
  • Traininig big sparse recommendation models on commodity servers with limited GPU devices.

Schedule

Time (EST) Session Details Speaker Slides
8:00 am Introduction to Recommender Systems Muhammad Adnan Slides
8:20 am Deep Learning based Recommendation Models (DLRM) Prashant J. Nair Slides
8:45 am Challenges Associated with Training Recommendation Models Muhammad Adnan Slides
9:15 am Setting Up resources for training Muhammad Adnan  
10:00 am Coffee Break    
10:20 am Skewness in Embedding Access Pattern Prashant J. Nair Slides
10:40 am Baseline DLRM Training Profiling Muhammad Adnan Slides
11:15 am FAE Training Muhammad Adnan Slides
12:10 am Conclusion Prashant J. Nair