Paperpile

Referenced Papers (19)

On loop tiling and placement for accelerating AI compilers

P Photilimthana

Proc. Annu. Conf. Tax. Held Under Auspices Natl. Tax Assoc.-Tax Inst. Am.

"This paper describes an XLA TPU autotuner system that uses learned policies to optimize compiler decisions for graph-level optimizations like layout assignment and operator fusion."

Referenced at: 04:16

Learned memory allocation and beyond

Kathryn S. McKinley, Colin Raffel

CACM

"This work explores using LSTM networks to predict object lifetimes based on call stack information to improve memory allocation policies."

Referenced at: 07:41

SmartChoices: Augmenting software with learned implementations

Daniel Golovin, Gabor Bartok, Eric Chen, Emily Donahue, Tzu-Kuo Huang, Efi Kokiopoulou

arXiv [cs.SE], 2023

"This paper introduces SmartChoices, a software platform simplifying the integration of learned decisions into various parts of system and application code, allowing for online feedback."

Referenced at: 10:45

HALP: Heuristic aided learned preference eviction policy for YouTube content delivery network

Zhenyu Song, Kevin Chen, Nuikhil Sarda, Deniz Altinbüken, E Brevdo, Jimmy Coleman

NSDI, 2023

"This example demonstrates how SmartChoices can reduce the byte miss rate in YouTube's cache eviction policy by incorporating complex features influencing content popularity."

Referenced at: 13:14

Ten lessons from three generations shaped Google's TPUv4i

N Jouppi

ISCA

"This paper discusses the design evolution of Google's Tensor Processing Units, highlighting how specialized hardware can significantly improve efficiency for machine learning workloads."

Referenced at: 15:59

A domain-specific accelerator for deep learning

J Dean, N Jouppi

ISCA

"This slide discusses the influential 2017 ISCA paper by Dean, Jouppi et al. which detailed the first TPU design and became the second most cited paper in ISCA's 50-year history, highlighting its significance in driving specialized hardware for machine learning."

Referenced at: 16:47

Energy and policy considerations for deep learning in NLP

Emma Strubell, Ananya Ganesh, A McCallum

Annu Meet Assoc Comput Linguistics, 2019

"This paper made a flawed estimate of CO₂ emissions for neural architecture search by modeling the wrong hardware and assuming a US average data center, leading to an overestimation of carbon emissions."

Referenced at: 19:25

Green Shrink Patterson Gonzalez Hölzle Quantifying the Carbon Footprint of AI Research

"This paper provides a more accurate assessment of AI research's carbon footprint, demonstrating that previous estimates were significantly overestimated and that neural architecture search can lead to more efficient models."

Referenced at: 20:52

Autonomous discovery of a unified methodology for fast chip design

M Yazgan, J Jiang, E Songhori, S Wang

NeurIPS 18 Compet.

"This work focuses on using machine learning to improve chip design by automatically generating test cases for verification, leading to improved quality and reduced time."

Referenced at: 25:00

A Rewarding Journey Towards Automated IC Design with Reinforcement Learning

A Mirhoseini, M Yazgan, E Songhori

Nature

"This paper introduces a reinforcement learning approach for automated physical chip design, significantly reducing design time and improving wirelength."

Referenced at: 25:00

TPUv5p and AI-hypercomputer

J Dean, N Jouppi

ISCA

"This citation points to the ISCA 2023 paper describing the increasing adoption of reinforcement learning-based placement techniques in Google's newer TPU chip designs, leading to significant wirelength reductions."

Referenced at: 27:21

Architecture design space exploration for transformer-based AI accelerators

PLoS Biol.

"This paper explores the vast design space for transformer-based AI accelerators, considering both hardware design choices and compiler/software mapping to optimize performance across various workloads."

Referenced at: 29:32

Gemini: A family of highly capable multimodal models

Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut

arXiv [cs.CL], 2023

"This paper introduces the Gemini family of multimodal models, designed to handle various data types like text, image, audio, and video as inputs and generate corresponding responses."

Referenced at: 30:52

Pathways: an AI-supercomputing architecture

J Dean, N Jouppi

J. Mach. Learn. Res.

"This paper introduces Pathways, an AI supercomputing architecture designed to seamlessly manage and map large-scale machine learning computations across various physical resources, including different network types."

Referenced at: 36:00

Cores that don't count

Peter H Hochschild, Paul Turner, J Mogul, R Govindaraju, Parthasarathy Ranganathan, D Culler

USENIX Workshop Hot Top Oper Syst, 2021

"This paper investigates silent data corruption errors in hardware and their potential to spread through machine learning training systems, highlighting the need for robust error detection and mitigation strategies."

Referenced at: 37:25

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, A Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford

ArXiv, 2022

"This paper discusses Chinchilla Scaling Laws, which guide decisions on optimal model size based on available compute resources, impacting strategies for inference cost reduction."

Referenced at: 40:15

Distilling the knowledge in a neural network

Geoffrey E Hinton, O Vinyals, J Dean

ArXiv, 2015

"This paper introduces knowledge distillation, a technique to compress large, high-performing models into smaller, more efficient models for faster and cheaper inference, by training the smaller model to mimic the larger one's outputs."

Referenced at: 40:15

A simple and efficient way to use a mixture of experts

Int. Crim. Law Rev.

"This work explores mixture-of-experts architectures for more efficient inference by activating only a small portion of a very large model, allowing for conditional computation and automatic sharding."

Referenced at: 40:15

GSPM: A sparse mixture of experts

NeurIPS 18 Compet.

"This paper presents GSPM, a sparse mixture of experts model, which is another technique to reduce inference costs by selectively activating parts of a large model based on input."

Referenced at: 40:15