Referenced Papers (19)
On loop tiling and placement for accelerating AI compilers
P Photilimthana
Proc. Annu. Conf. Tax. Held Under Auspices Natl. Tax Assoc.-Tax Inst. Am.
"This paper describes an XLA TPU autotuner system that uses learned policies to optimize compiler decisions for graph-level optimizations like layout assignment and operator fusion."
Learned memory allocation and beyond
Kathryn S. McKinley, Colin Raffel
CACM
"This work explores using LSTM networks to predict object lifetimes based on call stack information to improve memory allocation policies."
SmartChoices: Augmenting software with learned implementations
Daniel Golovin, Gabor Bartok, Eric Chen, Emily Donahue, Tzu-Kuo Huang, Efi Kokiopoulou
arXiv [cs.SE], 2023
"This paper introduces SmartChoices, a software platform simplifying the integration of learned decisions into various parts of system and application code, allowing for online feedback."
HALP: Heuristic aided learned preference eviction policy for YouTube content delivery network
Zhenyu Song, Kevin Chen, Nuikhil Sarda, Deniz Altinbüken, E Brevdo, Jimmy Coleman
NSDI, 2023
"This example demonstrates how SmartChoices can reduce the byte miss rate in YouTube's cache eviction policy by incorporating complex features influencing content popularity."
Ten lessons from three generations shaped Google's TPUv4i
N Jouppi
ISCA
"This paper discusses the design evolution of Google's Tensor Processing Units, highlighting how specialized hardware can significantly improve efficiency for machine learning workloads."
A domain-specific accelerator for deep learning
J Dean, N Jouppi
ISCA
"This slide discusses the influential 2017 ISCA paper by Dean, Jouppi et al. which detailed the first TPU design and became the second most cited paper in ISCA's 50-year history, highlighting its significance in driving specialized hardware for machine learning."
Energy and policy considerations for deep learning in NLP
Emma Strubell, Ananya Ganesh, A McCallum
Annu Meet Assoc Comput Linguistics, 2019
"This paper made a flawed estimate of CO₂ emissions for neural architecture search by modeling the wrong hardware and assuming a US average data center, leading to an overestimation of carbon emissions."
Green Shrink Patterson Gonzalez Hölzle Quantifying the Carbon Footprint of AI Research
"This paper provides a more accurate assessment of AI research's carbon footprint, demonstrating that previous estimates were significantly overestimated and that neural architecture search can lead to more efficient models."
Autonomous discovery of a unified methodology for fast chip design
M Yazgan, J Jiang, E Songhori, S Wang
NeurIPS 18 Compet.
"This work focuses on using machine learning to improve chip design by automatically generating test cases for verification, leading to improved quality and reduced time."
A Rewarding Journey Towards Automated IC Design with Reinforcement Learning
A Mirhoseini, M Yazgan, E Songhori
Nature
"This paper introduces a reinforcement learning approach for automated physical chip design, significantly reducing design time and improving wirelength."
TPUv5p and AI-hypercomputer
J Dean, N Jouppi
ISCA
"This citation points to the ISCA 2023 paper describing the increasing adoption of reinforcement learning-based placement techniques in Google's newer TPU chip designs, leading to significant wirelength reductions."
Architecture design space exploration for transformer-based AI accelerators
PLoS Biol.
"This paper explores the vast design space for transformer-based AI accelerators, considering both hardware design choices and compiler/software mapping to optimize performance across various workloads."
Gemini: A family of highly capable multimodal models
Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut
arXiv [cs.CL], 2023
"This paper introduces the Gemini family of multimodal models, designed to handle various data types like text, image, audio, and video as inputs and generate corresponding responses."
Pathways: an AI-supercomputing architecture
J Dean, N Jouppi
J. Mach. Learn. Res.
"This paper introduces Pathways, an AI supercomputing architecture designed to seamlessly manage and map large-scale machine learning computations across various physical resources, including different network types."
Cores that don't count
Peter H Hochschild, Paul Turner, J Mogul, R Govindaraju, Parthasarathy Ranganathan, D Culler
USENIX Workshop Hot Top Oper Syst, 2021
"This paper investigates silent data corruption errors in hardware and their potential to spread through machine learning training systems, highlighting the need for robust error detection and mitigation strategies."
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, A Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford
ArXiv, 2022
"This paper discusses Chinchilla Scaling Laws, which guide decisions on optimal model size based on available compute resources, impacting strategies for inference cost reduction."
Distilling the knowledge in a neural network
Geoffrey E Hinton, O Vinyals, J Dean
ArXiv, 2015
"This paper introduces knowledge distillation, a technique to compress large, high-performing models into smaller, more efficient models for faster and cheaper inference, by training the smaller model to mimic the larger one's outputs."
A simple and efficient way to use a mixture of experts
Int. Crim. Law Rev.
"This work explores mixture-of-experts architectures for more efficient inference by activating only a small portion of a very large model, allowing for conditional computation and automatic sharding."
GSPM: A sparse mixture of experts
NeurIPS 18 Compet.
"This paper presents GSPM, a sparse mixture of experts model, which is another technique to reduce inference costs by selectively activating parts of a large model based on input."