yt2papers

Referenced Papers (15)

Scaling laws for neural language models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child

arXiv [cs.LG], 2020

"This paper empirically found scaling laws for large language models, providing a power law for their performance based on parameters and data size."

Referenced at: 03:40

Attention is All you Need

Ashish Vaswani, Noam M Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez

Neural Inf Process Syst, 2017

"This seminal paper introduced the Transformer architecture, which became a universal model for various AI/machine learning tasks."

Referenced at: 08:08

A versatile graph learning approach through LLM-based agent

Lanning Wei, Huan Zhao, Xiaohan Zheng, Zhiqiang He, Quanming Yao

arXiv [cs.LG], 2023

"This paper focuses on understanding the safety of large language models by identifying a small percentage of critical neurons using low-rank properties."

Referenced at: 14:34

Speculative decoding with Big Little Decoder

Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W Mahoney, Amir Gholami

arXiv [cs.CL], 2023

"This technique allows large language models to use smaller models for initial drafts and larger models for verification, saving inference costs during decoding."

Referenced at: 15:28

SpecDec++: Boosting speculative decoding via adaptive candidate lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

arXiv [cs.CL], 2024

"This technique allows large language models to use smaller models for initial drafts and larger models for verification, saving inference costs during decoding."

Referenced at: 15:28

Auto-Encoding Variational Bayes

Diederik P Kingma, Max Welling

arXiv [stat.ML], 2013

"Variational autoencoders are discussed as an early model that aimed to compress and recover data, which unexpectedly revealed capabilities for generating new content."

Referenced at: 21:28

Generative Adversarial Networks

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair

arXiv [stat.ML], 2014

"Generative Adversarial Networks are introduced as another early model in generative AI, initially developed for supervised learning but proving powerful for generating realistic images as a byproduct."

Referenced at: 22:41

Time Reversal of Diffusions

U G Haussmann, E Pardoux

Ann. Probab., 1986

"Cited as part of the mathematical foundation for deriving the time reversal of an Ornstein-Uhlenbeck process, crucial for diffusion models."

Referenced at: 28:11

Unsupervised learning of image manifolds by semidefinite programming

Kilian Q Weinberger, Lawrence K Saul

Int. J. Comput. Vis., 2006

"Cited as the source for effective dimensionality estimates of image datasets, illustrating that practical data often reside on low-dimensional manifolds."

Referenced at: 32:02

The intrinsic dimension of images and its impact on learning

Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein

arXiv [cs.CV], 2021

"Cited as a source for effective dimensionality estimates of image datasets, highlighting the low-dimensional nature of complex data."

Referenced at: 32:02

Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data

Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang

ICML, 2023

"This paper provides statistical theory for diffusion models, demonstrating their ability to converge to data distribution and adapt to manifold structures efficiently."

Referenced at: 35:35

Classifier-Free Diffusion Guidance

Jonathan Ho, Tim Salimans

arXiv [cs.LG], 2022

"Introduces a method to guide diffusion models towards specific objectives without requiring a separate classifier, which is crucial for controlling generation."

Referenced at: 37:21

Diffusion models beat GANs on image synthesis

Prafulla Dhariwal, Alex Nichol

arXiv [cs.LG], 2021

"Introduces a method to guide diffusion models towards specific objectives using an external classifier to steer the generation process."

Referenced at: 37:21

Factorized diffusion architectures for unsupervised image generation and segmentation

Xin Yuan, Michael Maire

arXiv [cs.CV], 2023

"This paper develops an algorithm and theory for leveraging pre-training to learn useful structures from unlabeled data, provably achieving near-optimal sub-optimality."

Referenced at: 39:13

DeepMind predicts millions of new materials

Robert F Service

Science, 2023

"DeepMind used generative AI to discover 2.2 million new crystal structures, a task that would take human researchers 800 years."

Referenced at: 41:10