Referenced Papers (15)
Scaling laws for neural language models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child
arXiv [cs.LG], 2020
"This paper empirically found scaling laws for large language models, providing a power law for their performance based on parameters and data size."
Attention is All you Need
Ashish Vaswani, Noam M Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez
Neural Inf Process Syst, 2017
"This seminal paper introduced the Transformer architecture, which became a universal model for various AI/machine learning tasks."
A versatile graph learning approach through LLM-based agent
Lanning Wei, Huan Zhao, Xiaohan Zheng, Zhiqiang He, Quanming Yao
arXiv [cs.LG], 2023
"This paper focuses on understanding the safety of large language models by identifying a small percentage of critical neurons using low-rank properties."
Speculative decoding with Big Little Decoder
Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W Mahoney, Amir Gholami
arXiv [cs.CL], 2023
"This technique allows large language models to use smaller models for initial drafts and larger models for verification, saving inference costs during decoding."
SpecDec++: Boosting speculative decoding via adaptive candidate lengths
Kaixuan Huang, Xudong Guo, Mengdi Wang
arXiv [cs.CL], 2024
"This technique allows large language models to use smaller models for initial drafts and larger models for verification, saving inference costs during decoding."
Auto-Encoding Variational Bayes
Diederik P Kingma, Max Welling
arXiv [stat.ML], 2013
"Variational autoencoders are discussed as an early model that aimed to compress and recover data, which unexpectedly revealed capabilities for generating new content."
Generative Adversarial Networks
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair
arXiv [stat.ML], 2014
"Generative Adversarial Networks are introduced as another early model in generative AI, initially developed for supervised learning but proving powerful for generating realistic images as a byproduct."
Time Reversal of Diffusions
U G Haussmann, E Pardoux
Ann. Probab., 1986
"Cited as part of the mathematical foundation for deriving the time reversal of an Ornstein-Uhlenbeck process, crucial for diffusion models."
Unsupervised learning of image manifolds by semidefinite programming
Kilian Q Weinberger, Lawrence K Saul
Int. J. Comput. Vis., 2006
"Cited as the source for effective dimensionality estimates of image datasets, illustrating that practical data often reside on low-dimensional manifolds."
The intrinsic dimension of images and its impact on learning
Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein
arXiv [cs.CV], 2021
"Cited as a source for effective dimensionality estimates of image datasets, highlighting the low-dimensional nature of complex data."
Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data
Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang
ICML, 2023
"This paper provides statistical theory for diffusion models, demonstrating their ability to converge to data distribution and adapt to manifold structures efficiently."
Classifier-Free Diffusion Guidance
Jonathan Ho, Tim Salimans
arXiv [cs.LG], 2022
"Introduces a method to guide diffusion models towards specific objectives without requiring a separate classifier, which is crucial for controlling generation."
Diffusion models beat GANs on image synthesis
Prafulla Dhariwal, Alex Nichol
arXiv [cs.LG], 2021
"Introduces a method to guide diffusion models towards specific objectives using an external classifier to steer the generation process."
Factorized diffusion architectures for unsupervised image generation and segmentation
Xin Yuan, Michael Maire
arXiv [cs.CV], 2023
"This paper develops an algorithm and theory for leveraging pre-training to learn useful structures from unlabeled data, provably achieving near-optimal sub-optimality."
DeepMind predicts millions of new materials
Robert F Service
Science, 2023
"DeepMind used generative AI to discover 2.2 million new crystal structures, a task that would take human researchers 800 years."