yt2papers

Referenced Papers (8)

Making sense of vision and touch: Learning multimodal representations for contact tasks

Lee

IEEE Int. Conf. Robot. Autom.

"This paper is cited as an example of learning multimodal representations from vision and touch for contact tasks in robotics."

Referenced at: 02:50

SMELLNET: A large-scale dataset for real-world smell recognition

Dewei Feng, Carol Li, Wei Dai, Paul Pu Liang

ArXiv, 2025

"This paper provides a database for real-world smell recognition, cited in the context of AI for physical sensing, specifically AI for smell."

Referenced at: 02:50

VideoPoet: A large language model for zero-shot video generation

D Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung

ICML, 2023

"The speaker uses this model as an example of multimodal generative AI that can translate between different modalities like text-to-video and video-to-audio."

Referenced at: 04:20

Developing ICU clinical behavioral atlas using ambient intelligence and computer vision

Wei Dai, Ehsan Adeli, Zelun Luo, Dev Dash, S Lakshmikanth, Zane Durante

NEJM AI, 2025

"This work is cited as an example of applying AI for holistic health, specifically for understanding the physical health of patients in an ICU by analyzing their behavior."

Referenced at: 04:55

OpenFace 3.0: A lightweight multitask system for comprehensive facial behavior analysis

Jiewen Hu, Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

IEEE Int Conf Autom Face Gesture Recognit, 2025

"Cited in the context of holistic health, this work on facial behavior analysis is relevant to understanding social wellness through non-verbal cues."

Referenced at: 04:55

Advancing Social Intelligence in AI: Technical Challenges and Open Questions

Mathur

EMNLP

"This paper highlights the challenges in developing AI with social intelligence, a key component of the speaker's discussion on holistic health."

Referenced at: 04:55

WebArena: A realistic web environment for building autonomous agents

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar

Int Conf Learn Represent, 2023

"This paper is cited as an example of work on interactive AI agents that operate on the web, illustrating a shift from single-step prediction to multi-step action sequences."

Referenced at: 06:05

VideoWebArena: Evaluating multimodal agents on video understanding web tasks

Jang

ICLR

"This citation showcases advancements in building and evaluating interactive, multimodal agents for web-based tasks."

Referenced at: 06:05