Overview Speakers Schedule Papers Organizers Zoom Calendar

Overview

There has been growing interest in learning robot skills from humans—both of which can be viewed as physical agents that interact with the world. In recent years, computer vision researchers have focused on creating digital twins of humans in virtual environments that behave like real ones, while roboticists have been building physical agents capable of interacting with the real world. We believe that progress in one field can greatly benefit the other. On one hand, virtual humans can be regarded as a special form of robotic agents; on the other hand, robots can learn manipulation and locomotion from human demonstrations, including those performed by simulated humans. Through the proposed workshop, we aim to bring these two fields together to explore common challenges—such as retargeting, embodiment gap, contact modeling, data scarcity, and the role of foundation models, among others. We hope this workshop will serve as a fertile ground for inspiring new research directions and fostering cross-disciplinary collaboration in this space.

Schedule

TimeContentSpeaker
09:25 – 09:30Opening Remark
09:30 – 10:00Keynote 1Baoxiong Jia
10:00 – 10:45Spotlight 1Haoru Xue, Yuxuan Kuang, Yilin Wu
10:45 – 11:15Keynote 2Kristen Grauman
11:15 – 11:45Keynote 3Umar Iqbal
11:45 – 12:15Lunch Break
Poster Session
13:30 – 14:00Keynote 4Gerard Pons-Moll
14:00 – 14:30Keynote 5Marc Pollefeys
14:30 – 15:15Spotlight 2Toru Lin, Mert Albaba, Sirui Xu
15:15 – 15:45Keynote 6Hanbyul Joo
15:45 – 16:15Keynote 7Lingni Ma
16:30 – 17:15Panel Discussion

Accepted / Invited Paper List

  • InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
    Sirui Xu, Samuel Schulter, Morteza Ziyadi, Xialin He, Xiaohan Fei, Yu-Xiong Wang, Liang-Yan Gui
  • UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos
    Gu Zhang, Qicheng Xu, Haozhe Zhang, Jianhan Ma, Long He, Yiming Bao, Zeyu Ping, Zhecheng Yuan, Chenhao Lu, Chengbo Yuan, Tianhai Liang, Xiaoyu Tian, Maanping Shao, Feihong Zhang, Mingyu Ding, Yang Gao, Hao Zhao, Hang Zhao, Huazhe Xu
  • Action-Sketcher: From Reasoning to Action via Visual Sketches for Robotic Manipulation
    Huajie Tan, Peterson Co, Yijie Xu, Shanyu Rong, Yuheng Ji, Cheng Chi, Xiansheng Chen, Zhongxia Zhao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
  • AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Affordance Correspondence
    Jiawei Zhang, Kaizhe Hu, Yingqian Huang, Yuanchen Ju, Zhengrong Xue, Huazhe Xu
  • Recovering Physically Plausible Human-Object Interactions from Monocular Videos
    Dingbang Huang, Etienne Vouga, Qixing Huang, Georgios Pavlakos
  • AGiLe: Learning Robust Long-Horizon Manipulation via Affordance-Grounded Bidirectional Latent Planning
    Zixuan Chen, Xiangrong Feng, Jieqi Shi, Lin Shao, Jing Huo, Yang Gao
  • CrossHOI: Learning Cross-View Representations for Monocular 3D Human-Object Interaction Reconstruction
    Pei Geng, Shanshan Zhang, Jian Yang
  • RHINO: Reconstructing Human Interactions with Novel Objects from Monocular Videos
    Lixin Xue, Chengwei Zheng, Georgios Paschalidis, Chen Guo, Manuel Kaufmann, Juan Zarate, Dimitrios Tzionas
  • CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction
    Xianghui Xie, Bowen Wen, Yan Chang, Hesam Rabeti, Jiefeng Li, Ye Yuan, Gerard Pons-Moll, Stan Birchfield
  • OneHOI: Unifying Human-Object Interaction Generation and Editing
    Jiun Tian Hoe, Weipeng Hu, Xudong Jiang, Yap-Peng Tan, Chee Seng Chan
  • Decoupled Generative Modeling for Human-Object Interaction Synthesis
    Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim
  • MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
    Kehong Gong, Zhengyu Wen, Weixia He, Mingxi Xu, Qi Wang, Ning Zhang, Zhengyu Li, Dongze Lian, Wei Zhao, Xiaoyu He, Mingyuan Zhang
  • Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos
    Dingkun Wei, Zehong Shen, Yan Xia, Georgios Pavlakos, Yujun Shen, Xiaowei Zhou
  • AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis
    Xiaofei Wu, Yi Zhang, Yumeng Liu, Yuexin Ma, Yujiao Shi, Xuming He
  • Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models
    Shengli Zhou, Minghang Zheng, Feng Zheng, Yang Liu
  • Grounded 3D-Aware Spatial Vision-Language Modeling
    An-Chieh Cheng, Yang Fu, Yatai Ji, Ligeng Zhu, Guanqi Zhan, Zhuoyang Zhang, Zhaojing Yang, Song Han, Yao Lu, Pavlo Molchanov, Vidya Nariyambut Murali, Jan Kautz, Xiaolong Wang, Hongxu Yin, Sifei Liu
  • FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning
    Zhengyu Fu, René Zurbrügg, Kaixian Qu, Marc Pollefeys, Marco Hutter, Hermann Blum, Zuria Bauer
  • Local Motion Matters: A Deconstruct–Recompose Paradigm for Reinforcement Learning Pre-training from Videos
    Jinwen Wang, Youfang Lin, Xiaobo Hu, Shuo Wang, Kai Lv
  • Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer
    Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, Linxi “Jim” Fan, Yuke Zhu
  • RAYNOVA: Scale-Temporal Autoregressive World Modeling in Ray Space
    Yichen Xie, Chensheng Peng, Mazen Abdelfattah, Yihan Hu, Jiezhi Yang, Eric Higgins, Ryan Brigden, Masayoshi Tomizuka, Wei Zhan
Speakers Schedule iCal Download Accepted / Invited Paper List Organizers

Overview

The goal of this workshop is to build communication among researchers who study human modeling and robotics, both of which can be considered as physical agents that interact with the world. In recent years, computer vision researchers have focused on creating digital twins of humans in virtual environments that behave like real humans, and at the same time roboticists have worked on building physical agents capable of interacting with the real world. We believe that the progress in one field can greatly benefit the other. On one hand, virtual humans can be considered as a special form of a robotic agent. On the other hand, robots can learn manipulation and locomotion from human demonstrations, including from simulated humans. Through the proposed workshop, we aim to bring these two fields together and explore common challenges - such as contact modeling, motion prediction, overcoming data paucity, the role of large-scale models, etc. We hope that our workshop will provide a suitable platform for inspiring new research directions and solutions in this space.

Schedule

[Zoom]
TimeSpeaker(s)
9:25 - 9:30Opening Remark
9:30 - 10:00KeynoteHanbyul Joo
10:00 - 10:45SpotlightAditya Prakash, Mandi Zhao, Rick Akkerman
10:45 - 11:15KeynoteMichael Black
11:15 - 11:45KeynoteRichard Newcombe
11:45 - 12:15Poster Session@ExHall D #182 - #201
Lunch Break
13:30 - 14:00KeynoteKaren Liu
14:00 - 14:30KeynoteSiyuan Huang
14:30 - 15:00SpotlightRoei Herzig, Sirui Xu
15:15 - 15:45KeynoteJitendra Malik
15:45 - 16:15KeynoteDinesh Jayaraman
16:15 - 16:45SpotlightNeerja Thakkar, Junyao Shi
16:45 - 16:50Closing Remark

Call for Paper

Accepted / Invited Paper List

  • Poly-Autoregressive Prediction for Modeling Interactions
    Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran, Shiry Ginosar, Jitendra Malik
  • InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
    Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, Yu-Xiong Wang, Liangyan Gui
  • How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
    Aditya Prakash, Benjamin Lundell, Dmitry Andreychuk, David Forsyth, Saurabh Gupta, Harpreet Sawhney
  • InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
    Rick Akkerman, Haiwen Feng, Michael J. Black, Dimitrios Tzionas, Victoria Fernández Abrevaya
  • InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
    Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, Liang-Yan Gui
  • DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation
    Zhao Mandi, Yifan Hou, Dieter Fox, Yashraj Narang, Ajay Mandlekar, Shuran Song
  • ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
    Junyao Shi, Zhuolun Zhao, Tianyou Wang, Ian Pedroza, Amy Luo, Jie Wang, Yecheng Jason Ma, Dinesh Jayaraman
  • UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
    Jaehyun Kang, Hanjung Kim, Hyolim Kang, Meedeum Cho, Seon Joo Kim, Youngwoon Lee
  • Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
    Shaowei Liu, Chuan Guo, Bing Zhou, Jian Wang
  • HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
    Chen Bao, Jiarui Xu, Xiaolong Wang, Abhinav Gupta, Homanga Bharadhwaj
  • Sparse MoE Students for Efficient Knowledge Distillation
    Jongwon Ryu, Mingyu Jeon, Woojun Jung, Minuk Ma, Junyeong Kim
  • DiffCogNav: Diffusion-based Trajectory Planning for Cognitively-Aware Human Navigation Behavior
    Zhiwen Qiu, Ziang Liu, Tapomayukh Bhattacharjee, Saleh Kalantari
  • DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy
    Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani
  • BG-HOP: A Bimanual Generative Hand-Object Prior
    Sriram Krishna, Sravan Chittupalli, Sungjae Park
  • Agent-Agnostic Semantic Reasoning for Material-Aware Obstacle Handling in Autonomous Vehicles
    Ayush Bheemaiah, Seungyong Yang
  • Visual imitation enables contextual humanoid control
    Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa
  • VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
    Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger
  • Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
    Yitang Li, Mingxian Lin, Zhuo Lin, Yipeng Deng, Yue Cao, Li Yi
  • OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
    Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao Dong
  • GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
    Rao Fu, Dingxi Zhang, Alex Jiang, Wanjia Fu, Austin Funk, Daniel Ritchie, Srinath Sridhar
  • SkillMimic: Learning Basketball Interaction Skills from Demonstrations
    Yinhuai Wang, Qihan Zhao, Runyi Yu, Ailing Zeng, Jing Lin, Zhengyi Luo, Hok Wai Tsui, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan
  • InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
    Jinlu Zhang, Yixin Chen, Zan Wang, Jie Yang, Yizhou Wang, Siyuan Huang
  • DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
    Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding
  • EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
    Yumeng Liu, Xiaoxiao Long, Zemin Yang, Yuan Liu, Marc Habermann, Christian Theobalt, Yuexin Ma, Wenping Wang

Reviewer Acknowledgement

We thank all reviewers who helped us with the review process: Nischal Reddy Chandra, Sichang Su, Zhiwen Qiu, Chen Bao, Rakhil Immidisetti, Xinpeng Liu, Abhiroop Chatterjee, Zi-ang Cao, Yu Wu, Yuqi Xie, Aditya Prakash, Sholder Lyko, Rynaa Grover, Susmita Ghosh, Poorvi Hebbar.