Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
- Qiuyuan Huang, Microsoft; Jianfeng Gao, Microsoft
Vision-Language Navigation is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. We propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL) and further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions.
-
-
Qiuyuan Huang
Principal Researcher
-
Jianfeng Gao
Technical Fellow & Corporate Vice President
-
-
Regardez suivant
-
-
Session: Compute & Trust (Systems)
- Ashish Panwar,
- Aditya Desai,
- Abhilash Jindal
-
Multimodal & Embodied Intelligence (Pt 1), Panel on Multimodal AI: Progress, Pitfalls, Possibilities
- Madhava Krishna,
- Sriram Ganapathy,
- Somak Aditya
-
Session on Compute & Trust (Security)
- Krishna Pillutla,
- Danish Pruthi
-
-
Session on Reasoning
- Hongxiang Fan,
- Nagarajan Natarajan
-
-
Session on Retrieval
- Lokesh Nagalapatti,
- Soumen Chakrabarti
-
-