Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

Chenhui Xu; Fuxun Yu; Mike Bianco; Jacob Kovarskiy; Raphael Tang; Qi Zhang; Zirui Xu; William Levine; Brandon Dubbs; Heming Liao; C. Burgess; Suvam Bag; Jay Patravali; Rupanjali Kukal; Mikael Figueroa; Rishi Madhok; Nikolaos Karianakis; Jinjun Xiong

Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

Chenhui Xu ,
Fuxun Yu ,
Mike Bianco ,
Jacob Kovarskiy ,
Raphael Tang ,
Qi Zhang ,
Zirui Xu ,
William Levine ,
Brandon Dubbs ,
Heming Liao ,
C. Burgess ,
Suvam Bag ,
Jay Patravali ,
Rupanjali Kukal ,
Mikael Figueroa ,
Rishi Madhok ,
Nikolaos Karianakis ,
Jinjun Xiong

ICML 2026 | September 2025

Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: indirect verifiable rewards, derived from seemingly unrelated metadata, are sufficient to induce sophisticated and generalizable geospatial reasoning across a wide range of downstream tasks (25+). We present Geo-R1 as one empirical instantiation of this paradigm. Rather than relying on limited task-specific annotations (i.e., direct rewards), Geo-R1 utilizes scalable, verifiable indirect proxy rewards based on cross-view alignment with metadata (geolocation information) to drive reinforcement learning at scale. Such indirect rewards successfully motivate the model to discover and internalize zero-shot geospatial reasoning across diverse tasks, achieving extraordinary zero-shot transfer on out-of-distribution benchmarks and even surpassing fully supervised specialists on certain benchmarks. These findings indicate that optimizing for indirect verifiable rewards may provide a scalable pathway to unlock generalized reasoning capabilities in rare domains with massive unlabeled data archives.

GitHub