Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards
- Chenhui Xu ,
- Fuxun Yu ,
- Mike Bianco ,
- Jacob Kovarskiy ,
- Raphael Tang ,
- Qi Zhang ,
- Zirui Xu ,
- William Levine ,
- Brandon Dubbs ,
- Heming Liao ,
- C. Burgess ,
- Suvam Bag ,
- Jay Patravali ,
- Rupanjali Kukal ,
- Mikael Figueroa ,
- Rishi Madhok ,
- Nikolaos Karianakis ,
- Jinjun Xiong
Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: indirect verifiable rewards, derived from seemingly unrelated metadata, are sufficient to induce sophisticated and generalizable geospatial reasoning across a wide range of downstream tasks (25+). We present Geo-R1 as one empirical instantiation of this paradigm. Rather than relying on limited task-specific annotations (i.e., direct rewards), Geo-R1 utilizes scalable, verifiable indirect proxy rewards based on cross-view alignment with metadata (geolocation information) to drive reinforcement learning at scale. Such indirect rewards successfully motivate the model to discover and internalize zero-shot geospatial reasoning across diverse tasks, achieving extraordinary zero-shot transfer on out-of-distribution benchmarks and even surpassing fully supervised specialists on certain benchmarks. These findings indicate that optimizing for indirect verifiable rewards may provide a scalable pathway to unlock generalized reasoning capabilities in rare domains with massive unlabeled data archives.