Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

  • Chenhui Xu ,
  • Fuxun Yu ,
  • Mike Bianco ,
  • Jacob Kovarskiy ,
  • Raphael Tang ,
  • Qi Zhang ,
  • Zirui Xu ,
  • William Levine ,
  • Brandon Dubbs ,
  • Heming Liao ,
  • C. Burgess ,
  • Suvam Bag ,
  • Jay Patravali ,
  • Rupanjali Kukal ,
  • Mikael Figueroa ,
  • Rishi Madhok ,
  • ,
  • Jinjun Xiong

ICML 2026 |

Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: indirect verifiable rewards, derived from seemingly unrelated metadata, are sufficient to induce sophisticated and generalizable geospatial reasoning across a wide range of downstream tasks (25+). We present Geo-R1 as one empirical instantiation of this paradigm. Rather than relying on limited task-specific annotations (i.e., direct rewards), Geo-R1 utilizes scalable, verifiable indirect proxy rewards based on cross-view alignment with metadata (geolocation information) to drive reinforcement learning at scale. Such indirect rewards successfully motivate the model to discover and internalize zero-shot geospatial reasoning across diverse tasks, achieving extraordinary zero-shot transfer on out-of-distribution benchmarks and even surpassing fully supervised specialists on certain benchmarks. These findings indicate that optimizing for indirect verifiable rewards may provide a scalable pathway to unlock generalized reasoning capabilities in rare domains with massive unlabeled data archives.

GitHubGitHub