Abstract

Teaser image

LiDARs are widely used for mapping and localization in dynamic environments. However, their high cost limits their widespread adoption. On the other hand, monocular localization in LiDAR maps using inexpensive cameras is a cost-effective alternative for large-scale deployment. Nevertheless, most existing approaches struggle to generalize to new sensor setups and environments, requiring retraining or fine-tuning. In this paper, we present CMRNext, a novel approach for camera-LIDAR matching that is independent of sensor-specific parameters, generalizable, and can be used in the wild for monocular localization in LiDAR maps and camera-LiDAR extrinsic calibration. CMRNext exploits recent advances in deep neural networks for matching cross-modal data and standard geometric techniques for robust pose estimation. We reformulate the point-pixel matching problem as an optical flow estimation problem and solve the Perspective-n-Point (PnP) problem based on the resulting correspondences to find the relative pose between the camera and the LiDAR point cloud. We extensively evaluate CMRNext on six different robotic platforms, including three publicly available datasets and three in-house robots. Our experimental evaluations demonstrate that CMRNext outperforms existing approaches on both tasks and effectively generalizes to previously unseen environments and sensor setups in a zero-shot manner.

Technical Approach

Overview of our approach

In this paper, we propose CMRNext, a novel DNN-based approach for camera-LiDAR matching that is independent of any sensor-specific parameter. Our approach can be employed for monocular localization in LiDAR-maps, as well as camera-LiDAR extrinsic calibration. Moreover, CMRNext is able to generalize to different environments and different sensor setups, including different intrinsic and extrinsic parameters. This is achieved by decoupling the matching step from the metric pose estimation step. In particular, we first train a CNN to predict dense matches between image pixels and 3D points in the LiDAR point clouds, together with their respective uncertainties. These matches are then used to estimate the pose of the camera using a traditional PnP algorithm. Hence, the network only reasons at the pixel level, and is thus independent of any metric information. The intrinsic parameters of the camera are instead given as input to the PnP algorithm, and therefore the network can be used with different cameras without needing any retraining.

Video

Code

A software implementation of this project based on PyTorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors. We will release the code upon the acceptance of our paper.

Publications

If you find our work useful, please consider citing our paper:

Daniele Cattaneo, and Abhinav Valada
CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration
Under review, 2024.

(PDF) (BibTeX)

Authors

Daniele Cattaneo

Daniele Cattaneo

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant number 468878300.