We propose a novel type of a map for visual navigation, a renderable neural radiance map (RNR-Map), which is designed to contain the overall visual information of a 3D environment. The RNR-Map has a grid form and consists of latent codes at each pixel. These latent codes are embedded from image observations, and can be converted to the neural radiance field which enables image rendering given a camera pose. The recorded latent codes implicitly contain visual information about the environment, which makes the RNR-Map visually descriptive. This visual information in RNR-Map can be a useful guideline for visual localization and navigation. We develop localization and navigation frameworks that can effectively utilize the RNR-Map.
We evaluate the proposed frameworks on camera tracking, visual localization, and image-goal navigation. Experimental results show that the RNR-Map-based localization framework can find the target location based on a single query image with fast speed and competitive accuracy compared to other baselines. Also, this localization framework is robust to environmental changes, and even finds the most visually similar places when a query image from a different environment is given. The proposed navigation framework outperforms the existing image-goal navigation methods in difficult scenarios, under odometry and actuation noises. The navigation framework shows 65.7% success rate in curved scenarios of the NRNS dataset, which is an improvement of 18.6% over the current state-of-the-art.
The proposed method can encode the observation images into latent codes with fast speed of 91.9Hz. These latent codes are embedded in the grid map according to its position. They can be converted to a neural radiance fields which can render the corresponding region. We provide some examples of building RNR-Map below.
We can locate places based on an image by directly utilizing the latent codes without rendering images. The visual localization process operates with fast speed (56.8Hz) and high accuracy (99% inliers less than 50cm )
We observed that visual localization framework with RNR-Map is robust to environment changes.
The renderable property of RNR-Map also enables fine-level pose prediction.
The visual information in RNR-Map is useful for exploring an environment when finding an image query.
@InProceedings{Kwon_2023_CVPR,
author = {Kwon, Obin and Park, Jeongho and Oh, Songhwai},
title = {Renderable Neural Radiance Map for Visual Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {9099-9108}
}