Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Loop closing is an important part of SLAM for autonomous mobile systems.
- BoW features can be used for loop searching and 6-DoF loop correction.
- BoW3D is a novel method for real-time loop closing in 3D LiDAR SLAM.
- BoW3D is efficient, pose-invariant and can be used for accurate point-to-point matching.
- BoW3D is tested on public datasets and shows better performance than other state-of-the-art algorithms.
- BoW3D takes an average of 48 ms to recognize and correct the loops on KITTI 00.
Paper Content
I. introduction
- Fast and robust loop closing is essential for long-term SLAM
- Standard frame-to-frame point registration algorithms may fail due to large drift in pose estimation
- Place recognition is done by building a database from images or point clouds
- Distance-based association can be used for loop closing with drift in a certain range
- BoW is used for efficient image retrieval in visual SLAM
- Challenges in 3D LiDAR SLAM due to irregularity, sparsity and disorder of LiDAR point cloud
- Proposed loop closing method builds BoW for 3D LiDAR point clouds
- Hash table used as basic structure of database
- Accurate point-to-point LinK3D matching used to calculate 6-DoF loop pose
Feature extraction
- Feature Extraction, Odometry and Mapping of A-LOAM, and Loop Closing are the three modules of the system
- BoW3D has been embedded to the loop closing thread
- Experimental results show that the method can reduce drifts and improve accuracy of 3D LiDAR SLAM
- Place recognition and pose graph optimization are two steps of loop closing
- Different sensor modalities have been used for loop closing, including camera images and 3D LiDAR points
- BoW is suitable for camera SLAM due to its efficiency
- BoW compresses image information and builds a tree to speed up loop retrieval
- Most existing methods are too time-consuming or can’t provide 6-DoF pose estimation
Iii. background review
A. review of link3d features
- BoW3D is based on the LinK3D feature
- LinK3D consists of three parts: keypoint extraction, descriptor generation and feature matching
- LinK3D descriptor is represented by a 180-dimension vector
- LinK3D is lightweight and takes an average of 32 ms to extract features from the point cloud
- LinK3D can be used to achieve accurate point-to-point matching, which enables it to be applied to fast 3D registration
B. review of bag of words
- BoW is used to recognize revisited places by retrieving 2D features
- BoW creates a visual vocabulary as a tree structure from a training image dataset
- BoW converts extracted features of a new image into a low-dimensional vector
- Vector contains term frequency and inverse document frequency (tf-idf) score
- Higher tf-idf score indicates more frequent word in the image
- Similarity between word and words in database is computed if score is high enough
- Invert index is used to search corresponding images
Iv. methodology
- Proposed loop closing system based on BoW3D
- System embedded in state-of-the-art A-LOAM6
- System consists of three parts: extracting LinK3D features, BoW3D encoding, and loop closure detection
A. bow3d algorithm
- BoW3D algorithm proposed
- LinK3D descriptor used, no further conversion needed
- Hash table used to build one-to-one mapping between words and places
- Retrieval algorithm used to retrieve words and count frequency of each place
- Inverse document frequency used to measure difference between number of places
- Loop correction used to provide constraint for pose graph optimization
Place set1
- BoW3D data structure consists of words and places
- Cost function is used to compute the loop
- Rotation and translation of transformation is calculated
- Update algorithm is proposed to add new words and places to the database
- Descriptors are selected based on distance to LiDAR centers
B. loop optimization
- Pose graph is built for loop optimization
- Pose graph consists of global poses and observation constraints
- Cost function is minimized to optimize the pose graph
- Points in local map are updated based on optimized global poses
V. experiments
- Evaluated performance of algorithm
- Used KITTI dataset with Velodyne HDL-64E S2
- 11 sequences with ground truth poses
- Used Euclidean distance and time difference to determine true positive loop pairs
- Experiments performed on notebook with Intel Core i7 @2.2 GHz processor and 16 GB RAM
A. place recognition performance
- Our method outperforms other state-of-the-art LiDAR loop closure detection and place recognition methods
- Our method can be used to correct the full 6-DoF loop pose
- Our method is based on the LinK3D descriptors, which are pose invariant
- Our system forms constraints based on accurate point-to-point matching results
B. performance on lidar-based slam
- Evaluated performance of loop closing system used in 3D LiDAR-based SLAM
- Verified accuracy of loop correction and whole trajectories using Euclidean distance, Ξ and RMS E
- Results showed loop closing system can effectively correct cumulative errors and reduce drifts of 3D LiDAR SLAM system
C. hyperparameters setup and robustness analyzation
- Performance of BoW3D measured by F1 score and average detection time
- As T h r increases, runtime increases and F1 score remains the same
- Setting T h f less than 5 increases F1 score but requires more time
- Setting T h f larger than 8 reduces robustness of algorithm
- Optimal settings for robustness: T h r = 4, T h f = 5
D. system runtime
- Evaluated average runtime of each module in SLAM system after integrating loop closing
- Used KITTI 00 dataset with 4K+ LiDAR scans
- Set T h r = 4, T h f = 5, number of closer features as 5 when adding to database, 3 when retrieving from database
- Runtime of each module shown in Fig. 7
- Each module operates separately in different threads
- Runtime of mapping thread and PGO more than 100 ms, but can be performed online due to low frequency
- BoW3D takes less than 100 ms to process one frame, ensuring realtime performance of system
Vi. conclusion
- Proposed a novel 3D-feature-based bag of words algorithm for place recognition
- Consists of three parts: place retrieval, loop correction and database update
- Hash table used as overall structure of database
- Achieves competitive results compared to state-of-the-art methods
- Does not require pre-training or GPU resources