arxiv-summary: AI-summarized AI papers

BoW3D: Bag of Words for Real-Time Loop Closing in 3D LiDAR SLAM

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Loop closing is an important part of SLAM for autonomous mobile systems. BoW features can be used for loop searching and 6-DoF loop correction. BoW3D is a novel method for real-time loop closing in 3D LiDAR SLAM. BoW3D is efficient, pose-invariant and can be used for accurate point-to-point matching. BoW3D is tested on public datasets and shows better performance than other state-of-the-art algorithms....

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large language models require significant GPU memory for inference. A procedure is developed to reduce memory needed for inference by half while retaining full precision performance. A 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. A two-part quantization procedure is developed to cope with emergent features in transformer language models....

TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Focuses on automatic segmentation of multiple anatomical structures in CT images. Problems with existing algorithms: difficult to use, don’t generalize, can only segment one structure. New dataset and segmentation toolkit solves these problems. Paper Content Introduction Need for computer-based evaluation methods for radiological images Segmentation of major anatomical structures Work on segmenting specific anatomical structures (e....

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-to-image models allow users to create images through natural language. A new approach is presented to allow creative freedom with 3-5 images of a user-provided concept. The approach uses “words” in the embedding space of a frozen text-to-image model to represent the concept. A single word embedding is sufficient for capturing unique and varied concepts....

Neural Density-Distance Fields

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural fields have been used for 3D vision tasks Several methods have been proposed to estimate distance or density fields using neural fields Neural Radiance Field (NeRF) does not provide density gradient in most empty regions Neural Implicit Surface (NeuS) has limitations in objects’ surface shapes This paper proposes Neural Density-Distance Field (NeDDF) to reciprocally constrain the distance and density fields NeDDF enables explicit conversion from distance field to density field Experiments show NeDDF can achieve high localization performance Paper Content Introduction Representing 3D shapes using coordinate-based neural networks Neural Radiance Fields (NeRF) have shown impressive quality for tasks such as novel view-synthesis Proposed Neural Density-Distance Field (NeDDF) achieves robust localization with distance fields while providing object reconstruction quality comparable to NeRF Two main types of 3D shape representation in neural fields: density field and distance field Distance field provides gradient over a wide range even after optimization converges NeDDF has a network that inputs a position and outputs the distance and its gradient, and a converter that explicitly calculates the density Three contributions: extending the distance field, recovering corresponding density, and implementation to alleviate instability of distance gradient Neural fields Traditional way of representing volumes is to discretize density or distance into voxels Memory-efficient representations such as octree or hash table have been proposed Geometric deep learning methods can handle irregular non-grid structures Neural fields can model output dimensions without increasing model capacity Modeling using gradient information has been proposed Density field Density field outputs volume density for 3D position Used with color field to enable volume rendering Low density value can describe semi-transparent objects Can model specular reflections NeRF has limitation of known camera pose and static scene Many NeRF-based methods proposed to address this Blank areas with density value of 0 have uncertain gradient directions NeDDF provides consistent distance field while retaining expressiveness of density field Can improve registration performance from rough initial camera poses Distance field Distance field takes 3D position as input and outputs distance to nearest boundary Widely used in fusion and registration because provides stable surfaces and normal vectors Provides residuals and gradient directions for fast-fitting of two shapes KinectFusion and DynamicFusion use SDF for localization and shape integration DeepSDF, SAL, UDF, IDR, UNISURF, VolSDF, NeuS use neural fields to handle distance fields This study extends distance field to correspond to various density distributions from depth values Method Distance and density fields are considered Distance field is redefined to interpret arbitrary density fields Conversion formula is introduced to obtain density of independent points from distance and gradient of distance value Distance field from density field Distance field in boundary surfaces describes the distance to the nearest surface for a given location....

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent progress in vision Transformers has been successful in various tasks. A convolution-based framework can be used to implement the key ingredients of vision Transformers. A new operation called Recursive Gated Convolution ($\textit{g}^\textit{n}$Conv) is highly flexible and customizable. HorNet is a new family of generic vision backbones based on $\textit{g}^\textit{n}$Conv. HorNet outperforms Swin Transformers and ConvNeXt....

Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Ground segmentation is an essential task for 3D perception using 3D LiDAR sensors. Several ground segmentation methods have been proposed, but they have some limitations. Patchwork++ is a robust ground segmentation method that addresses these limitations. Patchwork++ uses adaptive ground likelihood estimation, temporal ground revert, region-wise vertical plane fitting, and reflected noise removal....

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract YOLOv7 surpasses all known object detectors in both speed and accuracy YOLOv7-E6 outperforms other detectors in speed and accuracy YOLOv7 outperforms other object detectors in speed and accuracy YOLOv7 is trained on MS COCO dataset from scratch without using any other datasets or pre-trained weights Paper Content Introduction Real-time object detection is important in computer vision Computing devices used for real-time object detection are mobile CPUs, GPUs, and NPUs Real-time object detector proposed in paper supports mobile GPU and GPU devices from edge to cloud Edge devices focus on speeding up operations such as convolution, depth-wise convolution, or MLP Real-time object detectors for CPU are based on MobileNet, ShuffleNet, or GhostNet Real-time object detectors for GPU are based on ResNet, DarkNet, or DLA Proposed methods focus on optimization of training process Model re-parameterization and dynamic label assignment are important topics in network training and object detection Proposed methods address new issues discovered in training of object detector Proposed methods reduce parameters and computation of state-of-the-art real-time object detector Model re-parameterization Common practices for model-level reparameterization involve training multiple models and averaging their weights....

Feature Refinement to Improve High Resolution Image Inpainting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of degradation in inpainting quality of neural networks at high resolutions Receptive field remains static when resolution increases Downscaling image prior to inpainting produces coherent structure but lacks detail Optimize intermediate featuremaps of a network to improve inpainting results and establish new state-of-the-art Paper Content Introduction Image inpainting is the task of filling missing pixels or regions in an image....

Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Blind face restoration is a difficult problem that requires additional help to improve the mapping from degraded inputs to desired outputs. CodeFormer is a Transformer-based prediction network that models the global composition and context of low-quality faces for code prediction. CodeFormer has a controllable feature transformation module that allows a flexible trade-off between fidelity and quality....