Link to paper The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract Presents I$^2$-SDF, a method for intrinsic indoor scene reconstruction and editing Uses differentiable Monte Carlo raytracing on neural signed distance fields Jointly recovers shapes, incident radiance and materials from multi-view images Introduces a novel bubble loss and error-guided adaptive sampling scheme Decomposes neural radiance field into spatially-varying material of the scene Demonstrates superior quality on indoor scene reconstruction, novel view synthesis, and scene editing Paper Content Introduction Reconstructing 3D scenes from multi-view images is a fundamental task in computer science Neural Radiance Field (NeRF) uses MLPs to approximate the underlying geometry and appearance of a 3D scene Novel view synthesis is insufficient for scene editing applications Inverse rendering or intrinsic decomposition reconstructs and decomposes the scene into shape, shading and surface reflectance Complex indoor scenes are difficult to reconstruct I 2 -SDF is a new method to decompose a 3D scene into its underlying shape, material, and incident radiance components Bubble loss and error-guided adaptive sampling improve reconstruction quality on small objects I 2 -SDF enables photorealistic indoor scene relighting and editing High-quality synthetic indoor scene multi-view dataset provided Related work Neural implicit scene representations are used to represent 3D geometry and radiance information Neural radiance field (NeRF) uses a single MLP to encode a scene as a continuous volumetric field Follow-up works accelerate reconstruction speed using voxels, hashgrids or deep image features Neural fields can also be applied to represent 3D geometric functions Difficulties in handling shape-radiance ambiguity on texture-less surfaces Traditional multi-view stereo methods struggle with texture-less regions Learning-based MVS methods divided into two categories: depth-based and TSDF-based Neural implicit SDF methods used to tackle texture-less regions Inverse rendering attempts to reconstruct and factorize the scene with geometry, material and lighting Neural implicit representations used to estimate BRDF and lighting from image collections Recent methods mainly focus on single object reconstruction and do not handle spatially-varying lighting conditions Overview Goal is to decompose shape, radiance and material of indoor scene according to multi-view input images Implicit representations used to model geometry, radiance and material Pipeline consists of neural SDF field, neural radiance field, neural material fields and emission field Two-stage training scheme used to avoid training ambiguities Implicit neural surface representation and volume rendering Represent scene geometry as an implicit signed distance function (SDF) SDF maps 3D point to closest distance to surface Parameterize SDF and scene appearance as MLP Use differentiable volume rendering to learn scene implicit representation from images Color, depth, and normal of surface can be accumulated Intrinsics decomposition Indoor scenes contain objects of different scales and visibility levels Existing indoor reconstruction methods often fail to recognize and reconstruct thin or suspended objects Neural networks tend to converge faster on low-frequency information than high-frequency information Gradients for small objects can vanish due to the nature of neural networks To address this problem, “bubbles” are inserted to create gradients for SDF near small or thin objects Bubble loss is used to minimize the absolute SDF value of surface points Importance sampling algorithm is used to filter out large planar areas and preserve small-object areas Geometry loss is used to approximate the geometry field Depth and normal priors are used to handle shape-radiance ambiguity Smoothness loss is used to encourage smooth surface reconstruction Emitter semantic field Radiance field F c is trained from LDR images, causing under-estimation of light intensity from emitters....