Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Introducing 3DBiCar, a large-scale dataset of 3D biped cartoon characters
- Introducing RaBit, a parametric model built upon 3DBiCar
- Applications of 3DBiCar and RaBit include single-view reconstruction, sketch-based modeling, and 3D cartoon animation
- Part-sensitive texture reasoner used to make local areas more detailed
Paper Content
Introduction
- Rapid development of digitization leads to demand for high-quality 3D articulated characters
- Creating 3D characters is labor-intensive and time-consuming
- 3D sensing devices make capturing 3D data from the real world convenient
- Parametric models and deep learning techniques can infer accurate 3D digital humans from single-view images or sparse sketches
- Introduction of 3DBiCar, a large-scale 3D biped cartoon character dataset with 1,500 high-quality 3D models
- RaBit, a generative model for 3D biped cartoon character generation
- BiCar-Net, a baseline method for single-view reconstruction
- Applications of sketch-based modeling and 3D character animation
Related work
- 3D character datasets can be categorized as real-captured and computer-designed
- Real-captured datasets focus on human faces and bodies
- FaceWarehouse and FaceScape collect 3D faces with high diversity
- CAE-SAR dataset is widely used for body shape modeling
- Computer-designed datasets lack diversity and are unsatisfactory
- 3DCaricShop and SimpModeling help with unreal character heads
- 3DBiCar is a large 3D biped cartoon character dataset
- Parametric shape modeling uses PCA and 3DMM
- Parametric texture modeling uses deep neural networks
- GANFIT, StylePeople, and GET3D use neural texture synthesis
Dataset
- Digitizing realistic and articulated human characters has made progress, but creating visually plausible biped cartoon characters is still difficult.
- 3DBiCar is a large-scale full-body 3D biped character data set.
- 3DBiCar contains 1,500 high-quality 3D models with diverse identities, shapes, and textural styles.
- 3DBiCar has a unified mesh topology and provides various forms of data for each character.
Parametric modeling
- Proposed first parametric model of 3D biped cartoon characters (RaBit)
- Model contains linear blend model for shapes and neural generator for textures
- Parametric space decomposed into identity-related body parameter B, non-rigid pose-related parameter ฮ, and texture-related parameter T
Shape modeling
- Linear shape models are used to represent 3D models.
- PCA has been used to model the human body and face.
- An equation is used to parameterize character shape linearly.
- PCA is used to learn the shape model of RaBit from 1,050 characters.
- Eyeballs can be computed based on predefined landmarks.
Pose modeling
- RaBit uses a vertex-based linear blend skinning technique.
- Pose parameter ฮ is a set of angles.
- Pose function F P changes vertex from rest pose to posed mesh.
- G k (ฮ, J) is the global transformation of joint k.
- A(k) is a set of all ancestors of joint k.
- J j is the location of the j-th joint.
Texture modeling
- Traditional linear PCA can build a decent statistical shape model, but cannot represent high-frequency details in textures.
- GAN-based architectures have shown the capability of generating high-fidelity images.
- StyleGAN2-based techniques are used to generate UV texture maps with a coherent UV unfolding.
- Neural texture generator translates a latent code to a texture map.
- Textured mesh is generated by applying the texture map to the mesh model.
Single-view reconstruction
- Single-view reconstruction is a popular task for 3D content generation
- Bi-CarNet is a baseline learning-based method for reconstructing 3D shape, pose, and texture from a single masked image of cartoon characters
- PSR is used to address the issue of losing detailed appearances of small areas
- Five individual UV-mappings are designed for significant parts of the cartoon character
- Fuser is used to address blending artifacts
Experiments
- Split 3DBiCar into training and testing set
- Generate a large number of synthetic paired data
- 13,650 pairs for training
- BiCarNet takes an image with foreground masked as input
- BiCarNet can generate vivid 3D cartoon characters
- HMR-like blocks and RaBit for shape and pose learning
- Compare 3 methods for shape reconstruction
- Compare GAN-based texture generator with PCA-based inference
- Ablative analysis on BiCarNet without Fuser and Part-sensitive Reasoner
Sketch-based modeling
- Customizing 3D biped cartoon characters usually requires a lot of work with commercial tools.
- Sketch-based modeling allows amateur users to customize 3D shapes in a simple and intuitive way.
- 12,000 T-pose models were generated by sampling shape vectors and using RaBit.
- 108,000 sketch-model pairs were created using suggestive contour.
- ResNet-50 and MLPs were used to map input sketches to 100-dimensional shape parameters.
- Output characters are animation-ready and can be used with other commercial tools.
- Fig. 10 shows sketches created by users and corresponding models generated by the system.
3d character animation
- Extract human from video frames and use temporal-aware encoder to recover sequence of poses
- Use motion retargeting to convert poses to motion of cartoon characters
- Animation-ready characters generated by RaBit can be used for 3D animation
Conclusion
- 3DBiCar is the first large-scale 3D biped cartoon character dataset
- It contains 1,500 textured and skinned models with a consistent mesh topology
- RaBit is the first 3D full-body cartoon parametric model
- BiCarNet is a baseline method for reconstructing 3D textured models from a single image with cartoon characters
- Experimental results demonstrate the capability of 3DBiCar and RaBit as well as the effectiveness of BiCarNet
- Two applications, i.e., sketch-based modeling and 3D character animation, demonstrate the usability and practicality of the dataset and parametric model
- 4 image styles are defined based on their different sources: picture book, computer designed, hand drawn, and toy
- Shape model is learned from 1,050 models of 3DBiCar using PCA
- Pose modeling utilizes the consistent skeleton and skinning weight matrix defined in 3DBiCar
- Texture generator follows the architecture of StyleGAN2
- 3DBiCar is split into a training set (1,050 image-model pairs) and a testing set (450 pairs)
- Synthetic paired data is augmented with the help of RaBit
- Eyeball is approximated as a sphere
- Sketch-based modeling interface is implemented with the QT framework
- 12,000 shape vectors are randomly sampled and fed to RaBit to generate 3D cartoon characters
- Suggestive contour is applied to render the front-view sketches with different abstraction levels
- ResNet-50 module and three MLPs are used as the encoder-decoder architecture
- pSp-encoder is used to learn a 512-dimensional texture vector from the image
- pSp is used as the basic building block to learn multiple local UV textures
- pix2pixHD is used as the fusion module (Fuser)