Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Introducing 3DBiCar, a large-scale dataset of 3D biped cartoon characters
Introducing RaBit, a parametric model built upon 3DBiCar
Applications of 3DBiCar and RaBit include single-view reconstruction, sketch-based modeling, and 3D cartoon animation
Part-sensitive texture reasoner used to make local areas more detailed

Paper Content

Introduction

Rapid development of digitization leads to demand for high-quality 3D articulated characters
Creating 3D characters is labor-intensive and time-consuming
3D sensing devices make capturing 3D data from the real world convenient
Parametric models and deep learning techniques can infer accurate 3D digital humans from single-view images or sparse sketches
Introduction of 3DBiCar, a large-scale 3D biped cartoon character dataset with 1,500 high-quality 3D models
RaBit, a generative model for 3D biped cartoon character generation
BiCar-Net, a baseline method for single-view reconstruction
Applications of sketch-based modeling and 3D character animation

3D character datasets can be categorized as real-captured and computer-designed
Real-captured datasets focus on human faces and bodies
FaceWarehouse and FaceScape collect 3D faces with high diversity
CAE-SAR dataset is widely used for body shape modeling
Computer-designed datasets lack diversity and are unsatisfactory
3DCaricShop and SimpModeling help with unreal character heads
3DBiCar is a large 3D biped cartoon character dataset
Parametric shape modeling uses PCA and 3DMM
Parametric texture modeling uses deep neural networks
GANFIT, StylePeople, and GET3D use neural texture synthesis

Dataset

Digitizing realistic and articulated human characters has made progress, but creating visually plausible biped cartoon characters is still difficult.
3DBiCar is a large-scale full-body 3D biped character data set.
3DBiCar contains 1,500 high-quality 3D models with diverse identities, shapes, and textural styles.
3DBiCar has a unified mesh topology and provides various forms of data for each character.

Parametric modeling

Proposed first parametric model of 3D biped cartoon characters (RaBit)
Model contains linear blend model for shapes and neural generator for textures
Parametric space decomposed into identity-related body parameter B, non-rigid pose-related parameter Θ, and texture-related parameter T

Shape modeling

Linear shape models are used to represent 3D models.
PCA has been used to model the human body and face.
An equation is used to parameterize character shape linearly.
PCA is used to learn the shape model of RaBit from 1,050 characters.
Eyeballs can be computed based on predefined landmarks.

Pose modeling

RaBit uses a vertex-based linear blend skinning technique.
Pose parameter Θ is a set of angles.
Pose function F P changes vertex from rest pose to posed mesh.
G k (Θ, J) is the global transformation of joint k.
A(k) is a set of all ancestors of joint k.
J j is the location of the j-th joint.

Texture modeling

Traditional linear PCA can build a decent statistical shape model, but cannot represent high-frequency details in textures.
GAN-based architectures have shown the capability of generating high-fidelity images.
StyleGAN2-based techniques are used to generate UV texture maps with a coherent UV unfolding.
Neural texture generator translates a latent code to a texture map.
Textured mesh is generated by applying the texture map to the mesh model.

Single-view reconstruction

Single-view reconstruction is a popular task for 3D content generation
Bi-CarNet is a baseline learning-based method for reconstructing 3D shape, pose, and texture from a single masked image of cartoon characters
PSR is used to address the issue of losing detailed appearances of small areas
Five individual UV-mappings are designed for significant parts of the cartoon character
Fuser is used to address blending artifacts

Experiments

Split 3DBiCar into training and testing set
Generate a large number of synthetic paired data
13,650 pairs for training
BiCarNet takes an image with foreground masked as input
BiCarNet can generate vivid 3D cartoon characters
HMR-like blocks and RaBit for shape and pose learning
Compare 3 methods for shape reconstruction
Compare GAN-based texture generator with PCA-based inference
Ablative analysis on BiCarNet without Fuser and Part-sensitive Reasoner

Sketch-based modeling

Customizing 3D biped cartoon characters usually requires a lot of work with commercial tools.
Sketch-based modeling allows amateur users to customize 3D shapes in a simple and intuitive way.
12,000 T-pose models were generated by sampling shape vectors and using RaBit.
108,000 sketch-model pairs were created using suggestive contour.
ResNet-50 and MLPs were used to map input sketches to 100-dimensional shape parameters.
Output characters are animation-ready and can be used with other commercial tools.
Fig. 10 shows sketches created by users and corresponding models generated by the system.

3d character animation

Extract human from video frames and use temporal-aware encoder to recover sequence of poses
Use motion retargeting to convert poses to motion of cartoon characters
Animation-ready characters generated by RaBit can be used for 3D animation

Conclusion

3DBiCar is the first large-scale 3D biped cartoon character dataset
It contains 1,500 textured and skinned models with a consistent mesh topology
RaBit is the first 3D full-body cartoon parametric model
BiCarNet is a baseline method for reconstructing 3D textured models from a single image with cartoon characters
Experimental results demonstrate the capability of 3DBiCar and RaBit as well as the effectiveness of BiCarNet
Two applications, i.e., sketch-based modeling and 3D character animation, demonstrate the usability and practicality of the dataset and parametric model
4 image styles are defined based on their different sources: picture book, computer designed, hand drawn, and toy
Shape model is learned from 1,050 models of 3DBiCar using PCA
Pose modeling utilizes the consistent skeleton and skinning weight matrix defined in 3DBiCar
Texture generator follows the architecture of StyleGAN2
3DBiCar is split into a training set (1,050 image-model pairs) and a testing set (450 pairs)
Synthetic paired data is augmented with the help of RaBit
Eyeball is approximated as a sphere
Sketch-based modeling interface is implemented with the QT framework
12,000 shape vectors are randomly sampled and fed to RaBit to generate 3D cartoon characters
Suggestive contour is applied to render the front-view sketches with different abstraction levels
ResNet-50 module and three MLPs are used as the encoder-decoder architecture
pSp-encoder is used to learn a 512-dimensional texture vector from the image
pSp is used as the basic building block to learn multiple local UV textures
pix2pixHD is used as the fusion module (Fuser)

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Dataset#

Parametric modeling#

Shape modeling#

Pose modeling#

Texture modeling#

Single-view reconstruction#

Experiments#

Sketch-based modeling#

3d character animation#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related work

Dataset

Parametric modeling

Shape modeling

Pose modeling

Texture modeling

Single-view reconstruction

Experiments

Sketch-based modeling

3d character animation

Conclusion