arxiv-summary: AI-summarized AI papers

DreamFusion: Text-to-3D using 2D Diffusion

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent breakthroughs in text-to-image synthesis have been driven by diffusion models 3D synthesis requires large-scale datasets and efficient architectures, which don’t exist Text-to-3D synthesis is done using a pretrained 2D text-to-image diffusion model Loss based on probability density distillation enables use of 2D diffusion model as prior DeepDream-like procedure optimizes a randomly-initialized 3D model via gradient descent Paper Content Introduction Generative image models now support high-fidelity, diverse and controllable image synthesis Quality improvements come from large aligned image-text datasets and scalable generative model architectures Diffusion models are effective at learning high-quality image generators Applying diffusion models to other modalities requires large amounts of modality-specific training data This work develops techniques to transfer pretrained 2D image-text diffusion models to 3D object synthesis 3D generative models can be trained on explicit representations of structure GANs can learn controllable 3D generators from photographs of a single object category Neural Radiance Fields can be used for neural inverse rendering Many 3D generative approaches have found success incorporating NeRF-like models This work uses pretrained 2D image-text models for 3D synthesis Score Distillation Sampling (SDS) enables sampling via optimization in differentiable image parameterizations DreamFusion generates high-fidelity coherent 3D objects and scenes for user-provided text prompts Diffusion models and score distillation sampling Diffusion models are generative models that learn to transform a sample from a noise distribution to a data distribution....

Personalizing Text-to-Image Generation via Aesthetic Gradients

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposes a method to personalize a CLIP-conditioned diffusion model Guides the generative process towards custom aesthetics Validated with qualitative and quantitative experiments Uses recent stable diffusion model and aesthetically-filtered datasets Code released on GitHub Paper Content Arxiv:2209.12330v1 [cs.cv] 25 sep 2022 Aim to provide user personalization to diffusion models Focus on learning custom objects from few images Alternative approach for personalization of text-to-image diffusion models Goal is to guide generative process towards custom aesthetics defined by user User chooses textual prompt to guide generation Represent aesthetic preferences of user with average of visual embeddings of images Measure agreement between CLIP representation of prompt and user preferences Perform gradient descent with respect to CLIP text encoder weights Only modify weights of CLIP text encoder Benefits include agnostic to diffusion model, computationally cheap, and user only needs to store one aesthetic embedding per set of images Conclusion

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Learned optimizers are neural networks that can accelerate machine learning model training Blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set Tools from dynamical systems are used to investigate the inductive biases and stability properties of optimization algorithms Simple modifications to a learned optimizer’s architecture and meta-training procedure can lead to improved stability and inductive bias The resulting learned optimizer outperforms the current state of the art and is capable of generalizing to tasks far different from those it was meta-trained on Paper Content Introduction Algorithms for stochastic non-convex optimization are important for neural network training Choice of optimization algorithm and hyperparameters is critical for model performance and training stability Few formal rules for choosing optimizers and hyperparameters Learned optimizers have been proposed Training methodology for learned optimizers is gradient-based Learned optimizers have reduced performance and stability when applied in different circumstances Learned optimizer performance is often highly dependent on random seed Use dynamical systems theory to characterize stability of parameter dynamics Propose changes to architecture and training of learned optimizers to improve stability and inductive bias Demonstrate improved stability and performance of resulting learned optimizer Related work Learned optimization has seen a surge of interest due to success of deep learning methods Meta-learning algorithms have been successful in few-shot learning Early approaches to few-shot learning used blackbox models Later approaches used algorithmic inductive biases such as gradient descent, metric learning, convex optimization, Bayesian inference, changepoint detection Adaptive optimization algorithms developed to improve stability of learning algorithms Line of work focused on meta-learning a neural network to choose hyperparameters Blackbox optimizers developed to exploit expressivity of neural networks Curriculum learning used to stabilize learning Truncated zeroth order optimization and reinforcement learning used to address chaotic behavior Fixed momentum operators used to create stable-by-design hidden states Problem statement Problem of training a neural network by optimizing its parameters Loss function acts on parameters and data Learned optimizer is defined by a parameteric update function Goal is to minimize meta-loss Examining optimizer performance in noisy quadratic setting Loss at each timestep is randomly sampled Update of the form φ t+1 = φ t − (g t + P ∇ t ) Nominal terms shift the region of stability Stability is necessary for achieving optimizers that reduce the loss over long training horizons....

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Industries are moving towards modeling massive 3D virtual worlds Need for content creation tools that can scale in terms of quantity, quality, and diversity of 3D content Aim to train performant 3D generative models that synthesize textured meshes Prior works lack geometric details, limited in mesh topology, don’t support textures, or use neural renderers Introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures GET3D able to generate high-quality 3D textured meshes Paper Content Introduction 3D content is important for many industries Manual creation of 3D assets is time-consuming and requires technical and artistic skills Creating many 3D models is difficult Generative 3D networks can produce high-quality and diverse 3D assets Requirements for 3D generative models: detailed geometry, arbitrary topology, textured mesh, 2D image supervision Prior work has focused on subsets of the requirements GET3D is a novel approach that fulfills all requirements GET3D can generate high-quality geometric and texture details GET3D can be adapted to other tasks, such as material and lighting effects, text-guided 3D shape generation Related work 3D generative models have been developed to generate photorealistic images 3D generative models focus mainly on generating geometry and disregard appearance GET3D is able to generate diverse shapes with arbitrary topology, high quality geometry, and texture 3D-aware generative image synthesis has been developed to tackle the problem of 3D-aware image synthesis GET3D directly outputs textured 3D meshes that can be readily used in standard graphics engines Method GET3D framework synthesizes textured 3D shapes Generation process is split into two parts: geometry branch and texture branch Training uses efficient differentiable rasterizer to render textured mesh into 2D images Model is differentiable, allowing for adversarial training from images Generator and rendering/loss functions introduced in Sec 3....

Efficient Few-Shot Learning Without Prompts

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent few-shot methods have achieved impressive results in label-scarce settings. These methods are difficult to employ due to high variability from manually crafted prompts and require billion-parameter language models. SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. SetFit works by fine-tuning a pretrained ST on a small number of text pairs and using the resulting model to generate rich text embeddings....

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Learning powerful representations in BEV for perception tasks is gaining attention from industry and academia. Conventional approaches for autonomous driving algorithms use front or perspective view. BEV perception has advantages such as representing scenes intuitively and fusion-friendly. Core problems for BEV perception include reconstructing 3D information, acquiring ground truth annotations, formulating pipelines, and adapting algorithms....

A Survey on Generative Diffusion Model

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Deep learning has potential for generation tasks due to its latent representation Generative models can generate observations randomly Diffusion Model is a rising class of generative models with power-generating ability Diffusion Model has drawbacks such as slow generation process, single data types, low likelihood, and inability for dimension reduction Improved techniques for existing problems in the diffusion-based model field include speed-up improvement, data structure diversification, likelihood optimization, and dimension reduction Applications with diffusion models include computer vision, sequence modeling, audio, and AI for science Paper Content Introduction Deep generative models have potential to create patterns humans cannot distinguish Focus on diffusion-based generative models Diffusion models do not require aligning posterior distributions, dealing with intractable partition functions, training additional discriminators, or imposing network constraints Diffusion models have been used in computer vision, natural language processing, and graph analysis Lack of systematic taxonomy and analysis of research progress on diffusion models Diffusion models provide tractable probabilistic parameterization, stable training procedure, and unified loss function design Diffusion models have been used in computer vision, sequence modeling, audio processing, and AI for science Diffusion models have inherent drawback of plenty of sampling steps and long sampling time Works aspire to accelerate diffusion process and improve sampling quality Diffusion models improved algorithms classified into four categories: speed-up improvement, data structure diversification, likelihood optimization, and dimension reduction • Application of diffusion models to computer vision, natural language processing, bioinformatics, and speech processing Domain-specialized problem formulation, related datasets, evaluation metrics, and downstream tasks, along with sets of benchmarks Limitations of models and possible further-proof directions Problem statement Notions and definitions State States are data distributions that describe diffusion models....

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-to-image models have enabled high-quality and diverse synthesis of images from a given text prompt. A new approach has been developed to “personalize” text-to-image diffusion models. The approach uses a few images of a subject to fine-tune a pretrained text-to-image model. The approach enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions....

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Investigated scaling behaviors of red teaming across 3 model sizes and 4 model types Released dataset of 38,961 red team attacks for others to analyze and learn from Analyzed data and found a variety of harmful outputs, ranging from offensive language to subtly unethical outputs Paper Content Introduction Large language models can have harmful behaviors Examples of harmful behaviors include reinforcing social biases, generating offensive/toxic outputs, leaking personal info, aiding in disinformation campaigns, generating extremist texts, and spreading falsehoods As AI systems improve, the scope of possible harms is likely to grow Strategies have been developed to address some of these harms Red teaming is a tool to address harm, using manual or automated methods to probe a language model for harmful outputs Paper describes early efforts to implement manual red teaming Investigated scaling behaviors for red teaming across 3 model sizes and 4 model types Released dataset of 38,961 red team attacks Described instructions, processes, and statistical methodologies for red teaming Proposed policy interventions for how to develop shared norms, practices, and technical standards for red teaming Related work We use the same models as in our previous work We run additional experiments to determine the influence of model size on susceptibility to red team attacks We analyze the content of the attacks to understand the types of harms uncovered by red teaming We provide more detail on our red team methods and release the data We focus on reinforcement learning from human feedback as our most promising safety intervention Red team task Developed an interface for red team members to have open-ended conversations with an AI assistant Provided a brief list of example conversation topics Asked participants to enter a description of how they intend to red team the model Reviewed literature and conducted interviews to incorporate best practices into task instructions and interface Warned participants of sensitive content Asked participants to select topics within their own risk tolerance Asked participants to select the more harmful of two model-generated responses Used dataset of pairs of model responses to train a harmlessness preference model Asked participants to rate how successful they were at making the AI assistant say something bad Assigned red team members to models at random Models We derive dialogue models from a general language model and a helpful and harmless preference model....

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Standard diffusion models involve an image transform and an image restoration operator. A family of generative models can be constructed by varying the choice of image degradation. Diffusion models can be generalized to create generative models even when using deterministic degradations. Code is available at a given URL. Paper Content Introduction Diffusion models are powerful tools for generative modeling Diffusion models are based on random noise removal Diffusion models are understood as a random walk around the image density function Variational inference with a Gaussian prior is used to derive the loss for the denoising network Examining the need for Gaussian noise or any randomness for diffusion models to work Considering models built around arbitrary image transformations Generative behavior emerges when a sequence of updates is applied at test time Cold diffusions require no Gaussian noise or any randomness during training or testing Background Generative models exist for natural language and images GANs have been used for image synthesis Diffusion models have become competitive for some applications Noise is used in training and sampling pipelines Noise is thought to expand the support of the low-dimensional training distribution Noise is also thought to act as data augmentation Iterative neural models have been used for inverse problems Diffusion models have been applied to inverse problems Noise is not a necessity in diffusion models Feature space similarity metrics have been proposed to measure how closely generative models approximate the real training data Generalized diffusion Diffusion models have two components: an image degradation operator and a trained restoration operator....