arxiv-summary: AI-summarized AI papers

Revisiting Rotation Averaging: Uncertainties and Robust Losses

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. Current methods have a weakly connected cost function with the input data. Proposed method directly propagates uncertainty from point correspondences into the rotation averaging. Integrate a variant of the MAGSAC loss into the rotation averaging problem. Results superior to baselines in terms of accuracy on large-scale public benchmarks....

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. DiffusionDepth is a new approach that reformulates monocular depth estimation as a denoising diffusion process. The model learns to reverse the process of diffusing the refined depth of itself into random depth distribution. DiffusionDepth is superior for generating accurate and highly detailed depth maps....

Bayesian at heart: Towards autonomic outflow estimation via generative state-space modelling of heart rate dynamics

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent research is exploring how cognitive processes are affected by the brain and body. Physiological features such as breathing rhythms, heart rate, and skin conductance are being analyzed. Heart rate dynamics are of particular interest as they provide insight into the autonomic nervous system. Extracting useful information from heartbeats is challenging due to noisy estimates....

X-Avatar: Expressive Human Avatars

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract X-Avatar is a novel avatar model that captures the full expressiveness of digital humans. X-Avatar can be learned from 3D scans or RGB-D data. X-Avatar uses a part-aware learned forward skinning module. X-Avatar uses part-aware sampling and initialization strategies. X-Avatar uses a texture network to capture the appearance of the avatar. X-Avatar outperforms strong baselines in both data domains....

Ewald-based Long-Range Message Passing for Molecular Graphs

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent years have seen fast improvement in neural architectures that learn potential energy surfaces from molecular data. Message Passing Neural Network (MPNN) is a key driver of this success. MPNN relies on a spatial distance limit on messages, which can impede the learning of long-range interactions. Ewald message passing is proposed to address this drawback, which limits interactions via a cutoff on frequency instead of distance....

Video-P2P: Video Editing with Cross-attention Control

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Presents Video-P2P, a framework for real-world video editing with cross-attention control Adapts an image generation diffusion model to complete various video editing tasks Introduces a novel decoupled-guidance strategy for attention control Enables various text-driven editing applications Works well on real-world videos for generating new characters while preserving original poses and scenes Paper Content Introduction Video creation and editing are key tasks Text-driven editing is a promising pipeline Editing local objects in a video is challenging This paper proposes a pipeline for video editing Text-driven image editing requires a model to generate target content Attention control is the most effective pipeline for detailed image editing Inverting images into latent features with a pre-trained diffusion model Controlling attention maps in the denoising process to edit the image Proposing a novel framework to show pre-trained image diffusion model can be adapted for video editing Using a structure on inversion and attention control for all frames Adopting a method to convert a T2I model into a T2S model Optimizing a shared unconditional embedding for all frames to align the denoising latent features with the diffusion latent features Proposing a decoupled-guidance strategy in attention control Text driven editing Generative models have been used for image editing Video editing with generative models has seen advances recently Generative models can be used for tasks such as stylization and customization Proposed method allows for local editing with a diffusion model pre-trained on images Method V is a real video with n frames Prompt-to-Prompt setting introduces source prompt P and edited prompt P* to generate edited video V* Video-P2P framework proposed to achieve cross-attention control in video editing Shared unconditional embedding optimized for video inversion Different guidance used for source and edited prompts, with attention maps incorporated Video inversion Constructed a T2S model with 1x3x3 pattern convolution kernels and temporal attention Replaced self-attentions with frame-attentions Model processes video pair-by-pair and computes n times to obtain prediction for every frame Fine-tuned query projection matrices and additional temporal attention to perform noise prediction Used DDIM inversion to generate latent features and shared unconditional embedding for all frames Decoupled-guidance attention control Existing works require an inference pipeline with both reconstruction ability and editability to perform attention control on real images....

Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have been developed for commercial applications Inference hyperparameters can affect the utility/cost of text generation EcoOptiGen is a framework for economical hyperparameter optimization and cost-based pruning Experiments with GPT-3.5 models verify the effectiveness of EcoOptiGen EcoOptiGen is implemented in the FLAML library Paper Content Introduction LLMs have demonstrated impressive capabilities in a range of generative tasks LLMs have been used to build powerful user experiences Research community has studied the effect of individual hyperparameters on inference performance Need to optimize hyperparameters collectively and systematically Cost is a concern for most application builders Proposed cost-based pruning strategy to improve optimization efficiency Evaluated on four datasets, found higher quality hyperparameter settings than default Pruning technique increases tuning performance significantly Holistic hyperparameter optimization can mitigate idiosyncrasies Background Text generation with llms LLM Text Generation takes an input prompt and generates one or more output responses Input prompt can include multiple examples to demonstrate desired responses Output can be used by applications in various ways Cost of LLM Text Generation is measured in number of tokens in input and output Goal is to maximize utility of generated text under inference budget constraint Hyperparameters affect cost and utility of generated text Interactions between hyperparameters can be complex Ecooptigen Notations and definitions introduced for EcoOptiGen: Tuning Data D, Utility Function U, Budget Constraints B, Search Space S Utility of multiple verifiable responses is defined as the best utility score in all responses Budget Constraints B is a tuple of two values: B....

The Descriptive Complexity of Graph Neural Networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Graph Neural Networks (GNNs) can be used to compute graph queries. GNNs can use arbitrary real weights and a wide class of activation functions. GNNs with random initialization and global readout can compute the same queries as bounded depth Boolean circuits with threshold gates. Queries computable by a single GNN with piecewise linear activations and rational weights are definable in GFO+C without built-in relations....

Magnushammer: A Transformer-based Approach to Premise Selection

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Premise selection is a fundamental problem of automated theorem proving. Traditional symbolic systems can be outperformed by a neural transformer-based approach called Magnushammer. Magnushammer achieved a proof rate of 59.5% on the PISA benchmark, compared to 38.3% for Sledgehammer. Combining Magnushammer with a neural formal prover based on a language model improved the proof rate from 57....

The Lie-Group Bayesian Learning Rule

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Bayesian Learning Rule provides a framework for generic algorithm design Difficult to use due to parameterization, gradients, and updates Extension based on Lie-groups simplifies difficulties New algorithm for deep learning with desirable attributes Exploits Lie-group structures for new algorithm design Paper Content Introduction Bayesian Learning Rule (BLR) provides a general framework to derive algorithms from optimization, deep learning, and graphical models BLR uses natural-gradient descent to find approximations of the generalized posterior distribution BLR has been used to design new algorithms for uncertainty estimation in deep learning BLR can be difficult to use for three reasons Extension of BLR based on Lie-groups proposed to address difficulties Lie-group BLR uses group’s exponential map to update candidate distributions Gradient computations simplified by reparameterization trick Update naturally stays within the manifold Use cases for algorithm design in deep learning using additive, multiplicative, and affine groups New algorithm with multiplicative group gives rise to networks with nodes that are forced to be either excitatory or inhibitory The bayesian learning rule BLR aims to find a posterior candidate in a space of candidate distributions Balancing the two terms requires an exploration-exploitation tradeoff Problem can be rewritten as an inference problem When the loss corresponds to the log-joint distribution of a Bayesian model, the solution coincides with the posterior distribution BLR is a natural-gradient descent algorithm BLR can recover many existing algorithms from a variety of fields Design of new algorithms is possible BLR can be difficult to use in many cases Computing the gradient with respect to µ is not always straightforward λ obtained by BLR may not always be valid natural parameters The lie-group bayesian learning rule Proposing a Lie-group based extension of the BLR Describing Lie groups and their actions Parameterization and exponential map Deriving the new learning rule Lie groups and their actions Lie-group is a set with a binary operation that satisfies associativity, identity element and inverses Smooth manifold is locally diffeomorphic to Euclidean space Examples of Lie-groups include (R, +) and (R >0 , ×) Cartesian product of two Lie-groups is also a Lie-group Action of Lie-group on parameter-space is a smooth map Example of action is (A, b) • θ = Aθ + b Lie group parametrization G is an action on a space of measures Pushforwards are used to define another action on the space of measures A base distribution q0 is given with positive density The space of candidate distributions Q is the orbit of q0 under the action of G Every q in Q can be parametrized by group elements g Examples of EFs that can be parameterized this way include Gaussian and Bernoulli distributions This parameterization is useful for using non-EF distributions such as the Laplace distribution The exponential map and lie group updates Goal is to find a group element g* that minimizes a given energy function Exponential map is used to move in the direction of fastest descent Exponential map is a smooth function that folds the tangent space at identity to the group For diagonal matrices, exponential map is given by Taylor series Update of the form g ← g exp(−αX) is used to move in the direction of X with a step-size of α Simplifying gradients through reparametrization We will use the group’s exponential map to derive a new learning rule....