arxiv-summary: AI-summarized AI papers

Towards Democratizing Joint-Embedding Self-Supervised Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract JE-SSL has seen rapid developments in recent years due to its promise to leverage large unlabeled data. Development of JE-SSL methods driven by search for increasing classification accuracies and use of computational resources. This has led to numerous pre-conceived ideas that carried over across methods. This work debunks these ideas to unleash the full potential of JE-SSL....

Mixture of Soft Prompts for Controllable Data Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs can generate fluent text when the output follows natural language patterns. LLMs struggle when the output is confined to a limited ontology. MSP is a parameter-efficient procedure for generating data in a controlled manner. MSP produces diverse and natural text while preserving label semantics. MSP achieves state-of-the-art results on three benchmarks....

Chemically Transferable Generative Backmapping of Coarse-Grained Proteins

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Coarse-graining (CG) is a method used to accelerate molecular simulations of protein dynamics. Backmapping is the opposite operation of bringing lost atomistic details back from the CG representation. Machine learning (ML) has been used to produce accurate and efficient CG simulations of proteins, but fast and reliable backmapping remains a challenge. Rule-based methods produce poor all-atom geometries, needing computationally costly refinement....

Dropout Reduces Underfitting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Dropout is a regularizer for preventing overfitting in neural networks Dropout can also mitigate underfitting when used at the start of training Dropout reduces the directional variance of gradients across mini-batches Early dropout (dropout used only during the initial phases of training) can improve performance in underfitting models Late dropout (dropout not used in the early iterations and is only activated later in training) can regularize overfitting models Paper Content Introduction AlexNet’s “ImageNet moment” in 2012 launched a new era in deep learning Dropout was invented in 2012 and has since become widely adopted to reduce overfitting in neural networks Deep learning is evolving quickly and dropout has stayed relevant Drop rate of dropout has generally been decreasing over the years Dropout can be used to tackle underfitting Dropout can reduce gradient variance and allow the model to update in more consistent directions Early and late dropout can improve results compared to no dropout and standard dropout Revisiting overfitting vs....

Understanding plasticity in neural networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural networks need plasticity to be adaptable and robust. The mechanisms behind plasticity loss are not well understood. This paper looks into plasticity loss to understand it better. Plasticity loss is connected to changes in the loss landscape. Saturated units and divergent gradient norms are not the cause of plasticity loss. Layer normalization can help preserve plasticity....

Auxiliary Functions as Koopman Observables: Data-Driven Polynomial Optimization for Dynamical Systems

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Presents a data-driven method for dynamical system analysis that does not require explicit model discovery. Method is implemented as a semidefinite program that can be solved numerically. Method is agnostic of whether data is generated through a deterministic or stochastic process. Rigorous convergence results justify the applicability of the method. Paper Content Introduction Koopman operator is a linear description of nonlinear systems Koopman operator can be approximated through EDMD EDMD has convergence guarantees EDMD can be used to provide system-level information from dynamic data Auxiliary functions can be used to prove statements about dynamical systems EDMD can be used to approximate auxiliary functions from data Method can be applied to a broad class of deterministic or stochastic dynamical processes A class of dynamical systems Introduces a general class of dynamical systems Includes deterministic and stochastic differential equations or maps Considers stochastic processes Deterministic systems can be viewed as stochastic ones The general case Let Xt denote the state of a stochastic process at time t....

Consistency Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Diffusion models have made breakthroughs in image, audio, and video generation. Diffusion models have slow sampling speed and limit potential for real-time applications. Consistency models are proposed as a new family of generative models that achieve high sample quality without adversarial training. Consistency models support fast one-step generation and few-step sampling. Consistency models support zero-shot data editing without explicit training....

WiCE: Real-World Entailment for Claims in Wikipedia

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Models for textual entailment have been applied to settings like fact-checking and question answering. We propose WiCE, a new dataset for verifying claims in text. WiCE is built on real-world claims and evidence from Wikipedia. Annotations are over sub-sentence units of the hypothesis, decomposed automatically by GPT-3. Real claims in WiCE involve challenging verification problems....

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduce the Universal Speech Model (USM) for Automatic Speech Recognition (ASR) across 100+ languages. Pre-train the encoder on a large unlabeled multilingual dataset of 12 million hours and fine-tune on a smaller labeled dataset. Use multilingual pre-training with random-projection quantization and speech-text modality matching. Achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks....

X&Fuse: Fusing Visual Information in Text-to-Image Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduces X&Fuse, a general approach for conditioning on visual information when generating images from text. Demonstrates potential of X&Fuse in three different text-to-image generation scenarios. Retrieve&Fuse results in significant improvements on MS-COCO benchmark, achieving state-of-the-art FID score of 6.65 in zero-shot settings. Crop&Fuse outperforms textual inversion method while being more than x100 faster....