arxiv-summary: AI-summarized AI papers

Crawling the Internal Knowledge-Base of Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models contain a significant body of factual knowledge. There is currently no mechanism for representing this knowledge. We propose a procedure for extracting a knowledge-graph from a language model. The procedure is composed of sub-tasks and designed prompts. Evaluation shows high precision (82-92%) and reasonable number of facts. Paper Content Introduction Modern language models are trained on vast amounts of text that captures human knowledge Language models can be viewed as knowledge-bases Representing the knowledge in a language model as an explicit knowledge graph is the challenge addressed in this paper A knowledge graph is a graph of entities and relations between them The goal is to uncover the knowledge-base of a given language model The approach decomposes the problem into multiple sub-tasks The approach is evaluated with GPT-3, leading to high-precision graphs The approach can generate facts outside the schema of WIKIDATA Contributions are formulating the problem, presenting a prompt-based approach, and evaluating the approach with GPT-3 Crawling kgs via prompting Generate relations for an entity Find objects for each relation Improve recall with paraphrasing Pool results to construct final graph Relation generation Our task is to generate a set of relations for a given subject entity....

Equivariant Architectures for Learning in Deep Weight Spaces

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Machine learning architectures for processing neural networks in their raw weight matrix form is a new research direction. This design is challenging due to the unique symmetry structure of deep weight spaces. If successful, these architectures could be used for a range of tasks, such as adapting a pre-trained network to a new domain....

SingSong: Generating musical accompaniments from singing

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Presents SingSong, a system that generates instrumental music to accompany input vocals Builds on recent developments in musical source separation and audio generation Applies a state-of-the-art source separation algorithm to a large corpus of music audio Adapts AudioLM for conditional “audio-to-audio” generation tasks Listeners expressed a preference for instrumentals generated by SingSong compared to a retrieval baseline Paper Content Introduction Related work Microsoft Songsmith extracts pitch information from input vocals and predicts a sequence of symbolic chord labels Lattner & Grachten (2019) and Grachten et al....

REPLUG: Retrieval-Augmented Black-Box Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract REPLUG is a retrieval-augmented language modeling framework. REPLUG treats the language model as a black box and augments it with a tuneable retrieval model. REPLUG prepends retrieved documents to the input for the frozen black-box LM. REPLUG can be applied to any existing retrieval and language models. REPLUG can be used to supervise the retrieval model....

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Cost of vision-language pre-training has become expensive BLIP-2 is a pre-training strategy that uses frozen pre-trained image encoders and language models BLIP-2 bridges the modality gap with a lightweight Querying Transformer BLIP-2 achieves state-of-the-art performance on various vision-language tasks BLIP-2 outperforms Flamingo80B with 54x fewer trainable parameters BLIP-2 has emerging capabilities of zero-shot image-to-text generation Paper Content Introduction Vision-language pre-training (VLP) research has seen rapid advancement in the past few years Pre-trained models with larger scale have been developed to push the state-of-the-art on various downstream tasks Most state-of-the-art models incur high computation cost during pre-training Proposed a generic and compute-efficient VLP method by bootstrapping from pre-trained vision and language models Pre-trained models offer high-quality visual representation and strong language generation Unimodal pre-trained models remain frozen during pre-training Querying Transformer (Q-Former) pre-trained with two-stage pre-training strategy to facilitate cross-modal alignment BLIP-2 achieves state-of-the-art performance on various vision-language tasks BLIP-2 can perform zero-shot image-to-text generation BLIP-2 is more compute-efficient than existing state-of-the-arts Related work End-to-end vision-language pre-training Vision-language pre-training is used to learn multimodal foundation models with improved performance on various vision-language tasks....

Sample Efficient Deep Reinforcement Learning via Local Planning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract This paper focuses on sample-efficient deep reinforcement learning with a simulator. The proposed algorithmic framework, UFLP, takes advantage of the ability to reset the environment to a previously observed state. UFLP can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks. UFLP can achieve super-human performance on the Atari game, Montezuma’s Revenge....

A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Knowledge of syntax includes knowledge of rare, idiosyncratic constructions. LLMs must overcome frequency biases to master such constructions. Prompted GPT-3 to give acceptability judgments on the English-language Article + Adjective + Numeral + Noun construction. Validated prompt using CoLA corpus of acceptability judgments. Compared GPT-3’s judgments to crowdsourced human judgments on a subset of sentences....

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proteins power a variety of processes in cells Protein design enables engineering of cellular behavior Structure-based protein design looks for designable, novel, and diverse structures Search-based methods are limited due to the large space of sequences and structures Generative models learn the low-dimensional structure of complex data distributions Genie is a generative model of protein structures that performs discrete-time diffusion Genie generates more designable, novel, and diverse protein backbones than existing models Paper Content Introduction Proteins play an essential role in cellular processes Evolution has explored a small subregion of foldable protein space Protein design efforts have focused on optimizing functional properties of naturally occurring proteins Recent advances in protein structure prediction methods have enabled new approaches to explore structure space Generative models can capture complex data distributions and have been applied to protein design Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) have been used Denoising Diffusion Probabilistic Models (DDPMs) have shown promise in generating high quality 2D images Multiple prior efforts have applied generative modeling to structure-based protein design FoldingDiff uses internal coordinates to parameterize proteins ProtDiff uses atomic coordinates in Cartesian space AlphaFold2 combines implicit reasoning in a latent space with geometric reasoning in Cartesian space Genie combines aspects of SE(3)-equivariant reasoning with DDPMs to create a diffusion process over protein backbone geometry in Cartesian space Methods Genie is a DDPM that generates protein backbones as a sequence of C α atomic coordinates Genie performs diffusion directly in Cartesian space and uses an SE(3)-equivariant denoiser Section 2....

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Training large models with billions of parameters is expensive and requires specialized HPC clusters. Alternative setups for training large models include using cheap “preemptible” instances or pooling resources from multiple regions. SWARM parallelism is a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM parallelism was used to train a large Transformer language model with 1B shared parameters on preemptible T4 GPUs....

Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract BabyLM Challenge is a shared task for computer science research related to language modeling, human language acquisition, low-resource NLP, and cognitive modeling. Three tracks are available, two of which restrict the training data to pre-released datasets of 10M and 100M words. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality....