arxiv-summary: AI-summarized AI papers

Learning Universal Policies via Text-Guided Video Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Artificial Intelligence goal is to construct an agent that can solve a variety of tasks Recent progress in text-guided image synthesis has yielded models with ability to generate complex novel images Investigating if such tools can be used to construct more general-purpose agents Sequential decision making problem cast as text-conditioned video generation problem Text-encoded specification of desired goal used to synthesize set of future frames Control actions extracted from generated video Leveraging text as underlying goal specification enables combinatorial generalization to novel goals Policy-as-video formulation can represent environments with different state and action spaces in unified space of images Leveraging pretrained language embeddings and widely available videos enables knowledge transfer Paper Content Introduction Building models that solve a diverse set of tasks is a dominant paradigm in vision and language Large pretrained models have demonstrated zero-shot learning of new language tasks Models have shown zero-shot classification and object recognition capabilities Training agents faces challenge of environmental diversity Universal tokens used to encode different environments Video used as universal interface for conveying action and observation behavior Text used as universal interface for expressing task descriptions Model enables combinatorial generalization, multi-task learning, action planning, and internet-scale knowledge transfer Problem formulation Introduces a new abstraction, the Unified Predictive Decision Process (UPDP), as an alternative to the Markov Decision Process (MDP) Presents an instantiation of UPDP with diffusion models Markov decision process Markov Decision Process (MDP) is a broad abstraction used to formulate many sequential decision making problems Many RL algorithms have been derived from MDPs with empirical successes Existing algorithms are typically unable to combinatorially generalize across different environments Lack of universal state interface across different control environments Explicit requirement of real-valued reward function in an MDP Dynamics model in an MDP is environment and agent dependent Unified Predictive Decision Process (UPDP) exploits images as a universal interface across environments, texts as task specifiers to avoid reward design, and a task-agnostic planning module UPDP bypasses reward design, state extraction and explicit planning, and allows for non-Markovian modeling of image-based state space UPDP isolates video-based planner from deferred action selection UPDP leverages existing large text-video models that have been pretrained on massive, web-scale datasets UPDP uses a continuous-time diffusion model to define a forward process and a generative process to reverse the forward process Decision making with videos Proposed approach UniPi is an instantiation of the diffusion UPDP UniPi incorporates two main components: a diffusion model and a task-specific action generator Universal video-based planner Text-to-video models have been successful We want to construct a video diffusion module as a trajectory planner This is more challenging than typical text-to-video models We use a constrained video synthesis model We use tiling to ensure environment consistency We use hierarchical planning We use flexible behavioral modulation Task specific action adaptation Train a small model to estimate actions given input images Generate an action sequence given x 0 and c by synthesizing H image frames and applying the learned inverse-dynamics model Inferred actions can be executed via closed-loop or open-loop control Use open-loop control for computational efficiency Experimental evaluation Combinatorial policy synthesis Measured ability of UniPi to generalize to different language tasks Used combinatorial robot planning tasks Robot must manipulate blocks to satisfy language instructions Split language instructions into two sets, one seen during training and one seen during testing Compared UniPi to three separate representative approaches Measured final task completion accuracy UniPi generalizes well to seen and novel combinations of language prompts Ablated UniPi on seen language instructions and in-relation-to tasks All components of UniPi are crucial for good performance Assessed ability of UniPi to adapt at test time to new constraints Multi-environment transfer Evaluated ability of UniPi to learn across different tasks and generalize to unseen environments Used language guided manipulation tasks from Shridhar et al....

In-Context Retrieval-Augmented Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Retrieval-Augmented Language Modeling (RALM) methods improve language modeling and provide a source attribution mechanism. Existing RALM approaches modify the LM architecture, making deployment complicated. This paper proposes an alternative, In-Context RALM, which leaves the LM architecture unchanged and prepends grounding documents to the input. In-Context RALM provides large LM gains and the document retrieval and ranking mechanism can be specialized to further boost performance....

Transformers Meet Directed Graphs

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transformers are used for a variety of modalities, including images, audio, video, and undirected graphs. Transformers for directed graphs are an underexplored topic. Two direction- and structure-aware positional encodings for directed graphs are proposed. The extra directionality information is useful for downstream tasks. Model outperforms prior state of the art on Open Graph Benchmark Code2 by 14....

Mathematical Capabilities of ChatGPT

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Investigated mathematical capabilities of ChatGPT Tested on publicly available and hand-crafted datasets Measured performance against other models Tested usefulness to professional mathematicians Current datasets only cover elementary mathematics Introduced new dataset GHOSTS to cover graduate-level mathematics Benchmarked ChatGPT on GHOSTS ChatGPT’s mathematical abilities are below average mathematics graduate student Paper Content Introduction ChatGPT is a widely known question-and-answer dialogue system It is the most talked about language model on Twitter It has been tested in a number of exam-related use cases It is believed to be used as an assistant by many professionals This paper focuses on analyzing the mathematical capabilities of ChatGPT It introduces new natural-language math datasets to benchmark ChatGPT’s performance Related work ChatGPT is a large language model that can be used to perform mathematical reasoning Mathematical reasoning has been studied since 1959 Classical approaches using symbolic encoding have reached a plateau There is a growing body of literature on learning mathematical relationships directly Most recently published large language models are tested on elementary-level mathematical reasoning datasets Variations of BERT have been shown to solve between 28-37% of problems on AQuA-RAT dataset Minerva, based on PaLM, achieved a score of 50% on MATH dataset Supervised approaches have outperformed classical solvers An up-to-date survey on mathematical datasets and performance of LLMs can be found in [23] Investigations related to ChatGPT’s performance consist of anecdotal evidence Ideas used in this article are echoed in [31] for formal mathematics Datasets Dataset creation Created collection of 728 prompts Manually rated by experts • symbolic-integration MATH dataset and Symbolic-Integration subdataset taken from existing datasets Minvera and supervised-learning approach used for comparison Hand-crafted datasets by authors Creation of datasets requires advanced mathematical insight Format GHOSTS dataset consists of multiple JSON-formatted files Each datapoint in a JSON file has a prompt, reference, MSC code, confidence, and timestamp Rating is a number from 1 to 5 Errorcodes and warningcodes highlight failure modes of ChatGPT Comment field can provide context MSC codes indicate areas where ChatGPT performs better Prompt engineering is allowed Chats with ChatGPT are “cold” ChatGPT must provide correct solution without clarification The subdatasets Used L A T E X to encode mathematical input Experiments showed ChatGPT can process L A T E X-encoded mathematics Used exercises from books to teach undergraduate/graduate courses in mathematics Used exercises from Problem-Solving Strategies book for mathematical competitions Prompted ChatGPT to fill gaps in proofs from math....

Benchmarking Large Language Models for News Summarization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have shown promise for automatic summarization Instruction tuning is the key to LLM’s zero-shot summarization capability Existing studies have been limited by low-quality references Human evaluation over high-quality summaries from freelance writers shows LLM summaries are on par with human written summaries Paper Content Introduction Large language models (LLMs) have shown promising results in zero-/few-shot tasks across a wide range of domains LLMs have potential for automatic summarization Design decisions contributing to success on summarization remain poorly understood Evaluation of 10 diverse LLMs with human evaluation on news summarization Instruction tuning is key to zero-shot summarization capability Self-supervised learning alone cannot induce strong summarization performance in the zero-shot setting Poor quality reference summaries reduce correlation between metric results and human judgement Recruit freelance writers to re-annotate 100 articles from the test set of CNN/DM and XSUM Best performing LLM is rated as comparable to freelance writers Instruction tuning, not model scale, is key to LLMs’ summarization capability Poor quality of training data makes comparison difficult News summarization News summarization is the task of producing a concise paragraph that captures the main points of a news article....

Grounding Language Models to Images for Multimodal Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposes an efficient method to ground pretrained text-only language models to the visual domain Leverages abilities of language models learnt from large scale text-only pretraining Keeps language model frozen and finetunes input and output linear layers to enable cross-modality interactions Achieves strong zero-shot performance on grounded tasks such as contextual image retrieval and multimodal dialogue Paper Content Introduction LLMs are trained on text-only data and lack visual cues LLMs have limitations on tasks involving visual reasoning and grounding Propose a method to bootstrap a frozen LLM for processing and outputting multimodal data Model is efficient and requires less compute than existing models Model is capable of generating coherent multimodal outputs Model is capable of processing arbitrarily interleaved image-text inputs Model retains original abilities of text-only LLM to generate text Model attains new multimodal dialogue and reasoning abilities Model is model agnostic and can be applied to larger or stronger LLMs Model is more accurate on long and complex free-form text than existing models Method Model architecture Language model uses a byte-level BPE tokenizer to extract a sequence of input tokens from text....

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent text-to-image generative models can generate diverse and creative imagery. Current state-of-the-art diffusion models may fail to generate images that fully convey the semantics in the given text prompt. We introduce the concept of Generative Semantic Nursing to help mitigate these failure cases. We use an attention-based formulation of GSN to guide the model to refine the cross-attention units....

Patch Gradient Descent: Training Neural Networks on Very Large Images

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Traditional CNN models are trained and tested on low resolution images. Patch Gradient Descent (PatchGD) allows existing CNN architectures to be trained on large-scale images. PatchGD updates the model on small parts of the image at a time. PatchGD is evaluated on two datasets with ResNet50 and MobileNetV2 models. PatchGD is more stable and efficient than standard gradient-descent when handling large images....

Differentially Private Distributed Bayesian Linear Regression with MCMC

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a novel Bayesian inference framework for distributed differentially private linear regression Data is split between multiple parties who share summary statistics in a privacy-preserving way Developed a novel generative statistical model for privately shared statistics Bayesian estimation of regression coefficients is conducted using Markov chain Monte Carlo algorithms Also provided a fast version to perform Bayesian estimation in one iteration Proposed methods have computational advantages over competitors Numerical results on real and simulated data demonstrate well-rounded estimation and prediction Paper Content Introduction Linear regression is a mathematical method used in statistical research Many researchers have been working on linear regression since the 19th century Differential privacy is the most commonly used definition for privacy There is a growing interest in differentially private linear regression General-purpose Bayesian differentially private estimation methods can be used in regression problems Hierarchical model for privatised data and Bayesian estimation for the model parameters Differential privacy mechanisms for posterior sampling and linear regression General-purpose differentially private Markov chain Monte Carlo (MCMC) algorithms can be applied to regression Perturbing polynomial objective functions with privacy-preserving noise Perturbation of summary statistics Point estimation of the linear regression parameters Confidence intervals for the coefficients of linear regression Rates of convergence for parameter estimation with differential privacy Distributed setting where the total dataset is shared among multiple parties Adding noise to summary statistics of linear regression Fast Bayesian estimation methods MCMC algorithms for iterative sampling from posterior distributions Differential privacy Differential privacy is a type of algorithm that takes in sensitive data and returns a random output....

The passive symmetries of machine learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Representation of data involves investigator choices Choices lead to exact symmetry Symmetries include coordinate freedom, gauge symmetry and units covariance Goal is to understand implications of passive symmetries for machine learning Discuss links to causal modeling Implementation of passive symmetries valuable for generalizing out of sample Paper Content Introduction ML has been inspired by mathematical physics Kernel trick and statistical mechanics techniques used in ML Representation of observables involves investigator choices ML methods should be written in a form that is equivariant to changes in investigator choices Geometric Principle: Laws of physics must be expressed as geometric relationships between geometric objects Symmetries of coordinate freedom and gauge symmetry Analogs of these symmetries could have big impacts in ML Two types of symmetries: passive and active Passive symmetries apply to all data analysis problems Guidance on how to structure ML models to respect passive symmetries Passive symmetries Passive symmetries arise from redundancies or free parameters in data representation Active symmetries arise from observed or empirical invariances of physical laws Passive symmetries can be established without observations Active symmetries are rare and usually only appear in natural-science contexts Active and passive transformations are similar mathematically To enforce a passive symmetry, all relevant contextual information must be incorporated Ricci calculus is used to make objects equivariant to coordinate diffeomorphisms Passive symmetries have not featured in ML practice, but could be significant Example: units covariance Units covariance is a passive symmetry that states the behavior of a system does not depend on the units system used....