arxiv-summary: AI-summarized AI papers

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract PFMs are used for various downstream tasks with different data modalities Pretraining is used to provide reasonable parameter initialization for a wide range of applications GPT and BERT use Transformers to train on large datasets AI has made waves in a variety of fields over the past few years This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs Paper Content Introduction PFMs are essential components of AI in the era of big data PFMs are studied in the three major AI fields: NLP, CV and GL PFMs are powerful general models that are effective in various fields or across fields PFMs have demonstrated great potential in learning feature representations in various learning tasks PFMs show superior performance for training on multiple tasks with large-scale corpus and fine-tuning it to similar small-scale tasks Pfms and pretraining PFMs are based on pretraining technique which uses large amounts of data and tasks Pretraining originates from transfer learning in CV tasks When applied to NLP, LMs capture rich knowledge beneficial for downstream tasks Pretraining data can be derived from any unlabeled text corpus Early pretraining was static, but dynamic pretraining techniques have been proposed PFMs are used for text, image, and graph tasks PFMs have two major advantages: minor fine-tuning and already vetted on quality Related work focuses on model efficiency, security, and compression Contribution and organization Several survey studies have reviewed pretrained models for specific areas Bommasani et....

Machine Love

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ML generates economic value Many have problematic relationships with ML-powered applications ML optimizes for what we want in the moment, not what is best for us ML falls short of its potential to help us reach our highest aspirations Love is a primary catalyst for human flourishing This paper explores whether there is a useful conception of love fitting for machines to embody This paper forwards a candidate conception of machine love Experiments aim to highlight the need for richer models of human flourishing in ML ML may be aligned to support our growth Paper Content Problem: models of human behavior in ml are insufficient 18-year-olds may not be making decisions that maximize their expected lifetime wellbeing Common models of human rationality applied and optimized by ML may not account for human flourishing Positive psychology and psychotherapy suggest humans have an intrinsic drive towards growth and self-actualization Maslow’s gridworld highlights the limitations of ignoring deeper facets of human psychology Contrasting revealed preferences and maslow’s hierarchy Models of human behavior in ML are similar to those in economics, where humans are seen as rational agents Psychology considers the nuance of human behavior, including development over time and behavior that does not serve flourishing ML that optimizes for revealed preferences is useful but reinforces existing behavior patterns Maslow’s Hierarchy of Needs provides a model of human growth and flourishing Humans have competing drives for safety and growth, which can be affected by environment Optimizing towards observed choices or engagement without qualification can drive stagnation or regression ML driven to help users meet their unmet needs might assist their growth and flourishing Maslow’s gridworld Maslow’s gridworld environment is used to explore how optimizing for engagement may decouple from deeper conceptions of human flourishing....

Scalable Prompt Generation for Semi-supervised Learning with Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Prompt-based learning methods in semi-supervised learning settings have been effective on NLU datasets and tasks. Designing multiple prompts and verbalizers requires domain knowledge and human effort, making it difficult to scale. Two methods proposed to automatically design multiple prompts and integrate automatic verbalizer without sacrificing performance. Best average accuracy of 73.2% obtained with proposed methods....

On Equivalent Optimization of Machine Learning Methods

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Machine learning methods use iterative optimization algorithms for training. Choices of optimizer, learning rate, batch size, etc. must be made for deep neural networks. Koopman operator theory can be used to identify when choices lead to equivalent or non-equivalent optimization trajectories. Analysis of feedforward, fully connected neural networks provides a general characterization of when choices lead to equivalent or non-equivalent evolution of network parameters....

JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposes a new method for approximating intractable likelihood functions and posterior densities in Bayesian surrogate modeling Method involves training three complementary networks in an end-to-end fashion Method can be used to estimate marginal likelihood and posterior predictive estimation Benchmarked against state-of-the-art Bayesian methods and proposed a powerful diagnostic for joint calibration Investigated ability of recurrent likelihood networks to emulate complex time series models Paper Content Introduction Surrogate modeling and simulation-based inference are two important parts of a new generation of methods for simulation science Surrogate modeling seeks to approximate the intractable likelihood function Simulation-based inference aims to approximate the intractable posterior distribution of a complex generative model Specialized neural approximators have been developed to solve the intractable problem JANA is a Bayesian neural framework for simultaneous amortized SM and SBI JANA enables accurate solutions to downstream tasks like the estimation of marginal and posterior predictive distributions JANA outperforms or is on par with other methods given identical simulation budgets JANA unlocks the potential of powerful Bayesian tools for model comparison, validation, and calibration JANA can compute marginal likelihoods and rapidly produce posterior samples and normalized likelihood estimates of new data instances Method Problem formulation Bayesian models are specified as a triple of a simulation program, externalized randomness, and prior knowledge about simulation parameters....

Text-driven Visual Synthesis with Latent Diffusion Prior

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-to-image synthesis has made great progress A generic approach using latent diffusion models as image priors is presented Feature matching and KL divergence loss are used to improve the approach The approach is tested on three applications: text-to-3D, StyleGAN adaptation, and layered image editing Results show the method is better than existing baselines Paper Content Introduction Diffusion models have shown impressive image generation capabilities....

Auditing large language models: a three-layered approach

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs represent a major advance in AI research LLMs come with ethical and social challenges Auditing is a promising governance mechanism to ensure AI systems are ethical, legal, and technically robust Existing auditing procedures don’t address the governance challenges posed by LLMs Article proposes a three-layered approach to audit LLMs in feasible and effective ways Article discusses limitations of auditing LLMs Paper Content Introduction Auditing is a governance mechanism used to identify and mitigate risks associated with AI systems Procedural regularity and transparency contribute to good governance Proactivity in the design of AI systems helps identify risks and prevent harm Operational independence between the auditor and the auditee contributes to objectivity and professionalism of the evaluation Previous work on AI auditing has focused on ensuring specific applications meet predefined requirements Foundation models are effective across many different tasks and display emergent capabilities when scaled LLMs pose ethical and social challenges such as perpetuating harmful stereotypes, leaking personal data, spreading misinformation, plagiarism, and misuse of copyrighted material Auditing procedures should be designed to capture the risks posed by LLMs A three-layered approach combining governance, model, and application audits should be used to audit LLMs Outputs of the audits should ensure LLMs are designed and deployed in ethical, legal, and technically robust ways The need to audit llms Previous research on LLMs and their ethical and social challenges Need for auditing procedures to capture risks LLMs pose Potential objections to approach addressed The opportunities and risks of llms LLMs represent a major advance in AI research NLP researchers and practitioners have been developing software to analyse, manipulate, and generate natural language since the 1950s Deep learning, neural architectures, and computational power have revolutionised the field LLMs can approximate human performance on some benchmarks LLMs are highly adaptable to various downstream applications Scaling the model can result in emergent gains on a wide array of tasks LLMs are accessible via open-source libraries, democratising the gains from deep language modelling LLMs can introduce representational and allocational harms, compromise privacy, produce misleading information, be co-opted by users with bad intent, and incur high environmental costs The governance gap LLMs pose methodological and normative challenges LLMs are developed and adopted in two stages It is difficult to assess LLMs independent of context Performance of LLMs can be unpredictable LLMs force AI labs and policymakers to face hard questions Audits help identify risks, inform design, and inform public discourse Addressing initial objections Auditing procedures should be established at different stages of supply chains....

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Intuitive psychology is a part of common-sense reasoning. Replicating this reasoning in machines is important for creating AI like humans. Recent tasks and benchmarks have focused on belief attribution in Theory-of-Mind tasks. These tasks have had successes and failures. Evaluation of models should be skeptical and failure cases should be given more weight....

Tuning computer vision models with task rewards

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Misalignment between model predictions and intended usage can be damaging Reinforcement learning techniques can be used to align models with a task reward This approach is effective for multiple computer vision tasks This approach has potential to be widely useful for better aligning models with computer vision tasks Paper Content Introduction Complex outputs in computer vision require alignment with task risk Researchers use postprocessing, global loss, and altered input data to improve behavior NLP and RL fields have studied this problem and use imitation and reinforcement learning Reward optimization has not been explored for computer vision tasks Reward optimization works out-of-the-box for a wide range of computer vision tasks Reward optimization can be used with evaluation metrics, human feedback, or holistic system performance Related work Optimizing computer vision metrics by computing pseudo-gradients and approximations CRF loss used to ensure segmentation mask consistency Optimizing text generation with MLE and REINFORCE Generalization of sampled outputs is an underlying issue Reinforcement learning used for vision tasks to attend to parts of the image and iterative refinement Tuning models with rewards Formulate computer vision task as learning a function that maps an input to an output Maximum-likelihood training to maximize likelihood of ground-truth annotations Goal is to learn a conditional distribution that maximizes a reward function Two step framework: pretraining with maximum-likelihood estimation and tuning with REINFORCE algorithm Maximum-likelihood pretraining captures distribution of training data REINFORCE algorithm tunes model to optimize an arbitrary reward function Pretrained MLE model provides good initial sampling strategy Practical applications Use encoder-decoder architecture with ViT encoder and Transformer decoder Pretrain model with maximum-likelihood estimation and tune with task reward Use Adafactor variant as optimizer and sample greedily at inference time Validation metrics may differ from task risk in real scenario, requiring further validation or reward design Panoptic segmentation Panoptic segmentation combines instance and semantic segmentation Panoptic Quality (PQ) is used to measure the completeness and detail of predictions Pretrained encoder-decoder Transformer model on COCO captions Tuning for CIDEr to optimize with batch size 256 and 10k steps Object detection The goal of object detection is to predict a tight bounding box for objects in an image Many approaches have been proposed, but they don’t offer an explicit way to obtain a model aligned with the task risk We use detection-specific rewards to optimize a vanilla detection data likelihood model We represent a set of bounding boxes as a discrete sequence We use the standard ViT-B/16 as image encoder and 6-layer auto-regressive Transformer decoder We pretrain a MLE model and then tune it with rewards for recall and mAP We tune our MLE model to optimize the recall reward We use a supervised loss to learn the expected IoU scores of sampled outputs plus the recall reward We improve the reward by computing its value at various IoU ranges and by weighting each class Our strong ViT-B result demonstrates the promise of the proposed task reward tuning Colorization Colorization task is adding color to grayscale images Standard image colorization models use MLE to generate plausible image coloring Tuning MLE model to produce vivid images with “colorfulness” reward Reward discourages gray colors and promotes color diversity Tuning step increases vividness and diversity of predicted colors Image captioning Image captioning is the task of generating text descriptions for images....

A Survey of Geometric Optimization for Deep Learning: From Euclidean Space to Riemannian Manifold

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Deep Learning (DL) has problems such as feature redundancy and vanishing/exploding gradients. Riemannian-based DL uses geometric optimization to update parameters on Riemannian manifolds. This article surveys the application of geometric optimization in DL networks for AI tasks. Toolboxes that implement optimization on manifold are discussed. Performance comparison between deep geometric optimization methods is made....