arxiv-summary: AI-summarized AI papers

Aligning Text-to-Image Models using Human Feedback

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Deep generative models have been used for text-to-image synthesis. Current models often generate images that are not well-aligned with text prompts. A fine-tuning method is proposed to improve alignment using human feedback. Paper Content Introduction Deep generative models have been successful in generating high-quality images from text prompts Scaling of deep generative models to large-scale datasets has been a factor in this success Challenges remain in domains where large-scale text-to-image models fail to generate images that are well-aligned with text prompts Learning from human feedback has emerged as a powerful solution for aligning model behavior with human intent Proposed fine-tuning method for aligning text-to-image models using human feedback Fine-tuning with human feedback significantly improves the image-text alignment of a text-to-image model Learned reward function predicts human assessments of the quality more accurately than the CLIP score Careful investigations on several design choices are important in balancing alignment-fidelity tradeoffs Related work Variational auto-encoders, generative adversarial networks, auto-regressive models, and diffusion models have been proposed for image distributions Combined with language encoders, these models have shown impressive results in text-to-image generation Text-to-image models struggle to generate images that are well-aligned with text prompts Techniques such as character-aware text encoders and structured representations of language inputs have been investigated to address these issues Human feedback has been used to improve various AI systems We propose a fine-tuning method with human feedback for improving text-to-image models Various evaluation protocols have been proposed to measure image-text alignment We train a reward function that is better aligned with human evaluations by exploiting pre-trained representations and human feedback data Main method Generate a set of diverse images from text prompts Human raters provide binary feedback on images Train a reward model to predict human feedback Fine-tune text-to-image model using reward-weighted log likelihood Human data collection Generated image-text dataset with prompts combining words or phrases from three categories (count, color, background) Collected binary feedback from human labelers on the image-text dataset Reward learning Measure image-text alignment by learning a reward function Data augmentation to improve data-efficiency and performance Generate N-1 text prompts with different semantics Use reward function to classify original prompt Auxiliary loss to encourage low values for prompts with different semantics Combined loss to combine penalty parameter Updating the text-to-image model Update text-to-image model with parameters θ by minimizing loss Minimize reward-weighted negative log-likelihood on model-generated dataset Minimize pre-training loss to reduce NLL on pre-training dataset Regularization in loss function enables model to generate more natural images Experiments Conducted experiments to test efficacy of fine-tuning approach Used human feedback in experiments Experimental setup Stable diffusion v1....

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large Language Models (LLMs) are being adopted in many systems, including IDEs and search engines. LLMs can be modulated via natural language prompts, but their internal functionality is unassessable. Prompt Injection (PI) attacks can be used to misalign LLMs and override instructions and filtering schemes. Augmenting LLMs with retrieval and API calling capabilities introduces a new set of attack vectors....

ProsAudit, a prosodic benchmark for self-supervised speech models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ProsAudit is a benchmark to assess structural prosodic knowledge in self-supervised learning speech models. It consists of two subtasks and an evaluation dataset. The subtasks involve correctly identifying strong versus weak prosodic boundaries and distinguishing between pauses inserted between words and within words. Human evaluation scores are provided. SSL models were able to perform above chance on both tasks, even when trained on an unseen language....

Liquidity Providers Greeks and Impermanent Gain

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Traditional finance uses the Black & Scholes model to price derivatives Decentralized Finance (DeFi) and Automated Market Makers (AMMs) are becoming more important Liquidity Providers (LPs) are exposed to risks such as Impermanent Loss (IL) This paper proposes a method to calculate the greeks of an LP Introduces Impermanent Gain, a product that LPs can use to hedge their position and traders can use to bet on a rise in volatility Paper Content Introduction Overview of computer science ecosystem Principal actors in the ecosystem Decentralized exchanges DEXs are peer-to-peer marketplaces for crypto traders....

Controlled and Conditional Text to Image Generation with Diffusion Prior

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Denoising Diffusion models have been used to generate high quality images from text. DALLE-2’s two step process uses a Diffusion Prior to generate a CLIP image embedding from text and a Diffusion Decoder to generate an image from the embedding. Diffusion Prior can be used to constrain the generation to a specific domain without altering the Diffusion Decoder....

Bayes meets Bernstein at the Meta Level: an Analysis of Fast Rates in Meta-Learning with PAC-Bayes

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Bernstein’s condition is an assumption that helps machine learning algorithms run faster. The Gibbs algorithm has an excess risk of $O(d_{\pi}/n)$ instead of the standard $O(\sqrt{d_{\pi}/n})$. This paper examines the Gibbs algorithm in the context of meta-learning. Bernstein’s condition always holds at the meta level, regardless of its validity at the observation level....

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Model parallelism is a way to scale a single large deep learning model beyond the memory limits of a single device. Model parallelism can also be used to serve multiple models on multiple devices, even when a single model can fit into a single device. AlpaServe is a novel serving system that determines an efficient strategy for placing and parallelizing collections of large deep learning models across a distributed cluster....

Modular Deep Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transfer learning is the dominant paradigm of machine learning. Pre-trained models can be fine-tuned for downstream tasks with fewer labelled examples. Modular deep learning is a promising solution to challenges of positive transfer and systematic generalisation. Modular architectures provide a unified view of research that evolved independently. Modularity can be used for scaling language models, causal inference, programme induction, and planning in reinforcement learning....

How Does In-Context Learning Help Prompt Tuning?

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Fine-tuning large language models is becoming impractical Prompt tuning (PT) and in-context learning (ICL) are parameter-efficient adaptation methods Instruction prompt tuning (IPT) combines PT and ICL How these methods interact with each other is unexplored This paper empirically studies when and how in-context examples improve PT Paper Content Introduction LLMs are becoming too large to fine-tune all parameters for new tasks Three methods studied to adapt LLMs to downstream tasks: In-context learning (ICL), Prompt tuning (PT), Instruction prompt tuning (IPT) ICL struggles on complex and out-of-domain tasks PT generally outperforms ICL, but is unstable and difficult to optimize IPT combines ICL and PT and is effective at adapting LLMs to medical domain PT and IPT consistently outperform ICL across five text generation tasks Performance of PT and IPT depends on task and experimental configuration IPT outperforms PT on examples with similar test input PT exhibits high variance, IPT reduces variance Prompt embeddings learned via PT can be transferred to new tasks with in-context demonstrations Background Parameter-efficient fine-tuning methods specialize LLMs to a target task while adjusting a small number of task-specific parameters....

Guiding Large Language Models via Directional Stimulus Prompting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduce a new framework, Directional Stimulus Prompting, to provide guidance for black-box frozen large language models on downstream tasks. Train a policy LM to generate discrete tokens as ``directional stimulus’’ of each input. Policy LM can be trained through supervised learning and reinforcement learning. Framework is flexibly applicable to various LMs and tasks....