arxiv-summary: AI-summarized AI papers

Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Traditional approaches to RL focus on learning decision policies from episodic decisions Some approaches refine representations via auxiliary self-supervised losses while learning decision policies Supervised language model cascades can adapt to many diverse manifolds Transfer methods for language models require human supervision Proposed self-supervised loss policy called contrastive distillation Contrastive distillation outperforms common methods of transfer learning Contrastive distillation is improved through sampling from memory Paper Content Introduction Humans excel at generating contrastive self-learning data in regimes of limited data Humans manifest latent axes of variation as sequences Transfer learning and a diversity of online learning algorithms are needed to build more causal representations Current machine learning approaches to transfer learning are less data-efficient Unsupervised pre-training then online supervised transfer is the current paradigm Auxiliary losses require view augmentations to be specified beforehand Self-learning approaches have been shown to be effective at improving in-domain accuracy Self-learning of sequences to model distribution shifts has been relatively unexplored Self-supervised loss surface created via generation of auxiliary sequences with high mutual information improves transfer learning Tradeoff between manifesting latent axes of variation and additional compute and training time Contrastive distillation and sampling from memory can improve contrastive distillation Core contributions Synthesize 4 literatures into unifying model for self-learning Introduce contrastive distillation to improve transfer learning Sampling from episodic memory improves contrastive distillation Algorithm for sampling negative examples for contrastive losses Related works Self-learning via expert iteration Self-learning of large language models via CoT rollouts and majority voting has been shown to have promising results (Huang et al....

Generalized Decoding for Pixel, Image, and Language

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Presents X-Decoder, a model that can predict pixel-level segmentation and language tokens Takes two types of queries as input: generic non-semantic and semantic queries Enables seamless interactions across tasks at different granularities Pretrained on limited segmentation data and millions of image-text pairs Achieves state-of-the-art results on open-vocabulary segmentation and referring segmentation on 8 datasets Paper Content Introduction Visual understanding at different levels of granularity is a longstanding problem in the vision community Tasks span from image-level (e....

Training language models for deeper understanding improves brain alignment

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Natural language processing (NLP) seeks to build systems that understand language. Recent works have trained language models on narrative datasets to extract critical information. This work investigates if these models are truly understanding the text or just learning a heuristic. Results suggest that training can lead to deeper language understanding. Findings have consequences for cognitive neuroscience and NLP....

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have good zero-shot generalization to new language tasks. LLMs are not effective for zero-shot VQA due to modality and task disconnection. Img2Prompt is a plug-and-play module that bridges the disconnections, allowing LLMs to perform zero-shot VQA without end-to-end training. Img2Prompt is flexible, reduces cost, and achieves comparable or better performance than end-to-end training....

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Increasing adoption of AI in drug discovery Existing works mainly use machine learning to utilize chemical structures, ignoring textual knowledge Presenting MoleculeSTM, a multi-modal molecule structure-text model Constructing PubChemSTM, the largest multi-modal dataset to date Designing two challenging zero-shot tasks based on text instructions MoleculeSTM has open vocabulary and compositionality via natural language Obtaining state-of-the-art generalization ability to novel biochemical concepts Paper Content Introduction Recent progress in AI promises to be transformative for drug discovery AI methods have been used to augment and accelerate current computational pipelines ML methods mainly focus on modeling chemical structure of molecules Supervised setting requires expensive annotations Unsupervised pretraining on large-scale databases proposed Existing molecule pretraining methods incorporate only chemical structures Textual data is being harnessed in large-scale multi-modal models Pretrained multi-modal models can generalize well to new categories and tasks Previous work attempted to leverage textual knowledge to learn molecule representation Proposed MoleculeSTM incorporates both molecular structural information and textual knowledge MoleculeSTM can be generalized to diverse downstream tasks in a zero-shot manner MoleculeSTM has two main attributes: open vocabulary and compositionality Results Overview and preliminaries MoleculeSTM consists of two branches: chemical structure and textual description Pretraining uses contrastive learning to reduce representation distance between same molecule pairs and increase distance between different molecule pairs Downstream tasks include zero-shot structure-text retrieval, zero-shot text-based molecule editing, and molecular property prediction Pretrained models are used for retrieval in the zero-shot setting Molecular property prediction uses pretrained encoder and adds a prediction head Two principles for downstream task design Open vocabulary: language model can support exploration of novel biochemical concepts with unbound vocabulary Compositionality: language model can transform molecule property compositionality problem into language compositionality problem Language model can be used for drug re-purposing and text-based lead optimization Downstream: zero-shot structure-text retrieval Retrieval task can be seen as a multiple-choice problem Pretrained encoders and projectors from MoleculeSTM are used and remain frozen in the task Example of setting (1) is given Downstream: zero-shot text-based molecule editing MoleculeSTM and a pretrained molecule generative model are frozen Editing pipeline is split into two phases: space alignment and latent optimization Space alignment phase: learn an adaptor module to align the two latent spaces Latent optimization phase: optimize a latent code to be close to the representations of input molecule and text prompt Evaluation metric is satisfactory hit ratio, which is task-specific Downstream: molecular property prediction MoleculeSTM is a pretrained chemical structure representation that shares information with external domain knowledge MoleculeNet is a benchmark used to evaluate the expressiveness of the pretrained molecule representation methods Evaluation metric is ROC-AUC Baselines include randomly initialized models, MegaMolBART, KV-PLM, AttrMasking, ContextPred, InfoGraph, MolCLR, and GraphMVP MoleculeSTM performs best on average across all eight tasks Discussion Presented a multi-modal model, MoleculeSTM, to illustrate effectiveness of incorporating textual descriptions for molecule representation learning Confirmed improved performance of MoleculeSTM compared to existing methods MoleculeSTM can retrieve novel drug-target relations and modify molecule substructures to gain desired properties Outcomes of downstream tasks consistent with feedback from chemistry experts Methods PubChem database is used as data source PubChemSTM dataset is constructed with 250K molecules and 281K structure-text pairs Chemical structure branch f c uses SMILES string and 2D molecular graph Textual description branch f t uses BERT model and pretrained SciBERT Pretraining uses contrastive learning strategy Pre-processing includes PubChemSTM-raw and PubChemSTM-extracted Vocabulary size is important factor Evaluation is computationally feasible Fuzzy matching is used for molecule editing task

Hierarchically branched diffusion models for efficient and interpretable multi-class conditional generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Diffusion models generate realistic objects from complex data distributions. Diffusion models are computationally inefficient. Branched diffusion models offer improvements in efficient generation from multiple classes. Branched diffusion models offer advantages such as ease of extension and interpretability. Paper Content Introduction Diffusion models are popular for generating complex data distributions Diffusion models have been successful in generating images, videos, graphs, and tabular data Generating samples from diffusion models is computationally costly Training diffusion models can be a limitation in continual learning Branched diffusion models offer major improvements in computational efficiency for multi-class sample generation Branched diffusion models can generate new data classes efficiently Branched diffusion models offer interpretability into the diffusion and generation processes Related work Song et al....

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Instruction tuning is a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions. MultiInstruct is the first multimodal instruction tuning benchmark dataset that consists of 47 diverse multimodal tasks. OFA is the base pre-trained model for multimodal instruction tuning and multiple transfer learning strategies are explored to leverage the large-scale Natural Instructions dataset....

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Contrastive learning has been used for retrieval of semantically aligned sentences. Generative model proposed for learning multilingual text embeddings. Model operates on parallel data in $N$ languages. Approximation encourages source separation in multilingual setting. Comparison between contrastive and generation-based approaches for learning multilingual text embeddings. Evaluated on suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval....

Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introducing a new attack vector in the context of machine unlearning and other settings Adversary adds carefully crafted points to the training dataset with minimal impact on model’s predictions Adversary triggers request to remove subset of points, unleashing attack and negatively affecting model’s predictions Attack is a clean-label targeted attack, with goal of causing model to misclassify a specific test point Attack is tested on CIFAR-10, Imagenette, and Imagewoof datasets Paper Content Introduction ML research traditionally assumes a static pipeline Practical deployments are more dynamic Dynamic settings increase vulnerability to malicious actors Camouflaged data poisoning attacks introduced Attack takes place in two phases Attack goal is to misclassify one point in training set Algorithm based on gradient-matching approach Camouflage set constructed to undo impact of poison set Attack efficacy demonstrated on various models and datasets Preliminaries Governments around the world have introduced legislation requiring organizations to delete user data upon request Machine unlearning is the process of removing user data from machine learning models Retraining from scratch is the ideal way to perform data deletion, but it can take a long time Machine unlearning research has focused on fast methods for data deletion and other implications Data poisoning is when an adversary modifies the training data to negatively impact the model’s behaviour Related work Marchant et al....

There's Plenty of Room Right Here: Biological Systems as Evolved, Overloaded, Multi-scale Machines

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Computational models can be applied to the biological world. View should be observer-dependent and pragmatic. Living systems can perform multiple functions in the same place at the same time (polycomputing). Understanding of meso-scale events can be improved with an observer-centered framework. Overloading of different functions on the same hardware is an important design principle....