arxiv-summary: AI-summarized AI papers

Image-and-Language Understanding from Pixels Only

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Multimodal models are becoming more effective due to unified components CLIPPO uses a single encoder to process both regular images and text rendered as images CLIPPO performs image-based tasks with half the number of parameters and no text-specific tower or embedding CLIPPO can perform well on natural language understanding tasks without any word-level loss CLIPPO can achieve strong performance on multilingual multimodal retrieval without a tokenizer Paper Content Introduction Large-scale multimodal training of Transformer-based models has improved performance in different domains....

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have shown impressive results with little or no direct supervision LLMs may have potential in information-seeking scenarios Attributed QA is a key first step in developing attributed LLMs Evaluation framework uses human annotations and automatic metric Benchmark a broad set of architectures for the task Paper Content Introduction Large language models (LLMs) have shown impressive results across a variety of natural language tasks LLMs require little or no direct supervision LLMs have potential in information-seeking scenarios LLMs can produce compelling output in scenarios such as question answering and dialog Difficult to construct labeled datasets for complex tasks LLMs need to attribute text they generate Attributed question answering task proposed Evaluation framework proposed using human annotations and automatic metric Analysis of systems based on state-of-the-art components Possibility of post-hoc attribution of LLM-generated answers Related work Related work in computer science Key areas of related work Question answering tasks Question answering is a key way to discover and demonstrate advances in large language models....

FlexiViT: One Model for All Patch Sizes

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Vision Transformers convert images to sequences by slicing them into patches. Changing the patch size typically requires retraining the model, but randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes. Paper Content Introduction ViTs process images by cutting them into nonoverlapping patches Patchification is different from CNNs, which use small local and overlapping filters Patchification has enabled new capabilities Patch size is an important factor in ViT models FlexiViT is a flexible ViT model that can match or outperform standard fixed-patch ViTs FlexiViT can be used for efficient transfer learning and pre-training FlexiViT representations are often similar across different patch sizes Related work Exploiting patchification to improve ViT’s efficiency Removing tokens during or after training Training a cascade of Transformers to allow early exiting during inference Keeping all tokens and not discarding any information Training one “supernet” from which individual, differently-shaped “subnets” can be extracted Patchifying an image at multiple scales and dropping random tokens to reduce the sequence length Focusing on ViT’s patch size only and allowing benefit from existing pretrained models Training models whose output vector contains meaningful subvectors Making vit flexible Standard ViT models are not flexible....

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Human evaluation is the foundation for evaluating summarization systems and automatic metrics. Existing protocols and benchmarks have low inter-annotator agreement or lack scale. Proposed modified summarization salience protocol with fine-grained semantic units. RoSE benchmark with over 22k summary-level annotations. Comparing ACU protocol with other protocols. Evaluating existing automatic metrics using collected human annotations....

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large language models can generate step-by-step reasoning to justify their final answers. It is difficult to objectively study the correctness of the reasoning steps without reliable methods. ROSCOE is a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. ROSCOE can measure semantic consistency, logicality, informativeness, fluency, and factuality....

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract AI classifiers can accurately predict SARSCoV2 infection status 67,842 individuals with linked metadata were studied, 23,514 tested positive for SARS CoV 2 Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REACT randomised surveillance survey AI classifiers predict SARS-CoV-2 infection status with high accuracy After adjusting for confounders, classifier performance is weaker Audio based classifiers are outperformed by simple predictive scores based on user reported symptoms Paper Content Results Study design Invited volunteers to participate in study from March 2021 to March 2022 Collected audio recordings of four respiratory audio modalities Final dataset consisted of 23,514 COVID + and 44,328 SARS-CoV-2 PCR-negative (COVID − ) individuals Acoustic target should be causally linked to COVID-19 Acoustic target should not be self-identifiable Acoustic target should enable high-utility COVID-19 screening Characterising and controlling recruitment bias Audio-based COVID-19 classification results can be affected by the characteristics of the enrolled population....

Multimodal Teacher Forcing for Reconstructing Nonlinear Dynamical Systems

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Nonlinear dynamical systems (DS) are commonly accessed through time series measurements. Data modalities can include event counts and continuous signals. Sparse teacher forcing (TF) has been suggested as a control-theoretic method for training ML models on chaotic DS. A novel recurrent neural network (RNN) training framework has been developed for DS reconstruction based on multimodal variational autoencoders (MVAE)....

Manifestations of Xenophobia in AI Systems

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Xenophobia is a key driver of discrimination and conflict Many ML fairness frameworks do not measure or mitigate xenophobic harms Aim to bridge the gap between AI and xenophobia Identify distinct types of xenophobic harms Review potential interplay between AI and xenophobia in various application domains Recommendations for inclusive, xenophilic design of future AI systems Paper Content Introduction AI is being used more and more in our daily lives There is a need to ensure that the risks and benefits of AI are distributed fairly AI systems can be biased against certain characteristics, such as race and gender AI systems need to take into account structural and historical power asymmetries This paper focuses on xenophobia, which is discrimination against the foreign Xenophobia is a growing problem, especially during the Covid-19 pandemic AI can be used to mitigate or amplify xenophobia Existing AI fairness strategies focus on legally protected groups AI systems need to make explicit normative choices when distinguishing between us and them AI can be used to detect hate speech and dangerous speech This paper reviews the impact of xenophobia in social media, healthcare, immigration, employment, and large pre-trained models The paper makes a moral argument for inclusive, xenophilic systems On xenophobia Xenophobia is a form of hostility or prejudice directed towards foreigners, immigrants or those construed as “others” It can manifest as fear, dislike or hate towards people who are perceived to be different It has been associated with misassociations, stereotyping and cognitive bias It may be attitudinal prejudice or systematically biased institutional and structural processes Xenophobia is distinct from racism It is orientated around the notion of “civic ostracism” It penalises individuals on the basis of their foreignness It may result in discriminatory material disadvantages It may deny individuals proper ethical recognition It may restrict the effective exercise of individuals’ rights It may manifest differently than racism or sexism Practical considerations Development of technological solutions can address discrimination against those perceived to be foreign Design of inclusive and xenophilic AI solutions can help address discrimination Social media Social media can amplify xenophobia Low barriers to entry and difficulty in moderation can lead to fake news and hateful speech Social media can also provide a medium for positive and inclusive views Social media can shape culture, both positively and negatively There is a strong link between social media and hate crime Harms in the digital sphere can extend to the real world Social media can be used to shape public opinion and exclude minority views Deepfakes can amplify xenophobic narratives Promise....

Protein Structure Prediction until CASP15

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract AlphaFold2 presented at CASP14 in Dec 2020, revolutionizing protein structure predictions AlphaFold2 code released in summer 2021, can accurately predict structure of most proteins and protein-protein interactions AlphaFold2 release sparked explosion of development in the field, improving AI-based methods for protein complexes, disordered regions, and protein design Paper Content Background Protein structure prediction is divided into two categories: template-based and de-novo In the 1980s, methods were developed to predict structure by copying coordinates from an experimentally determined protein structure Increase in sequence and structure data has enabled more proteins to be modelled accurately De-novo modelling has blurred over the last two decades by using fragments In the 1990s, co-evolutionary signals in a multiple sequence alignment were proposed to predict structure 1999 solution proposed to increase accuracy of contact predictions by separating direct and indirect contacts Ten years ago, the idea of indirect correlation was rediscovered Combining DCA and machine learning was a way to improve contact prediction AlphaFold (version 1) introduced at CASP13, deeper architecture and predicted distance probabilities Alphafold v2....

Transformers learn in-context by gradient descent

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transformers are the state-of-the-art neural network architecture for machine learning. Training Transformers on auto-regressive tasks is related to gradient-based meta-learning formulations. Training self-attention-only Transformers on simple regression tasks shows similarity between models learned by GD and Transformers. Optimized Transformers implement gradient descent in their forward pass. Transformers surpass plain gradient descent by an iterative curvature correction....