arxiv-summary: AI-summarized AI papers

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference. State-of-the-art QE systems based on pretrained language models have been achieving remarkable correlations with human judgements. Limitations of these systems include being computationally heavy and requiring human annotations. Metric estimation (ME) predicts automated metric scores without the reference....

The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Companies struggle to develop and deploy AI models to production systems. Continuous pipelines for AI are an active research area. This paper includes a Multivocal Literature Review and semi-structured interviews. Paper provides and compares terminologies for DevOps, CI/CD, MLOps, lifecycle management, and CD4ML. Paper provides a list of potential triggers for reiterating the pipeline....

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Deep neural networks (DNN) are increasingly trained over massive GPU accelerators. Contemporary parallelization plan generators rely on empirical rules that couple transformation and scheduling. SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans. SuperScaler can generate empirical parallelization plans and construct new plans that achieve up to 3....

Regeneration Learning: A Learning Paradigm for Data Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Machine learning methods are used to build a mapping from source data to target data. Target data is high-dimensional and complex, making it difficult to learn the mapping. Regeneration learning is a learning paradigm that generates an abstraction of the target data, then uses it to generate the target data. Regeneration learning is a counterpart of traditional representation learning....

Explainable Multilayer Graph Neural Network for Cancer Gene Prediction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Identification of cancer genes is a challenging problem in cancer genomics research Computational methods, including deep neural networks, have been developed to address this issue These methods fail to exploit gene-gene interactions and provide little explanation for their predictions Proposed EMGNN approach leverages multiple gene-gene interaction networks and multi-omics data EMGNN outperforms existing approaches and provides valuable biological insights into its predictions Paper Content Introduction Understanding gene function and disease pathogenicity depends on gene properties and interactions High-throughput experiments enable profiling of genetic and molecular properties Computational methods predict gene functions by combining gene properties and network connectivity Predicting gene pathogenicity in disease-specific contexts is challenging Cancer sequencing projects generate data for identifying novel cancer genes EMOGI models multi-omics features of cancer genes in PPI networks to predict novel cancer genes EMGNN proposed to address challenge of functional properties irrelevant to cancer disease physiology EMGNN maximizes concordance of functional gene relationships with unknown disease physiology EMGNN achieves state-of-the-art performance by combining information from 6 PPI networks EMGNN identifies most important multi-omics features and most influential PPI networks Datasets Trained proposed model with 6 PPI Networks Used mutation, copy number, DNA methylation and gene expression data from 29,446 samples from TCGA Data from 16 different cancer types Multilayer graph neural network Graph neural networks (GNNs) are used to leverage both network structure and node features....

Is ChatGPT A Good Translator? A Preliminary Study

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ChatGPT is evaluated for machine translation Candidate prompts generally work well ChatGPT performs competitively with commercial translation products on high-resource European languages ChatGPT lags behind significantly on low-resource or distant languages ChatGPT does not perform as well as commercial systems on biomedical abstracts or Reddit comments Paper Content Introduction ChatGPT is an intelligent chatting machine It is trained to follow instructions and provide detailed responses It can answer followup questions, admit mistakes, challenge incorrect premises, and reject inappropriate requests It can do various natural language processing tasks, including question answering, storytelling, logic reasoning, code debugging, and machine translation Evaluation setting Compared 3 commercial translation products Evaluated on Flores-101, WMT19 Biomedical Translation Task, WMT20 Robustness Task Sampled 50 sentences from each set for evaluation Used BLEU score, ChrF++, and TER as metrics Translation prompts ChatGPT was asked to provide ten concise prompts or templates for machine translation Three candidate prompts were summarized from the results, with an extra added to one of them The three candidate prompts were compared on a Chinese-to-English translation task, with TP3 performing the best in terms of all three metrics Multilingual translation Four languages are evaluated: German, English, Romanian, and Chinese 12 directions of translation are tested German-English translation is considered a high-resource task Romanian-English translation is considered a low-resource task ChatGPT performs competitively for German-English translation ChatGPT lags behind for Romanian-English translation Translating between different language families is harder than within the same language family Translation robustness ChatGPT was evaluated on WMT19 Bio and WMT20 Rob2 and Rob3 test sets....

Multiview Compressive Coding for 3D Reconstruction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Visual recognition aims to understand objects and scenes from a single image. 3D recognition is more challenging due to occlusions not depicted in the image. This work explores single-view 3D reconstruction by learning generalizable representations. A framework is introduced that operates on 3D points of single objects or whole scenes. The model, Multiview Compressive Coding (MCC), learns to compress the input appearance and geometry....

Everything is Connected: Graph Neural Networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Graphs are a main form of data from nature. Graphs are used to represent patterns in natural and artificial systems. Graphs are used in traffic forecasting, drug discovery, social network analysis and recommender systems. Graphs are related to image, text and speech processing. Paper Content The fundamentals: permutation equivariance and invariance Studying data that lives on graphs is a good idea Graph-structured inputs have a set of edges and a set of nodes Each node has a feature vector Node feature matrix is prepared by stacking the features Adjacency matrix is used to represent edges Permuting nodes and edges should not change outputs GNNs can be classified into three spatial flavours Node classification, graph classification and link prediction are three principal tasks GNNs may be limited in terms of problems they can solve GNNs in Equation 10 are Turing universal Gnns without a graph: deep sets and transformers Assumption that input graph is given is often not true Optimal computation graph may not be given GNNs can modulate input graph structure Deep Sets model assumes no edges Lazy option is to assume fully connected graph Equation 9 reduces to Transformer NLP perspective: words interact, optimal graph task-dependent Transformer rederived Transformer computations align with hardware Storage complexity better than message passing Third option: infer graph structure for GNN Latent graph inference is challenging Various paradigms proposed to overcome challenge Gnns beyond permutation equivariance: geometric graphs Graphs can be endowed with spatial geometry Features and coordinates of nodes can be transformed by 3D rotations, translations and reflections Model proposed by Satorras et al....

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Generative transformer models have become complex and can process multiple input modalities. Current methods for explaining their predictions require a lot of extra memory and are difficult to use in production. AtMan provides explanations of generative transformer models with almost no extra cost. AtMan manipulates the attention mechanisms of transformers to produce relevance maps....

Batch Prompting: Efficient Inference with Large Language Model APIs

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs can be computationally and financially costly to use. Batch prompting is an alternative prompting approach that reduces costs. Batch prompting can improve downstream performance. Paper Content Introduction Large language models (LLMs) have shown strong capabilities in zero/few-shot settings Recent work has made progress in in-context learning LLMs can be costly in terms of token and time usage Batch prompting is an alternative approach that allows the model to perform inference on multiple samples at once Batch prompting reduces token and time costs while still retaining downstream performance Batch prompting works well across different LLMs and reasoning methods Approach Introduces batch prompting as an efficient alternative to standard prompting Compares token and time costs of batch and standard prompting Problem setup Conventional paradigm for prompting LLMs for in-context learning involves selecting K in-context few-shot exemplars with both a context and an output....