arxiv-summary: AI-summarized AI papers

A Watermark for Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Watermarking model output can help mitigate potential harms of large language models Watermark can be embedded with minimal impact on text quality Watermark can be detected using an open-source algorithm without access to the language model API or parameters Watermark works by selecting a randomized set of whitelist tokens and promoting use of whitelist tokens during sampling Statistical test for detecting the watermark with interpretable p-values Information-theoretic framework for analyzing the sensitivity of the watermark Paper Content Introduction Large language models (LLMs) can write documents, create executable code, and answer questions....

From Inclusive Language to Gender-Neutral Machine Translation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Gender inclusivity in language is a topic of debate and research Gender inclusivity in translation is largely unexplored Gender-Neutral Translation (GNT) is a form of gender inclusivity in translation MT models have been found to perpetuate gender bias and discrimination Relevant institutional guidelines for Gender-Inclusive Language (GIL) are reviewed GNT scenarios of use are discussed and a list of desiderata is devised Main technical challenges to the implementation of GNT in MT are identified Focus is on translation from English into Italian due to different rules for gender marking Paper Content Introduction Gender bias and discrimination perpetuated through language is a topic of discussion Language can reflect the perceived value, power and status associated with genders in society Psycholinguists have investigated the influence of gendered forms on cognition Demand for Gender-Inclusive Language has grown Responses have been disparate Debates have assumed a binary approach Two main linguistic strategies have been employed: innovative linguistic elements and neutral language English is a leader of change towards gender-inclusive language Situation is more complicated for other languages due to less timely discussions and grammatical structures Proposing a list of desiderata for Gender-Neutral Translation Challenges of implementing desiderata in the context of Machine Translation Background Gender expression is socially relevant Language reflects social change Language interacts with perception and representation of individuals Appropriate use of gender expressions is critical for human and automatically generated language Gender and language Gender is a complex concept that encompasses both social and individual aspects Language expresses gender through personal pronouns, possessive adjectives, lexically gendered forms, and compounds Gender representation in language can be discriminatory and reinforces social asymmetries Androcentric normativity promotes the masculine gender as the human prototype Stereotypes are reiterated and reinforced through associations of professional nouns and gender Gender-Inclusive Language is a form of verbal hygiene to regulate language Two strands of gender-related linguistic policies: non-sexist and non-heteronormative Innovative approaches from grassroots efforts focus on direct forms of inclusive language Gender-inclusive innovations are inconsistent across different languages Gender-neutralization strategies are an actionable and acceptable form of GIL Gender (bias) and machine translation Language technologies can amplify biased behaviors Gender bias is a tendency to discriminate against certain individuals or groups Gender bias is more evident in cross-lingual scenarios Gender bias has both technical and societal implications Recent works have focused on non-binary identities in NLP Neutral translation is a path towards avoiding gendered inferences Review of guidelines for gender-inclusive language Gender inclusivity is conceptualized differently in English and Italian guidelines English guidelines adopt a non-heteronormative outlook Italian guidelines address women and men only Strategies to address discrimination at the linguistic level vary between English and Italian Strategies to implement GIL are systematized in a multilingual perspective Focus is largely on masculine generics Discouraged androcentric forms Avoid stereotypical associations Focus on neutralization of pronouns in English and nouns in Italian Neutralization strategies range from omissions to replacements of single words Neutralization of short segments is preferable Trade-off between neutrality and acceptability of text Reformulation process resembles a rewriting process Desiderata for a gender-inclusive translation Monolingual guidelines exist for Gender-Inclusive Translation (GIL) Search queries for GIL only provide a few tips and tricks blog posts Gender-Neutral Translation (GNT) is a form of GIL GNT does not mark gender of human referents if not assigned in source text Trade-off between neutrality and linguistic acceptability Desiderata to guide GNT: avoid expressing gender if not in source, use proper expressions when gender is in source, avoid propagating masculine generics Respect speaker’s choice of gender expression when translating 1st person singular referent Challenges and insights for a gender-neutral machine translation Neutralization strategies systematized and converted into GNT desiderata Technical challenges include dedicated data, metrics and architectures Creation of dedicated benchmarks to determine advancements towards GNT Benchmarks should comprise source sentences requiring GNT and aligned with GNT counterpart in target language Evaluation protocol needs to be designed Creation of multiple reference translations or GNT-oriented quality estimation metric Training models without GNT examples Neutrally-constrained MT could rely on bilingual dictionaries Training methodology to reward highly-probable and low-cost outputs Disambiguating gender through wider context Disambiguating gender through external knowledge Conclusions Gender bias and discrimination in language is a rising concern in automatic translation MT models have been found to amplify male visibility and stereotypes This work focuses on the use of neutral forms devoid of gender marking for an English-Italian setting An extensive review of gender neutralization strategies was conducted A definition of gender-neutral translation suitable for cross-lingual contexts was outlined Technical challenges for the implementation of a gender-neutral translation in MT were discussed

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Question Answering (QA) has made progress in recent years due to large pre-trained language models, benchmark datasets and algorithms PRIMEQA is an open-source QA repository to facilitate replication of state-of-the-art QA methods PRIMEQA supports core QA functionalities and auxiliary capabilities such as question generation PRIMEQA is designed for building front-end applications, replicating SOTA methods and expanding pre-existing methods Paper Content Introduction Entry points Provides different entry points for QA community Top-level scripts, Jupyter notebooks, Inference APIs, Service Layer, UI Pipelines for openqa PRIMEQA core components and entry points make it easy to build an OpenQA pipeline....

Noisy Parallel Data Alignment

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Natural language processing disproportionately favors resource-rich languages. Most modern language technologies are either nonexistent or unreliable for under-resourced languages. OCR is used to convert endangered language documents into machine-readable data, but the output is often noisy. Word alignment models are not built to work under noisy conditions. This work studies existing word-level alignment models under noisy settings and aims to make them more robust....

InfiniCity: Infinite-Scale City Synthesis

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed framework, InfiniCity, constructs and renders an unconstrainedly large and 3D-grounded environment from random noises Decomposed into three modules: 2D map synthesis, 3D octree completion, and voxel-based neural rendering Synthesizes arbitrary-scale and traversable 3D city environments Allows flexible and interactive editing from users Paper Content Introduction Rapid evolution in generative modeling research Generators can synthesize high-quality images, 3D content and videos Most works focus on bounded space Recent attempts to achieve infinite visual synthesis with neural implicit model Take city scenes as a case study Synthesizing 3D environment broken down into stages of global structure planning and local perfection Proposed InfiniCity pipeline for infinite-scale 3D city scene generation Framework makes best use of both 2D and 3D data Interactive sampling GUI for fast and flexible user interaction Related work Attempts to generate infinite environments using finite images Divide-and-conquer strategy used to generate small patches Autoregressive and non-autoregressive inference processes used Generate 3D-grounded traversable environment of infinite scale Leverage explicit 3D supervision to learn geometry of 3D environment Octree used as 3D representation Learn 3D structure from 2D image collections GAN-based framework used for texturization Infinicity Generate infinite-scale 3D city scenes using 2D and 3D data InfiniCity synthesis pipeline consists of three components First component generates arbitrarily large satellite map from random noises Second component converts map into watertight voxel environment Third component texturizes voxel world Data preprocessing Dataset consists of images with GPS-registered camera poses, CAD model Data is processed for 3 modules: octree-based voxel completion, bird’s-eye view scan, street-view render Octree-based voxel completion: CAD model converted to set of octrees, surface octrees extracted into 2D images Street-view render: GPS-registered camera location and annotated camera orientation used to render segmentation images Infinite 2d map synthesis Generating 3D environments directly is currently not possible We propose to start by synthesizing the corresponding 2D map Leverage the infinite-pixel image synthesis ability of InfinityGAN Generate categorical labels instead of real RGB satellite images Model height map and surface normal vector to regularize structural plausibility Apply contrastive patch discriminator to increase importance of fine-grained details Synthesize tuples of images of arbitrary scale Voxel world completion Model ensures final voxel structure is watertight and maintains original voxel surfaces Adopt PVD as a critical baseline Measure distribution distance similar to FID using an autoencoder Outperforms PVD in evaluation setting Pillar method creates undesired appearances for certain object classes Synthesizing structure from satellite view simplifies and benefits structure synthesis Bilateral filtering improves plausibility of structure and suppresses noises Texturization via neural rendering Our method is the first attempt to generate infinite-scale 3D environments using 2D and 3D data....

Prediction-Powered Inference

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract A framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a machine-learning system No assumptions are made on the machine-learning algorithm Higher accuracy of predictions leads to smaller confidence intervals Algorithms for computing valid confidence intervals for statistical objects Demonstrated with data sets from proteomics, genomics, electronic voting, remote sensing, census analysis, and ecology Paper Content Introduction Machine-learning algorithms are used to make predictions Predictions can be used to generate predictions for entities not studied experimentally Examples of predictions include molecular activity, tumor prognoses, and microclimatic modeling Analysis was done to form a confidence interval for the fraction of the Amazon rainforest lost between 2000 and 2015 Gold-standard labels were collected through field visits, an expensive process Machine-learning predictions of forest cover were also available, based on satellite imagery Two natural alternatives for constructing confidence intervals were used Imputation approach yields a small confidence interval that fails to cover the true deforestation fraction Classical approach covers the truth at the expense of a wider interval Second example used to construct a confidence interval for an odds ratio between phosphorylation and disorder Imputation approach significantly overestimates the true odds ratio Prediction-powered inference framework provides an affirmative answer to the question of whether predictions can improve inferential quality Confidence intervals cover the truth and are smaller than those obtained using the classical approach General principle Goal is to estimate a quantity θ* Have access to small gold-standard data set and large unlabeled data set Use predictions from machine-learning algorithm to estimate θ* Introduce problem-specific measure of prediction error called rectifier, ∆f Use gold-standard data to construct confidence set for rectifier, R Construct confidence set C PP by rectifying θf with each value in R Confidence intervals and p-values for a broad class of statistical problems Further preliminaries Labeled data set is denoted as (X, Y) ∈ (X ×Y) n Unlabeled data set is denoted as ( X, Y ) ∈ (X × Y) N Data sets are assumed to be i....

Fully transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Deep learning can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. A DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications....

Zorro: the masked multimodal transformer

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Attention-based models are useful for multimodal processing. Zorro is a technique that uses masks to control how inputs from each modality are routed inside Transformers. Zorro achieves state-of-the-art results on most relevant benchmarks for multimodal tasks. The resulting models are able to perform unimodal inference on both video and audio benchmarks. Paper Content Introduction Humans and other animals integrate multiple modalities to build their view of the world Humans can process information and perform tasks with only one modality Most multimodal models need all modalities to operate Zorro is a multimodal Transformer architecture that can operate with single or multiple modalities Zorro has separate unimodal and multimodal representation streams Zorro can be pre-trained with audio-visual contrastive loss Zorro achieves state-of-the-art performance on relevant benchmarks Related work Multimodal perception is challenging Convolutional neural networks have been used to fuse activations from different modalities Self-supervised audio-visual learning has been used to employ cross-modality similarity Transformer architectures have been used to process different modalities Zorro masking produces unimodal outputs without running the model multiple times Zorro regulates cross-modality communication by masking latent connections Zorro is used for supervised and self-supervised training Architecture Zorro architecture consists of three main blocks Masking binary tensor used to specify which vectors are connected Mask applied to self-attention and decoding cross-attention Four outputs: audio-specific, video-specific, fusion-specific, global Zorro-Swin and Zorro-HiP variants proposed Contrastive audio-visual methods learn representations by aligning audio and video into common embedding space Self-supervised loss contrasting unimodal representations with multimodal one Experiments Evaluated Zorro architecture on multiple settings Training and evaluation procedures, as well as main datasets used Evaluated against state-of-the-art models on 3 audiovisual and 1 vision benchmark Ablated main design decisions and compared different architectures Experimental details Zorro is pre-trained using self-supervision and standard supervision Four datasets are used for pre-training: AudioSet, YouTube-8M, ACAV-100M, and ImageNet-21k Zorro is evaluated on AudioSet, VGGSound, and Kinetics-400 Zorro is also evaluated on unimodal fine-tuning tasks: Kinetics-400 for vision and ESC-50 for audio Inputs to the model are video and audio State-of-the-art comparison Evaluated Zorro against state-of-the-art methods Trained Zorro from scratch on AudioSet-2M using audio and visual modalities Zorro matches or outperforms other methods trained on AudioSet-2M Zorro performs similarly to supervised state-of-the-art when pre-trained only with self-supervision Zorro performs comparably with SOTA models when initialized with ViT pre-trained on ImageNet-21k Zorro performs similarly to MBT when pre-trained on YouTube-8M Zorro can be trained using unimodal self-supervised methods Zorro performs only 2....

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-to-image synthesis has seen progress due to large pretrained language models, large-scale training data, and scalable model families. Iterative evaluation is needed to generate a single sample with the best-performing models. Generative adversarial networks (GANs) only need a single forward pass and are faster, but remain far behind the state-of-the-art. This paper aims to identify the necessary steps to regain competitiveness....

DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Simplifying protein structures as sequences of protein angles Introducing a hidden atomic direction space (ADS) Geometric constraints can be efficiently imposed on the direction space Introducing a Direct2Seq decoder (Dec_d2s) Combining Enc_s2d and Dec_d2s to form a SDS model Applying the SDS model as a denoising neural network DiffSDS outperforms previous strong baselines on protein inpainting Paper Content Introduction Aim to improve and simplify modeling of constrained protein backbone inpainting AlphaFold breakthrough has led to increasingly sophisticated protein structure models Language transformers could unify protein structure modeling Convert protein structures into sequences of angles Language models unsuitable for constrained structure design tasks Insert hidden atomic direction space into language model to impose structural constraints Evaluated DiffSDS on CATH4....