Fairness in Language Models Beyond English: Gaps and Challenges

Fairness in Language Models Beyond English: Gaps and Challenges

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models are not treating diverse demographic groups fairly. Most research on fairness has been focused on English. This paper looks at fairness in multilingual and non-English contexts. Current research is limited and cannot be scaled across languages and cultures. Paper Content Introduction Language models are susceptible to spurious correlations and encoding biases Representational harms refer to groups being misrepresented or underrepresented Allocational harms refer to inequitable distribution of resources and opportunities Bias can occur in multiple steps of the pipeline Most work on fairness in NLP is Anglo-centric NLP systems can reinforce and reproduce social and racial hierarchies Insufficient documentation of harms from unfair models Interplay between privacy, efficiency, and fairness in NLP is understudied Metrics for measurement Monolingual systems have biases and challenges Bias in NLP models can be quantified with intrinsic and extrinsic measures WEAT and SEAT are commonly used metrics StereoSet and CrowS-Pair measure stereotypical proclivity Blodgett et al....

February 24, 2023 · 1011 words · Krithika Ramesh, Sunayana Sitaram, Monojit Choudhury
Model-Based Uncertainty in Value Functions

Model-Based Uncertainty in Value Functions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning Characterizing the variance over values induced by a distribution over MDPs Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation New uncertainty Bellman equation converges to the true posterior variance over values and explicitly characterizes the gap in previous work Easily integrated into common exploration strategies and scales naturally beyond the tabular setting Experiments show that sharper uncertainty estimates improve sample-efficiency Paper Content Introduction Goal of reinforcement learning (RL) agents is to maximize expected return Model-based RL (MBRL) learns a statistical model of the environment Recent improvements in deep MBRL algorithms due to models that quantify epistemic and aleatoric uncertainty Paper proposes learning the solution to a Bellman recursion prescribed by theory Experiments in tabular and continuous control problems demonstrate improved sample efficiency Model-free approaches to Bayesian RL directly model the distribution over values Model-based Bayesian RL maintains a posterior over plausible MDPs Dyna-style actor-critic algorithms paired with model-based uncertainty estimates for improved performance Optimism in the face of uncertainty (OFU) relies on building upper-confidence estimates of true values Paper focuses on estimating and using the variance of the expected return for policy optimization Aleatoric uncertainty about returns originates from aleatoric noise of MDP transitions and stochastic policy Epistemic uncertainty about value function due to incomplete knowledge of MDP Problem statement Agent acts in an infinite-horizon MDP Finite state and action spaces Unknown transition function Known and bounded reward function Discount factor between 0 and 1 Reward function can be learned alongside transition function One-step dynamics Stochastic policy At each time step, agent selects action, receives reward, transitions to next state Bayesian setting with known prior distribution Value function is expected sum of discounted rewards Estimate variance of value function Independent transitions and acyclic MDP assumed Posterior mean transition and value functions of interest Local uncertainty defined and UBE solved Uncertainty bellman equation Built a new UBE whose fixed-point solution is equal to the variance of the value function Showed the gap between the UBE and the variance of the value function The Bellman recursion propagates knowledge about the local rewards The UBE propagates a notion of local uncertainty The fixed-point solution to the UBE encodes the long-term epistemic uncertainty about the values of a given state Theorem 1 states that the U-values converge exactly to the variance of values Theorem 2 presents a clear relationship that connects Theorem 1 with the upper bound The uncertainty reward has two components: total uncertainty about the mean values and average aleatoric uncertainty about the value of the next state Corollary 1 states that the solution to the UBE results in an upper-bound of the variance The gap between the exact reward function and the approximation is fully characterized by a gap term The influence of the gap term depends on the stochasticity of the dynamics and the policy The method returns the exact epistemic uncertainty about the values Toy example MRP has 4 possible combinations of δ and β Assumptions 1 and 2 are satisfied Table 1 includes results for uncertainty rewards and UBEs W π (s 2 ) = U π (s 2 ) Gap terms for s 2 cancel out W π overestimates variance of value by ∼ 36% Variance-driven optimistic exploration Propose a technique to solve the RL problem using uncertainty quantification of Q-values Define Γ t as the posterior distribution over MDPs Update policy by solving the upper-confidence bound optimization problem Typical RL techniques violate theoretical assumptions Propose practical upper-bound on the solution of the UBE Tabular implementation uses Dirichlet prior on transition function and Normal prior for rewards Deep RL implementation uses MBPO architecture and approximates sum of cumulative uncertainty rewards Train an ensemble of N value functions and U-net UBE-based methods have added complexity of training U-net Small N reduces computational burden Experiments Evaluated performance of policy optimization scheme Examined different variance estimates from Section 4 Tabular environments Evaluated tabular implementation in grid-world environments Used PSRL as baseline Tested agent’s ability to explore over multiple time steps in presence of deterrent Considered deterministic version of problem Optimal policy is to always go right Ran each method for 1000 episodes and five random seeds Found using u min = -0....

February 24, 2023 · 1301 words · Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
MUX-PLMs: Pre-training Language Models with Data Multiplexing

MUX-PLMs: Pre-training Language Models with Data Multiplexing

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Data multiplexing is a method to improve a model’s inference efficiency. Prior work on data multiplexing only used task-specific Transformers without pre-training. This paper develops pre-trained multiplexed language models (MUX-PLMs). MUX-PLMs can be widely finetuned on any downstream task. MUX-PLMs include a three-stage training procedure and novel multiplexing and demultiplexing modules. MUX-BERT and MUX-ELECTRA models achieve 2x/5x inference speedup with a 2-4% drop in performance on GLUE and 1-2% drop on token-level tasks....

February 24, 2023 · 788 words · Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang and 2 others
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ProofNet is a benchmark for autoformalization and formal proving of undergraduate-level mathematics. ProofNet consists of 371 examples, each with a formal theorem statement, natural language theorem statement, and natural language proof. The problems are from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology....

February 24, 2023 · 622 words · Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev and 1 others
Flexible Phase Dynamics for Bio-Plausible Contrastive Learning

Flexible Phase Dynamics for Bio-Plausible Contrastive Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Learning algorithms used in neuroscience and neuromorphic chips use Contrastive Learning (CL) CL traditionally implemented with rigid, temporally non-local, and periodic learning dynamics Recent work explores how CL might be implemented by biological or neurmorphic systems CL can be made temporally local and still function even if many dynamical requirements are relaxed Theorems and numerical experiments provide theoretical foundations for CL methods for biological and neuromorphic neural networks Paper Content Introduction CL is a family of algorithms for learning representations CL can be divided into equilibrium-based and nonequilibrium-based methods CL has been proposed as a normative model for biological learning Investigating alternative training dynamics for CL CL learns representations by leveraging statistical differences between positive and negative samples CL algorithms have a periodic, biphasic time course of learning Investigating temporal non-locality, periodicity, learning rate modulation, and deterministic phase length Proposing an importance sampling-inspired approach to estimating the gradient Not always optimal to spend equal amount of time learning from positive and negative samples Proving that equilibrium-based CL can still occur with no learning rate modulation and with noise in phase length Relaxing requirements on learning dynamics of CL Background AI algorithms are being used as normative models of learning in biological neural networks AI algorithms often exhibit properties that are computationally incompatible with neural hardware Bio-plausible learning is relevant to both neuroscience and AI Backpropagation is an example of an AI algorithm that is not bio-plausible Neuromorphic computing is a potential solution to AI energy efficiency problems CL algorithms do not require backpropagation for neural network-based learning CL algorithms exhibit properties that have not been observed in the brain CL algorithms require a specific pattern of learning rate scheduling and periodic stimulus/sample presentation The Boltzmann machine is an example of a classic equilibrium-based CL method Equilibrium-based CL must run internal dynamics on the space of the given model’s hidden unit activations Non-equilibrium-based CL has been used for unsupervised generative modelling Temporally local learning has been explored in equilibrium-based and non-equilibrium-based CL methods Theoretical results CL can be generalized to temporally local learning dynamics Learning rate and phase length can be studied in equilibrium-based CL systems SGD is used to optimize a two-term objective function Separate passes through the neural network are needed to estimate each term of the gradient Classic CL algorithms have this form of gradient Gradient sampling to avoid non-locality We can resolve the temporal non-locality of Equation 2 by considering the situation where each term g + and g − are used separately to do parameter updates....

February 24, 2023 · 1112 words · Ezekiel Williams, Colin Bredenberg, Guillaume Lajoie
In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages

In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Multilingual generative language models are becoming more fluent in many languages. It is unknown what cultural biases are present in the predictions of these models. This work focuses on formality, a language property highly influenced by culture. Two popular multilingual language models were analyzed in 5 languages. The models were found to be biased towards the formal style when prompted neutrally....

February 23, 2023 · 1019 words · Asım Ersoy, Gerson Vizcarra, Tasmiah Tahsin Mayeesha, Benjamin Muller
DisCO: Portrait Distortion Correction with Perspective-Aware 3D GANs

DisCO: Portrait Distortion Correction with Perspective-Aware 3D GANs

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Close-up facial images often have perspective distortion. Proposed method for correcting perspective distortion in a single close-up face. Method uses GAN inversion and joint optimization of camera parameters and face latent code. Method uses focal length reparametrization, optimization scheduling, and geometric regularization. Results show improved visual quality compared to previous approaches. Paper Content Introduction Millions of people take smartphone selfies every day Smartphones have high-quality cameras Selfies suffer from perspective distortion Perspective distortion makes faces look unnatural and asymmetric Existing methods aim to correct distortion using reconstruction-based and learning-based warping 3D GAN inversion proposed to correct distortion 3D GAN inversion estimates facial geometry and camera-to-face distance Optimization of parameters is ill-posed Three designs proposed to address problem Quantitative evaluation protocol established Related work Selfie photos taken from close distances often exhibit perspective distortions People are bothered by distorted facial features Existing smartphones attempt to persuade people to take selfies from a longer distance Existing perspective distortion methods have difficulty handling severe distortions 3D face reconstruction from a single image is challenging Existing methods are limited to reconstructing only the face Prior works focus on normalizing head pose Conditional generative models learn a face-specific GAN to generate a target face pose 2D GAN inversion methods optimize the latent code for a single image 3D GAN inversion approaches optimize the face latent code and part of the camera parameters Jointly estimating face shape, camera-to-face distance, and focal length is challenging Method Aim to manipulate camera-to-subject distance of single close-up face portrait Propose 3D GAN inversion to invert portrait to corresponding face latent code and camera parameters Adjust camera parameters according to user preference, especially camera-to-subject distance and focal length Develop workflow to warp and blend regions to compose full-frame image/video Preliminary StyleGAN maps random samples from a normal distribution to an intermediate latent vector 3D GAN uses additional camera parameters and a neural render to generate the final image Training and inversion of 3D GANs require aligning and cropping the face Perspective-aware 3d gan inversion 3D GAN with additional camera parameters can enable camera-controllable image generation Inversion process is complicated when using single-face image Problem is ill-posed, meaning multiple combinations of focal length, camera-to-subject distance, and face shape can match input image Existing 3D GAN inversions focus on far camera-to-subject distances Accurate estimation of both camera-to-subject distance and focal length is necessary for near-range camera-to-subject distances Focal length reparameterization, optimization scheduling, and landmark regularization proposed to ease ill-posedness and improve facial geometry and rendering results Start from close camera-to-subject distance to ease optimization Optimization of face and camera parameters is asynchronous Uncertainty-based landmark loss used to increase sensibility to camera-to-subject variation Stitching 3D GAN inversion method can manipulate camera distance and focal length to render virtual images System developed to stitch reprojected face with original full image Algorithm aligns and blends depth from 3D GAN and depth estimated for full image Entire image projected to far distance using same camera parameter as 3D GAN Generator fine-tuned to make border of synthesis close to warped full image Refined synthetic far image and warped full image blended to produce complete image Implementation details Learning rates set to 1x10-2, 5x10-3, and 3x10-4 EG3D pretrained on FFHQ used in experiments Camera parameters initialized using Deng et al....

February 23, 2023 · 892 words · Zhixiang Wang, Yu-Lun Liu, Jia-Bin Huang, Shin'ichi Satoh, Sizhuo Ma and 2 others
Active Prompting with Chain-of-Thought for Large Language Models

Active Prompting with Chain-of-Thought for Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs can be used for complex tasks such as arithmetic and commonsense reasoning. Task-specific prompts are important for LLMs to produce high-quality answers. Active-Prompt is a new method to adapt LLMs to different tasks with task-specific example prompts. Uncertainty-based active learning is used to select the most uncertain questions for annotation. Experimental results show the superiority of the proposed method....

February 23, 2023 · 966 words · Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang
Improving Adaptive Conformal Prediction Using Self-Supervised Learning

Improving Adaptive Conformal Prediction Using Self-Supervised Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Conformal prediction is a tool for uncertainty quantification It produces valid prediction intervals with finite-sample guarantees Self-supervised learning can be used to improve the quality of conformal regressors Self-supervised pretext tasks can improve the adaptability of conformal intervals We use self-supervised error as an additional feature to estimate nonconformity scores We demonstrate the benefit of the additional information using synthetic and real data Paper Content Introduction Machine Learning research is focused on minimizing a model’s predictive error on unseen data....

February 23, 2023 · 890 words · Nabeel Seedat, Alan Jeffares, Fergus Imrie, Mihaela van der Schaar
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models

DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural Radiance Fields (NeRFs) have shown good results on novel view synthesis tasks. NeRFs learn a scene’s color and density fields by minimizing the photometric discrepancy between training views and differentiable renders of the scene. NeRFs can generate novel views from arbitrary camera positions, but can lead to artifacts when trained with few input views....

February 23, 2023 · 922 words · Jamie Wynn, Daniyar Turmukhambetov