arxiv-summary: AI-summarized AI papers

Understanding Political Polarisation using Language Models: A dataset and method

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Aim to analyze political polarization in US political system Use Language Models to help candidates make informed decisions Help voters understand candidates views on economy, healthcare, education, etc. Dataset extracted from Wikipedia spanning 120 years Language model based method to analyze polarization Data divided into 2 parts: background and political information Hypothesis: political views should be based on reason and independent of factors like birthplace, alma mater, etc....

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Visual recognition has improved rapidly in the early 2020s ConvNets (ConvNeXt) have demonstrated strong performance Self-supervised learning techniques (MAE) can potentially benefit ConvNets A new model family (ConvNeXt V2) has been proposed to improve performance on various recognition benchmarks Pre-trained ConvNeXt V2 models of various sizes are available Paper Content Introduction Visual recognition has ushered in a new era of large-scale visual representation learning Three main factors influence performance of a visual representation learning system: neural network architecture, training method, and data used for training Convolutional neural network architectures (ConvNets) have had a significant impact on computer vision research Transformer architecture has gained popularity due to its strong scaling behavior ConvNeXt architecture has modernized traditional ConvNets Focus of visual representation learning has shifted from supervised learning with labels to self-supervised pre-training with pretext objectives Masked autoencoders (MAE) have recently brought success in masked language modeling to the vision domain Combining design elements of architectures and self-supervised learning frameworks can present challenges Proposed to co-design the network architecture and the masked autoencoder under the same framework Introduced ConvNeXt V2 which demonstrates improved performance when used in conjunction with masked autoencoders ConvNeXt V2 models can be used in a variety of compute regimes and includes models of varying complexity Related work ConvNets were first introduced in the 1980s and have been improved over the years Supervised training on the ImageNet dataset has been used to discover innovations Self-supervised pre-text tasks such as rotation prediction and colorization have been used ConvNeXt has excelled in scenarios requiring lower complexity Masked autoencoders are a self-supervised learning strategy for visual recognition Fully convolutional masked autoencoder Approach is conceptually simple and runs in a fully convolutional manner Model predicts missing parts of raw input visuals given the remaining context Masking ratio of 0....

Massive Language Models Can Be Accurately Pruned in One-Shot

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract GPT family models can be pruned to 50% sparsity without retraining SparseGPT is a new pruning method designed for GPT-family models SparseGPT can reach 60% sparsity with minimal increase in perplexity SparseGPT is compatible with weight quantization approaches Paper Content Introduction Large Language Models (LLMs) from the Generative Pretrained Transformer (GPT) family have shown remarkable performance on a wide range of tasks....

Muse: Text-To-Image Generation via Masked Generative Transformers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Present Muse, a text-to-image Transformer model that is more efficient than diffusion or autoregressive models Trained on a masked modeling task in discrete token space Uses pre-trained large language model (LLM) to predict randomly masked image tokens More efficient than diffusion and autoregressive models Achieves high-fidelity image generation and understanding of visual concepts New SOTA on CC3M with FID score of 6....

Rethinking with Retrieval: Faithful Large Language Model Inference

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs can have incomplete, out-of-date, or incorrect knowledge External knowledge can be used to assist LLMs Current methods for incorporating external knowledge require additional training or fine-tuning Rethinking with Retrieval (RR) is a novel post-processing approach that retrieves relevant external knowledge RR does not require additional training or fine-tuning and is not limited by the input length of LLMs RR is evaluated on three complex reasoning tasks and improves the performance of LLMs Paper Content Introduction LLMs have shown good performance without task-specific training or fine-tuning Chain-of-thought prompting can generate explanations and predictions Rethinking with retrieval uses decomposed reasoning steps to retrieve external knowledge RR provides more faithful explanations and more accurate predictions RR is evaluated on 3 complex reasoning tasks using GPT-3 175B and external knowledge sources Related work Retrieval-enhanced LMs have been used to improve performance Search query generators have been used to retrieve relevant documents Documents have been used as additional context in generation tasks Human feedback has been used in a text-based web-browsing environment External knowledge sources have been incorporated to enhance LMs for tabular reasoning tasks Explicit rules have been added to inputs to improve reasoning ability Symbolic reasoning module has been introduced to improve coherence and consistency Careful prompting has been used to encourage LLMs to generate explanations Sampling diverse reasoning paths has been proposed to improve LLMs LLMs have been shown to generate incorrect supporting facts External knowledge has been proposed to enhance LLMs without additional training or fine-tuning Knowledge retrieval has been used to estimate the faithfulness of each reasoning path Inference procedure has been proposed to identify the most faithful prediction Baselines Zero-shot/few-shot prompting: providing few in-context exemplars of input-output pairs in the prompt Chain-of-thought prompting: feeding LLMs step-by-step reasoning examples instead of standard input-output examples Self-consistency: sampling a diverse set of reasoning paths and selecting the most consistent answer by marginalizing the sampled paths Commonsense reasoning Utilize Wikipedia as external knowledge base Use BM25 to retrieve top 10 most relevant paragraphs Use MPNet model to select most similar paragraph Use NLI model to obtain entailment and contradiction scores Calculate faithfulness of each reasoning path using f KB (•) Temporal reasoning Experiment uses TempQuestions dataset to investigate temporal reasoning Dataset includes 1,271 questions divided into four classes Focus on implicit temporal questions which contain free-text temporal expressions Utilize Wikidata as external knowledge base Extract temporal relations from Wikidata pages and convert to sentences Tabular reasoning IN-FOTABS dataset consists of 23,738 hypotheses based on 2,540 tables Development set includes 1,800 hypotheses based on 200 tables Entailed and contradictory hypotheses are considered External knowledge bases WordNet and ConceptNet are used Tables are converted into textual premises Word relation triples are retrieved and converted into sentences Final prediction is obtained using procedure described in Section 4....

DensePose From WiFi

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Advances in computer vision and machine learning have led to development in 2D and 3D human pose estimation. Human pose estimation from images is affected by occlusion and lighting. Radar and LiDAR technologies need specialized hardware that is expensive and power-intensive. Recent research has explored the use of WiFi antennas for body segmentation and key-point body detection....

Nowcasting Stock Implied Volatility with Twitter

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Predicted next-day stock end-of-day implied volatility using random forests Examined usefulness of different sources of predictors and value of attention and sentiment features from Twitter Studied approach on 165 most liquid US stocks across 11 traditional market sectors Discovered stocks in certain sectors are easier to predict than others Possible reasons for discrepancies caused by excess social media attention or low option liquidity Explored how approach fares throughout time by identifying four underlying market regimes in implied volatility Paper Content Introduction Social media has caused significant changes in the world, including financial markets Efficient Market Hypothesis (Fama, 1970) suggests rapid information diffusion could lead to higher price efficiency Behavioral economists argue social media could influence investors and incite herd behavior Data providers offer social media indicators to help financial institutions Academic studies have tried to quantify the interplay between social media and financial variables Existing research has mainly focused on Twitter and its influence on stock price, volatility and trading volume Current literature overlooks the interaction between social media and market implied volatility This study investigates the ability to predict one-day ahead movement in implied volatility using machine learning Stock universe is diversified across 11 traditional US stock market sectors Out-of-sample period spans from January 1st, 2013 till March 1st, 2019 Hidden Markov models used to identify four regimes in implied volatility and measure performance across them Preliminaries Explains market implied volatility and its relation to derivatives Describes random forest machine learning model for predictions Describes hidden Markov model for quantifying regimes in market implied volatility Market implied volatility Options are a type of financial instrument Sellers of options are exposed to risk Measuring this risk requires considering expected price fluctuations of the underlying asset The CBOE Volatility Index combines implied volatility of different option contracts into an index The VIX is a measure of expected price fluctuations in the S&P 500 Index over the next 30 days Equation 1 is used to compute the VIX for a given term Option contracts typically have fixed expiration dates The VIX is calculated by linearly interpolating between two computed measures Random forests Random forests are a machine learning approach for learning a predictive model They consist of multiple decision (or regression) trees whose predictions are combined Combination is typically done by taking the mode (or average) of all outputs Random forests are fast to build, not affected by feature scaling, robust to irrelevant predictors and noisy data Constructing an ensemble model by randomly subsampling both data points and features helps reduce overfitting Hidden markov models Hidden Markov models (HMM) are a generative approach for modeling systems that follow a Markov process HMM models the joint distribution of a sequence of hidden states and observations Parameters of HMM are initial state distribution, state transition model, and emission probabilities model Three key tasks associated with HMM are: probability of sequence of observations, best sequence of hidden states, and learning an HMM Methodology Main goal of study is to explore 3 questions related to stock market performance Study uses random forests to predict stock market performance using stock price, implied volatility, and Twitter features Study covers 165 stocks over a 6 year period Performance is grouped by 11 traditional stock sectors Hidden Markov models used to identify 4 distinct implied volatility regimes per stock Stock universe selection Looked at popular ETFs to obtain a diversified universe of stocks Selected 15 most liquid stocks per sector for a total of 165 stocks Excluded some stocks due to stock splits, late introduction, and ambiguous names Replaced excluded stocks to maintain 15 stocks per sector Data acquisition and feature generation Data from Jan 1, 2011 to March 1, 2019 was used from 3 sources: stock prices, option contracts, and Twitter 4 features were extracted per stock for each trading day: closing price, 30-day implied volatility, total tweet count, and average sentiment polarity Sentiment polarity was calculated using VADER 2 additional predictors were generated per feature: daily difference and difference between daily value and exponential moving average of last 10 trading days Predicting movements in implied volatility This study aims to predict one-day ahead movements in a stock’s 30-day implied volatility....

Design on Matroids: Diversity vs. Meritocracy

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Provide optimal solutions to institutions with dual goals of diversity and meritocracy. Provide a class of choice rules to maximize merit while attaining a diversity level. Paper Content Introduction Meritocratic systems have been idealized since ancient times. Confucius argued that those who govern should do so based on merit. Han dynasty adopted Confucianism and implemented civil service examinations....

A Survey for In-context Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs are able to make predictions based on contexts and a few training examples This paper surveys and summarizes the progress, challenges, and future work in ICL It provides a formal definition of ICL and clarifies its correlation to related studies It discusses advanced techniques of ICL, including training strategies and prompting strategies It presents the challenges of ICL and provides potential directions for further research Paper Content Introduction Large language models (LLMs) can perform complex tasks with in-context learning (ICL) ICL requires a few examples to form a demonstration context, usually written in natural language templates ICL does not require parameter updates and directly performs predictions on the pretrained language models ICL provides an interpretable interface to communicate with large language models ICL is similar to the decision process of human beings by learning from analogy ICL is a training-free learning framework ICL has multiple attractive advantages, including being interpretable and training-free Overview ICL relies on two stages: training and inference Training stage involves language models being trained on language modeling objectives Inference stage involves input and output labels being represented in natural language templates Taxonomy of ICL is shown in Fig....

Efficient Market Design with Distributional Objectives

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Existence of a mechanism studied that weakly improves a policy objective on the distribution of student types to schools. Mechanism must satisfy constrained efficiency, individual rationality, and strategy-proofness. Mechanism need not exist in general. New notion of discrete concavity (M$^{\natural}$-concavity) introduced. Mechanism with desirable properties constructed when distributional objective satisfies M$^{\natural}$-concavity. Several distributional objectives that are natural in the setting satisfy M$^{\natural}$-concavity....