arxiv-summary: AI-summarized AI papers

TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Presents TextBox 2.0, a comprehensive and unified library for text generation research Covers 13 common text generation tasks and 83 datasets Incorporates 45 pre-trained language models Implements 4 efficient training strategies and provides 4 generation objectives Easy to use through Python API or command line Paper Content Introduction Text generation is a fundamental technique in many text applications Pre-trained language models (PLMs) are the mainstream approach to developing effective text generation models TextBox 2....

Sitting Posture Recognition Using a Spiking Neural Network

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Designed a personalized smart chair system to recognize sitting behaviors Used a liquid state machine and logistic regression classifier to construct a spiking neural network Designed an algorithm to encode map-like data into cosine-rank sparsity data Experimental results show prediction precision of 88.52% Paper Content I. introduction Smart cities are using digital twins to improve quality of life Global Burden of Disease study shows people are suffering from lower back pain due to sitting behaviors Sensing systems are needed to provide real-time feedback on sitting posture This paper presents a deep learning-powered chair to teach best sitting posture and provide feedback Chair uses pressure sensors to obtain pressure data and a spiking neural network to recognize sitting posture Ii....

Closed-form control with spike coding networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Spiking neural networks (SNNs) are not yet efficient and robust enough for control. Biological agents use sparse and irregular spiking patterns for efficient and robust control. Most artificial SNNs used for control have dense and regular activity patterns. Spike Coding Networks (SCNs) offer a fully analytical solution for implementing dynamical systems in recurrent SNNs....

GraphCast: Learning skillful medium-range global weather forecasting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ML-based weather simulator called “GraphCast” outperforms most accurate deterministic operational medium-range weather forecasting system in the world GraphCast is an autoregressive model based on graph neural networks and a novel high-resolution multi-scale mesh representation GraphCast can make 10-day forecasts, at 6-hour time intervals, of five surface variables and six atmospheric variables, each at 37 vertical pressure levels, on a 0....

Detecting Objects with Graph Priors and Graph Refinement

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Goal of paper is to detect objects by exploiting their interrelationships. Infer graph prior from object co-occurrence statistics. Model object relations as a function of initial class predictions and co-occurrence priors. Learn object-relation joint distribution via energy based modeling. Experiments show method is detector agnostic, end-to-end trainable, and beneficial for rare object classes....

SuperGF: Unifying Local and Global Features for Visual Localization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Advanced visual localization techniques require extracting global and local features from input images. SuperGF is a transformer-based aggregation model that unifies local and global features for visual localization. SuperGF is evaluated in terms of accuracy and efficiency, and is shown to be better than other methods. SuperGF can be implemented using various types of local features....

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Choosing the parameter k for k-means clustering is a challenge The elbow method is a common heuristic, but it is not reliable Better alternatives have been known for a long time Educators should discuss the problems of the elbow method and teach alternatives Researchers and reviewers should reject conclusions drawn from the elbow method Paper Content Introduction Cluster analysis is used to identify subgroups in data No single definition of a cluster exists Different algorithms are used to find the best solution K-means clustering is the most used and taught clustering method K-means is simple and runs quickly Choosing the number of clusters is a key problem with k-means K-means clustering K-means clustering is a least-squares optimization problem....

The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Small training set sizes can be approximated by an infinite width neural network. After a critical sample size, finite-width networks become worse than infinite-width networks. Finite-size effects can become relevant for very small dataset sizes. Variance of the neural tangent kernel can explain the transition. Feature learning and ensemble averaging can push the transition to larger sample sizes....

Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Investigated how humans dub video content from one language to another Leveraged a novel corpus of 319.57 hours of video from 54 professionally produced titles Challenged assumptions made in qualitative and machine-learning literature on dubbing Argued for importance of vocal naturalness and translation quality over isometric and lip-sync constraints Found influence of source-side audio on human dubs beyond words of translation Paper Content Introduction Considerable attention has been paid to the dubbing of video content from one language to another Human dubbing has been studied from a qualitative perspective Machine-learning practitioners have taken up the task of building multimodal systems for automatic dubbing Human dubbing involves a sequence of contributors with control over different aspects of the process A data-driven examination of the way humans actually perform this task is missing Human dubbing is a “constrained translation” Questions about isochrony, isometry, speech tempo, lip sync, translation quality, and source influence are explored Insights are provided on research directions to address weaknesses in current automatic dubbing approaches Related work Qualitative Dubbing is a type of constrained translation Dubs need to match the original video track Dubs need to be isochronic, phonetic and kinesic synchrony Dubs need to be intelligible to the target language and culture Dubs should sound natural Dubs should preserve the semantic meaning of the source Dubbing is a form of non-literal translation called “transcreation” Scholars have investigated the role of power, ideology, identity, and similar considerations in dubbing Automatic dubbing Automatic dub generation has been explored with a variety of constraints Lip sync constraints have been integrated into dub generation Adjusting mouth movements in the original video to match a dubbed audio track has been explored Isometric machine translation has been used to produce a translation with similar length to the input Controlling speaking rate in automatic dubbing systems to achieve prosodic alignment has been studied Time-boundary relaxation has been used to control speaking rate and speech fluency Integrating pause constraints directly into MT has been examined End-to-end dubbing has been explored Empirical studies Studies have attempted to examine human dubbing through a quantitative lens Di Giovanni and Romero-Fresco found that audiences may not be as sensitive to lip sync as traditionally believed Karakanta et al....

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Larger Transformer-based pre-trained language models have lower perplexity but are less predictive of human reading times. Regression analyses show a positive log-linear relationship between perplexity and fit to reading times for certain models. Residual errors reveal systematic deviation of larger variants, such as underpredicting reading times of named entities. Larger models tend to ‘memorize’ sequences during training, making their surprisal estimates diverge from humanlike expectations....