Attentive Mask CLIP

Attentive Mask CLIP

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. Removing a large portion of image tokens may discard the semantic content associated with a given text description. Proposed an attentive token removal approach for CLIP training which retains tokens with a high semantic correlation to the text description....

December 16, 2022 · 964 words · Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang and 6 others
Connecting Permutation Equivariant Neural Networks and Partition Diagrams

Connecting Permutation Equivariant Neural Networks and Partition Diagrams

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Schur-Weyl duality between partition algebra and symmetric group provides a stronger theoretical foundation for characterizing permutation equivariant neural networks. Unifies two separate bodies of literature and corrects some widely quoted results in machine learning community. Graphical representation of basis of set partitions used to find basis of matrices for learnable, linear, permutation equivariant layer functions....

December 16, 2022 · 2488 words · Edward Pearce-Crump
Efficient Conditionally Invariant Representation Learning

Efficient Conditionally Invariant Representation Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduce CIRCE, a measure of conditional independence for multivariate continuous-valued variables. Used to learn neural features of data while being conditionally independent of a distractor given a target. Requires single ridge regression from target to kernelized features of distractor. Estimation properties and consistency guarantees. Established that CIRCE is zero if and only if features are independent of distractor given target....

December 16, 2022 · 985 words · Roman Pogodin, Namrata Deka, Yazhe Li, Danica J. Sutherland, Victor Veitch and 1 others
Brauer's Group Equivariant Neural Networks

Brauer's Group Equivariant Neural Networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Characterization of group equivariant neural networks with layers as tensor power of $\mathbb{R}^{n}$ for 3 symmetry groups Spanning set of matrices for learnable, linear, equivariant layer functions in standard/symplectic basis of $\mathbb{R}^{n}$ Circumvents requirement of decomposing tensor power spaces of $\mathbb{R}^{n}$ into irreducible representations Results based on Schur-Weyl dualities established by Brauer in 1937 paper Paper Content Introduction Finding neural networks that are equivariant to a symmetry group is an active area of research Requirement for the overall network to be equivariant restricts the form of the neural network Parameter sharing within each layer results in simpler, more interpretable models Data from physical processes often comes with a certain type of symmetry Constructing equivariant neural networks typically involves decomposing tensor product representations Paper takes a different approach to constructing equivariant neural networks for 3 groups Approach is motivated by mathematical concept not seen in machine learning literature Combinatorics underlying Brauer and Brauer-Grood vector spaces provides theoretical background Finds a spanning set of matrices for learnable, linear, equivariant layer functions Neural networks are simple to implement Schur-Weyl duality is a powerful mathematical concept to characterise group equivariant neural networks Preliminaries Field of scalars is R Tensor products are taken over R [n] represents the set {1, ....

December 16, 2022 · 962 words · Edward Pearce-Crump
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation

MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large language models have enabled progress in multi-step reasoning over text. When applied to text generation from semi-structured data, these methods suffer from low semantic coverage, hallucination, and logical inconsistency. MURMUR is a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. MURMUR uses neural and symbolic modules, a grammar, and value functions to generate reasoning paths....

December 16, 2022 · 1185 words · Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru, Asli Celikyilmaz
Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of hallucinations in neural machine translation has been recognized for a long time. Progress on alleviating the problem has been limited. Standard sequence log-probability is more informative than previously thought. Method proposed to evaluate percentage of source contribution to generated translation. Method improves detection accuracy for severe hallucinations by a factor of 2....

December 16, 2022 · 885 words · David Dale, Elena Voita, Loïc Barrault, Marta R. Costa-jussà
Biomedical image analysis competitions: The state of current participation practice

Biomedical image analysis competitions: The state of current participation practice

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Number of international benchmarking competitions in ML increasing Survey conducted to understand development of algorithms in biomedical imaging 70% of participants motivated by knowledge exchange, 16% by prize money 80 working hours spent on method development, 32% didn’t have enough time 25% perceived infrastructure to be a bottleneck 94% of solutions deep learning-based, 84% based on standard architectures 43% of data samples too large to process at once, addressed by patch-based training, downsampling, and solving 3D analysis tasks as a series of 2D tasks K-fold cross-validation on training set performed by 37%, 50% performed ensembling 48% applied postprocessing steps Paper Content Purpose Validation of biomedical image analysis algorithms is conducted through challenges....

December 16, 2022 · 988 words · Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee and 350 others
Fake it till you make it: Learning(s) from a synthetic ImageNet clone

Fake it till you make it: Learning(s) from a synthetic ImageNet clone

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Stable Diffusion is a large-scale image generation model that can create realistic images from a simple text prompt. This paper explores whether real images are necessary for training image prediction models. The paper tests the ability of Stable Diffusion to generate synthetic clones of ImageNet and measure their usefulness for training classification models....

December 16, 2022 · 1136 words · Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis
Teaching Small Language Models to Reason

Teaching Small Language Models to Reason

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Chain of thought prompting improves reasoning capabilities of large language models. Reasoning capabilities only appear in models with over 100 billion parameters. Knowledge distillation can transfer reasoning capabilities to models with less than 100 billion parameters. Experiments show improved task performance across arithmetic, commonsense and symbolic reasoning datasets. Accuracy of T5 XXL on GSM8K improves from 8....

December 16, 2022 · 864 words · Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn
How to disagree well: Investigating the dispute tactics used on Wikipedia

How to disagree well: Investigating the dispute tactics used on Wikipedia

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Disagreements are studied from two perspectives: toxicity and argument structure Framework proposed unifies perspectives and includes dialogue acts such as asking questions and providing clarification Framework includes preferential ordering of rebuttal tactics 213 disagreements annotated from Wikipedia Talk pages to investigate research questions Models developed for multilabel prediction of dispute tactics in an utterance Auxiliary task used to incorporate ordering of rebuttal tactics Paper Content Introduction Disagreements are common in online communication Debate and disagreement can lead to better supported beliefs People can be biased and not consider evidence NLP research has looked at detecting negative aspects of online disagreements Argumentation mining looks at identifying argument structures and inferring argument quality Real world disagreements contain both well-structured arguments and attacks Proposed framework of dispute tactics consisting of rebuttal and coordination strategies 213 disputes annotated with dispute tactics Lower mean rebuttal level in a disagreement is correlated with less constructive dispute resolutions People use a range of rebuttal levels more often than adhering to only one Models developed to predict dispute tactics used in an utterance Annotations can be used to improve predicting whether a dispute will be resolved without escalating to a moderator Online disagreements Wikipedia Talk pages are used for NLP studies and to coordinate edits and resolve disputes WikiDisputes is a dataset of Talk page discussions tagged as “disputes” Wikipedia recommends Graham’s hierarchy of disagreement as a guide for constructive dispute resolution Graham’s hierarchy has 7 levels of disagreement, ranging from namecalling to refuting the central point Tang and Wang used this taxonomy to analyse the rationality of online discussions Walker et al....

December 16, 2022 · 985 words · Christine de Kock, Tom Stafford, Andreas Vlachos