arxiv-summary: AI-summarized AI papers

Dissociating language and thought in large language models: a cognitive perspective

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs can generate coherent, grammatical and seemingly meaningful text. LLMs are capable of performing tasks that require abstract knowledge and reasoning. LLMs show impressive performance on tasks requiring formal linguistic competence, but fail on tasks requiring functional competence. Paper Content Introduction Alan Turing proposed the Turing test to determine if an agent is a human or a machine....

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Combines mel spectrograms, diffusion models, and neural vocoders to synthesize long-context, high-fidelity music Synthesizes 190 seconds of stereo music at 44.1 kHz without concatenative synthesis, cascading architectures, or compression techniques First work to successfully employ a diffusion-based model for synthesizing long music samples at high sample rates Demo and code available online Paper Content Introduction Music is a universal language that connects people from diverse cultures....

Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Drug combination therapy is a well-established strategy for disease treatment. Computational approaches, specifically deep learning models, can be used to discover synergistic combinations. Data from various datasets was collected and used to generate informative representations and features. A message-passing graph was built to propagate information and graph structure learning flexibility. State-of-the-art results were achieved in comparison with other deep learning-based methods....

Recent advances in artificial intelligence for retrosynthesis

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Retrosynthesis is a cornerstone of organic chemistry AI has revolutionized retrosynthesis This review provides a taxonomy of existing methods and a comparison of their performance Popular databases and platforms for retrosynthesis are introduced Discussion of promising research directions is included Paper Content Single-step retrosynthesis methods Single-step retrosynthesis can be divided into two categories: selection-based and generation-based Selection-based methods use chemical knowledge to make predictions, but have limited generalization ability Generation-based methods generate reactants directly and have wider generalization ability Selection-based methods Two types of selection-based retrosynthesis methods: reactant selection and template selection Generation-based methods Generation-based retrosynthesis methods rely on no chemical knowledge Divided into template-free methods and semi-template methods Semi-template methods divide retrosynthesis into two steps: identify reaction center and complete synthons to reactants Template-free methods formulate retrosynthesis as a sequence generation problem Sequence can be a series of SMILES tokens or molecular edit actions on the molecule graph Modeled as a sequence-to-sequence translation problem Encoder-Decoder architecture based on LSTM cells Transformer-based network to train the forward reaction prediction and retrosynthesis simultaneously Data augmentation strategies classified into four categories Pretrain tasks divided into two types: same/similar content and modified output Transformer performs well on the retrosynthesis task Multi-step retrosynthesis methods Most molecules require more than one step to synthesize....

YOLOv6 v3.0: A Full-Scale Reloading

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract YOLO community has released two versions of YOLOv6 YOLOv6 v3.0 has been released for Chinese New Year 2023 YOLOv6-N has 37.5% AP on COCO dataset at 1187 FPS YOLOv6-S outperforms other mainstream detectors at same scale YOLOv6-M/L have better accuracy performance than other detectors YOLOv6-L6 has state-of-the-art accuracy in real-time Code is available on GitHub Paper Content Introduction YOLO series is popular for its balance between speed and accuracy YOLOv4 reorganized the detection framework into several parts YOLOv6 has a Concatenation (BiC) module, a SimCSPSPPF Block, an anchor-aided training (AAT) strategy, a deeper backbone and neck, and a self-distillation strategy Method Network design BiC module applied to top-down pathway of PAN brings AP improvements on YOLOv6-S/L BiC module not applied to bottom-up pathway due to confusion for detection heads BiC module gives impressive boost to performance of small object detection Different types of SPP Blocks explored, SimCSPSPPF blocks introduced for better accuracy-efficiency trade-off Anchor-aided training AAT brings 0....

Designing losses for data-free training of normalizing flows on Boltzmann distributions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Generating a Boltzmann distribution in high dimension has been achieved with Normalizing Flows. Current implementations rely on accurate training data. There is an incentive to train models with incomplete or no data. Standard losses based on Kullback-Leibler divergences have limitations. Strategies to alleviate these issues have been proposed. Imperfect pre-trained models can be further optimized in the absence of training data....

Guiding Text-to-Image Diffusion Model Towards Grounded Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Goal: Augment pre-trained text-to-image diffusion model with open-vocabulary objects grounding Contribution: Insert grounding module into existing diffusion model, automatic pipeline for constructing dataset Paper Content Introduction Text-to-image generative models have strong semantic correspondence between visual and language Models lack ability to ground objects within generated images Paper aims to augment existing text-to-image diffusion model with ability to generate photorealistic images and segmentation masks Challenges include establishing visual-language correspondence and open-vocabulary grounding Automatic pipeline developed to construct {image, segmentation, text prompt} triplets Novel architecture proposed to segment any visual entity mentioned in text prompt Evaluation protocols initiated to validate effectiveness of open-vocabulary grounding Related work Image generation is a challenging task in computer vision Generative adversarial networks (GANs), variational autoencoders (VAEs), flow-based models and autoregressive models (ARMs) have made progress Diffusion Probabilistic Models (DMs) demonstrate state-of-the-art generation quality Visual grounding is used to understand natural language queries and find target objects in an image Methodology Aim to introduce a knowledge induction procedure to convert an existing text-to-image diffusion model for grounded generation....

Tracr: Compiled Transformers as a Laboratory for Interpretability

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Interpretability research aims to build tools to understand ML models. We propose to build transformer models manually as a testbed for interpretability research. Tracr is a “compiler” that translates human-readable programs into weights of a transformer model. Tracr is used to create ground truth transformers that implement programs such as computing token frequencies, sorting, and Dyck-n parenthesis checking....

Taking Search to Task

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Tasks in information retrieval have been discussed for a long time. Understanding and addressing users’ tasks is more important than ever. This paper provides perspectives on tasks in IR, bottlenecks, and how to move forward. A tree-like structure is presented to ground ongoing and future research. Paper Content Introduction Scholars have argued for the importance of considering task information in information retrieval Several attempts have been made to understand search tasks and use task knowledge to better provide support Search services have incorporated spatial and temporal information to understand or expand a query This paper provides a new foundation for task-related information in IR applications It outlines paths and perspectives for capturing task-related information and using it in IR applications A brief history of tasks in ir Task is a set of physical, cognitive, and affective actions to achieve goals....

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Common to have multiple annotators label text and obtain ground truth labels based on agreement of major annotators Need NLP systems to represent people’s diverse voices on subjective matters and predict level of diversity Examines whether text of task and annotators’ demographic background info can be used to estimate level of disagreement Paper Content Introduction Supervised AI systems are trained on annotated datasets with labels determined by consensus among multiple annotators....