arxiv-summary: AI-summarized AI papers

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Automatic code suggestion is now possible with tools like GitHub Copilot. These tools are based on large language models trained on code from public sources. Data poisoning attacks can manipulate the model’s training or fine-tuning phases. Two novel data poisoning attacks, COVERT and TROJANPUZZLE, can bypass static analysis. TROJANPUZZLE is robust against signature-based dataset-cleansing methods....

Training trajectories, mini-batch losses and the curious role of the learning rate

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Stochastic gradient descent is important for deep learning applications. Loss functions for large networks with large amounts of data are non-convex. Loss for fixed mini-batches can be modeled by a quadratic function. A simple model and geometric interpretation can explain the relationship between gradients of mini-batches and full batches. Averaging two points a few steps apart can improve accuracy....

HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Volumetric scene representations enable photorealistic view synthesis for static scenes. Existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. HyperReel is a novel 6-DoF video representation. HyperReel has two core components: a ray-conditioned sample prediction network and a compact and memory efficient dynamic volume representation....

Teaching Computer Vision for Ecology

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Computer vision can automate the analysis of raw imagery from sensors. Computer vision is rarely taught to ecologists. This paper discusses the experience of teaching a diverse group of ecologists to prototype and evaluate computer vision systems in a summer workshop. Paper Content Introduction Manual annotation of images and videos is time consuming Computer vision algorithms can automate this process Computer vision is important for ecology due to climate change Ecologists need to understand and apply computer vision methods Related work Teaching machine learning, deep learning, and computer vision to computer science students Teaching machine learning to cross-disciplinary audiences such as non-CS undergraduates, business students, artists, materials scientists, and biologists Teaching computer vision to ecologists with background knowledge in statistics and programming Workshop in which researchers build prototypes using their own research data, not a traditional classroom environment The cv4e workshop CV4E Workshop held at Caltech from August 1-19, 2022 Program designed to train ecologists to use computer vision Application process included project proposal, personal statement, programming example, letter of reference, and CV Selection process included reviews from machine learning and ecology communities All participants funded for travel, room, and board Participants met with instructors to finalize project plans and learn Python Lectures, Invited Speakers, Reading Groups, Work Time, and Group Updates during workshop All 18 participants had trained models by end of workshop Slack workspace and ongoing projects formed community beyond workshop Lessons learned Enforce structured Python preparation before workshop Start simple when building computer vision system Collect similar projects in working groups Mix experience levels in working groups Make unambiguous infrastructure recommendations Avoid deep learning library wrappers Avoid Jupyter Notebooks Make sure GPUs are available Educational techniques Guided troubleshooting to hone machine learning skills Pair pseudocoding to help participants write code Goal statements to help participants track progress Contextualized lectures to keep topics grounded in applications Conclusion 18 participants in the inaugural 2022 CV4E Workshop 11 participants from 7 different states in the US, 5 from European countries, 2 from Canada Participants from diverse academic backgrounds 7 main categories of projects Using annotation tools to label image or audio data Common Unix commands Tools like nano, tmux, and screen SSH command and SSH keys Tools like scp or rsync for transferring files Using GitHub for tracking changes made to code Interacting with web interfaces of cloud computing providers Creating and managing virtual environments Libraries and command-line tools like OpenCV, ImageMagick, and FFmpeg Classes and objects, inheritance, encapsulation, polymorphism Common data structures and their methods Mutable and immutable objects In-place operations Concept of generalization Errors vs....

Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Machine Learning-guided solutions have made progress in protein learning tasks. Data availability is a constraint for success in scientific discovery tasks. Deep learning models pretrained on millions of protein sequences have shown promise. Representation Learning via Dictionary Learning (R2DL) is an end-to-end representation learning framework. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences....

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introducing a language modeling approach for text to speech synthesis Training a neural codec language model using discrete codes Regard TTS as a conditional language modeling task Pre-training stage scales up TTS training data to 60K hours of English speech Vall-E can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording Vall-E outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity Vall-E preserves the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis Paper Content Introduction Speech synthesis has advanced through neural networks and end-to-end modeling in the last decade Current TTS systems use a pipeline with an acoustic model and a vocoder using mel spectrograms as intermediate representations High-quality clean data from recording studios is needed for advanced TTS systems Existing work leverages speaker adaptation and speaker encoding methods to tackle the zero-shot TTS problem Recent years have seen notable performance improvement for data increase in the text language model VALL-E is the first language model based TTS framework leveraging large, diverse, and multi-speaker speech data VALL-E generates acoustic tokens conditioned on the acoustic tokens of the 3-second enrolled recording and the phoneme prompt VALL-E is trained with LibriLight, a corpus consisting of 60K hours of English speech with over 7000 unique speakers VALL-E significantly outperforms the state-of-the-art zero-shot TTS system on LibriSpeech and VCTK VALL-E is able to provide diverse outputs with the same input text and keep the acoustic environment and speaker’s emotion of the acoustic prompt Related work Cascaded TTS systems use a pipeline with an acoustic model and a vocoder End-to-end TTS models jointly optimize the acoustic model and vocoder Zero-shot multi-speaker TTS techniques are used to customize a TTS system to an arbitrary voice Speaker adaptation and speaker encoding approaches are used Advanced speaker embedding models can be employed Advanced but complex speaker encoder can be designed Diffusion model based TTS is extended to zero-shot TTS Audio codec code is used as intermediate representations Self-supervised learning is used in speech understanding and speech-to-speech generation HuBERT codes, VQVAE codes, and a speaker encoder are combined Audio codecs are used to synthesize speech without training a vocoder µ-law transformation is used to quantize audio Vector quantization is used for feature extraction Neural codec models are used to encode waveform into discrete acoustic codes Vall-e 4....

DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Current self-supervised monocular depth estimation methods rely on estimating a rigid-body motion representing camera motion. These methods suffer from the scale ambiguity problem. DepthP+P is a method that learns to estimate outputs in metric scale. DepthP+P aligns two frames using a common ground plane to remove the effect of the rotation component....

Semantic match: Debugging feature attribution methods in XAI for healthcare

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract AI tools for healthcare have sparked debate around adoption of the technology. Explainable AI (XAI) is seen as a way to make AI devices more transparent and trustworthy. Some have expressed concerns about the reliability of XAI techniques, particularly feature attribution methods. Feature importance can be used reliably when low-level features come with a clear semantics, such as tabular data like Electronic Health Records (EHRs)....

Explain to Me: Towards Understanding Privacy Decisions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Privacy assistants help users manage their online privacy. Tasks include detecting privacy violations and recommending sharing actions. It is important for privacy assistants to explain their decisions to users. This paper develops a methodology to create explanations of privacy. The methodology is based on identifying topics, providing explanation schemes, and generating them automatically....

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract InPars introduced a method to use LLMs in information retrieval tasks InPars-v2 uses open-source LLMs and existing rerankers to generate synthetic query-document pairs BM25 retrieval pipeline and monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on BEIR benchmark Code, synthetic data, and finetuned models open sourced Paper Content Introduction and background Data augmentation is a tool to improve AI models when there is not enough in-domain training data Previous work used LLMs to generate synthetic training data for information retrieval models Bonifacio et al....