arxiv-summary: AI-summarized AI papers

Unlimited-Size Diffusion Restoration

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Diffusion models are used for zero-shot image restoration Diffusion models are pre-trained and do not require finetuning Current methods only discuss how to deal with fixed-size images This paper focuses on how to use diffusion-based zero-shot IR methods to deal with any size Mask-Shift Restoration and Hierarchical Restoration are proposed to address local incoherence and out-of-domain issues Code is available on GitHub Paper Content Introduction Recent progress in diffusion models has improved Image Restoration tasks Diffusion-based IR methods can be divided into supervised and zero-shot Zero-shot methods only need pre-trained off-the-shelf diffusion model Difficulties in applying zero-shot IR methods to arbitrary output size Proposed Mask-Shift Restoration (MSR) to solve boundary artifacts Proposed Hierarchical Restoration (HiR) to address lack of global semantics MSR and HiR are parameter-free and training-free Preliminaries Diffusion models Diffusion models have diverse interpretations....

Collage Diffusion

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-conditional diffusion models generate high-quality, diverse images. Users can control image generation by defining a collage. Collage Diffusion modifies text-image cross-attention with the layers’ alpha masks. Collage Diffusion learns specialized text representations per layer. Layer-based controls provide fine-grained control over the final output. Collage Diffusion generates globally harmonized images. Paper Content Introduction Diffusion-based text-conditional image generation can generate plausible images from a text prompt Recent work provides new ways to specify desired output, such as sketching, segmentation masks, and reference images This paper seeks to give users precise control over image output when creating scenes with a specific desired spatial arrangement Users can make a collage of images to express artistic intent Collage Diffusion generates novel, high-quality images that respect the scene composition and object appearance Collage input enables per-layer control mechanisms to control the harmonization-fidelity tradeoff Problem definition and goals Goal is to generate high-quality images that respect desired scene composition User describes desired intent with a collage Collage consists of a text string and a sequence of layers Output image should be globally harmonized and have appearance fidelity Diffusion-based techniques used to constrain spatial layout and appearance of objects Related work Traditional graphics techniques can be used to flatten collage layers into a single image Diffusion-based image harmonization techniques can be used to improve the visual quality of the image Adding noise to the image can lead to a loss of spatial and appearance fidelity Collage Diffusion seeks to better maintain the spatial and appearance fidelity of the initial collage Existing techniques define spatial layouts in terms of segmentation maps Collage Diffusion uses a collage as an intuitive way to specify spatial composition Collage Diffusion aims to preserve visual characteristics of the input layers Collage Diffusion can be framed as a constrained form of image stylization Image-to-image approaches struggle to constrain scene composition for collage-conditional generation Layered image and video editing is a well-established technique in traditional computer graphics Collage diffusion Text-conditioned diffusion models can be used to perform image harmonization....

Almanac: Knowledge-Grounded Language Models for Clinical Medicine

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large-language models have been used in natural language tasks. Adoption of these models in real-world settings has been limited due to incorrect and toxic statements. This paper explores the ability of large-language models to streamline medical guidelines and recommendation referencing. Improved factual grounding, helpfulness, and safety is demonstrated in clinical scenarios. Paper Content Introduction Language model pre-training is a powerful training paradigm in NLP Performance improvements have been observed to scale with model and dataset size LLMs can be prone to generating factually invalid statements LLMs can reproduce social biases and generate statements reinforcing stereotypes Different ways of steering LLM outputs to align with user-intent have been explored LLMs have been used in transformative applications LLMs are at risk of adversarial attacks Almanac is a framework to explore the role of LLMs in the clinical workflow Related work Pre-training transformers on scientific and biomedical corpora has improved performance on biomedical tasks Smaller domain-specific language models can be beneficial even with limited data Large language models are prone to hallucinations and biases Leveraging language models for language understanding and modeling capabilities can improve accuracy of question answering External tools can be used to retrieve knowledge and improve clinically useful tasks Results Almanac outperforms its counterpart in factuality and has evenly matched performances in completeness and safety ChatGPT struggles to provide references with 1 correct, 15 invalid, and 4 incorrect references Almanac is able to correctly reference various sources for fact-checking Dataset Task of medical question answering is evaluated Existing datasets are not sufficient for capturing clinical scenarios ClinicalQA is a novel benchmark of clinical questions Summary statistics and samples are provided in tables ClinicalQA can serve as a benchmark for LM-based clinical decision-making support systems Future work will expand dataset to include more varied examples and multi-modal inputs Architecture Almanac consists of many components working together to achieve accurate document retrieval, reasoning, and question-answering....

Methods and measures for investigating microscale motility

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Motility is important for survival and diversification Novel technologies, analytical frameworks and theoretical methods can help us understand microscale motility Overview of experimental, analytical and mathematical methods used to study microscale motility Identify transferable techniques, challenges and future directions in the field Paper Content Introduction Motility is important for organisms to find resources, evade predators, and locate suitable habitats....

EvoPrompting: Language Models for Code-Level Neural Architecture Search

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models used as mutation and crossover operators for evolutionary neural architecture search algorithm Combination of evolutionary prompt engineering and soft prompt-tuning (EvoPrompting) produces diverse and high performing models EvoPrompting successful at designing accurate and efficient neural network architectures across a variety of machine learning tasks Paper Content Introduction Scaling of Transformers has produced language models with impressive performance Language models can learn how to code, do math, and solve reasoning problems Limitations of language models in solving complex problems and creating novel solutions EVOPROMPTING improves ability to propose novel and diverse solutions to complex reasoning problems EVOPROMPTING uses evolutionary search to create and curate data to improve LM in-context prompting examples Few-shot prompting with EVOPROMPTING enables LMs to create architectures that outperform those designed by human experts EVOPROMPTING discovers novel graph neural network architectures that outperform current state-of-the-art Related work Transformer models are popular for natural language systems Transformer models can be used to write code, do math, and solve reasoning problems Brown et al....

Monocular Depth Estimation using Diffusion Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Monocular depth estimation is formulated using denoising diffusion models. Innovations are introduced to address problems arising from noisy, incomplete depth maps. Pre-training is leveraged to cope with limited availability of data. DepthGen model achieves SOTA performance on the indoor NYU dataset and near SOTA results on the outdoor KITTI dataset. DepthGen naturally represents depth ambiguity and has zero-shot performance combined with depth imputation....

Membership Inference Attack for Beluga Whales Discrimination

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Main challenge in animal ecology is to re-identify and discriminate between known and unknown individuals Novel approach proposed to tackle this challenge is based on Membership Inference Attacks (MIAs) Experiments conducted on three benchmark datasets, two neural network architectures and three MIAs Ensemble MIA strategy designed to increase attack accuracy and reduce false positive rate Research on privacy attacks can be used to address practical challenges in animal ecology Paper Content Introduction Animal ecology research requires re-identifying individual animals Re-ID is difficult and often requires extensive training and experience Tagging and photo-ID are used to re-ID animals Tagging is intrusive and expensive, photo-ID is non-invasive and cheaper Photo-ID has practical and methodological challenges Computer vision techniques are used to standardize and automate the process Feature engineering is used to select or transform raw data into informative features Feature engineering requires programming experience and familiarity with the species Deep learning systems use large data volumes to automatically learn discriminative features Convolutional Neural Networks (CNNs) have achieved state-of-the-art results CNNs lack robustness when deployed in real-world applications Membership Inference Attacks (MIA) can be used to discriminate between known and unknown individuals This paper proposes a novel approach for whales discrimination through images Experiments have been conducted with three state-of-the-art MIAs Ensemble MIAs combines the outputs of different MIAs to increase accuracy and decrease false positive rate Related work Review of related work on reidentification and discrimination of marine mammals Background on MIAs Automated photo identification of marine mammals Research on individual identification of cetaceans using natural markings began in the 1970s Notches in the dorsal fin or fluke are used for identification Image metadata (annotations describing the animal characteristics) can be used to reduce the number of possible matches Machine learning techniques (neural networks, Bayesian classifiers, decision trees and k-nearest neighbors) were applied on 869 pictures of 223 Commerson’s dolphins Decision tree classifier correctly identified 90% of the individuals SPIR system developed to study presence of Risso’s dolphins Preprocessing of input image to extract dorsal fin segmentation Feature extraction using Speeded Up Robust Feature (SURF) and Scale-Invariant Feature Transform (SIFT) NNPool proposed to automatically discriminate unknown vs....

Learning Hidden Markov Models Using Conditional Samples

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract This paper examines the computational complexity of learning a Hidden Markov Model (HMM). It proposes an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs. This model enables computationally efficient learning algorithms, bypassing cryptographic hardness. Algorithms are presented for two settings: one with query access to exact conditional probabilities, and one with samples from the conditional distributions....

Is Japanese CCGBank empirically correct? A case study of passive and causative constructions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Japanese CCGBank is used to develop Japanese CCG parsers Linguistic validity of Japanese CCGBank needs to be verified This paper focuses on analysis of passive/causative constructions in Japanese CCGBank Together with ccg2lambda, Japanese CCGBank yields wrong predictions for nested passives and causatives Paper Content Introduction Process of generating wide-coverage syntactic parsers from treebanks established in 1990s Belief at the time that formal syntactic theories too inflexible to describe real texts Theoretical development of formal grammars and emergence of linguistically-oriented treebanks dispelled misconception Combinatory Categorial Grammar (CCG) and CCGbank gave rise to CCG parsers Research on Japanese syntax and parsers impacted by CCG Japanese CCGbank generated from Kyoto Corpus by automatic conversion Syntactic structures of CCG have more elaborated information than CFG CCGBank serves as training and evaluation data for CCG parsers Research from perspective of formal syntax conducted regarding adequacy of syntactic structures in treebanks This paper assesses syntactic structures exhibited by Japanese CCGbank from viewpoint of theoretical linguistics Passive and causative constructions in japanese Japanese passives and causatives are described in a standard Japanese CCG Ga-marked noun phrases in passive sentences correspond to nimarked or omarked noun phrases in active sentences Syntactic structure of left-side sentences of (1) and (2) are shown in Figure 1 Semantic representations of words are defined using event semantics Passive and causative suffixes know the argument structure of its first argument N P ga corresponds to N P o or N P ni in passive constructions and N P ni|o corresponds to N P ga in causative constructions Validity of analysis can be verified by inference data on various constructions including passives and causatives Ccg2lambda and the s\s analysis Analysis of Japanese CCGBank relies on two CCG parsers Lexical assignments for left-side sentences of (1) and (2) are shown in Figure 2 Semantic representation of two-place predicate homera is given Relations between Agent and Theme are relativized by higher-order variables Semantic representation of right-side of (1) is a standard neo-Davidsonian representation Semantic representation of hasira-se is obtained using a semantic template Semantic representation of hasira-sera-re is obtained by applying (13) to (17) Error occurs because passive suffix assumes first argument is given Theme and second argument is given Agent Conclusion Syntactic analysis of Japanese CCGBank produces false predictions for passive and causative nesting Standard analysis correctly explains all inferences Burden of proof is on CCGBank side Need for outreach to linguistic community to keep treebanks and parsers sound

In-Context Instruction Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs enable zero-shot task generalization Instruction learning has been approached as a fine-tuning problem In-Context Instruction Learning (ICIL) improves zero-shot task generalization ICIL uses a single fixed prompt to evaluate all tasks Paper Content Introduction LLMs can adapt to target tasks during inference LLMs have emergent capabilities, including the ability to generalize to unseen tasks by following instructions Instruction learning methods have been proposed to improve this ability In-Context Instruction Learning (ICIL) involves learning to follow instructions during inference ICIL uses a prompt that consists of multiple cross-task demonstrations ICIL is a zero-shot learning method ICIL significantly enhances the zero-shot task generalization performance of various pretrained LLMs ICIL improves the zero-shot instruction-following ability of LLMs LLMs learn the correspondence between the answer choice included in the instruction and output of each demonstration during inference In-context instruction learning ICIL consists of cross-task demonstrations Demonstrations are a concatenation of instruction, input, and output instance Fixed demonstration set is constructed to evaluate various tasks in a zero-shot manner Advantages of applying ICIL during inference of LLMs mentioned Demonstration set construction Filter tasks using heuristics Sample K tasks from N tasks Heuristics include task type, answer choice overlap, demonstration length, and demonstration ordering In-context instruction learning during inference ICIL uses a single fixed prompt to adapt to different tasks ICIL improves zero-shot task generalization performance for various LLMs ICIL also assists LLMs for zero-shot generalization after instruction tuning or RLHF Model-generated demonstration set is effective for ICIL Experiments Experimental setup Constructed demonstrations for ICIL from English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark Used held-out tasks from SUPERNI for testing, consisting of 119 tasks across 12 different categories Selected SUPERNI as evaluation benchmark because it offers diverse set of tasks with varying levels of complexity Evaluated 4 LLMs with various model sizes, including GPT-3, OPT, GPT-NeoX, and GPT-J Results Pretrained LLMs benefit from ICIL ICIL increases performance of pretrained LLMs by over 50% ICIL outperforms LLMs with much larger parameters ICIL gain is comparable to instruction tuning ICIL improves performance of LLMs fine-tuned through instruction tuning or RLHF Irrelevant ICIL does not harm performance much Analysis ICIL significantly improves the zero-shot task generalization performance of both pretrained and instruction-fine-tuned LLMs Constructing the demonstration set with classification tasks is important for ICIL LLMs learn the correspondence between answer choice in the instruction and the label of the demonstrations during ICIL ICIL reinforces the correspondence between the instruction and the label of the demonstrations during inference ICIL does not require any backpropagation and uses the pretrained model checkpoint without any gradient update Increasing the number of demonstrations improves the performance Ordering the demonstrations by the number of answer choices reduces the variance Answer choice overlap between demonstrations harms the performance ICIL is effective for machine-generated demonstrations Performance of ICIL is comparable to adaptive in-context learning methods There is still a large gap between ICIL and few-shot in-context learning