We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.Estimation of statistical associations in microbial genomic survey count data is fundamental to microbiome research. Experimental limitations, including count compositionality, low sample sizes and technical variability, obstruct standard application of association measures and require data normalization prior to statistical estimation. Here, we investigate the interplay between data normalization, microbial association estimation and available sample size by leveraging the large-scale American Gut Project (AGP) survey data. We analyze the statistical properties of two prominent linear association estimators, correlation and proportionality, under different sample scenarios and data normalization schemes, including RNA-seq analysis workflows and log-ratio transformations. We show that shrinkage estimation, a standard statistical regularization technique, can universally improve the quality of taxon-taxon association estimates for microbiome data. We find that large-scale association patterns in the AGP data can be grouped into five normalization-dependent classes. Using microbial association network construction and clustering as downstream data analysis examples, we show that variance-stabilizing and log-ratio approaches enable the most taxonomically and structurally coherent estimates. Taken together, the findings from our reproducible analysis workflow have important implications for microbiome studies in multiple stages of analysis, particularly when only small sample sizes are available.In eukaryotes, 5'-3' co-translation degradation machinery follows the last translating ribosome providing an in vivo footprint of its position. Thus, 5' monophosphorylated (5'P) degradome sequencing, in addition to informing about RNA decay, also provides information regarding ribosome dynamics. Multiple experimental methods have been developed to investigate the mRNA degradome; however, computational tools for their reproducible analysis are lacking. Here, we present fivepseq an easy-to-use application for analysis and interactive visualization of 5'P degradome data. This tool performs both metagene- and gene-specific analysis, and enables easy investigation of codon-specific ribosome pauses. To demonstrate its ability to provide new biological information, we investigate gene-specific ribosome pauses in Saccharomyces cerevisiae after eIF5A depletion. In addition to identifying pauses at expected codon motifs, we identify multiple genes with strain-specific degradation frameshifts. To show its wide applicability, we investigate 5'P degradome from Arabidopsis thaliana and discover both motif-specific ribosome protection associated with particular developmental stages and generally increased ribosome protection at termination level associated with age. Our work shows how the use of improved analysis tools for the study of 5'P degradome can significantly increase the biological information that can be derived from such datasets and facilitate its reproducible analysis.Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.Pancreatic islet β-cell failure is key to the onset and progression of type 2 diabetes (T2D). The advent of single-cell RNA sequencing (scRNA-seq) has opened the possibility to determine transcriptional signatures specifically relevant for T2D at the β-cell level. Yet, applications of this technique have been underwhelming, as three independent studies failed to show shared differentially expressed genes in T2D β-cells. We performed an integrative analysis of the available datasets from these studies to overcome confounding sources of variability and better highlight common T2D β-cell transcriptomic signatures. After removing low-quality transcriptomes, we retained 3046 single cells expressing 27 931 genes. Cells were integrated to attenuate dataset-specific biases, and clustered into cell type groups. In T2D β-cells (n = 801), we found 210 upregulated and 16 downregulated genes, identifying key pathways for T2D pathogenesis, including defective insulin secretion, SREBP signaling and oxidative stress. We also compared these results with previous data of human T2D β-cells from laser capture microdissection and diabetic rat islets, revealing shared β-cell genes. https://www.selleckchem.com/products/AZD2281(Olaparib).html Overall, the present study encourages the pursuit of single β-cell RNA-seq analysis, preventing presently identified sources of variability, to identify transcriptomic changes associated with human T2D and underscores specific traits of dysfunctional β-cells across different models and techniques.DNA methylation is a stable epigenetic modification, extremely polymorphic and driven by stochastic and deterministic events. Most of the current techniques used to analyse methylated sequences identify methylated cytosines (mCpGs) at a single-nucleotide level and compute the average methylation of CpGs in the population of molecules. Stable epialleles, i.e. CpG strings with the same DNA sequence containing a discrete linear succession of phased methylated/non-methylated CpGs in the same DNA molecule, cannot be identified due to the heterogeneity of the 5'-3' ends of the molecules. Moreover, these are diluted by random unstable methylated CpGs and escape detection. We present here MethCoresProfiler, an R-based tool that provides a simple method to extract and identify combinations of methylated phased CpGs shared by all components of epiallele families in complex DNA populations. The methylated cores are stable over time, evolve by acquiring or losing new methyl sites and, ultimately, display high information content and low stochasticity.