BH akademski imenik

Geny: a genotyping tool for allelic decomposition of killer cell immunoglobulin-like receptor genes

23. 12. 2024.

0

Qinghui Zhou, Mazyar Ghezelji, Ananth Hari, M. K. Ford, Connor Holley, S. C. Sahinalp, Ibrahim Numanagić

Introduction Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the generic genotyping workflows are unable to accurately infer copy numbers and complete genotypes of individual KIR genes from next-generation sequencing data. Thus, specialized genotyping tools are needed to genotype this complex region. Methods Here, we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR allele databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation, and estimate the correct allele of each copy of genes within the KIR region. Results & Discussion We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 40 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing state-of-the-art KIR genotyping tools in terms of accuracy, precision, and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine.

Preuzmi PDF

Vidi više

Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool.

21. 10. 2024.

1

M. K. Ford, Ananth Hari, Qinghui Zhou, Ibrahim Numanagić, S. Cenk Sahinalp

Bioinformatics

SUMMARY Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community. AVAILABILITY AND IMPLEMENTATION BAKIR is available at github.com/algo-cancer/bakir. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Preuzmi PDF

Vidi više

Diagnostics of viral infections using high-throughput genome sequencing data

23. 9. 2024.

5

Haochen Ning, Ian Boyes, Ibrahim Numanagić, Michael Rott, Li Xing, Xuekui Zhang

Briefings Bioinform.

Abstract Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI’s superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.

Preuzmi PDF

Vidi više

Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool

22. 8. 2024.

1

M. K. Ford, Ananth Hari, Qinghui Zhou, Ibrahim Numanagić, S. C. Sahinalp

bioRxiv

Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.

Preuzmi PDF

Vidi više

Computational pharmacogenotype extraction from clinical next-generation sequencing

4. 7. 2023.

9

Tyler Shugg, Reynold C. Ly, Wilberforce Osei, Elizabeth J. Rowe, Caitlin A. Granfield, T. Lynnes, Elizabeth B. Medeiros, Jennelle C Hodge, Amy M. Breman et al.

Frontiers in Oncology

Background Next-generation sequencing (NGS), including whole genome sequencing (WGS) and whole exome sequencing (WES), is increasingly being used for clinic care. While NGS data have the potential to be repurposed to support clinical pharmacogenomics (PGx), current computational approaches have not been widely validated using clinical data. In this study, we assessed the accuracy of the Aldy computational method to extract PGx genotypes from WGS and WES data for 14 and 13 major pharmacogenes, respectively. Methods Germline DNA was isolated from whole blood samples collected for 264 patients seen at our institutional molecular solid tumor board. DNA was used for panel-based genotyping within our institutional Clinical Laboratory Improvement Amendments- (CLIA-) certified PGx laboratory. DNA was also sent to other CLIA-certified commercial laboratories for clinical WGS or WES. Aldy v3.3 and v4.4 were used to extract PGx genotypes from these NGS data, and results were compared to the panel-based genotyping reference standard that contained 45 star allele-defining variants within CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, G6PD, NUDT15, SLCO1B1, TPMT, and VKORC1. Results Mean WGS read depth was >30x for all variant regions except for G6PD (average read depth was 29 reads), and mean WES read depth was >30x for all variant regions. For 94 patients with WGS, Aldy v3.3 diplotype calls were concordant with those from the genotyping reference standard in 99.5% of cases when excluding diplotypes with additional major star alleles not tested by targeted genotyping, ambiguous phasing, and CYP2D6 hybrid alleles. Aldy v3.3 identified 15 additional clinically actionable star alleles not covered by genotyping within CYP2B6, CYP2C19, DPYD, SLCO1B1, and NUDT15. Within the WGS cohort, Aldy v4.4 diplotype calls were concordant with those from genotyping in 99.7% of cases. When excluding patients with CYP2D6 copy number variation, all Aldy v4.4 diplotype calls except for one CYP3A4 diplotype call were concordant with genotyping for 161 patients in the WES cohort. Conclusion Aldy v3.3 and v4.4 called diplotypes for major pharmacogenes from clinical WES and WGS data with >99% accuracy. These findings support the use of Aldy to repurpose clinical NGS data to inform clinical PGx.

Preuzmi PDF

Vidi više

Codon: A Compiler for High-Performance Pythonic Applications and DSLs

17. 2. 2023.

25

Ariya Shajii, Gabriel Ramirez, Haris Smajlović, Jessica Ray, Bonnie Berger, Saman P. Amarasinghe, Ibrahim Numanagić

International Conference on Compiler Construction

Domain-specific languages (DSLs) are able to provide intuitive high-level abstractions that are easy to work with while attaining better performance than general-purpose languages. Yet, implementing new DSLs is a burdensome task. As a result, new DSLs are usually embedded in general-purpose languages. While low-level languages like C or C++ often provide better performance as a host than high-level languages like Python, high-level languages are becoming more prevalent in many domains due to their ease and flexibility. Here, we present Codon, a domain-extensible compiler and DSL framework for high-performance DSLs with Python's syntax and semantics. Codon builds on previous work on ahead-of-time type checking and compilation of Python programs and leverages a novel intermediate representation to easily incorporate domain-specific optimizations and analyses. We showcase and evaluate several compiler extensions and DSLs for Codon targeting various domains, including bioinformatics, secure multi-party computation, block-based data compression and parallel programming, showing that Codon DSLs can provide benefits of familiar high-level languages and achieve performance typically only seen with low-level languages, thus bridging the gap between performance and usability.

Preuzmi PDF

Vidi više

Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing

11. 1. 2023.

21

Haris Smajlović, Ariya Shajii, Bonnie Berger, Hyunghoon Cho, Ibrahim Numanagić

Genome Biology

Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3–4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.

Preuzmi PDF

Vidi više

An efficient genotyper and star-allele caller for pharmacogenomics

1. 1. 2023.

17

Ananth Hari, Qinghui Zhou, Nina Gonzaludo, J. Harting, S. Scott, X. Qin, S. Scherer, S. C. Sahinalp, Ibrahim Numanagić

Genome Research

High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation identification, variant calling, and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that uses combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and uses a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10x Genomics, and Pacific Biosciences (PacBio) HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts, and hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies.

Preuzmi PDF

Vidi više

Design and performance of a long-read sequencing panel for pharmacogenomics

26. 10. 2022.

0

M. van der Lee, Loes Busscher, R. Menafra, Qinglian Zhai, Redmar R. van den Berg, S. Kingan, Nina Gonzaludo, T. Hon, Ting Han et al.

bioRxiv

Pharmacogenomics (PGx)-guided drug treatment is one of the cornerstones of personalized medicine. However, the genes involved in drug response are highly complex and known to carry many (rare) variants. Current technologies (short-read sequencing and SNP panels) are limited in their ability to resolve these genes and characterize all variants. Moreover, these technologies cannot always phase variants to their allele of origin. Recent advance in long-read sequencing technologies have shown promise in resolving these problems. Here we present a long-read sequencing panel-based approach for PGx using PacBio HiFi sequencing. A capture based approach was developed using a custom panel of clinically-relevant pharmacogenes including up- and downstream regions. A total of 27 samples were sequenced and panel accuracy was determined using benchmarking variant calls for 3 Genome in a Bottle samples and GeT-RM star(*)-allele calls for 21 samples.. The coverage was uniform for all samples with an average of 94% of bases covered at >30×. When compared to benchmarking results, accuracy was high with an average F1 score of 0.89 for INDELs and 0.98 for SNPs. Phasing was good with an average of 68% the target region phased (compared to ~20% for short-reads) and an average phased haploblock size of 6.6kbp. Using Aldy 4, we compared our variant calls to GeT-RM data for 8 genes (CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP3A4, CYP3A5, SLCO1B1, TPMT), and observed highly accurate star(*)-allele calling with 98.2% concordance (165/168 calls), with only one discordance in CYP2C9 leading to a different predicted phenotype. We have shown that our long-read panel-based approach results in high accuracy and target phasing for SNVs as well as for clinical star(*)-alleles.

Preuzmi PDF

Vidi više

Aldy 4: An efficient genotyper and star-allele caller for pharmacogenomics

15. 8. 2022.

0

Ananth Hari, Qinghui Zhou, Nina Gonzaludo, J. Harting, Stuart A. Scott, S. C. Sahinalp, Ibrahim Numanagić

bioRxiv

High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation discovery, variant calling and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that utilizes combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and ships with a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10X Genomics and PacBio HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts. We hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies. Availability Aldy 4 is available at https://github.com/0xTCG/aldy.

Vidi više

Ibrahim Numanagić

Institucija

Pretplatite se na novosti o BH Akademskom Imeniku