Staff Scientist, National Institutes of Health
Polje Istraživanja: Oncology
Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g.,"cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between consecutive cores are O(c^i). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs>10x faster than vg, while using>13x less memory.
Tumor evolution is driven by various mutational processes, ranging from single-nucleotide vari- ants (SNVs) to large structural variants (SVs) to dynamic shifts in DNA methylation. Current short-read sequencing methods struggle to accurately capture the full spectrum of these genomic and epigenomic alter- ations due to inherent technical limitations. To overcome that, here we introduce an approach for long-read sequencing of single-cell derived subclones, and use it to profile 23 subclones of a mouse melanoma cell line, characterized with distinct growth phenotypes and treatment responses. We develop a computational frame- work for harmonization and joint analysis of different variant types in the evolutionary context. Uniquely, our framework enables detection of recurrent amplifications of putative driver genes, generated by indepen- dent SVs across different lineages, suggesting parallel evolution. In addition, our approach revealed gradual and lineage-specific methylation changes associated with aggressive clonal phenotypes. We also show our set of phylogeny-constrained variant calls along with openly released sequencing data can be a valuable resource for the development of new computational methods.
Most human cancers arise from somatic alterations, ranging from single nucleotide variations to structural variations (SVs) that can alter the genomic organization. Pathogenic SVs are identified in various cancer types and subtypes, and they play a crucial role in diagnosis and patient stratification. However, the studies on structural variations have been limited due to biological and computational challenges, including tumor heterogeneity, aneuploidy, and the diverse spectrum of SVs from simpler deletions and focal amplifications to catastrophic events shuffling large fragments from one or multiple chromosomes. Long-read sequencing provides the advantage of improved mappability and direct haplotype phasing. Yet, no tool currently exists to comprehensively analyze complex rearrangements within the cancer genome using long-read sequencing. Here, we present Severus, a tool for somatic SV calling and complex SV characterization using long reads. Severus first detects individual SV junctions from phased split alignments, then constructs a phased breakpoint graph to cluster junctions into complex rearrangement events. We first benchmarked the somatic SV calling performance using six tumor/normal cell line pairs (HCC1395, H1437, H2009, HCC1937, HCC1954, Hs578T). We sequenced all cell lines with Illumina, ONT, and PacBio HiFi. We then established a set of high-confidence calls supported by multiple technologies and tools. Severus consistently had the highest F1 scores compared to the HiFi, ONT, and Illumina methods against this high-confidence SV call set. We then extend our analysis to complex SVs. Severus accurately detected complex events, i.e., chromothripsis and chromoplexy, and templated insertion cycles/chains (TIC), reported for these cell lines. We then compared Severus’ performance with Jabba and Linx, two widely used tools for complex SV calling in short-read sequencing. Our comparison revealed that Severus showed higher agreement with Linx, while Jabba failed to detect most of the SV clusters identified by both Severus and Linx. Severus also outperformed the other tools in characterizing complex reciprocal translocations and TICs. Most of the junctions in complex SVs called by either of the tools but not Severus were either simple SVs with a single long-read junction or were not present in long-read sequencing. In contrast, Severus effectively resolved overlapping SVs by utilizing long-read connectivity, allowing for more accurate clustering of smaller genomic segments. We have also applied Severus to seventeen pediatric leukemia cases. Severus identified two chromoplexy and two cryptic translocations, which were missed by FISH and karyotype panels and were incomplete in Illumina SV calls, further validated by RNA-seq. This highlights the potential of the long-read whole genome sequencing approach for diagnosing complex cases driven by SVs. Ayse Keskus, Asher Bryant, Tanveer Ahmad, Anton Goretsky, Byunggil Yoo, Sergey Aganezov, Ataberk Donmez, Lisa A. Lansdon, Isabel Rodriguez, Jimin Park, Yuelin Liu, Xiwen Cui, Joshua Gardner, Brandy McNulty, Samuel Sacco, Jyoti Shetty, Yongmei Zhao, Bao Tran, Giuseppe Narzisi, Adrienne Helland, Daniel Cook, Pi-Chuan Chang, Alexey Kolesnikov, Andrew Carroll, Erin Molloy, Chengpeng Bi, Adam Walter, Margaret Gibson, Irina Pushel, Erin Guest, Tomi Pastinen, Kishwar Shafin, Karen Miga, Salem Malikic, Chi-Ping Day, Nicolas Robine, Cenk Sahinalp, Michael Dean, Midhat S. Farooqi, Benedict Paten, Mikhail Kolmogorov. Severus: A tool for detecting and characterizing complex structural variants in cancer using long-read sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2848.
Melanoma, a highly heterogeneous cancer, evolves through a complex interplay of genetic alterations, including both single nucleotide variants (SNVs) and structural variants (SVs). To study the evolutionary trajectory of melanoma, we established a model system composed of 24 single-cell-derived clonal sublines (C1-C24) from the M4 melanoma model, developed in a genetically engineered hepatocyte growth factor (HGF)-transgenic mouse. While SNVs have been extensively used to construct phylogenetic trees using Trisicell (Triple-toolkit for single-cell intratumor heterogeneity inference), a tool that analyzes intratumor heterogeneity and single-cell RNA mutations, the role and timing of SVs in melanoma evolution remain less well understood. This study integrates SV data with an SNV-driven phylogeny to investigate whether SV patterns align with SNV-based evolutionary trajectories in the mouse melanoma model, providing insights into the functional impact of SVs during tumor progression. We performed long-read sequencing on the 24 clonal sublines and detected SVs using Severus, a tool optimized for phasing in long-read sequencing. The SVs were mapped to the SNV-driven phylogeny using R and classified as either concordant (aligning with the SNV-based tree) or discordant (deviating from the SNV phylogeny). Gene ontology enrichment analysis revealed that concordant SVs were significantly enriched in genes associated with the hepatocyte growth factor receptor signaling pathway and the negative regulation of peptidyl-threonine phosphorylation, both of which represent core drivers of tumor progression. In contrast, discordant SVs were associated with a broader range of functional pathways, including the positive regulation of antigen receptor-mediated signaling and the regulation of natural killer cell-mediated cytotoxicity, though the exact mechanisms underlying these associations remain unclear. By integrating these SVs with an established SNV-driven phylogeny, this study highlights the distinct and critical roles SVs play in melanoma evolution. Concordant SVs appear to drive core oncogenic processes, while discordant SVs may contribute to other aspects of tumor evolution. These findings underscore the importance of considering SVs alongside SNVs to fully capture the complexity of melanoma evolution. Ongoing investigations will continue to explore the functional implications of these SVs and how the gene disruption patterns they cause shape the evolutionary trajectory of melanoma, offering potential targets for future therapeutic strategies. Xiwen Cui, Ayse G. Keskus, Salem Malikic, Yuelin Liu, Anton Goretsky, Chi-Ping Day, Farid R. Mehrabadi, Mikhail Kolmogorov, Glenn Merlino, S. Cenk Sahinalp. Integrating structural variants and single nucleotide variants to uncover evolutionary trajectories in melanoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3898.
Melanoma is the most serious form of skin cancer, developed by the malignant evolution of melanocytes. Malignant melanoma incidence is increasing faster than most other cancers. While stage zero melanoma is highly treatable, survivability dramatically decreases in its advanced stages. Melanoma has shown to be one of the most heterogeneous cancers from RNA and exome analyses by The Cancer Genome Atlas and other groups. A better understanding of the key genomic and epigenomic events that characterize the diverse subclonal populations in melanoma may reveal key insights into what drives its progression and therapeutic resistance. In this study, we leveraged Nanopore long-read sequencing to study the evolution of the mouse B2905 melanoma cell line. Twenty-four distinct clonal sublines were derived in vitro from single cells of the cell line, and the genetically homogeneous population from each subline was sequenced using PromethION R10 flow cells. Enabled by long reads to perform haplotype phasing and accurate structural variation detection, our goal is to integrate small and structural variants to better our understanding of melanoma evolution, and build upon prior analyses of short-read sequenced sublines. We employed multiple SNV calling approaches, including DeepVariant and Clair, in order to provide highly accurate variants for phylogeny reconstruction using Trisicell. We performed structural variant calling with our cancer somatic structural variant (SV) caller Severus as well as copy-number alteration (CNA) analysis with our method Wakhan. Lastly, we placed SNVs, SVs, and CNAs on our reconstructed phylogeny to examine the progression of different types of variants during subline evolution. We identified approximately 560k unique SNVs and around 2, 400 unique SVs. The majority of SNVs (19%) are either clonal or private (73%); however, a meaningful fraction of subclonal variants were available for phylogenetic tree reconstruction. SVs are distributed across the phylogenetic tree branches similarly to SNVs. We identified loss of heterozygosity (LOH) events throughout the subline evolution as well as subclonal CNAs resulting from chromosomal translocations. We find clonal and subclonal evidence of densely clustered SNVs and SV, resembling kataegis; however, our analysis of mutational signatures did not reveal APOBEC-mediated mutations. By analyzing mutational signatures within individual branches of the phylogenetic tree, we observed relative timing of different mutational processes, such as early clonal signatures of UV damage. By incorporating structural variations, copy number changes, and small variant data in the phylogenetic reconstruction, our analysis offers a better characterization of the genetic landscape of subclonal evolution in melanoma. Anton Goretsky, Yuelin Liu, Ayse Keskus, Tanveer Ahmad, Salem Malikic, Glenn Merlino, Chi-Ping Day, Erin Molloy, S. Cenk Sahinalp, Mikhail Kolmogorov. Nanopore sequencing of single-cell derived sublines provides insights into melanoma heterogeneity and evolution [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 7497.
Cancer progression is an evolutionary process driven by the selection of cells adapted to gain growth advantage. We present the first formal study on the adaptation of gene expression in subclonal evolution. We model evolutionary changes in gene expression as stochastic Ornstein–Uhlenbeck processes, jointly leveraging the evolutionary history of subclones and single-cell expression data. Applying our model to sublines derived from single cells of a mouse melanoma revealed that sublines with distinct phenotypes are underlined by different patterns of gene expression adaptation, indicating non-genetic mechanisms of cancer evolution. Interestingly, sublines previously observed to be resistant to anti-CTLA-4 treatment showed adaptive expression of genes related to invasion and non-canonical Wnt signaling, whereas sublines that responded to treatment showed adaptive expression of genes related to proliferation and canonical Wnt signaling. Our results suggest that clonal phenotypes emerge as the result of specific adaptivity patterns of gene expression.
Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo
Saznaj više