Proceedings of International Conference on Applied Innovation in IT
2023/11/30, Volume 11, Issue 2, pp.59-65
Scrutinised and Compared: HVG Identification Methods in Terms of Common Metrics
Nadiia Kasianchuk, Yevhenii Kukuruza, Vladyslav Ostash, Anastasiia Boshtova, Dmytro Tsvyk and Matvii Mykhailichenko Abstract: Highly variable gene (HVG) identification plays a critical role in unravelling gene expression patterns and understanding cellular heterogeneity in single-cell RNA-sequencing (scRNA-seq) data. A plethora of software packages have been developed for this purpose; however, their comparative performance is yet to be explored. This study addresses this gap by independently evaluating 22 methods from 9 different packages to provide a comprehensive assessment of the HVG identification methods. For such purpose it was deemed necessary to employ a set of common metrics, namely overlap with highly and lowly expressed genes, runtime, and clustering indices (e.g., Calinski-Harabasz, Davies-Bouldin, and ROGUE). The results reveal substantial disparities not only between different methods but also in the performance of a single method across diverse datasets. That is to say, the dimensionality of the provided data, spike-ins, and background noise are some of the key factors influencing the results. These variations underscore the significant impact of dataset characteristics on analysis outcomes. Therefore, consistent consideration of data nature is imperative. The study emphasises the urgent need for a standardised, data-driven assessment framework to ensure reliable and effective scRNA-seq analyses. This work serves as a valuable resource for both scRNA-seq software developers and experimental researchers seeking optimal methods for their investigations.
Keywords: Highly Variable Genes, Single-Cell RNA-Sequencing, Differential Expression, Heterogeneity Analysis, Cellular Heterogeneity, Cellular Diversity.
DOI: 10.25673/112994
Download: PDF
References:
- S. Fedushko, M. Gregus, and T. Ustyianovych, ‘Medical card data imputation and patient psychological and behavioral profile construction’, Procedia Comput Sci, vol. 160, pp. 354-361, 2019, doi: 10.1016/j.procs.2019.11.080.
- M. Marczyk et al., ‘Treatment Efficacy Score-continuous residual cancer burden-based metric to compare neoadjuvant chemotherapy efficacy between randomized trial arms in breast cancer trials’, Annals of Oncology, vol. 33, no. 8, pp. 814-823, Aug. 2022, doi: 10.1016/j.annonc.2022.04.072.
- P. Rzymski, N. Kasianchuk, D. Sikora, and B. Poniedziałek, ‘COVID‐19 vaccinations and rates of infections, hospitalizations, ICU admissions, and deaths in Europe during SARS‐CoV‐2 Omicron wave in the first quarter of 2022’, J Med Virol, vol. 95, no. 1, Jan. 2023, doi: 10.1002/jmv.28131.
- N. Kasianchuk, D. Tsvyk, E. Siemens, and H. Falfushynska, ‘Random Forest Algorithm in Unravelling Biomarkers of Breast Cancer Progression’. Proceedings of the International Conference on Applied Innovations in IT (ICAIIT), vol. 11, no. 1, pp. 133-141, Mar. 2023, doi: 10.25673/101930.
- N. Kasianchuk, D. Tsvyk, E. Siemens, V. Ostash, and H. Falfushynska, “Genomic data machined: The random forest algorithm for discovering breast cancer biomarkers,” in Information and Communication Technologies and Sustainable Development, in Lecture notes in networks and systems. Cham: Springer Nature Switzerland, 2023, pp. 428-443. doi: 10.1007/978-3-031-46880-3_25.
- D. Deshpande et al., ‘RNA-seq data science: From raw data to effective interpretation’, Front Genet, vol. 14, Mar. 2023, doi: 10.3389/fgene.2023.997383.
- A. Sonrel et al., ‘Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability’, Genome Biol, vol. 24, no. 1, p. 119, May 2023, doi: 10.1186/s13059-023-02962-5.
- S. H. Yip, P. C. Sham, and J. Wang, ‘Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data’, Brief Bioinform, vol. 20, no. 4, pp. 1583-1589, Jul. 2019, doi: 10.1093/bib/bby011.
- Y. Zhang, X. Xie, P. Wu, and P. Zhu, ‘SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data’, Blood Science, vol. 3, no. 2, pp. 35-39, Apr. 2021, doi: 10.1097/BS9.0000000000000072.
- T. S. Andrews and M. Hemberg, ‘M3Drop: dropout-based feature selection for scRNASeq’, Bioinformatics, vol. 35, no. 16, pp. 2865–2867, Aug. 2019, doi: 10.1093/bioinformatics/bty1044.
- B. Liu, C. Li, Z. Li, D. Wang, X. Ren, and Z. Zhang, ‘An entropy-based metric for assessing the purity of single cell populations’, Nat Commun, vol. 11, no. 1, p. 3155, Jun. 2020, doi: 10.1038/s41467-020-16904-3.
- F. Mair et al., ‘A Targeted Multi-omic Analysis Approach Measures Protein Expression and Low-Abundance Transcripts on the Single-Cell Level’, Cell Rep, vol. 31, no. 1, p. 107499, Apr. 2020, doi: 10.1016/j.celrep.2020.03.063.
- J. N. Campbell et al., ‘A molecular census of arcuate hypothalamus and median eminence cell types’, Nat Neurosci, vol. 20, no. 3, pp. 484-496, Mar. 2017, doi: 10.1038/nn.4495.
- A. C. Richard, A. T. L. Lun, W. W. Y. Lau, B. Göttgens, J. C. Marioni, and G. M. Griffiths, ‘T cell cytolytic capacity is independent of initial stimulation strength’, Nat Immunol, vol. 19, no. 8, pp. 849-858, Aug. 2018, doi: 10.1038/s41590-018-0160-9.
- F. Buettner et al., ‘Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells’, Nat Biotechnol, vol. 33, no. 2, pp. 155-160, Feb. 2015, doi: 10.1038/nbt.3102.
- D. Risso et al., ‘scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets’, R package version 2.14.0, 2023, doi: 10.18129/B9.bioc.scRNAseq.
- R. Fardoos et al., ‘HIV specific CD8+ TRM-like cells in tonsils express exhaustive signatures in the absence of natural HIV control’, Front Immunol, vol. 13, Oct. 2022, doi: 10.3389/fimmu.2022.912038.
- ‘Single Cell Portal’. Accessed: Jul. 22, 2023. [Online]. Available: https://singlecell.broadinstitute.org/ single_cell.
- H.-I. H. Chen, Y. Jin, Y. Huang, and Y. Chen, ‘Detection of high variability in gene expression from single-cell RNA-seq profiling’, BMC Genomics, vol. 17, no. S7, p. 508, Aug. 2016, doi: 10.1186/s12864-016-2897-6.
- C. A. Vallejos, J. C. Marioni, and S. Richardson, ‘BASiCS: Bayesian Analysis of Single-Cell Sequencing Data’, PLoS Comput Biol, vol. 11, no. 6, p. e1004333, Jun. 2015, doi: 10.1371/journal.pcbi.1004333.
- F. A. Wolf, P. Angerer, and F. J. Theis, ‘SCANPY: large-scale single-cell gene expression data analysis’, Genome Biol, vol. 19, no. 1, p. 15, Dec. 2018, doi: 10.1186/s13059-017-1382-0.
- A. Tyryshkina, N. Coraor, and A. Nekrutenko, ‘Predicting runtimes of bioinformatics tools based on historical data: five years of Galaxy usage’, Bioinformatics, vol. 35, no. 18, pp. 3453-3460, Sep. 2019, doi: 10.1093/bioinformatics/btz054.
- P. Brennecke et al., ‘Accounting for technical noise in single-cell RNA-seq experiments’, Nat Methods, vol. 10, no. 11, pp. 1093-1095, Nov. 2013, doi: 10.1038/nmeth.2645.
|
HOME
- Call for Papers
- Paper Submission
- For authors
- Important Dates
- Conference Committee
- Editorial Board
- Reviewers
- Last Proceedings
PROCEEDINGS
-
Volume 12, Issue 1 (ICAIIT 2024)
-
Volume 11, Issue 2 (ICAIIT 2023)
-
Volume 11, Issue 1 (ICAIIT 2023)
-
Volume 10, Issue 1 (ICAIIT 2022)
-
Volume 9, Issue 1 (ICAIIT 2021)
-
Volume 8, Issue 1 (ICAIIT 2020)
-
Volume 7, Issue 1 (ICAIIT 2019)
-
Volume 7, Issue 2 (ICAIIT 2019)
-
Volume 6, Issue 1 (ICAIIT 2018)
-
Volume 5, Issue 1 (ICAIIT 2017)
-
Volume 4, Issue 1 (ICAIIT 2016)
-
Volume 3, Issue 1 (ICAIIT 2015)
-
Volume 2, Issue 1 (ICAIIT 2014)
-
Volume 1, Issue 1 (ICAIIT 2013)
PAST CONFERENCES
ICAIIT 2024
-
Photos
-
Reports
ICAIIT 2023
-
Photos
-
Reports
ICAIIT 2021
-
Photos
-
Reports
ICAIIT 2020
-
Photos
-
Reports
ICAIIT 2019
-
Photos
-
Reports
ICAIIT 2018
-
Photos
-
Reports
ETHICS IN PUBLICATIONS
ACCOMODATION
CONTACT US
|
|