Proceedings of International Conference on Applied Innovation in IT
2024/03/07, Volume 12, Issue 1, pp.79-87

Computational Breakthroughs in Aquatic Taxonomy: The Role of Deep Learning and DNA Barcoding


Nadiia Kasianchuk, Sofiia Harkava, Sofiia Onishchenko, Olesia Solodka, Daria Shyshko, Eduard Siemens, Halina Falfushynska and Taras Ustyianovych


Abstract: Aquatic ecosystems are crucial in maintaining environmental equilibrium and sustaining human well-being. However, the traditional manual methods used in hydrobiological research have limitations in providing a far-reaching understanding of these intricate ecosystems. Data science, machine learning, and deep learning techniques offer a variety of opportunities to overcome these limitations and unlock new insights into aquatic environments. This study highlights the impact of computational tools in areas such as taxonomic identification, metagenomic sequence analysis, and water quality prediction. Deep learning techniques have demonstrated superior accuracy in classifying organisms, including those previously unidentified by conventional methods. In metagenomic sequence analysis, machine learning aids in effectively assembling DNA sequences, aligning them with known databases, and addressing challenges related to sequence repeats, errors, and missing data. Furthermore, predictive models have been developed to provide insights into water quality parameters, such as eutrophication events and heavy metal concentrations. These advancements lead to informed conservation measures and a deep understanding of the intricate relationships within aquatic ecosystems. However, challenges persist, including data quality issues, model interpretability, and the need for robust training datasets. Thus, data integration strategies designed specifically for environmental and genomic studies are necessary. Data fusion and imputation can help address data scarcity and provide a comprehensive view of hydrobiological processes. As the study of aquatic ecosystems continues to evolve, the synergy between computational methods and traditional hydrobiological techniques holds immense potential. By leveraging the power of data science and cutting-edge technologies, researchers can gain a deep understanding of aquatic environments, monitor changes in biodiversity, and develop informed strategies for sustainable management amidst global environmental shifts.

Keywords: Aquatic Ecosystems; Deep Learning; Taxonomic Identification; Metagenomic Sequences; Environment Modeling; Machine Learning

DOI: 10.25673/115645; PPN 1884680585

Download: PDF

References:

  1. D. B. Oerther, L. Gautham, and N. Folbre, “Environmental engineering as care for human welfare and planetary health,” Journal of Environmental Engineering, vol. 148, no. 6, Jun. 2022, doi: 10.1061/(asce)ee.1943-7870.0002013.
  2. D. Y. Kwon, J. Kim, S. Park, and S. Hong, “Advancements of remote data acquisition and processing in unmanned vehicle technologies for water quality monitoring: An extensive review,” Chemosphere, vol. 343, p. 140198, Dec. 2023, doi: 10.1016/j.chemosphere.2023.140198.
  3. K. I. Suh, J. M. Hwang, Y. J. Bae, and J. H. Kang, “Comprehensive DNA barcodes for species identification and discovery of cryptic diversity in mayfly larvae from South Korea: Implications for freshwater ecosystem biomonitoring,” Entomological Research, vol. 49, no. 1, pp. 46-54, Jan. 2019, doi: 10.1111/1748-5967.12334.
  4. V. Gomez‐Alvarez, H. Liu, J. G. Pressman, and D. G. Wahman, “Metagenomic Profile of Microbial Communities in a Drinking Water Storage Tank Sediment after Sequential Exposure to Monochloramine, Free Chlorine, and Monochloramine,” ACS ES&T Water, vol. 1, no. 5, pp. 1283-1294, Mar. 2021, doi: 10.1021/acsestwater.1c00016.
  5. C. O. Coleman and A. Radulovici, “Challenges for the future of taxonomy: talents, databases and knowledge growth,” Megataxa, vol. 1, no. 1, Jan. 2020, doi: 10.11646/megataxa.1.1.5.
  6. A. C. Staudt, et al., “The added complications of climate change: understanding and managing biodiversity and ecosystems,” Frontiers in Ecology and the Environment, vol. 11, no. 9, pp. 494-501, Nov. 2013, doi: 10.1890/120275.
  7. H. Falfushynska, N. Kasianchuk, E. Siemens, E. Henao, and P. Rzymski, “A review of common cyanotoxins and their effects on fish,” Toxics, vol. 11, no. 2, p. 118, Jan. 2023, doi: 10.3390/toxics11020118.
  8. R. C. Allen, B. E. Rittmann, and R. Curtiss, “Axenic Biofilm Formation and Aggregation by Synechocystis sp. Strain PCC 6803 Are Induced by Changes in Nutrient Concentration and Require Cell Surface Structures,” Applied and Environmental Microbiology, vol. 85, no. 7, Apr. 2019, doi: 10.1128/aem.02192-18.
  9. K. Malde, N. O. Handegard, L. Eikvil, and A.-B. Salberg, “Machine intelligence and the data-driven future of marine science,” Ices Journal of Marine Science, vol. 77, no. 4, pp. 1274-1285, Apr. 2019, doi: 10.1093/icesjms/fsz057.
  10. R. H. Medina, et al., “Machine learning and deep learning applications in microbiome research,” ISME Communications, vol. 2, no. 1, Oct. 2022, doi: 10.1038/s43705-022-00182-9.
  11. R. Harr, P. Hagblom, and P. Gustafsson, “Two-dimensional graphic analysis of DNA sequence homologies,” Nucleic Acids Research, vol. 10, no. 1, pp. 365-374, Jan. 1982, doi: 10.1093/nar/10.1.365.
  12. C. Mora, D. P. Tittensor, S. M. Adl, A. G. B. Simpson, and B. Worm, “How many species are there on Earth and in the ocean?,” PLOS Biology, vol. 9, no. 8, p. e1001127, Aug. 2011, doi: 10.1371/journal.pbio.1001127.
  13. J. R. Miller, S. Koren, and G. Sutton, “Assembly algorithms for next-generation sequencing data,” Genomics, vol. 95, no. 6, pp. 315-327, Jun. 2010, doi: 10.1016/j.ygeno.2010.03.001.
  14. A. M. Phillippy, “New advances in sequence assembly,” Genome Res., vol. 27, no. 5, pp. xi-xiii, May 2017, doi: 10.1101/gr.223057.117.
  15. K. L. Korunes and K. Samuk, “pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data,” Mol. Ecol. Resour., vol. 21, no. 4, pp. 1359-1368, May 2021, doi: 10.1111/1755-0998.13326.
  16. J. Mbatchou, et al., “Computationally efficient whole-genome regression for quantitative and binary traits,” Nat. Genet., vol. 53, no. 7, pp. 1097-1103, Jul. 2021, doi: 10.1038/s41588-021-00870-7.
  17. E. Csuhaj-Varjú, I. Petre, and G. Vaszil, “Self-assembly of strings and languages,” Theor. Comput. Sci., vol. 374, no. 1, pp. 74-81, Apr. 2007, doi: 10.1016/j.tcs.2006.12.004.
  18. R. Brijder and H. J. Hoogeboom, “Combining overlap and containment for gene assembly in ciliates,” Theor. Comput. Sci., vol. 411, no. 6, pp. 897-905, Feb. 2010, doi: 10.1016/j.tcs.2009.07.047.
  19. R. Brijder, H. J. Hoogeboom, and G. Rozenberg, “REDUCTION GRAPHS FROM OVERLAP GRAPHS FOR GENE ASSEMBLY IN CILIATES,” Internat. J. Found. Comput. Sci., vol. 20, no. 02, pp. 271-291, Apr. 2009, doi: 10.1142/S0129054109006553.
  20. B. Ekim, B. Berger, and R. Chikhi, "Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer," vol. 12, no. 10, pp. 958-968.e6, Oct. 2021, doi: 10.1016/j.cels.2021.08.009.
  21. R. M. Idury and M. S. Waterman, “A new algorithm for DNA sequence assembly,” Journal of Computational Biology, vol. 2, no. 2, pp. 291-306, Jan. 1995, doi: 10.1089/cmb.1995.2.291.
  22. D. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome Res., vol. 18, no. 5, pp. 821-829, May 2008, doi: 10.1101/gr.074492.107.
  23. Y. Endo, F. Toyama, C. Chiba, H. Mori, and K. Shoji, “Memory Efficient de novo Assembly Algorithm using Disk Streaming of K-mers,” scitepress.org, Jan. 2016, doi: 10.5220/0005798302660271.
  24. E. Costa and G. Silva, “The velvet assembler using OpenACC directives,” EPiC Series in Computing, May 2023, doi: 10.29007/pzbt.
  25. R. Chikhi and G. Rizk, “Space-efficient and exact de Bruijn graph representation based on a Bloom filter,” Algorithms Mol. Biol., vol. 8, no. 1, p. 22, Sep. 2013, doi: 10.1186/1748-7188-8-22.
  26. L. Song, et al., “Robust data storage in DNA by de Bruijn graph-based de novo strand assembly,” Nature Communications, vol. 13, no. 1, Sep. 2022, doi: 10.1038/s41467-022-33046-w.
  27. J. Thompson and O. Poch, “New challenges and strategies for multiple sequence alignment in the Proteomics Era,” in Humana Press eBooks, 2005, pp. 475-492. doi: 10.1385/1-59259-890-0:475.
  28. T. N. Petersen, et al., “MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads,” PLOS ONE, vol. 12, no. 5, p. e0176469, May 2017, doi: 10.1371/journal.pone.0176469.
  29. T. Wolf, P. Kämmer, S. Brunke, and J. Linde, “Two’s company: studying interspecies relationships with dual RNA-seq,” Current Opinion in Microbiology, vol. 42, pp. 7-12, Apr. 2018, doi: 10.1016/j.mib.2017.09.001.
  30. C. Anyansi, T. J. Straub, A. L. Manson, A. M. Earl, and T. Abeel, “Computational methods for Strain-Level microbial detection in colony and metagenome sequencing data,” Frontiers in Microbiology, vol. 11, Aug. 2020, doi: 10.3389/fmicb.2020.01925.
  31. I. L. Brito, “Examining horizontal gene transfer in microbial communities,” Nature Reviews Microbiology, vol. 19, no. 7, pp. 442-453, Apr. 2021, doi: 10.1038/s41579-021-00534-7.
  32. R. A. DeVore, G. Petrova, and P. Wojtaszczyk, “Greedy algorithms for reduced bases in banach spaces,” Constructive Approximation, vol. 37, no. 3, pp. 455-466, Feb. 2013, doi: 10.1007/s00365-013-9186-2.
  33. R. Jafari, M. M. Javidi, and M. K. Rafsanjani, “Using deep reinforcement learning approach for solving the multiple sequence alignment problem,” SN Applied Sciences, vol. 1, no. 6, May 2019, doi: 10.1007/s42452-019-0611-4.
  34. Y.-J. Song and D.-H. Cho, “Local alignment of DNA sequence based on deep reinforcement learning,” IEEE Open Journal of Engineering in Medicine and Biology, vol. 2, pp. 170-178, Jan. 2021, doi: 10.1109/ojemb.2021.3076156.
  35. A. Lall and S. Tallur, “Deep reinforcement learning-based pairwise DNA sequence alignment method compatible with embedded edge devices,” Scientific Reports, vol. 13, no. 1, Feb. 2023, doi: 10.1038/s41598-023-29277-6.
  36. M. Muthulakshmi, “A Novel Feature Extraction from Genome Sequences For Taxonomic Classification Of Living Organisms,” Turkish Journal of Computer and Mathematics Education, Apr. 2021, doi: 10.17762/turcomat.v12i2.1364.
  37. F. J. Wrona, T. D. Prowse, J. Reist, and W. F. Vincent, “Climate change effects on aquatic biota, ecosystem structure and function,” ResearchGate, Dec. 2006, doi: 10.1579/0044-7447(2006)35.
  38. V.-K. Bui and C. Wei, “CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies,” BMC Bioinformatics, vol. 21, no. 1, Oct. 2020, doi: 10.1186/s12859-020-03777-y.
  39. R. Ounit, S. Wanamaker, T. J. Close, and S. Lonardi, “CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers,” BMC Genomics, vol. 16, no. 1, Mar. 2015, doi: 10.1186/s12864-015-1419-2.
  40. F. Mock, F. Kretschmer, A. Kriese, S. Böcker, and M. Marz, “Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 119, no. 35, Aug. 2022, doi: 10.1073/pnas.2122636119.
  41. B. H. Mendoza‐Ramírez, L. Páiz‐Medina, T. Salvatierra‐Suárez, N. Del Socorro Hernández, and J. A. Huete‐Pérez, “A survey of aquatic macroinvertebrates in a river from the dry corridor of Nicaragua using biological indices and DNA barcoding,” Ecology and Evolution, vol. 12, no. 11, Nov. 2022, doi: 10.1002/ece3.9487.
  42. H.-T. Vu and L. Le, “Bioinformatics Analysis on DNA Barcode Sequences for Species identification: A review,” Annual Research & Review in Biology, pp. 1-12, Dec. 2019, doi: 10.9734/arrb/2019/v34i130142.
  43. M. Emu and S. Sakib, “Species Identification using DNA Barcode Sequences through Supervised Learning Methods,”2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Feb. 2019, doi: 10.1109/ecace.2019.8679166.
  44. L. Jin, J. Yu, X. Yuan, and X. Du, “Fish Classification Using DNA Barcode Sequences through Deep Learning Method,” Symmetry, vol. 13, no. 9, p. 1599, Aug. 2021, doi: 10.3390/sym13091599.
  45. P. Qian, et al., “Multi-Target Deep Learning for Algal Detection and Classification,” In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jul. 2020, doi: 10.1109/embc44109.2020.9176204.
  46. P. Memmolo, et al., “Learning Diatoms Classification from a Dry Test Slide by Holographic Microscopy,” Sensors, vol. 20, no. 21, p. 6353, Nov. 2020, doi: 10.3390/s20216353.


    HOME

       - Call for Papers
       - For authors
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceedings


    PROCEEDINGS

       - Volume 12, Issue 1 (ICAIIT 2024)        - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    PAST CONFERENCES

       ICAIIT 2024
         - Photos
         - Reports

       ICAIIT 2023
         - Photos
         - Reports

       ICAIIT 2021
         - Photos
         - Reports

       ICAIIT 2020
         - Photos
         - Reports

       ICAIIT 2019
         - Photos
         - Reports

       ICAIIT 2018
         - Photos
         - Reports

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

DOI: http://dx.doi.org/10.25673/115729


        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Anhalt University of Applied Sciences

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.