Proceedings of International Conference on Applied Innovation in IT
2023/11/30, Volume 11, Issue 2, pp.81-89

Dynamic Topic Modelling of Online Discussions on the Russian War in Ukraine


Taras Ustyianovych, Nadiia Kasianchuk, Halina Falfushynska, Solomiia Fedushko and Eduard Siemens


Abstract: The availability of robust end-to-end ML processes plays a crucial role in delivering an accurate and reliable system for real-time text data inference. In this paper, we present an approach to building machine learning operations (MLOps) and an observability application to perform topic modelling of online discussions in social media, here observed based on topics and threads related to the Russian war in Ukraine. Splunk Enterprise is the main tool and platform used throughout this research with its knowledge discovery, dashboarding, and alerting. 30GB of social media text data coming from a Russian social network VKontakte over the time line January 2022 to May 2023. Main inquiries included text mining and topic modelling, which we managed to perform over the observation period using Python frameworks, mainly gensim for text processing and MLflow for experiment management and logging. The Splunk architecture allowed us to ingest and analyse the results and prediction of ML experiments for dynamic topic modelling, and served as a MLOps solution. The designed set of five dashboards played a crucial role in determining the optimal model hyperparameters (number of topics, A-priori belief on document-topic distribution, number of total corpus passes) and drift detection which occurred almost every two-three weeks depending on the phase of the war. Our application assisted us with text analysis, discovering how events on the battlefield influenced social media discussions, and what post attributes contributed to a high user engagement. With our setup we were able to find out how antiwar hashtags have been used to promote misleading content actually supporting the war against Ukraine. The analysis of the researched discussions shows a trend where usage of adjectives decreased over time since the war has started, whereas an increase for nouns and verbs usage over time. Information distortion has steadily been present in the content leading to bias and misleading data in social media discussions.

Keywords: Machine Learning Operations (MLOps), Social Media Discussions, Russian War in Ukraine, Splunk Enterprise, Latent Dirichlet Allocation (LDA).

DOI: 10.25673/112997

Download: PDF

References:

  1. V. Solopova, O. Popescu, C. Benzmüller, and T. Landgraf, "Automated multilingual detection of pro-kremlin propaganda in newspapers and telegram posts", Datenbank-Spektrum, vol. 23, no. 1, p. 5-14, 2023, doi: 10.1007/s13222-023-00437-2.
  2. M. Popova, E. Siemens, and K. Karpov, "The concept of text processing in an ontological approach to spatio-temporal social network analysis", 2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP), 2023, doi: 10.1109/iwssip58668.2023.10180274.
  3. D. Kreuzberger, N. Kuehl, and S. Hirschl, "Machine learning operations (mlops): overview, definition, and architecture", IEEE Access, vol. 11, p. 31866-31879, 2022, doi: 10.1109/access.2023.3262138.
  4. Studer, S. Bui, T.B. Drescher, C. Hanuschkin, A. Winkler, L., Peters, and et. al, “A Machine Learning Process Model with Quality Assurance Methodology.” Mach. Learn. Knowl. Extr. 2021, 3, 392-413, doi: 10.3390/make3020020.
  5. Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15). Association for Computing Machinery, New York, NY, USA, 399-408, [Online]. Available: https://doi.org/10.1145/2684822.2685324.
  6. D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation”, Journal of Machine Learning Research 3 (2003) 993-1022, doi: 10.5555/944919.944937.
  7. R. Churchill and L. Singh, (2022). The evolution of topic modeling. ACM Computing Surveys, 54(10s),(2022) 1-35, doi: 10.1145/3507900.
  8. S. Fedushko, T. Ustyianovych, and M. Gregus, "Real-time high-load infrastructure transaction status output prediction using operational intelligence and big data technologies", Electronics, vol. 9, no. 4, p. 668, 2020, doi: 10.3390/electronics9040668.
  9. S. Fedushko, T. Ustyianovych, Y. Syerov, and T. Peracek, "User-engagement score and slis/slos/slas measurements correlation of e-business projects through big data analysis", Applied Sciences, vol. 10, no. 24, p. 9112, 2020, doi: 10.3390/app10249112.
  10. I. Vayansky and S. Kumar, "A review of topic modeling methods", Information Systems, vol. 94, p. 101582, 2020. doi: 10.1016/j.is.2020.101582.
  11. A. Abdelrazek, Y. Eid, E. Gawish, H. Mohamed, and A. Hassan, "Topic modeling algorithms and applications: a survey", Information Systems, vol. 112, p. 102131, 2023, doi: 10.1016/j.is.2022.102131.
  12. “BERTopic”. [Online]. Available: https://maartengr.github.io/BERTopic/index.html [Accessed: Jul. 27, 2023].
  13. R. Belwal, S. Rai, and A. Gupta, "Extractive text summarization using clustering-based topic modeling", Soft Computing, vol. 27, no. 7, p. 3965-3982, 2022, doi: 10.1007/s00500-022-07534-6.
  14. A. Alambo et al., "Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles," 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 2020, pp. 591-596, doi: 10.1109/BigData50022.2020.9378403.
  15. A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, "Deepsumm: exploiting topic models and sequence to sequence networks for extractive text summarization", Expert Systems With Applications, vol. 211, p. 118442, 2023, doi: 10.1016/j.eswa.2022.118442.


    HOME

       - Call for Papers
       - Paper Submission
       - For authors
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceedings


    PROCEEDINGS

       - Volume 12, Issue 1 (ICAIIT 2024)        - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    PAST CONFERENCES

       ICAIIT 2024
         - Photos
         - Reports

       ICAIIT 2023
         - Photos
         - Reports

       ICAIIT 2021
         - Photos
         - Reports

       ICAIIT 2020
         - Photos
         - Reports

       ICAIIT 2019
         - Photos
         - Reports

       ICAIIT 2018
         - Photos
         - Reports

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

DOI: http://dx.doi.org/10.25673/115729


        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.