Proceedings of International Conference on Applied Innovation in IT
2023/03/09, Volume 11, Issue 1, pp.113-118

KNN-Based Algorithm of Hard Case Detection in Datasets for Classification


Anton Okhrimenko and Nataliia Kussul


Abstract: The machine learning models for classification are designed to find the best way to separate two or more classes. In case of class overlapping, there is no possible way to clearly separate such data. Any ML algorithm will fail to correctly classify a certain set of datapoints, which are surrounded by a significant number of another class data points at the feature space. However, being able to find such hardcases in a dataset allows using another set of rules than for normal data samples. In this work, we introduce a KNN-based detection algorithm of data points and subspaces for which the classification decision is ambiguous. The algorithm described in details along with demonstration on artificially generated dataset. Also, the possible usecases are discussed, including dataset quality assessment, custom ensemble strategy and data sampling modifications. The proposed algorithm can be used during full cycle of machine learning model developing, from forming train dataset to real case model inference.

Keywords: KNN, Dataset Quality Assessment, Imbalanced Datasets, Hard Cases.

DOI: 10.25673/101926

Download: PDF

References:

  1. H. Abdi and L. J. Williams, “Principal component analysis. Wiley interdisciplinary reviews: computational statistics,” Wiley Interdisplinary Reviews: Computational Statistics, 2010.
  2. L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579-2605, 2008. [Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html
  3. W. A. Almutairi and R. Janicki, “On relationships between imbalance and overlapping of datasets,” EPiC Series in Computing, vol. 69, 2020.
  4. V. Garc´ıa, R. A. Mollineda, and J. S. S´anchez, “On the k-NN performance in a challenging scenario of imbalance and overlapping,” Pattern Analysis and Applications, vol. 11, 2008.
  5. M. M. Nwe and K. T. Lynn, “KNN-based overlapping samples filter approach for classification of imbalanced data,” Studies in Computational Intelligence, vol. 845, 2020.
  6. L. Chen, B. Fang, Z. Shang, and Y. Tang, “Tackling class overlap and imbalance problems in software defect prediction,” Software Quality Journal, vol. 26, no. 1, pp. 97–125, Mar 2018. [Online]. Available: https://doi.org/10.1007/s11219-016-9342-6.
  7. Y. Tang and J. Gao, “Improved classification for problem involving overlapping patterns,” IEICE TRANSACTIONS on Information and Systems”, vol. 90, no. 11, pp. 1787–1795, Nov 2007. [Online]. Available: https://doi.org/10.1093/ietisy/e90-d.11.1787.
  8. N. L¨assig, S. Oppold, and M. Herschel, “Metrics and algorithms for locally fair and accurate classifications using ensembles,” Datenbank-Spektrum, vol. 22, 2022.
  9. H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM Comput. Surv., vol. 52, no. 4, aug 2019. [Online]. Available: https://doi.org/10.1145/3343440.
  10. N. Kussul, A. Shelestov, M. Lavreniuk, I. Butko, and S. Skakun, “Deep learning approach for large scale land cover mapping based on remote sensing data fusion,” International Geoscience and Remote Sensing Symposium (IGARSS), vol. 2016-November, 2016.


    HOME

       - Call for Papers
       - Paper Submission
       - For authors
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceedings


    PROCEEDINGS

       - Volume 12, Issue 1 (ICAIIT 2024)        - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    PAST CONFERENCES

       ICAIIT 2024
         - Photos
         - Reports

       ICAIIT 2023
         - Photos
         - Reports

       ICAIIT 2021
         - Photos
         - Reports

       ICAIIT 2020
         - Photos
         - Reports

       ICAIIT 2019
         - Photos
         - Reports

       ICAIIT 2018
         - Photos
         - Reports

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

DOI: http://dx.doi.org/10.25673/115729


        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.