Proceedings of International Conference on Applied Innovation in IT 2023/03/09, Volume 11, Issue 1, pp.113-118 KNN-Based Algorithm of Hard Case Detection in Datasets for ClassificationAnton Okhrimenko and Nataliia KussulAbstract: The machine learning models for classification are designed to find the best way to separate two or more classes. In case of class overlapping, there is no possible way to clearly separate such data. Any ML algorithm will fail to correctly classify a certain set of datapoints, which are surrounded by a significant number of another class data points at the feature space. However, being able to find such hardcases in a dataset allows using another set of rules than for normal data samples. In this work, we introduce a KNN-based detection algorithm of data points and subspaces for which the classification decision is ambiguous. The algorithm described in details along with demonstration on artificially generated dataset. Also, the possible usecases are discussed, including dataset quality assessment, custom ensemble strategy and data sampling modifications. The proposed algorithm can be used during full cycle of machine learning model developing, from forming train dataset to real case model inference. Keywords: KNN, Dataset Quality Assessment, Imbalanced Datasets, Hard Cases. DOI: 10.25673/101926 Download: PDF References:
|
|
DOI: http://dx.doi.org/10.25673/115729
Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
ISSN 2199-8876
Publisher: Edition Hochschule Anhalt
Location: Anhalt University of Applied Sciences
Email: leiterin.hsb@hs-anhalt.de
Phone: +49 (0) 3496 67 5611
Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.