Abordagens eficientes para classificação binária em bases de dados extremamente desbalanceadas

PEREIRA, Leandro Duarte

Abordagens eficientes para classificação binária em bases de dados extremamente desbalanceadas

PEREIRA, Leandro Duarte; http://lattes.cnpq.br/6913225650128189

URI: https://repositorio.unifei.edu.br/jspui/handle/123456789/4314

Data: 2025-09-23

Resumo:

challenge across multiple domains, as the very low prevalence of the minority class (<1%) compromises both predictive performance and model reliability. Although the literature presents a considerable number of studies on class imbalance, the scenario of extreme imbalance still requires further in-depth investigation. In this context, this thesis developed two complementary research fronts. First, a Systematic Literature Review (SLR) was conducted following a rigorous protocol of selection and quality criteria, through which 22 primary experimental studies were analyzed across 52 datasets. The results indicated that combined approaches achieve superior performance in several scenarios, with particular emphasis on oversampling techniques associated with ensembles, especially the combination of Random Forest (RF) with methods derived from the Synthetic Minority Oversampling Technique (SMOTE). Second, we propose an innovative approach based on Design of Experiments (DOE) for generating synthetic datasets under extreme class imbalance conditions. The framework enables the controlled manipulation of six critical factors (feature dimensionality, sample size, imbalance ratio, response function type, decision threshold, and error variability), allowing systematic and replicable experimentation. Experiments conducted with Random Forest combined with SMOTE demonstrated the usefulness of the framework in analyzing the impact of main effects and interactions, with Analysis of Variance (ANOVA) identifying the relevance of feature dimensionality and error variability to classifier behavior. Altogether, the findings from the Systematic Literature Review and the proposed experimental framework contribute in an integrated manner to advancing knowledge and fostering the development of more robust methods for binary classification under extreme imbalance scenarios.

Mostrar registro completo