Meta-otimização dos hiperparâmetros do algoritmo XGBoost para classificação binária: uma integração entre planejamento de experimentos e o método da interseção normal à fronteira

RIBEIRO, Caio Tertuliano

Meta-otimização dos hiperparâmetros do algoritmo XGBoost para classificação binária: uma integração entre planejamento de experimentos e o método da interseção normal à fronteira

RIBEIRO, Caio Tertuliano; http://lattes.cnpq.br/3087349766668094

URI: https://repositorio.unifei.edu.br/jspui/handle/123456789/4372

Data: 2026-02-19

Resumo:

Hyperparameter tuning is decisive for both predictive performance and computational cost in machine learning models. In binary classification with Extreme Gradient Boosting, tuning is inherently multiobjective: predictive quality must be maximized while execution time is minimized. This work proposes a methodological framework that integrates Design of Experiments, Response Surface Methodology, Factor Analysis, and the Normal Boundary Intersection method to guide the selection of Extreme Gradient Boosting hyperparameters under a fixed evaluation budget. The initial exploration is conducted through a fractional facecentered central composite design, totaling 88 configurations. The observed responses (accuracy, precision, recall, specificity, and runtime) are collected under a reproducible protocol and summarized through factor scores obtained via principal component analysis with Varimax rotation. These scores define quality and cost objective functions, reduce redundancy among metrics, and support an interpretable assessment of trade-offs. Quadratic responsesurface models are then fitted to the objective functions and used by Normal Boundary Intersection to sample an approximately uniform Pareto frontier; candidate solutions are reevaluated on the real model, and a final compromise configuration is selected. The proposed approach is benchmarked, under equivalent evaluation budgets, against grid search, random search, Bayesian optimization, and Hyperopt. Results show substantial computational savings: the method achieves an average runtime of 0.078 s per fold and an average total runtime of 158 s, reducing total time by 9% to 71% relative to the benchmarks while maintaining high and stable predictive performance across replications. As external validation, the proposed pipeline was replicated on two additional datasets with contrasting profiles (balanced and highly imbalanced), reproducing the cost–quality trade-off observed in the canonical benchmark and reinforcing the multivariate stability of the method. Overall, integrating Design of Experiments and Normal Boundary Intersection provides a parsimonious, interpretable, and replicable alternative for multiobjective hyperparameter tuning, with potential applicability to other Gradient Boosting Decision Tree families and cost-constrained settings.

Mostrar registro completo