Resumo:
The technological advent provided the rise of data collection in companies, governments
and various industrial segments. In this respect, techniques that seek to perform groupings and
discrimination of clusters are widely used in datasets with multiple variables, bringing the need
to use specific tools, which contemplate the existing variance-covariance structure. Based on
this, this work presents a proposal to improve the discriminatory power of confidence regions
in the formation and estimation of optimal clusters, using multivariate and experimental
techniques to extract information in an optimized way in correlated datasets. Factor analysis
was used as the exploratory multivariate method, tuning the rotation for factor loads through
the mixture design, and agglutinating the total variance explained functions by the mean square
error afterwards. The optimization of this step is performed through the sequential quadratic
programming algorithm. Knowing the optimal scores, a multilevel factorial design is formed to
contemplate all combinations of the linkage methods and the types of analysis, seeking to find
the parameter that presents the least variability, generating confidence ellipses with better
discrimination between groups. A strategy to analyze the levels of agreement and the inversions
existence in the formation of clusters is proposed using the Kappa and Kendall indicators.
Motivated by the need for strategies to classify substations in the face of voltage sag
phenomena, which cause faults in the distribution of electricity, the method was applied to a set
of real data, representing the power quality indexes of substations located in southeastern
Brazil. Optimum values were found in the factor loads rotation and the parameterization “Wardanalysis of covariance” was defined as the ideal strategies to create the clusters in this dataset.
Thus, low variability clusters and precise confidence ellipses were generated to estimate the
voltage sag patterns, promoting a better discriminatory power in the clusters’ classification
through the regions of confidence. The confirmatory analysis inferred that the “Ward” linkage
proved to be the most robust method for this dataset, even under the influence of disturbances
in the original data.