Abstract:
Decision-making for groups, public or private, is indispensable to the development of
organizations, and searching for mechanisms to support the managers more assertively is
fundamental to this goal. Know how to use raw data transforming them into knowledge
allows these decisions to be based on data besides purely on intuition. Between the important
decisions taken by any organization, the classification and selection of suppliers are an
important practice to industrial engineering and Data Science is an ascendant field that
studies data and how to realize this transformation of raw data into knowledge. To this
research were used real data from suppliers of an enterprise of the aeronautical sector in its
analyses. So, this research acted between Data Science and Classification and Selection of
suppliers and had the focus on a problem known as clusterization that is the segmentation of
data in regions as homogeneous as possible when there´s no existence of previous categories
and aim to solve this problem supporting in the supplier´s management. This happens in
practice using Data Science tools known as Machine Learning that are algorithms that can be
used in the segmentation of groups without an initial classification. To the development has
been used the procedure CRISP-DM that allows elucidate analyses´ problems helping to
structure the scientific thinking. That way, by using this procedure, this dissertation had its
general objetive in the use of the technique of Machine Learning to help in the classification
and selection of suppliers of that organization. Having two specific objectives, the first one
consisted of an analysis of those algorithms in the demonstration of the operation and
behavior of the classic algorithms of clusterization of the real database. The second one
consisted in analyzing those clusterization algorithms in search of the most appropriate to the
supplier´s base culminating with the creation and suggestion of a framework that can be used
for future clustering analises. The clustering modelings were realized and through internal
and stability validations had their efficiency tested allowing the data to be split into clusters.
The use of CRISP-DM allowed that the clustering framework was proposed