Resumo:
This dissertation seeks to use data science, through unsupervised exploratory analysis,
to prove what the literature already knows and/or present new discoveries about the
influence of infrastructure on Brazilian basic education. School infrastructure is not
limited to the architectural issue of schools, but also to the educational and
administrative environment, equipment, educational resources, practices, curricula and
the teaching and learning process. Data collection was carried out on open data from the
2019 School Census (Basic Education) and the Basic Education Development Index
(Ideb), for the years 2005-2019. The choice of the year 2019 was because it was the last
year that schools presented results before the influence of the COVID-19 pandemic.
After several data treatments and the choice of attending only the initial segment of
fundamental education, two analysis methodologies were applied: Correlogram and
Factor Analysis (FA). For clarity in the results, new attributes were created referring to
the federative entities that allowed identifying which states and school profiles are
better related to the growth and good results of the Ideb. For these correlations, the
Sigma of the Gaussian Copula was chosen, which takes into account the categorical and
continuous data and also generated a definite positive matrix. The Correlogram
generated a square matrix that presented the attribute relationships in a Heatmap
Dendrogram. Divided into 4 large groups, each one had specific characteristics and
relationships with federal entities. The first group had a strong relationship with basic
infrastructure; the second group, with IDEB and the most sophisticated infrastructures;
the third group showed few relationships between the attributes; and the last group had
strong negative correlations and contained greater precariousness in infrastructure. After
verifying the compatibility of the database for the application of the FA, it was
estimated that 10 factors would be suitable for this study. Four factors were associated
with the attributes of the Ideb, the focus of this work. Three patterns were also observed
in the attributes that listed good results in the Ideb with different infrastructures, policies
and/or educational proposals: the first group, guided by São Paulo state, presented basic
sanitation offered by the public service, quality internet for use in learning and
institutions schoolchildren; the second group, headed by Minas Gerais state, indicates
an association with flexibility in traditional teaching, with school cycles and non-serial
classrooms; the third group was marked by complementary activities and specialized
care, represented by the Ceará state. In contradiction to these parameters, schools with
the EJA modality, mainly in the northeast, tend to have lower results in the Ideb. The
other 6 factors added a lot of relevant information, including those related to the
correlations and anti-correlations of the federative entities and specific attributes. As
seen, data science has a lot to add to the field of education. Future works are expected to
add even more data, such as longitudinal studies on the Ideb and to add other
educational indices such as the Ioeb and the socioeconomic level of the population.