--- title: "Basic Education Assessments: SAEB, ENCCEJA, and ENEM by School" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic Education Assessments: SAEB, ENCCEJA, and ENEM by School} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` This vignette covers three basic education assessment datasets available in educabR. For IDEB, ENEM, and the School Census, see `vignette("getting-started")`. ```{r setup} library(educabR) library(dplyr) library(ggplot2) ``` ## SAEB - Basic Education Assessment System SAEB (Sistema de Avaliacao da Educacao Basica) is a biennial assessment that measures student performance in Portuguese and Mathematics across Brazilian basic education. It is one of the components used to calculate IDEB. ### Available data types SAEB microdata includes four perspectives: | Type | Description | |------|-------------| | `"aluno"` | Student-level results (scores, responses) | | `"escola"` | School questionnaire data | | `"diretor"` | Principal questionnaire data | | `"professor"` | Teacher questionnaire data | ### Downloading SAEB data ```{r saeb-download} # Student performance data saeb_students <- get_saeb(year = 2023, type = "aluno") # School questionnaire saeb_schools <- get_saeb(year = 2023, type = "escola") # Use n_max for exploration saeb_sample <- get_saeb(year = 2023, type = "aluno", n_max = 5000) ``` ### Available years SAEB is conducted every two years: 2011, 2013, 2015, 2017, 2019, 2021, 2023. ```{r saeb-years} # 2021 data is split by education level saeb_fund <- get_saeb( year = 2021, type = "aluno", level = "fundamental_medio" ) saeb_infantil <- get_saeb( year = 2021, type = "aluno", level = "educacao_infantil" ) ``` ### Example analysis: Score distribution ```{r saeb-analysis} # Explore student scores saeb_sample <- get_saeb(2023, type = "aluno", n_max = 10000) # Score distribution by subject saeb_sample |> filter(!is.na(proficiencia_mt)) |> ggplot(aes(x = proficiencia_mt)) + geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) + labs( title = "SAEB 2023 - Mathematics Proficiency Distribution", x = "Mathematics Score", y = "Count" ) + theme_minimal() ``` --- ## ENCCEJA - Youth and Adult Education Certification ENCCEJA (Exame Nacional para Certificacao de Competencias de Jovens e Adultos) provides certification for elementary and high school equivalency. It covers four knowledge areas: Natural Sciences, Mathematics, Portuguese, and Social Sciences. ### Downloading ENCCEJA data ```{r encceja-download} # Download ENCCEJA microdata encceja_2023 <- get_encceja(year = 2023) # Sample for exploration encceja_sample <- get_encceja(year = 2023, n_max = 5000) ``` ### Available years ENCCEJA data is available from 2014 to 2024. ```{r encceja-structure} # Explore the data structure glimpse(encceja_sample) ``` ### Example analysis: Participation by state ```{r encceja-analysis} encceja_2023 <- get_encceja(2023, n_max = 50000) # Count participants by state participants_by_state <- encceja_2023 |> count(sg_uf_prova, sort = TRUE) |> head(10) ggplot(participants_by_state, aes( x = reorder(sg_uf_prova, n), y = n )) + geom_col(fill = "darkorange") + coord_flip() + labs( title = "ENCCEJA 2023 - Top 10 States by Participation", x = "State", y = "Number of Participants" ) + theme_minimal() ``` --- ## ENEM by School (2005-2015) ENEM by School (ENEM por Escola) provides ENEM results aggregated at the school level. This dataset covers 2005 to 2015 in a single bundled file and was **discontinued after 2015**. ### Downloading the data Unlike other datasets, this function has no `year` parameter — it downloads the entire 2005-2015 dataset at once. ```{r enem-escola-download} # Download all ENEM by School data (2005-2015) enem_escola <- get_enem_escola() # Sample for exploration enem_escola_sample <- get_enem_escola(n_max = 5000) ``` ### Data structure ```{r enem-escola-structure} glimpse(enem_escola_sample) ``` ### Example analysis: School performance trends ```{r enem-escola-analysis} enem_escola <- get_enem_escola() # Average scores over time (public vs private) trend <- enem_escola |> filter(!is.na(nu_media_tot)) |> group_by(nu_ano, tp_dependencia_adm_escola) |> summarise( mean_score = mean(nu_media_tot, na.rm = TRUE), .groups = "drop" ) |> mutate( admin_type = case_when( tp_dependencia_adm_escola == 1 ~ "Federal", tp_dependencia_adm_escola == 2 ~ "State", tp_dependencia_adm_escola == 3 ~ "Municipal", tp_dependencia_adm_escola == 4 ~ "Private" ) ) ggplot(trend, aes(x = nu_ano, y = mean_score, color = admin_type)) + geom_line(linewidth = 1) + geom_point(size = 2) + labs( title = "ENEM Average Score by School Type (2005-2015)", x = "Year", y = "Average Total Score", color = "School Type" ) + theme_minimal() ```