Sam Powers and Michele Claibourn
Return to Summary Page
Approach
We began by pulling data from The Virginia Department of Edcuation, using the "Build-a-Table" functions. Our initial analysis begain by pulling grade-level results to build division-level cohorts as provided in the detailed cohort view. The data presented in the division cohort gaps is a direct representation of the data acquired from VDOE; for some dvisions, the data is suppressed due to small numbers of students in the relevant group within the school district.
The overall division gap estimates, as presented in the main summary figures, are based on these same data, but are estimated via simulated data. The number of relevant students and the pass rates are used to create a simulated student-level data set. If a school division indicates 100 economically disadvantaged 8th graders took the reading SOL and 75 of them passed, the simulated data will contain 75 observations coded as economically disadvantaged, 8th grade, and passing, and 25 observations coded as economically disadvantaged, 8th grade, and not passing. This process generates a simulated data point for each student represented in the summary data as provided by VDOE. We use this simulated individual data to estimate a mixed-effects logit model of pass rates in each year, with students nested in their school division. Pass rates are modeled as a function of the relevant characteristic (race, ethnicity, economically advantaged/disadvantaged), allowing for both a random intercept and random coefficient. The model for each gap estimate is
\[ P(Y_{ij} = 1|Division_j) = logit^{-1}(\alpha_j + x_{ij}\beta_j) \]
Where \(x_{ij}\) represents the demographic group for student \(i\) in division \(j\), \(\alpha_j \sim N(0, \sigma_{\alpha})\), and \(\beta_j \sim N(0, \sigma_{\beta})\).
From the resulting model we estimate predicted pass rates for each group of students (Black-White, Hispanic-White, Economically Disadvantaged-Advantaged) and plot these estimates in the summary figure. Clicking through the years provides an overview of the reading gaps by division over time.
The data we pulled and our code are available on GitHub. We used the lme4 package in R to fit the generalized linear mulitlvel model which uses the adaptive Gauss-Hermite approximation to the log-likelihood.