Pieces of the puzzle fitting together

See all posts

Sorting students into courses by test scores may improve their performance on standardized reading and math tests.  Yet such non-random sorting of students will bias  value-added estimates that do not control for it.

It's not often that quantitative work in education research is as complementary as two recent working papers, "Does Sorting Students Improve Scores? An Analysis of Class Composition" from NBER and  and "Does Tracking of Students Bias Value-Added Estimates for Teachers" from Mathematica.

The NBER paper uses two years' worth of test data from more than 9,000 third and fourth graders from 135 Dallas ISD schools to estimate the effects of sorting by a variety of dimensions, such as  previous test performance, gifted and talented status, special education status, and limited English proficiency status.  In almost three-quarters of the schools, they find evidence of student sorting using at least one characteristic (and in 40 percent, two).  Moreover, they "find strong evidence that sorting students into more homogeneous groups is beneficial, particularly for sorting by previous testing score." This is true for both high and low-scoring students.  They generally find mixed, insignificant results in sorting along the other dimensions.

The Mathematica paper uses pre and post-test reading and math scores from the 2011-2012 school year for 6,500 District of Columbia Public School students in seventh to tenth grade to estimate four models to quantify and control for the sorting bias. Their baseline model is a value-added estimation controlling for pre-test scores (both subjects) and a variety of student characteristics (free/reduced lunch eligibility, special education status, and race/ethnicity). Two other models control for student sorting: (a) by explicitly using variable indicating course enrollment (at the middle and high school levels, sorting is typically into specific, named classes); and, (b) by including classroom characteristics (mean classroom pre-test scores and classroom standard deviation). The final, full model includes both the explicit track and classroom-level controls.

The authors find, in short, that the explicit track and classroom characteristic models do correct for sorting bias, but that each changes significantly the results at the teacher level. For example, the average change in value-added estimates for each teacher compared to the baseline ranges from about 20 to 30 percent of a standard deviation. Further, the full model reduces the precision (increases the confidence interval size) for the estimates. All specifications lead to small changes in the tails (10th and 90th percentiles) of the value-added distributions. 

Taken together, these two papers point to a pretty clear path forward: consider sorting students and, when estimating value-added measures, assume such sorting is common practice and explicitly model for it.