TQB: Teacher Quality Bulletin

Sensible, technical advice for school districts on incorporating student learning into teacher evals

See all posts

School leaders seeking to revamp their teacher evaluation systems are biting into a meaty subject.

Behind the very public questions of whom to evaluate and what to count is another layer most of us can overlook, but a question school districts can't: how should test score data be measured? Making the right decision requires an understanding of different value-added models and how to pick a model that best suits the needs of the school district.  There may not be one right answer. Darn.

A new study sets out to explain the trade-offs of different value-added models. Mark EhlertCory KoedelEric Parsons and Michael Podgursky explore different ways to build these models and find that some of these variations make a big difference in teacher evaluations – and some don't – though they leave the big decisions to education officials rather than throwing their weight behind one model or another.

The most significant decision facing those who develop teacher evaluations in a state or district is whether they want a "one-step" or "two-step" model— a decision with big policy implications. (See our explanation on the difference between the two and how the one-step model tends to favor teachers in more advantaged schools, while the two-step is more likely to recognize stronger teachers within disadvantaged schools.)

Deciding which student characteristics to include in the model (e.g., free/reduced lunch status, language status, race), also doesn't seem to have a clear right – or wrong — answer. Different combinations of these variables yield results that are highly correlated with each other (in other words, removing or adding a variable does not drastically change the outcome of the model). However, these changes do affect rankings. So, for example, when two teachers have similar value added scores, one model find that Teacher A is more effective, while a model with different characteristics included may find that Teacher B is more effective – but, and this is important, neither model is likely to see Teacher A go from highly effective to ineffective.

Finally, these researchers also make the case that using more years of previous test scores makes the data only a little more accurate; so little, in fact, that it doesn't merit the trade-off of evaluating fewer grades. It seems that one previous year of data is generally good enough...so out the window goes some advice we've been freely offering states over the years.