It's not surprising that many have run with the story that the valuable new study from Morgan Polikoff and Andrew Porter is "another blow to the reliability of measuring teacher effectiveness through test scores," as the Atlanta Journal-Constitution put it. That's just the kind of headline the authors were hoping to provoke with the question they raise in their conclusion, "If VAMs [value-added models using test scores] are not meaningfully associated with either content or quality of instruction, what are they measuring?"
Taking a careful look at a subsample of teachers in the MET study, Polikoff and Porter show that detailed teacher self-reports on the content they actually delivered, observations of teacher performance using the Danielson Framework, and VAM estimates of student learning are almost entirely uncorrelated. But this finding does not necessarily mean that VAMs should be thrown out of teacher evaluation.
First of all, the Chetty et al. blockbuster study of 2012 pretty much demonstrated that effective teachers as identified through VAMs have impact on the kinds of life outcomes that people care about (e.g., income), so VAMs are measuring something. And as Polikoff and Porter acknowledge, all the measures they use in the study may themselves have problems (e.g., the teacher self-reports on content delivery might be off because the teachers were not trained on how to fill them out.). Still, the problem of misalignment that they detail is a serious one. If tests don't measure the content that teachers think they should be teaching, and observational rubrics don't actually identify teachers who promote learning as measured by standardized tests, then it will be difficult for teacher evaluation systems to help teachers get better.
Russ Whitehurst, Matt Chingos and Katharine Lindquist suggest how to fix one of the myriad issues of misalignment in their recent paper. Looking carefully at teacher evaluation in four districts, they find that teachers who are assigned students at low levels of achievement are almost three times more likely to be assigned a low rating on an observational rubric than teachers assigned to high-achieving students. VAMs generally attempt to control for such issues, such that teachers won't get a bad score just because of the students to which they're assigned. Whitehurst et al. suggest that a similar set of controls should be included for classroom observation ratings.
Whitehurst et al. correctly note that to expect that states and districts to get every aspect of teacher evaluation systems right in such a short period of time is the equivalent of a "Hail Mary pass."