The rise and fall of better teacher evaluation: Who gets the blame?

See all posts Kate Walsh January 27, 2022

Showing up in my Twitter feed this week was a tweet from NYT reporter, Dana Goldstein, reacting to a newly published working paper306 finding that the nation's big push to improve teacher evaluation had little to no positive impact.

Dana's tweet: "A group of careful researchers with this major finding that is no longer surprising but still stunning given the policy oxygen this effort consumed."

Given my own reaction to this paper, I no doubt read too much into this tweet, as it is technically accurate. The study was indeed well done and there's no reason to question its findings, however grim. However, my own tweet, had I written one, would have been more along the lines of: "Once again, schools prove who really is in charge."

There is a long and proud tradition in public schools of successfully resisting efforts to impose new ways of doing things, especially when states or the federal government are the ones asking (or telling). For starters, we human beings don't much like change—especially since it invariably involves admitting that what we've been doing all along has been second rate.

To my recollection, the most egregious example of the power of schools to defeat a great idea is their stubborn indifference to adopting evidence-based early reading instruction over a period of a few decades, but particularly when the feds tried to move practice in the early 2000s. (The post mortem studies on Reading First were no less definitive as this study was on teacher evaluation.) Mysteriously, a generation or two later than was necessary, schools now seem poised to embrace the science of reading. In any case, should the takeaway tweet from that original failed effort have been, "Turns out adoption of scientifically based reading methods doesn't improve kids' reading skills after all"?

Great ideas, even reasonably sensible ideas that are grounded in strong research, still have difficulty overcoming what invariably gets thrown at them. Even remarkably effective vaccines have that problem. (It is not as clear that demonstrably bad ideas face the same miserable odds, but that's an existential question for another day.)

There's little doubt in my mind that this new study will now be held up at any school board meeting or legislative hearing for the next five to 10 years, but not as an example of how botched implementation can ruin even the best laid plans. No, it will be used as proof positive that more robust teacher evaluation systems just don't work in real schools.

In fact, there is good science behind these new systems. The widespread evaluation reform efforts that occurred as a result of the U.S. Department of Education's own keen interest in this issue were amply preceded by robust experimentation all across the country, including in Washington, D.C., Harrison, CO, 70 districts under Teacher Incentive Fund grants, and hundreds of schools under the heralded TAP model—to name but a few. A number of places—by no means a majority—did manage to build strong systems and saw real gains as a result, proof that better evaluation can spur great results. Further, the researchers found a correlation between positive results and the unwillingness of places to compromise on such key factors, such as including test score gains in ratings, annually observing all teachers, and using at least three rating categories.

I found fascinating a Forbes essay written by a veteran teacher, Peter Greene, who likens the attempt to improve teacher evaluation as something fashioned out of toothpicks and mayonnaise (an analogy I quite liked). He runs through the popular litany of objections to these new teacher evaluation systems and since few, perhaps none, were actually grounded in fact, it reads more like a summary of a successful scaremongering campaign. He asserts, for example, that administrators were being forced to rate teachers differently (lower, he would claim, no doubt) than they would have liked, even though the overwhelming evidence shows that administrators were awarding teachers the same uniformly high ratings as before.

Neither the antipathy of teachers nor that of their unions fully explain why this effort failed. Frankly, teachers didn't have the power to stop the train, but others who did were equally happy to do so. After winning a 2015 Race to the Top grant that had been premised on a pledge by the state to reform its evaluation system, one state superintendent confided to me that she eventually just gave up, not because of resistant teachers unions, but because her school superintendents, pressured by their own school principals, were so resistant.

As we are now witnessing with schools' delayed embrace of evidence-based reading instruction, I remain hopeful over the long haul that more schools will embrace better teacher evaluation systems. Inherent in that hope is a requirement, as has been the case in reading, that at least some of us remain steadfast to the fundamental principles involved here—even when the issue is considered by many to be politically toxic. These systems, however imperfectly (and many of them indeed were far, far from perfect), attempted to address a major flaw in how schools manage their teachers with well-documented adverse consequences for students. We can do better.

Impact of teacher, State role, District contracts and unions, Evaluation, Student growth