A fault in our measures? Evidence of bias in classroom observations may raise some familiar concerns

A lot of us can rattle off the possible shortcomings of using value-added test scores to evaluate teachers: The scores vary from year to year. They lack transparency. They cannot control for other events going on in the classroom, like broken air conditioning or teachers consistently being assigned exceptionally motivated students.

Sad to say, these problems aren’t relegated to the statistical wizardry behind VAM—it turns out classroom observation scores may suffer from many of the same ills.

New research from Matthew Steinberg of the University of Pennsylvania and Rachel Garrett of the American Institutes for Research employed data from the Methods of Effective Teaching study to look at how classroom composition relates to teachers’ observation scores.

First, they found that teachers who were assigned to high-performing students were more likely to earn higher observation scores. They also found that some domains of the evaluation instrument used in this study (the Danielson framework) appeared to give teachers undue credit for traits, achievement levels, and other factors students arrived with at the start of the school year; domains like “engaging students in learning” and “establishing a culture for learning,” which rely heavily on student-teacher interaction, were the primary culprits. The authors offer two competing hypotheses for these higher scores: they could be a sign of observer bias—meaning, for example, that a teacher might get a score boost for having an eager and well-behaved class, even if she basically inherited her students that way—or they could indicate that teachers either perform or become better when working with a class of higher-achieving students.

Finally, teachers who teach multiple subjects, like most elementary teachers, had observation scores that were less related to their students’ incoming achievement—unlike the scores for teachers who only teach a single subject (and therefore older grades). This difference could be because teachers’ observations in older grades are spread across multiple classrooms, or because teachers who spend more time with one group of students are better able to adjust to their needs.

So, would Steinberg and Garrett’s findings hold true elsewhere? After all, the MET study observation data relied on an unusually robust approach to teacher evaluation, using highly-trained off-site evaluators who rated videotapes of lessons. In contrast, districts more often rely on in-person observations, often by school principals and APs.

Unfortunately, the findings from more typical on-site evaluations by school administrators may look even worse, according to Whitehurst, Chingos, and Lindquist. These Brookings researchers tackled the observation issue a few years ago and found that outside observers produce more valid observations than school administrators—so Steinberg and Garrett’s work could actually understate the problem.

Read more about observations in our previous month’s Teacher Trendline.

Clinical Practice

Elementary Reading

Elementary Math

Reimagining Teaching

Teacher Compensation

Teacher Diversity

Teacher Preparation

Teacher Leave & Benefits

Teacher Prep Review

State Teacher Policy Database

Teacher Contract Database

Reimagining the Teacher Role

Clinical Practice Action Guide

Teacher Diversity Dashboard

Licensure Pass Rates

A fault in our measures? Evidence of bias in classroom observations may raise some familiar concerns

About the Author

Hannah Putman

More like this

Districts are facing hard choices: How can teacher evaluation help?

Rural teacher evaluation system shows promising results for students struggling in math

Put me in, coach! How practice plus coaching helps aspiring teachers win

When it comes to Chicago students, patience may not be a virtue

Hannah Putman

Districts are facing hard choices: How can teacher evaluation help?

Rural teacher evaluation system shows promising results for students struggling in math

Put me in, coach! How practice plus coaching helps aspiring teachers win

When it comes to Chicago students, patience may not be a virtue

Get the latest research and insights.