RE teacher evaluations: New MET findings out today!

See all posts
The latest findings from the Gates mega-funded research study setting out to find the multiple measures that should comprise a teacher's evaluation— known as MET —were released today at noon. These findings focus on the contribution that teacher observations can or should constitute in evaluating teachers.

Some might be plenty disappointed by the relatively weak correlation MET finds between teachers' performance (as measured by their students' performance on a range of tests) and ratings on classroom observations (using five different instruments, including CLASS and Danielson's)— especially since this experiment involved at least four observations a year of each teacher.  The observers were highly trained and utterly impartial—as opposed to what generally happens in school districts, where poorly trained principals may get around to conducting a single annual observation of a teacher—the same teacher with whom they may have socialized and worked on a daily basis for years.

In other words, under about-as-perfect conditions as can be had, the correlation was—phew!— positive but still really weak.  

These finding should leave states and school districts squirming about the defensibility of their own evolving evaluation plans.  We now have strong evidence that three measures of quality are necessary to ensure that evaluations don't steer off-course:

  • Maybe three, but certainly four observations of a teacher in a year are necessary to correlate consistently with performance—simply because the MET team found huge variations in teacher performance from one observation to another which connected to instructional content, not the abilities of the evaluators;
  • Observers need not only to be trained but certified as to their ability to consistently rate the same lesson the same way (which doesn't seem possible without using a video library); and, 
  • The district has to employ a method to randomly check the accuracy of evaluations to maintain integrity.

In presenting these findings, the MET team of researchers makes a point of returning to its last set of findings, reminding us of the much more powerful finding from the last go-around, that the best measure next to value-added scores is to survey students about their own teacher's performance (and yes, the MET team shows these surveys can be designed fairly).

The lack of traction for using such surveys in the year since MET released those findings has been disappointing to the research  team — and to us.  We paraphrase the remarks of Tom Kane, the leader of the MET project, on the power behind these survey results, even for children as young as 4th grade (and maybe younger—MET hasn't yet tested even younger children):

Is it any wonder that the results from, at best, four observations of a teacher by a few evaluators with little context for the classroom—even though they are adults—would not produce as accurate a rating of teacher performance as 35 evaluators—albeit children—who have the advantage of 180 observations?

A mighty powerful point. Districts, sit up and take notice.

Kate Walsh