District Trendline

Words matter: the language of evaluation ratings

See all posts

"What's in a name?" Juliet famously asks as she pleads her case to Romeo to reject his name and become her lover. Of course, what's in a name is the central struggle of the Shakespearean tragedy.

While labels may not change the essence of a person or a thing, those labels can change how that person or thing is perceived. From several studies about principals and evaluation ratings, we know that principals often struggle to give low ratings to teachers, even when they are able to identify which teachers are weaker. While much of this is likely driven by the often high-stakes consequences attached, would changing the labels reduce this hesitancy? Would more principals be willing to assign a teacher a rating of "developing" versus "needs improvement"—even if they were essentially equivalent?

This month, we take a look at evaluation systems in 123 large districts across the country[1] with a specific eye to the terminology used to describe teachers' ratings.

Number of ratings

NCTQ recommends that evaluation systems have at least three ratings to allow for identification of teachers at both ends of the effectiveness spectrum. All but four districts follow this recommendation (Elk Grove Unified School District (CA), Montgomery County Public Schools (MD), Santa Ana Unified School District (CA), and the School District of Philadelphia are the exceptions). The majority of districts in our sample (58 percent) have four final evaluation ratings. Dallas Independent School District is an outlier with seven final evaluation ratings.

Ratings terminology

We have 123 districts in our sample with quite a few variations on a theme. There are 17 different labels used to define teachers at the low end of performance and 18 different labels used to define teachers at the high end. Districts are the most consistent with teachers in the middle, using 13 different labels among them.

The most common language across all ratings is based on the term "effectiveness", such as ineffective, effective, and highly effective. The table below shows the variety of terms used, with like terms grouped together (e.g. "not meeting standards" and "does not meet standards" are grouped together). For a complete list of evaluation ratings by district, see our Teacher Contract Database.

Some districts (18) use different language for teachers depending on where they are in their career, switching up the labels for beginning teachers. For example, in Florida and Virginia districts, early career teachers receive a rating of "developing," recognizing the steep learning curve of new teachers, while experienced teachers receive a rating of "needs improvement". Baltimore County Public Schools has four ratings for first and second year teachers ("ineffective", "developing", "effective", and "highly effective"), but only three ratings ("ineffective", "effective", and "highly effective") for all other teachers.

Finally, there are two districts in our sample that do not give a final rating to teachers. In both districts, individual aspects of a teacher's performance are given a rating, but in Lewisville Independent School District (TX) teachers are instead judged in a holistic manner. At the end of the year, the teacher and the evaluator collaboratively determine if a teacher has met their goals for the year or not. In Burlington School District (VT), administrators recommend the teacher for renewal, various types of assistance or supervision, or non-renewal, but do not assign a final evaluation rating.

For your consideration

When considering how teachers are evaluated, consider which terminology is used and what message it sends. Some questions to think about next time you are looking at an evaluation system:

  • What is the evaluation system intending to measure?
  • Does the ratings terminology match what they are meant to convey?
  • Will principals be more comfortable rating someone as "needs improvement" than "unacceptable"?
  • Where did you set the "acceptable" level in your system, and how many rating levels fall below or above it?

Visualization of evaluation ratings


[1] Includes the 100 largest districts in the country and the largest district in each state.