Seven ways to make improving teacher evaluation worth the work

Getting teacher evaluation right has the potential to drive significant improvements. The trouble is that so few places—so far—have gotten it right.

To the surprise of few, a
recent working paper from the Annenberg Institute at Brown University found almost no positive impact from the teacher evaluation reforms that occurred as a result of a major push by the U.S. Department of Education under the Obama Administration. Those findings have been getting a lot of attention, as some consider the paper to have provided conclusive evidence that these teacher evaluation reforms were ill-advised from the start, while others assert that the commitment on the part of states and districts was half-hearted all along (some may say they were arm-twisted by the feds), and therefore doomed to produce few positive outcomes.

Will this paper be the death knell for rigorous and meaningful teacher evaluation systems? It’s too soon to say, but it’s worth pointing out that the paper never asserts that these systems can’t improve outcomes for both teachers and students, just that it appears remarkably hard to do so. In fact the authors (Joshua Bleiberg, Eric Brunner, Erica Harbatkin, Matt Kraft, and Matthew Springer) fully acknowledge that there are states and school districts that did achieve great results, just that there weren’t many of them.¹

One thing is for sure: many of the states and districts in the study weren’t paying close attention to the key principles
a number of research studies established as necessary to produce positive outcomes. It’s likely that these places where teacher evaluation reforms failed did not adhere to all of these research-based principles, either choosing to overlook them entirely or making compromises that effectively neutralized their capacity to do good.

These key principles remain as relevant now as ever, and we spotlight seven of them here.

1. Measure what matters, looking at multiple and frequent measures of teacher performance.

In 2018, NCTQ highlighted
four large school districts and two states where evaluation reforms had led to improvement in teacher quality, the positive results confirmed by the Annenberg working paper. Something all six systems had in common was that each annually evaluated all teachers using both objective and subjective measures, as opposed to the widespread practice by states and districts of exempting large numbers of teachers from yearly evaluation, only using subjective measures (such as teacher observation scores), or not giving significant weight to student learning.

What does this look like in practice?

Require evaluation of all teachers, including experienced or tenured teachers, each year. Annual evaluations provide all teachers with the regular feedback they need to improve, and result in the data needed to make informed personnel decisions (e.g., teacher leadership roles).
Incorporate multiple measures of teacher performance²—including objective measures—into each evaluation to support accuracy and stability of scores over time. These measures can include student growth measures from standardized assessments, classroom observations using a clearly defined rubric, and student surveys (which studies have found to be highly reliable).³ There is also research to support including other student outcomes, such as attendance, as means of measuring teacher performance.⁴ Student learning objectives (SLOs) are another option; however, research suggests they must be standardized across classrooms and require extensive training and oversight to be reliable.⁵

Combining multiple measures of teacher performance improves predictive power and reliability of evaluation scores

^{The above graph shows that evaluation scores that combine multiple measures–including student achievement gains, classroom observations, and student surveys–have higher predictive power and reliability than any single measure alone. Figure 1: Combining Strengths, from Kane, T. (2012). Capturing the Dimensions of Effective Teaching. Education Next. https://www.educationnext.org/capturing-the-dimens….}

Iterate on the system by monitoring outcomes and incorporating teacher and administrator feedback. The initial timing, components, or weighting of those components may need to be adjusted over time to ensure the evaluation system is working for administrators and teachers. Two places that have found sustained success with their systems (Washington, D.C. and Tennessee) both made changes after implementation based on educator feedback, such as decreasing the percentage of a teacher’s evaluation that is based on student test scores.⁶

Common practices of teacher evaluation systems that saw improvements in teacher quality and student outcomes

^{From NCTQ’s}^{Making a Difference: Six Places Where Teacher Evaluation Systems are Getting Results}^{, October 2018.}

2. Pay careful attention to who is doing the evaluating.

Studies over the last decade have shown that teachers both perceive evaluations to be more meaningful and see greater improvement in their practice when those doing the evaluating have been trained on the observation rubric, have more experience in and knowledge of the setting where teachers are being observed, and are familiar with the content their evaluees are teaching. If an evaluation system is going to provide the feedback teachers trust and need to improve, school and district leaders should:

Make sure evaluators and/or observers get comprehensive training on the observation rubric. Research from the U.S. Department of Education’s Institute of Education Sciences (IES) and the Bill & Melinda Gates’ Foundation’s Measures of Effective Teaching (MET) study found that teacher observation scores were more reliable and student learning improved when teachers were observed by evaluators who had been trained on the observation rubric. A study of teacher evaluations in Chicago also found that implementing a new, robust teacher evaluation system only corresponded with improvements in student outcomes when observers received “extensive” training and support on the new rubric.⁷
Prioritize evaluators with more experience in the school and setting. A new study that looked at 4,800 teachers matched with 350 evaluators at over 100 schools found that teachers found the feedback they received from their evaluators more impactful when the evaluator providing the feedback had more experience and longer tenure at their school.⁸
Pair teachers with evaluators who know their subject area. Surveys have also found that teachers are more likely to perceive feedback as valuable and to improve their instructional practice when their evaluators had relevant content-area expertise.⁹ Likewise, principals report feeling less effective as evaluators when they don’t have subject-specific knowledge relevant for the teacher they are evaluating.¹⁰

Consider using high-performing peer observers. School leaders are not the only ones who can observe and provide feedback to teachers; teachers also respond well to observations and feedback from their peers, particularly those with more experience and expertise in their grade or subject.¹¹ More support for the idea of incorporating peer observations and feedback into evaluation systems comes from another recent study of about 100 teachers that paired high-performing teachers with low-performing teachers, finding that the students of the low-performers saw greater academic gains when their teacher was paired with a high-performing mentor.¹²

3. Consider using video observations.

Two of the oft-noted difficulties in conducting multiple classroom observations with each teacher are 1) the time commitment it requires from observers and 2) the strain it can place on in-school professional relationships. Luckily,
research that examined the observations, feedback, and attitudes towards that feedback for over 400 teachers found that using video observations could be a solution. When teachers recorded videos of their lessons and then later watched and reflected upon these videos with their observers, it helped to alleviate time constraints for administrators and supported more effective feedback discussions, with more positive perceptions of the feedback and process. It also was associated with improved retention for the teachers that used the video observations!¹³

As more research conducted during the pandemic gets published, we should be seeing additional guidance about the effective and ineffective ways to use video in evaluation.

4. Address bias in the system head-on, iterating to make improvements.

Teacher evaluations, particularly observation scores, may be subject to
racial and gender biases. An important way to mitigate bias is through including multiple measures of performance in teachers’ summative evaluation scores, but these biases can still exist and can result in unjust outcomes, particularly for teachers of color and for teachers of students of color.

New research out of Chicago at first glance does not seem hopeful; a study
found significant bias against Black teachers in observations and that the bias was explained by which students these Black teachers are more likely to teach: students from low socioeconomic levels, with lower levels of achievement in reading, and with higher frequency of misconduct.¹⁴ But the upside is that it suggests a way to mitigate these biases. In addition to including other measures of teacher quality to offset potential bias in observation scores, the researchers recommend that education leaders in districts with this kind of problem statistically adjust observation scores to account for student characteristics, just as a teacher’s contributions to student learning are adjusted in a value-added measure.

Another, less complex way to mitigate bias in observation scores is to require multiple observations by different observers for each teacher, which can make them a more reliable and accurate measure.¹⁵

5. Tie results of observations and evaluations directly to each teacher’s own customized professional development.

In the six exemplar teacher evaluation systems NCTQ analyzed,
each tied the professional development a teacher should pursue to her evaluation results, as opposed to giving teachers open-ended choices. This finding is further supported by a meta-analysis of the effectiveness of performance pay systems that found that performance pay programs that are paired with professional development result in significantly higher student gains than those that are not.¹⁶

6. Pay great teachers more. A lot more.

For a teacher evaluation system to have a major impact on teacher quality and student learning, it needs to significantly reward high-performing teachers, encouraging them to continue teaching. That is a clear takeaway from a recent meta-analysis of over 40 research studies, which
showed positive effects on student achievement, particularly in math, when individual teachers were eligible for performance-based bonuses. Importantly, more significant gains for students were found when the annual incentives for high-performing teachers were above 7.5% of their base pay (or, nationally, on average $5,000 a year).¹⁷ Other research also suggests that 7% of teachers’ base pay would be the minimum to be effective, whereas the most effective performance pay bonus should be 14%.¹⁸

Why does a significant monetary incentive for teachers make a difference for students? A
review of 120 studies on teacher attrition found that not only were more robust teacher evaluation systems associated with better retention of high-performing teachers, but participation in a performance pay system could decrease the probability of teachers (all teachers, not just high-performing ones) leaving the classroom by 24%, or nearly 15% in high-need schools.¹⁹

Unfortunately, NCTQ research has found
few school districts have adopted performance pay incentives for teachers, with even fewer of these districts making the incentives above the threshold research suggests is needed to make an impact.

7. Use teacher evaluation data to provide support for low-performing teachers and, if necessary, to inform decisions about layoffs and dismissal.

Teacher evaluation reforms support improvements in teacher quality in part because they lead to less effective teachers
exiting the profession ²⁰ or dissuade people who are likely to be less effective from entering the profession.²¹ While the first priority of identifying struggling teachers must be to provide them with targeted professional development and support for improvement, prioritizing student learning means that consistently low-performing teachers should be the first to be considered when layoffs are necessary and should ultimately be exited from the profession.

A study that examined teacher evaluation data and retention for over 20,000 teachers over five years in Chicago found that the
implementation of their more rigorous teacher evaluation system increased the likelihood of exiting low-performing teachers by 50%. Additionally, the new hires who replaced these teachers were more effective on average, improving the overall quality of the teacher workforce.²² Other evidence that incoming teachers are more effective than those exited after implementation of rigorous teacher evaluation has been reported by two different studies of teachers in Washington, D.C.²³

Conclusion

When properly designed and even more importantly, when implemented with fidelity, a good evaluation system should be able to strengthen the teacher workforce by helping all teachers become more effective, motivating effective teachers to stay in the classroom, and informing decisions about who to exit from the classroom. Some
research has found that a meaningful evaluation system can even attract individuals to the profession who might not have otherwise considered teaching,²⁴ perhaps because of a more visible commitment on the part of a district to supporting and rewarding great teachers. Without a good system in place, it is difficult for administrators to access the data needed to tackle problems of educational inequities, primarily the tendency of districts to assign their more effective, qualified teachers to more advantaged students.

Ultimately, for a teacher evaluation system to have the positive outcomes for teachers and students that research and real examples prove are possible, it must adhere to these evidence-based practices, involve both teachers and administrators from the start, be evaluated frequently for efficacy, be tested for biases, and be iterated on as necessary. It’s far from easy, and requires both significant funding and dedicated district and school leadership, but the potential impact it can have on students and teachers is worth the investment.

See more research on the components of strong teacher evaluations here.

Endnotes

About the Author

Nicole Gerber

More like this

The rise and fall of better teacher evaluation: Who gets the blame?

January 27, 2022

Kate Walsh

What do teachers think of their evaluation feedback?

October 28, 2021

Patricia Saenz-Armstrong

Bias in teacher observations: No easy solutions

April 22, 2021

Nicole Gerber

Performance pay programs that pay off

February 25, 2021

Christie Ellis

More evidence that teacher evaluation works

June 25, 2020

Hannah Putman

Endnotes

Studies less sweeping in nature than this paper but still relevant have provided similar evidence attesting to such benefits from better teacher eval systems. See: Dee, T., James, J., & Wyckoff, J. (2019). Is Effective Teacher Evaluation Sustainable? Evidence from DCPS. (CEPA Working Paper No.19-09). Retrieved from Stanford Center for Education Policy Analysis: http://cepa.stanford.edu/wp19-09; Papay, J., & Laski, M. (2018). Exploring teacher improvement in Tennessee: A brief on reimagining state support for professional learning. Nashville, TN: Tennessee Education Research Alliance. https://peabody.vanderbilt.edu/TERA/files/Exploring_Teacher_Improvement.pdf; Putman, H., Ross, E., & Walsh, K. (2018). Making a Difference: Six Places Where Teacher Evaluation Systems Are Getting Results. Washington, DC: National Council on Teacher Quality. https://www.nctq.org/publications/Making-a-Difference; Sartain, L., & Steinberg, M. (2021). Can Personnel Policy Improve Teacher Quality? The Role of Evaluation and the Impact of Exiting Low-Performing Teachers. (EdWorkingPaper: 21-486). Retrieved from Annenberg Institute at Brown University: https://doi.org/10.26300/d201-7y89.
Kane, T., Taylor, E., Tyler, J., & Wooten, A. (2011). Identifying effective classroom practices using student achievement data. Journal of Human Resources, 46(3), 587-613; Kane, T. (2012). Capturing the Dimensions of Effective Teaching. Education Next. https://www.educationnext.org/capturing-the-dimensions-of-effective-teaching/; Taylor, E., & Tyler, J. (2012). The effect of evaluation on teacher performance. The American Economic Review, 102(7), 3628-3651; Doan, S., Schweig, J., & Mihaly, K. (2019). The consistency of composite ratings of teacher effectiveness: evidence from New Mexico. American Educational Research Journal, 56(6), 2116-2146.
Wallace, T., Kelcey, B., & Ruzek, E. (2016). What can student perception surveys tell us about teaching? Empirically testing the underlying structure of the Tripod student perception survey. American Educational Research Journal, 53(6), 1834-1868.
Backes, B., Hansen, M. (2015). Teach For America Impact Estimates on Nontested Student Outcomes. CALDER Working Paper No. 146. https://caldercenter.org/publications/teach-america-impact-estimates-nontested-student-outcomes.
Lin, S., Luo, W., Tong, F., Irby, B. J., Alecio, R. L., Rodriguez, L., & Chapa, S. (2020). Data-based student learning objectives for teacher evaluation. Cogent Education, 7(1), 1713427. https://www.tandfonline.com/doi/full/10.1080/2331186X.2020.1713427; Gill, B., English, B., Furgeson, J., & McCullough, M. (2014). Alternative Student Growth Measures for Teacher Evaluation: Profiles of Early-Adopting Districts. REL 2014-016. Regional Educational Laboratory Mid-Atlantic. https://eric.ed.gov/?id=ED544797; Briggs, D. C., Chattergoon, R., & Burkhardt, A. (2019). Examining the dual purpose use of student learning objectives for classroom assessment and teacher evaluation. Journal of Educational Measurement, 56(4), 686-714. https://doi.org/10.1111/jedm.12233.
Putman, H., Ross, E., & Walsh, K. (2018).
Steinberg, M. P., & Sartain, L. (2015). Does teacher evaluation improve school performance? Experimental evidence from Chicago’s Excellence in Teaching project. Education Finance and Policy, 10(4), 535-572. https://consortium.uchicago.edu/sites/default/files/2018-10/Does%20Teacher%20Evaluation%20Improve-Oct2015-Consortium.pdf.
Kraft, M., & Christian, A. (2021). Can teacher evaluation systems produce high-quality feedback? An administrator training field experiment. American Educational Research Journal. Retrieved from: https://www.edworkingpapers.com/sites/default/files/ai19-62_2.pdf.
Firestone, W., &
Donaldson, L. (2019). Teacher evaluation as data use: what recent research
suggests. Educational Assessment,
Evaluation and Accountability, 31(3), 289-314.
Kraft, M., & Gilmour, A. (2016). Can principals promote teacher development as evaluators? A case study of principals’ views and experiences. Educational Administration Quarterly, 52(5), 711-753. https://scholar.harvard.edu/files/mkraft/files/principals_as_evalutors_3.5_0.pdf.
Firestone, W., & Donaldson, L. (2019).
Papay, J., Taylor, E., Tyler, J., & Laski, M. (2020). Learning Job Skills from Colleagues at Work: Evidence from a Field Experiment Using Teacher Performance Data. American Economic Journal: Economic Policy, 12(1): 359-88. DOI: 10.1257/pol.20170709.
Kane, T., Blazar, D., Gehlbach, H., Greenberg, M.,Quinn, D., & Thal, D (2020). Can Video Technology Improve Teacher Evaluations? An Experimental Study. Education Finance and Policy 15(3): 397–427. https://doi.org/10.1162/edfp_a_00289.
Steinberg, M. & Sartain, L. (2021). What Explains the Race Gap in Teacher Performance Ratings? Evidence From Chicago Public Schools. Educational Evaluation and Policy Analysis, 43(1), 60–82. https://doi.org/10.3102/0162373720970204.
White, T. (2014). Adding eyes: The rise, rewards, and risks of
multi-rater teacher observation systems. Carnegie Foundation for the
Advancement of Teaching; Cantrell, S., & Kane, T. J. (2013). Ensuring fair and reliable measures of
effective teaching. Bill & Melinda Gates Foundation; Whitehurst, G.,
Chingos, M., & Lindquist, K. (2015). Getting classroom observations right. Education Next, 15(1), pp. 63-68.
Pham, L. D., Nguyen, T. D., & Springer, M. G. (2021). Teacher Merit Pay: A Meta-Analysis. American Educational Research Journal, 58(3), 527–566. https://doi.org/10.3102/0002831220905580.
Ibid.
Yesilirmak, M. (2019). Bonus pay for teachers, spatial sorting, and student achievement. European Journal of Political Economy, 59, 129-158. https://dx.doi.org/10.1016/j.ejpoleco.2019.02.004.
Nguyen, T., Pham, L., Springer, M., & Crouch, M. (2019). The Factors of Teacher Attrition and Retention: An Updated and Expanded Meta-Analysis of the Literature. (EdWorkingPaper: 19-149). Retrieved from Annenberg Institute at Brown University: https://edworkingpapers.com/ai19-149.
Nguyen, T., Pham, L., Springer, M., & Crouch, M. (2019); Dee, T., James, J., & Wyckoff, J. (2019).
Kraft, M., Brunner, E., Dougherty, S., & Schwegman, D. (2020) Teacher accountability reforms and the supply and quality of new teachers. Journal of Public Economics, 88. https://doi.org/10.1016/j.jpubeco.2020.104212.
Sartain, L., &
Steinberg, M. (2021).
Walsh, E., & Dotter, D. (2014). Longitudinal analysis of the effectiveness of DCPS teachers (No. 40185.533). Mathematica Policy Research; District of Columbia Public Schools. (2018, April 20). DC Public Schools continues steady growth on NAEP for ninth year. Retrieved from https://dcps.dc.gov/release/dc-public-schools-continues-steady-growth-naep-ninth-year.
Third Way. (2014). National Online Survey of College Students – Education Attitudes. Washington, D.C.: Third Way. https://thirdway.imgix.net/downloads/national-online-survey-of-college-students-education-attitudes/Third_Way_Educ_Attitudes_Topline_No_Summary_Tables.pdf.

Clinical Practice

Elementary Reading

Elementary Math

Reimagining Teaching

Teacher Compensation

Teacher Diversity

Teacher Preparation

Teacher Leave & Benefits

Teacher Prep Review

State Teacher Policy Database

Teacher Contract Database

Reimagining the Teacher Role

Clinical Practice Action Guide

Teacher Diversity Dashboard

Licensure Pass Rates

Seven ways to make improving teacher evaluation worth the work

1. Measure what matters, looking at multiple and frequent measures of teacher performance.

2. Pay careful attention to who is doing the evaluating.

3. Consider using video observations.

4. Address bias in the system head-on, iterating to make improvements.

5. Tie results of observations and evaluations directly to each teacher’s own customized professional development.

6. Pay great teachers more. A lot more.

7. Use teacher evaluation data to provide support for low-performing teachers and, if necessary, to inform decisions about layoffs and dismissal.

Conclusion

Endnotes