Identifying Effective Teachers Policy
The state should require instructional effectiveness to be the preponderant criterion of any teacher evaluation.
Although the state requires student performance data to be a factor, Massachusetts does not require that objective evidence of student learning be the preponderant criterion of its teacher evaluations. The state requires districts to either adopt the model system or develop one of their own that is consistent with the state's framework.
By school year 2013-2014, Massachusetts requires its teacher evaluations to include "multiple measures of student learning, growth and achievement" as one category of evidence in teacher evaluations. The state defines these measures as student progress on classroom assessments that are aligned with the state's Curriculum Frameworks; student progress on learning goals; statewide growth measures, including the MCAS Student Growth Percentile and the Massachusetts English Proficiency Assessment (MEPS); and district-determined measures of student learning across grade or subject. Student feedback is also required.
The summative evaluation includes the evaluator's judgment of the teacher's performance against performance standards and the teacher's attainment of goals set forth in the teacher's plan. Four rating categories must be used: exemplary, proficient, needs improvement and unsatisfactory. To be rated proficient overall, teachers must at least be rated proficient on the "Curriculum, Planning and Assessment" and "Teaching All Students" standards.
In addition to the summative performance rating, an impact rating of high, moderate or low is also determined based on at least two state or districtwide measures of student learning: the MCAS Student Growth Percentile and the Massachusetts English Proficiency Assessment (MEPA), when available; and additional district determined measures. The impact rating is discrete from the summative performance rating.
Classroom observations are required.
603 CMR 35.00
Require instructional effectiveness to be the preponderant criterion of any teacher evaluation.
Massachusetts falls short by failing to require that evidence of student learning be the most significant criterion. By keeping the impact measure wholly separate from the performance rating, it isn't clear that it is really a factor at all.
The state should either require a common evaluation instrument in which evidence of student learning is the most significant criterion, or it should specifically require that student learning be the preponderant criterion in local evaluation processes. This can be accomplished by requiring objective evidence to count for at least half of the evaluation score or through other scoring mechanisms, such as a matrix, that ensure that nothing affects the overall score more. Whether state or locally developed, a teacher should not be able to receive a satisfactory rating if found ineffective in the classroom.
Ensure that evaluations also include classroom observations that specifically focus on and document the effectiveness of instruction.
Although Massachusetts requires classroom observations, the state should articulate guidelines that ensure that the observations focus on effectiveness of instruction. The primary component of a classroom observation should be the quality of instruction, as measured by student time on task, student grasp or mastery of the lesson objective and efficient use of class time.
Massachusetts was helpful in providing NCTQ with facts that enhanced this analysis. The state asserted that its evaluation framework is structured differently from many other states in that it results in two evaluation ratings: a summative performance rating and an impact rating. The summative performance rating requires multiple measures of student learning, growth and achievement as one category of evidence, and also explicitly requires consideration of the fulfillment of student learning goal(s) as a factor in determining the final rating.
Massachusetts added that student learning/growth/achievement is the sole criterion used to determine an impact rating (a rating of impact on student learning) of low, moderate or high. If a teacher is found to elicit less than a year's student growth in a year's time, based on multiple measures and multiple years of data, the teacher then receives an impact rating of low. While this may be coupled with a summative performance rating of proficient, the proficient performance rating would not change the low-impact rating. These two ratings work together to determine consequences for that educator, including a focus on that area of discrepancy. In addition, evaluators who assign a summative performance rating of proficient or above would then be assessed on their effectiveness as evaluators during their subsequent evaluation cycles.
Finally, Massachusetts argued that when taken together, these two ratings ensure that instructional effectiveness is the preponderant criterion of any teacher evaluation given that one rating is based solely on student learning (impact rating) and the other rating requires evidence of student learning in multiple ways, as both a category of evidence and fulfillment of student learning goals (summative performance rating).
Massachusetts does not make it explicit that a teacher cannot be rated effective if he or she does not meet student growth targets. Treating these scores as two different entities, and using the summative evaluation as the basis to make personnel decisions, reinforces the fact that instructional effectiveness is not the most significant criterion in the state's teacher evaluation system. At best, the system is unclear on how to interpret and utilize the two separate ratings. At worst, it appears quite possible to virtually disregard the student impact measure.
Teachers should be judged primarily by their impact on students.
While many factors should be considered in formally evaluating a teacher, nothing is more important than effectiveness in the classroom. Unfortunately, districts have used many evaluation instruments, including some mandated by states that are structured, so that teachers can earn a satisfactory rating without any evidence that they are sufficiently advancing student learning in the classroom. It is often enough that teachers appear to be trying, not that they are necessarily succeeding.Many evaluation instruments give as much weight, or more, to factors that lack any direct correlation with student performance—for example, taking professional development courses, assuming extra duties such as sponsoring a club or mentoring and getting along well with colleagues. Some instruments hesitate to hold teachers accountable for student progress. Teacher evaluation instruments should include factors that combine both human judgment and objective measures of student learning.
Evaluation of Effectiveness: Supporting Research
Reports strongly suggest that most current teacher evaluations are largely a meaningless process, failing to identify the strongest and weakest teachers. The New Teacher Project's report, "Hiring, Assignment, and Transfer in Chicago Public Schools", July 2007 at: http://www.tntp.org/files/TNTPAnalysis-Chicago.pdf, found that the CPS teacher performance evaluation system at that time did not distinguish strong performers and was ineffective at identifying poor performers and dismissing them from Chicago schools. See also Lars Lefgren and Brian Jacobs, "When Principals Rate Teachers," Education Next, Volume 6, No. 2, Spring 2006, pp.59-69. Similar findings were reported for a larger sample in The New Teacher Project's The Widget Effect (2009) at: http://widgeteffect.org/. See also MET Project (2010). Learning about teaching: Initial findings from the measures of effective teaching project. Seattle, WA: Bill & Melinda Gates Foundation.
A Pacific Research Institute study found that in California, between 1990 and 1999, only 227 teacher dismissal cases reached the final phase of termination hearings. The authors write: "If all these cases occurred in one year, it would represent one-tenth of 1 percent of tenured teachers in the state. Yet, this number was spread out over an entire decade." In Los Angeles alone, over the same time period, only one teacher went through the dismissal process from start to finish. See Pamela A. Riley, et al., "Contract for Failure," Pacific Research Institute (2002).
That the vast majority of districts have no teachers deserving of an unsatisfactory rating does not seem to correlate with our knowledge of most professions that routinely have individuals in them who are not well suited to the job. Nor do these teacher ratings seem to correlate with school performance, suggesting teacher evaluations are not a meaningful measure of teacher effectiveness. For more information on the reliability of many evaluation systems, particularly the binary systems used by the vast majority of school districts, see S. Glazerman, D. Goldhaber, S. Loeb, S. Raudenbush, D. Staiger, and G. Whitehurst, "Evaluating Teachers: The Important Role of Value-Added." The Brookings Brown Center Task Group on Teacher Quality, 2010.
There is growing evidence suggesting that standards-based teacher evaluations that include multiple measures of teacher effectiveness—both objective and subjective measures—correlate with teacher improvement and student achievement. For example see T. Kane, E. Taylor, J. Tyler, and A. Wooten, "Evaluating Teacher Effectiveness." Education Next, Volume 11, No. 3, Summer 2011, pp.55-60; E. Taylor and J. Tyler, "The Effect of Evaluation on Performance: Evidence from Longitudinal Student Achievement Data of Mid-Career Teachers." NBER Working Paper No. 16877, March 2011; as well as H. Heneman III, A. Milanowski, S. Kimball, and A. Odden, "CPRE Policy Brief: Standards-based Teacher Evaluation as a Foundation for Knowledge- and Skill-based Pay," Consortium for Policy Research, March 2006.