Teaching that goes beyond the test? How to measure the many accomplishments of great teachers

Great teachers can have a tremendous positive effect on their students, but identifying which teachers are “great” can be far more complicated than it first appears. Historically, teacher evaluations tended to identify everyone as being similarly effective (see TNTP’s Widget Effect829), which helped motivate efforts to both create evaluation systems that better differentiate between teachers, and introduce objective measures of teachers’ contributions to student learning such as test-score based value-added measures (or VAM).

But teachers are so much more than their contributions to test scores. Is it possible to measure the myriad ways that teachers contribute to other facets of students’ outcomes? And if so, are the same teachers who are effective at raising test scores also effective at contributing to other student outcomes?

Numerous studies have found that much in the same way that we can statistically measure teachers’ contributions to students’ learning gains (as measured by test scores) using controls for student characteristics, past performance, and so on, we can similarly measure teachers’ contributions to students’ non-test outcomes such as attendance, suspensions, high school completion, and more.

In this District Trendline, we explore the research on non-test value-added measures (non-test VAM), the implications of the low low (though still evident) correlation between test-based VAM and non-test VAM, and the potential opportunities for school districts to use this information.

Evidence of teachers’ effects on outcomes other than test scores

Many studies from the past several years have found that teachers differ measurably when it comes to their impact on school climate, classroom behavior, and student engagement.

School climate

In a Massachusetts-based study of school climate data, researchers found that some teachers contribute more positively to their school climate than others. This analysis is based on a student survey that addresses “topics related to school climate: cultural competency, relationships, participation, emotional safety, physical safety, bullying, instruction, mental health, and discipline.”830 Not only do some teachers contribute more to school climate than others (their contribution as measured by the student climate survey would be their “school climate VAM”), but the teachers that make greater contributions to school climate also have higher test-based value-added measures (a low but positive correlation of 0.20 to 0.25), and somewhat higher non-test value-added measures (based on student absences, suspensions, and grade progression, a correlation of 0.10). In this study, teachers who share the race/ethnicity of their students tend to make greater school climate VAM contributions for those students.

Classroom behavior and engagement

A study using data from the National Center for Teacher Effectiveness, which studied math instruction among fourth and fifth grade teachers across four districts, found that some teachers had large effects on “students’ self-reported behavior in class, self-efficacy in math, and happiness in class,” but that the same teachers who were effective in these areas were not consistently the same teachers who were strong in raising students’ math test scores.831 Similarly, another study using the same data source found that teachers who successfully raised students’ math test scores were less successful at improving their engagement in class.832 Another study using MET data (from the Measures of Effective Teaching project, which included 3,000 teacher volunteers and videotaped lessons, student surveys, and student performance data on tests) found different teachers have different effects on “complex open-ended tasks in math and reading, as well as their growth mindset, grit, and effort in class,” but again, the teachers who had a greater effect in these areas were not necessarily those who were most effective at increasing student learning as measured by tests.833

Composite measures: Attendance, suspension, grades and grade progression, high school graduation, and college enrollment

Several studies create “behavioral” or “non-test” measures of teacher effectiveness as a composite of several outcomes: often including absences, suspensions, and grades or grade progressions (moving from one grade to the next), among others. The results can get a bit in the weeds, but the bottom line is that many of these composite measures not only offer a measure of teachers’ immediate impact on student outcomes beyond test scores, but also can predict longer-term outcomes such as high school completion.

One study of ninth grade teachers in North Carolina developed a composite “behavior index” based on teachers’ effects on these factors and found that teachers have differing effects on students’ behavioral outcomes, and that these effects differ from their test-based VAM effects.834 Math teachers with a high score on the behavioral index reduced students’ likelihood of suspension, increased their GPAs, and increased grade progression (but had no effect on student absences).835 The student gains on these behavioral outcomes associated with high-scoring English teachers are smaller, but still meaningful.836

Another study (based on Massachusetts data) found that a one standard deviation increase in a teachers’ non-test VAM (based on student absences and grades) was associated with students graduating high school at greater rates.837 This measure of teachers’ non-test VAM score was also correlated with increased student GPAs and a higher rate of grade progression (moving from one grade to the next), but, surprisingly, had no correlation with students’ absences or suspensions.838 This same study found that non-test VAM is associated with college enrollment and whether a student enrolls in a four-year college, while test-based VAM is more associated with college selectivity.839

A study of Los Angeles Unified School District K-12 data measured teachers’ test-score VAM in math and English, a student behavior VAM (based on a composite of suspensions, attendance, GPA, and grade retention), and a learning skills VAM (based on teachers’ assessment of students’ effort and 14 different learning skills such as self-control and conflict resolution).840 The study found that for elementary teachers, both test-based VAM and behavioral VAM were correlated with students’ high school performance. The study also found that the effects of a high-VAM teacher were cumulative: Giving a student a stronger teacher (by one standard deviation in test-score VAM) every year from third to 12th grade is associated with higher SAT scores and lower dropout rates.841 Further, a stronger teacher based on behavioral VAM is associated with similarly higher SAT scores and substantially lower dropout rates.842

One study looks specifically at teachers’ effect on student attendance (focusing on unexcused absences in middle and high school, where teachers presumably have more influence over students’ decisions to attend class than they would in the elementary grades). The study found that some teachers have higher “attendance value-added,” and that has a higher correlation to students’ likelihood of taking AP courses and finishing high school than test-based VAM scores do.843

VAMs have different effects for different outcomes and different groups of students

Recent research has added an important level of nuance to the understanding of how different facets of teacher effectiveness influence students’ outcomes.

A Massachusetts study indicates that rather than choosing one VAM over another, different VAM scores are better at predicting different outcomes. For example, teachers’ test-based VAM is more predictive of whether their students will take AP courses, pass AP exams, or enter into a selective college or university, while non-test VAM (in this case based on student absences, suspensions, grade progression, and grades) are more predictive of graduating high school, taking the SAT, enrolling in any college, and enrolling in four-year institutions.844

Another interesting result of this analysis is that different facets of teacher quality seem to matter more for different groups of students (e.g., those who are lower- versus higher-performing academically). For example, the selectivity of colleges and universities that students attend seems most affected by teachers’ test-based VAM for students in the higher end of the academic distribution, while teachers’ non-test VAM seems to matter more for college quality of those students at the lower end of the academic distribution.845 A study focusing specifically on teachers’ effects on attendance found that teachers with high “attendance VAM scores” have only a small impact on students with high achievement levels or students with high attendance rates, but a much greater effect on both the academics and school retention for students with low attendance and students with low achievement.846

The relationship between test-based and non-test value-added measures

These studies tend to find a small but positive correlation between teachers’ test-based VAM and non-test VAM scores.

In exploring the relationship between test-based and non-test VAMs, the researchers tended to find correlations typically considered “low” (meaning that teachers who have higher scores on test-based VAM measures may also have higher scores on non-test VAM, but many teachers do not fit this pattern).847

These positive but weak correlations have several implications. First, there are teachers who are strong in both areas; these teachers are ripe for further observation to develop a better understanding of how teachers support both students’ immediate academic outcomes and longer-term behavioral outcomes, so those practices can be replicated. (For an example of a study that began this exploration with a small set of teachers, see Blazar and Pollard, 2022.848) Second, looking at different facets of teacher quality creates a more holistic measure of teacher effectiveness than looking only at their effect on students’ test scores.849

Policy implications: How to use this data

Give feedback to teachers

The majority of people who go into teaching embark on this profession because they want to help students.850 And teachers want to help students not only perform well on tests, but also succeed on longer-term measures. It stands to reason that providing data to teachers on their effects on students’ attendance, course-taking, grades, and other measures that go beyond their effects on test scores could help teachers think more holistically about their impact and how to improve on these various measures.

Pair teachers for mentoring or assign relevant professional development

Past research has found that when one teacher excels in a certain skill and another teacher is weak in that skill, pairing them together for mentoring, even without further directions, can help the weaker teacher improve.851 Similarly, one study of teachers’ non-test outcomes suggests pairing “lower-skilled teachers with programs designed specifically to strengthen their interpersonal relationships with students and their classroom organization,” proposing that targeted coaching or other professional development could help these teachers improve in the specific areas in which they are weaker.852

Learn what makes teachers effective across the board

While most studies found a low correlation between teachers’ non-test VAM and test-based VAM scores, there was a correlation, meaning that there are likely some teachers who tend to be stronger on both sets of metrics. Further understanding of these teachers’ multifaceted effectiveness (perhaps using quantitative measures like VAM scores to identify them, and then qualitative measures like observations, surveys, interviews, analysis of lesson plans, and other data to better understand what they’re doing) could lead to great leaps forward in our understanding of effective teaching practices. Some research has started in this direction,853 but there’s lots more to learn.

What we can’t yet use the data for

Teacher hiring

The data probably isn’t there yet. For example, Jackson (2018) says that while there may be room eventually to predict which teachers are more effective, so far it’s not possible to tell from “observable characteristics” (e.g., years of experience, licensure type, other criteria that can be measured easily from a resume) which teachers will be more effective.

Teacher assignment

Some research found that teachers with higher test-based VAM scores are most impactful for students at the higher end of the academic distribution, while teachers with higher non-test VAM scores are most impactful for students at the lower end of the distribution (e.g., those most at risk for dropping out of high school).854 Some researchers have suggested that there may be an opportunity to assign students to the teachers whose skills most align with the supports that will benefit these students. We caution against this approach, especially if it means assigning students who struggle academically to teachers who are less effective at raising student achievement. Doing so, even if this approach might improve other student outcomes such as attendance, risks perpetuating or widening academic gaps for students who most need to gain ground.

Where it’s tricky

High-stakes evaluations for teachers

One implicit goal of this entire line of research has been to consider how to develop a more holistic and robust measure of teacher quality that goes beyond their impact on near-term test scores. Currently, test-based VAM can only be used for teachers of tested grades and subjects (about 20% of teachers, by one estimate),855 and as this Trendline illustrates, this measure does not fully capture teachers’ effects on their students’ academic outcomes.

But, districts should use caution in incorporating non-test VAM into evaluation measures. While districts have put in place safeguards to protect against manipulation of test score data (e.g., checks of the data for suspicious answer patterns, a “chain of custody” for the test materials, procedures to have teachers administer other classes’ assessments), there are not the same safeguards for many of these non-test measures.856 For example, teachers enter their own classroom attendance, they determine their own students’ grades, and so on. While most teachers are unlikely to manipulate the data, there could be a temptation to “game the system” that districts would need to protect against.

One potential remedy for the possibility of gamesmanship is to look at the next year’s data (e.g., teachers’ value-added for their students’ grades and attendance in the following year, since the data suggests that teachers’ effects persist over time). But, even if this is methodologically sound (it’s a method many of these studies use), it may lack face validity among teachers, who feel a lack of control over their students’ outcomes in the following year. So it may be too soon to incorporate, but also not worth writing off entirely.

Teacher incentives

Maybe. The concern is that some of the non-test outcomes are “gameable” if teachers are the ones gathering or creating data (e.g., reporting students’ attendance for their class, determining what grade students get); outcomes that could be manipulated by teachers do not lend themselves well to incentives. However, there may be other measures that correlate with these outcomes, like classroom observations and student or parent surveys, that could be used to both identify teachers with stronger non-test VAM and target them for incentives.857

Rather than focusing on outcomes, districts could instead seek to better understand the teaching behaviors that lead to better non-test outcomes. Once those specific teaching behaviors can be identified, then districts could consider providing incentives to engage in those practices. This may be a more actionable way to provide incentives (based on specific teacher actions) than providing incentives for student outcomes.858

Because many of these measures are longer-term outcomes (e.g., graduation rates, college enrollment, and even long-run test-based VAM scores), districts could consider how to use this data to create a bonus for teachers who have been teaching for a longer time and have a track record of improving student outcomes across a range of measures. This would act less as an incentive to change teacher behavior, and more as a retention incentive for high-performing teachers.

About the Author

Hannah Putman

More like this

Seven ways to make improving teacher evaluation worth the work

A look at how using evidence-based practices can help ensure that teacher evaluation systems produce positive outcomes.

February 10, 2022

Nicole Gerber

The rise and fall of better teacher evaluation: Who gets the blame?

Even reasonably sensible ideas that are grounded in strong research still have difficulty overcoming what invariably gets thrown at them.

January 27, 2022

Kate Walsh

More evidence that teacher evaluation works

Well before COVID-19 hit, many states had backed away from only recently adopted teacher evaluation policies. With COVID-19 now dominating the landscape, more schools and districts are opting to forgo teacher evaluations.

June 25, 2020

Hannah Putman

Clinical Practice

Elementary Reading

Elementary Math

Reimagining Teaching

Teacher Compensation

Teacher Diversity

Teacher Preparation

Teacher Leave & Benefits

Teacher Prep Review

State Teacher Policy Database

Teacher Contract Database

Reimagining the Teacher Role

Clinical Practice Action Guide

Teacher Diversity Dashboard

Licensure Pass Rates

Teaching that goes beyond the test? How to measure the many accomplishments of great teachers

About the Author

Hannah Putman

More like this

Seven ways to make improving teacher evaluation worth the work

The rise and fall of better teacher evaluation: Who gets the blame?

More evidence that teacher evaluation works

Hannah Putman

Seven ways to make improving teacher evaluation worth the work

The rise and fall of better teacher evaluation: Who gets the blame?

More evidence that teacher evaluation works

Get the latest research and insights.