Educate All Students, Support Public Education

February 2, 2011

Problems With Value Added Assessment by Diane Ravitch

Filed under: Arne Duncan,teacher evaluation — millerlf @ 12:35 pm

January 18, 2011 The Pitfalls of Putting Economists in Charge Of Education/Building Bridges Blog

Dear Deborah,

A few weeks ago, Mike Rose posted a list of his New Year’s resolutions. One of them was that we should “make do with fewer economists in education.

These practitioners of the dismal science have flocked to education reform, though most know little about teaching and learning.” Mike suggested that so few economists were able to give useful advice about the financial and housing markets that we should now be skeptical about expecting them “to change education for the better.”


I agree with Mike. It is astonishing to realize the extent to which education debates are now framed and dominated by economists, not by educators or sociologists or cognitive psychologists or anyone else who actually spends time in classrooms. My bookshelves are chock full of books that analyze the teaching of reading, science, history, and other subjects; books that examine the lives of children; books that discuss the art and craft of teaching; books about the history of educational philosophy and practice; books about how children learn.

Now such considerations seem antique. Now we are in an age of data-based decision-making, where economists rule. They tell us that nothing matters but performance, and performance can be quantified, and those who do the quantification need never enter a classroom or think about how children learn.

So the issue of our day is: How do we measure teacher effectiveness? Most of the studies by economists warn that there is a significant margin of error in “value-added assessment” (VAA) or “value-added modeling” (VAM). The basic idea of VAA is that teacher quality can be measured by the test-score gains of their students. Proponents of VAA see it as the best way to identify teachers who should get merit pay and teachers who should be fired. Critics say that the method is too flawed to use for high-stakes purposes such as these.

Last July, the U.S. Department of Education published a study by Mathematica Policy Research, which estimated that even with three years of data, there was an error rate of 25 percent. A few months ago, I signed onto a statement by a group of testing experts, which cautioned that such strategies were likely to misidentify which teachers were effective and which were ineffective, to promote teaching narrowly to the test, and to cause a narrowing of the curriculum.

None of these cautions has stemmed the tide of rating teachers by student test scores and releasing the ratings. Last year, the Los Angeles Times published an online database that rated 6,000 teachers as to their effectiveness (one of them, elementary school teacher Rigoberto Ruelas, committed suicide a few weeks later). New York City is poised to make a public release of the names and ratings of 12,000 teachers, if the courts give the go-ahead (in the first trial, a judge ruled that the data could be released even if it was inaccurate).

This trend did not just happen. It was encouraged by the Obama administration’s Race to the Top, which urged states to develop quantitative measures of teacher effectiveness. Secretary of Education Arne Duncan has issued statements endorsing the publication of teachers’ names and ratings, although few testing experts agree with this practice.

The bulk of studies warn about the inaccuracy and instability of these measures, but the Gates Foundation recently released a study called “Measures of Effective Teaching” (MET) that supports the use of VAA and VAM. As is customary for the Gates Foundation, it hired an impressive list of economists at institutions across the nation to give the gloss of authority to its work. Among its key findings was this one: “Teachers with high value-added on state tests tend to promote deeper conceptual understanding as well.” Ah, said the proponents of measuring teacher quality by the rise and fall of student test scores, this study vindicates these methods and effectively counters all those cautionary warnings.

But now comes a re-analysis of the Gates study by University of California-Berkeley economist Jesse Rothstein, which says that the MET study reached the wrong conclusions and that its data demonstrate that VAA misidentifies which teachers are more effective and is not much better than a coin toss. Even the claim that teachers whose students get high scores on state tests will also get high scores on tests of “deeper conceptual understanding” is flawed, writes Rothstein.

The correlation between the two tests was actually modest: About 29 percent of the teachers in the bottom quintile on the basic skills tests were rated above average on the tests of reasoning and critical thinking; these are the teachers who would be fired if the Gates Foundation had its way.


“Interpreted correctly,” writes Rothstein, the analyses in the Gates’ report “undermine rather than validate value-added-based approaches to teacher evaluation.” Jesse Rothstein is not just any economist; not only has he studied VAA in the past, but he served as senior economist for President Barack Obama’s Council of Economic Advisors and chief economist at the U.S. Department of Labor. So he is well-equipped to take on the entire stable of Gates-funded economists, mano a mano.


For another take on these issues, I recommend a lively blog debate between economist Dan Goldhaber and teacher John Thompson. Thompson warned Goldhaber that economists should think seriously about the damage their methods will wreak on teachers and children and schools, especially low-income schools, where it is harder to get big test-score gains. In one of Thompson’s perceptive comments, he wrote, “If teachers have a one-sixth, one-fifth, or one-third chance per year of being wrongly indicted as ineffective, none will ever have any peace of mind. Effective teachers with self-respect will flee those schools for lower-poverty schools.” I was reminded of a comment by Rutgers economist Bruce Baker, who asked whether you would buy a car if the salesman assured you that it would explode only once every five times you turned the ignition key.


If we step back a bit, Deborah, don’t you think there is a certain kind of madness in thinking that economists who never set foot in a classroom can create a statistical measure to tell us how best to educate children? It seems some will never be satisfied until they have a technical process to override the judgments of those who work in schools and are in daily contact with teachers and children. I don’t know of any other nation in the world that is so devoted to this effort to turn education into a statistical problem that can be solved by a computer. It is not likely to end well.


Posted by Diane Ravitch


  1. Astonishing that no one has commented on this campaign to do a triage on the teaching profession based on value-added measures that came into education from the work of Dr. Wm Sanders, a statistician specializing in agricultural genetics. It is useful to restate some rudimentary ideas about genetic engineering not only as the source of methods for evaluating teachers, but as a metaphor operating below a threshold of public and professional discussion in “reengineering education.” Frederick Taylor, a mechanical engineer, is reincarnarted as a genertic engineer. Genetics is the study of ways to alter or select traits of plants and animal species. The studies are made in order to perfect ways to propagate superior traits, accelerate genetic improvement, and engineer transformations that incorporate new features (e.g., capacity to resist disease), or new functions (e.g., terminator seeds that grow sterile plants).
    The technologies of genetic engineering also have unintended consequences. A major risk is that of disturbing a thriving ecological system, doing harm to strengths in existing species, unexpected and toxic reactions to changes introduced into reproductive systems, and the development of resistance to engineered interventions. Other concerns bear on unhealthy concentrations of traits by inbreeding, and perhaps most important, the irreversibility of these processes.
    Although selected techniques of statistical estimation (e.g., mixed model analysis of variance, “percent cumulative norm gain”) have been imported into education, other lessons that might be learned from genetic engineering as a metaphor (and program) in education have been left unexamined. I think this is a nice summer project for you and colleagues, especially since the NYTimes (Robert Pear, May 28, 2011) has just just given a heads up on the application of the same principle to Medicare patients treated in hospitals. All of the “service providers” involved in care during the stay and for 90 days after, from physicians to clerks and cleaners are included in the cost-per-patient calculation, but in this case there seems to be total indifference to the outcomes for the patient. The method of analyisis is called “value-based purchasing” built on a scheme of tracking the cost to serve each patient. Hospitals will be rewarded for holding down costs–being efficient–and lose reimbursement if they don’t hit some as yet unknown cut score for being inefficent. System kicks in this October in 3,100 hostpitals. If you need surgery, suggest you have it before the October. Of course, the GATES/DUNCAN/OBAMA adimistration is doing the value-added thing on teachers for the same bottom line agenda, cost-per-pupil for 2,4,6 point increments in test scores. Hanushek does these grand inferential leaps through the thin air of statistics with claims that x increments in tests scores will boost the economy by a gazillion dollars. Snake oil.

    Comment by Laura H. Chapman — May 31, 2011 @ 10:00 pm | Reply

  2. Gregor Mendell wasn’t a farmer, but he developed statistical analyses of chick pea genetics. V. Parato’s principles have been applied to a multitude of human endeavors which have yielded valuable applications. My point is that one doesn’t have to be teacher-trained to observe that “the emperor has no clothes!” There is a causal relationship between effective teachers & good test scores. I was educated in the 40’s & 50’s & the education we received then was equivalent to the IB programs or the “gifted- talented programs of today”. In those days, there was a results oriented mind- set, unlike today where incompent teachers hide behind staticial variance to keep their jobs!… Drury H. Bynum, MBA

    Comment by Drury H. Bynum — September 15, 2011 @ 4:14 pm | Reply

  3. […] anything to do with the not so subtle threat that LAUSD administration holds over their heads with value added assessment (45% margin of error), negative teaching evaluations, and overt threats if these teachers […]

    Pingback by LAUSD Encourages Teachers to Cheat — October 31, 2013 @ 7:31 pm | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: