Education Los Angeles Times

The LA Times/U of Colorado Teacher Ranking Wars: The Sequel


IN E-MAILS AND IN PUBLIC THE LA TIMES AND THE UNIVERSITY OF COLORADO’S NATIONAL EDUCATION POLICY CENTER HAVE SPENT OVER A WEEK TRADING BLOWS

Last week the National Education Policy Center (NEPC) at the University of Colorado, Boulder, released the results of a new study titled Due Diligence and the Evaluation of Teachers, that was harshly critical of the LA Times controversial “value added” teacher rankings which they said were “based on unreliable and invalid research.”

More specifically, the Colorado researchers found that, when they evaluated the same LAUSD 5th grade teachers using a slightly different statistical model, more than half of the teachers came out with different rankings than those of the Times when it came to teaching reading. Forty percent were different when it came to math.

Since the new study was released, the back and forth between the LA Times and the NEPC—both public and private—has been unusually acrimonious and accusation-filled.

Among other darts traded, the NEPC/U of Colorado group said in a press release introducing their their study that, “the Times now owes its community an acknowledgment of the tremendous weakness of the results reported and an apology for the damage its reporting has done.”

On Monday, in a private exchange of emails (reprinted in full after the jump), LA Times Editor-in-Chief Russ Stanton said of U of Colorado lead researcher, Derek Briggs, “many of your arguments are, at best, intellectually dishonest.

The punch-throwing actually began the day before the NEPC study was officially released when the Times, having gotten an early copy of the report, ran a story with the following headline:

Separate study confirms many Los Angeles Times findings on teacher effectiveness

The story itself pointed to some differences between its study and the Colorado study, but portrayed them as minimal—when, according to the U of Colorado researchers, they were anything but.

Then the last line of the Times’ article carried the following not-terribly-subtle barb:

[The NEPC] study was partly funded by the Great Lakes Center for Education Research and Practice, which is run by the heads of several Midwestern teachers unions and supported by the National Education Assn., the largest teachers union in the country.

The Times didn’t come right out and say that the NEPC researchers were carrying water for various value-added-loathing teachers’ unions. But they mightily implied it.

The NEPC responded to the Times article with an angry press release that included a point-by-point refutation of the Times’ story on their study. The press release read in part:

Yesterday, on Monday February 7, 2011, the Times published a story about this new study. That story included false statements and was generally misleading. Accordingly, along with this study’s release, we are publishing a “Fact Sheet” about the Times’ new article.

“We expected the LA Times to fight back or be defensive,” said Bill Mathis, NEPC’s managing director. “What we didn’t expect is that preposterous headline, and everything that followed.”

On Monday of this week, the Times delivered the next volley of the conflict in the form of an unsigned Reader’s Representative Journal, which basically stood by all their earlier conclusions. Near the end of the “journal” there was this:

Finally, a major source of funds for the policy center is the Great Lakes Center for Education Research and Practice, a foundation set up by the National Education Association and six major Midwestern teacher unions affiliates. The NEA was one of the teacher union groups that backed an unsuccessful call for a boycott of The Times when “Grading the Teachers” was first published.

Although the policy center presents itself as a source of neutral scientific research, the language of their public statements has been anything but dispassionate, including a call for The Times not only to remove “Grading the Teachers” from our website, but to “apologize” for our work.

Alrighty then.

According to NECP Publications Editor, Alex Molnar, the NECP/U of Colorado team will post another point-by-point reply to the Times’ Monday salvo.

Also, later this week, I’ll have an analysis by one of the nation’s experts in the field of value added teacher evaluations, in which he breaks down in simple terms why exactly the rest of us should care about this whole mess.

So stay tuned.

In the meantime, one of the better (and more even-handed) pieces of commentary on the dispute comes from Education Weekly. It is by Rick Hess, Resident Scholar and Director of Education Policy Studies at the American Enterprise Institute.

He concludes:


“…it strikes me that this is a case where both sides
make some valid points. I think Buddin [Richard Buddin is the analyst who did the research for the Times.] could’ve done more to explain his specifications and I would’ve liked to see the Times proceed more carefully, but I also think the Briggs critique is more cautionary than damning.”

Also, there was this particularly strong response in the LA Times comments section [comment #2] from Mike Rose, an award winning UCLA professor who is an expert in social research methodology as it relates to education.

Rose writes, among other things:

I cringed at the cheap insinuation that the Colorado study is influenced by the source of some of its funding. Shall we consider the vested interest of Mr. Lauter, Mr. Felch, etc. in this project? Or the fact that Thomas Kane, who Mr. Lauter approvingly quotes, is a high-level official at the Gates Foundation, overseeing a project which has invested heavily in Value-Added methods? The point is that there are all kinds of personal, professional, and institutional investments in this debate, so if you’re going to lay them out, lay them all out. And if you suspect a biasing influence, do the reporter’s job of demonstrating it.

But the big, big question for me is how is it that this newspaper moved so strongly toward advocating a particular technology in school reform? The Times is not just editorializing that we need reform, but within its news department is taking a side on a technique. The paper is no longer reporting the news, but creating it and spinning it.

It is a question that deserves further discussion.



FOR THE EMAIL EXCHANGE BETWEEN THE NECP’s PUBLICATION DIRECTOR, ALEX MOLNAR, & THE LA TIMES CLICK HERE


From: alex molnar
Sent: Thursday, February 10, 2011 8:21 AM
To: Stanton, Russ

Subject: LA TIMES TEACHER RATING REPORTING

Dear Mr Stanton:

My name is Alex Molnar. I am the publications director of the National Education Policy Center (NEPC) at the University of Colorado-Boulder. I am writing to demand a correction of the Los Angeles Times February 7 article (http://www.latimes.com/news/local/la-me-teacher-study-20110207,0,2144294.story) that materially misrepresents research published by NEPC.

On February 8, 2011, NEPC released a research brief (http://nepc.colorado.edu/publication/due-diligence) reporting the results of an analysis using the same data as the research that the Times used to create its teacher effectiveness database. Our research results clearly demonstrate that Richard Buddin’s study does not adequately support the uses to which the Times has put it.

The Times, in its February 7 article on our report, materially misrepresents and distorts its findings. A subsequent “for the record” statement published on February 9 makes one factual correction to the February 7 article, but in no way acknowledges or corrects its gross misrepresentation or omissions. It is simply inadequate to the enormity of error the paper has made in its reporting on the NEPC research.
Here are the salient, substantive points that were ignored or misrepresented in the Times February 7 article:
· The NEPC report found evidence that there are important variables correlated with student achievement and teacher assignment that were not included in the analysis.
· NEPC researchers demonstrated that the inclusion of three additional sets of variables in the model Buddin used – (i) a longer history of a student’s test performance, (ii) peer influence, and (iii) school — level factors — leads to dramatic changes in teacher ratings. For reading test outcomes in particular, as many as half of all teachers would be rated differently.
· The NEPC report showed that the Times’ categorization approach is less conservative than an approach based on the principle of a statistical confidence interval. Relative to this more conservative approach, the Times’ approach is likely to lead to a significant number of false positives and false negatives.
· Finally, the NEPC report—contrary to Buddin’s white paper—found that teacher experience and credentials are in fact associated with teacher effectiveness.
Taken together, there is simply no reasonable way to interpret our findings as “confirming” the Times’ conclusions, even in a “broad” sense.
I would have hoped that, in light of the NEPC research, the Times might have given a sober, second look at its own work on the teacher evaluation project and reconsidered the premises on which it was based as well as the conclusions which it reached. I still harbor some hope for that eventual outcome.

As your newspaper reported on January 11 (http://articles.latimes.com/2011/jan/11/local/la-me-ny-schools-20110111), a New York court has held that the public has a right to teacher evaluations even if those evaluations are “unreliable or otherwise flawed.”

This is a very different matter, however, from what the Times has done. The Times is not reporting on teacher rankings or evaluations that have been created by the Los Angeles Unified School District. Instead, the Times has created and published its own rankings of individual teachers and continues to provide public access to those rankings. And now, your newspaper does so with the certain knowledge that it cannot warrant that large numbers of those rankings are not in error.

More immediately, given the fact that the NEPC’s report directly criticized the research that was used by the reporters who carried out the original teacher-rating project, it strikes me as inappropriate, if not unethical, for the newspaper to assign one of those same reporters, Jason Felch, to write about our findings. At the very least, it seems to me, it would have been appropriate to give the assignment to another reporter whose work on the story would not be so obviously compromised by self-interest.

Failing that, however, I would at least expect the article to fully and accurately convey our essential criticism of the work in question. This Mr. Felch did not do.

It is time for the Times to set the record straight.

Sincerely,
Alex Molnar

On Mon, Feb 14, 2011 at 3:44 PM, Stanton, Russ wrote:

Dear Mr. Molnar,

Thank you for your note. We do, as you know, take the accuracy of our reporting very seriously and we have, as you know, published a correction from Mr. Felch’s story of Feb. 7.

I have looked into your other points and decided that no further corrections are warranted. To be perfectly candid, many of your arguments are, at best, intellectually dishonest. By that I am referring to an email that one of your researchers, Derek Briggs, sent to Mr. Felch last week. I won’t trouble with a long explanation of it here, but we have posted two statements about it on our website. You can find them at www.latimes.com/readers.

If you like, I’d be happy to send you the full email that Mr. Briggs sent to Mr. Felch.

If you would like to make some of your other points below within the framework of The Times, I would recommend contacting our Op Ed editor Sue Horton at (213) 237-7920 or atsue.horton@latimes.com.

Sincerely,

Russ Stanton
Editor
Los Angeles Times

From: alex molnar
Date: Mon, Feb 14, 2011 at 7:13 PM
Subject: Re: LA TIMES TEACHER RATING REPORTING
To: Russ.Stanton

Dear Mr. Stanton,

Thank you for your response. I have read the email exchanges between professor Briggs and Mr. Felch to which you refer. Also, you should know that professor Briggs read the email I sent to you prior to my sending it and confirmed to me the accuracy of my substantive statements.

I would be interested in learning what in professor Brigg’s emails provides the basis for your judgment of intellectual dishonesty. The email from professor Briggs, the reply from Mr. Felch, and professor Briggs’ response are embedded in chronological order below.

I would very much appreciate it if you highlighted any element in either of professor Briggs’ emails you consider to be intellectually dishonest or that in your judgment I represented in an intellectually dishonest way in my email to you.

You have made a serious judgment, and it deserves careful attention. I think we both understand how important this matter is.

I look forward to hearing from you.

Sincerely,

Alex Molnar


Professor Briggs first email:

From: Derek Briggs
Sent: Saturday, February 05, 2011 12:56 PM
To: Felch, Jason

Hi Jason,

I want to respond to a concern you raised yesterday during our phone
conversation regarding our analysis of the precision of value-added
estimates. Here is what we wrote in our executive summary in regard
to this issue:

“Once the specific value-added estimate for each teacher is bounded by
a confidence interval, we find that between 43% and 52% of teachers
cannot be distinguished from a teacher of –average effectiveness.
Because the L.A. Times did not use this more conservative approach to
distinguish teachers when rating them as –effective or –ineffective,
it is likely that there are a significant number of false positives
(teachers rated as effective who are really average), and false
negatives (teachers rated as ineffective who are really average) in
the L.A. Times’ rating system.”

In the full report, we illustrated this latter aspect of false
positives and false negatives as follows:

“An interesting contrast is to compare the classifications of teachers
into three levels of effectiveness when using a 95% confidence
interval as above (average, effective, and ineffective) with the
classifications that result when the teacher effectiveness
distribution is broken into quintiles. How many teachers classified as
effective or ineffective under the latter classification scheme would
be considered only average under the former, more conservative
approach? We consider a teacher with an effect in the bottom two
quintiles of the distribution but with a confidence interval that
overlaps 0 to be a –false negative, and the opposite extreme to be an
example of a –false positive. For classifications based on reading
outcomes, we find 1,325 false negative and 2,215 false positives–12.3%
and 20.5% of the total sample of teachers for whom these effects could
be estimated. For classifications based on math outcomes, we find 693
false negatives and 1,864 false positives–6.4% and 17.2% of the total
sample of teachers for whom these effects could be estimated.”

Now your concern, as I understand it, is that these statistics, which
are based on the full sample of nearly 11,000 teachers for whom
value-added could be estimated, are potentially misleading because
readers might assume that they generalize to the way that the LA Times
used the results for its website ratings. You note that only teachers
with at least 60 students were included in these ratings. Because
doing this reduces the sample of teachers considerably, the results
above will overstate the numbers and proportions of false positives
and false negatives if they are generalized to the L.A. Times’
ratings. Now while I think we used an appropriate frame of reference
in what we did and were very explicit about it, I regard this as a
reasonable concern. It is also something we can examine empirically,
so we did.

When we impose the N60 restriction this reduces our sample of
teachers to 5124 in math and 5047 in reading. Now we can re-write the
last two sentences of the second pgh above with respect to this
sample:

For classifications based on reading outcomes, we find 516 false
negative and 612 false positives–10.2% and 12.1% of the total sample
of teachers that would be included in ratings released by the L.A.
Times. For classifications based on math outcomes, we find 257 false
negatives and 454 false positives–5.0% and 8.9% of the total sample of
teachers that would be included in ratings released by the L.A. Times.

So in summary, I agree that use of N60 criterion by the L.A. Times
serves to mitigate the issue of false positive and false negatives.
But I stand by our statement in the executive summary of the report
that “it is likely that there are a significant number of false
positives (teachers rated as effective who are really average), and
false negatives (teachers rated as ineffective who are really average)
in the L.A. Times’ rating system.” The fundamental point we were
making is that the application of a 95% confidence interval to group
teachers into three categories will be more conservative than a
quintile approach that groups teachers into five. This remains true
with or without the N60 restriction.

Derek

Derek Briggs
Associate Professor & Program Chair
Research & Evaluation Methodology
School of Education
University of Colorado, Boulder


Mr. Felch’s response:

Sat, Feb 5, 2011 at 8:30 PM,

Derek

Thanks for your note. I’m grateful for the attention you gave to one of my concerns, but I’m afraid your re-analysis only deepens my feeling that your report — whose broader goal I strongly support — is misleading and will be widely misconstrued.

Let’s start with the title, Due Diligence. It implies that you are doing the due diligence that we failed to do. The truth is, we spent the better part of a year conducting due diligence before making our results public. This included many of the same verification tests you conducted, in additional to several more — running the dataset through several of the commonly used VAMS [Value-Added Models] and comparing the results, and taking the data out into dozens of schools and testing it against the observations of principals, teachers, parents and students.

You never once contacted the Times during the months of preparing your report. In your one email exchange with Dick Buddin, you apparently never asked whether he had conducted similar tests as those in your report, and simply assumed he had not. When I asked you why you hadn’t take these basic steps, you said you felt Dick’s white paper should stand alone, and that you weren’t writing about the Times’ work. Yet you critique the LA Times uses of the data throughout your paper, and raise numerous speculative questions about what Dick might or might not have done and why. All of these questions would happily have been addressed by the Times or Buddin. By failing to ask, I feel you failed in your own due diligence.

The most obvious consequence of this failure is the use of a different dataset to draw conclusions about the strength of our analysis. Your report presents itself as a replication of the Times’ study, yet you later acknowledge that it is not: your data includes 100,000 fewer student observations and 700 fewer teachers. These significant differences could have been resolved — as you told me on the phone, you simply chose not to do so. The result is that it is nearly impossible to know the source of the differences between the two analyses. The tone of your paper implies it is a flaw in our model, yet, as you later acknowledge in passing, it could simply be the differences in the underlying datasets. This does no service to your readers, who are left to guess at the significance of your findings.

Compounding this is your paper’s repeated conflation of Buddin’s analysis with the far smaller dataset that the Times released publicly. You claimed in our talk that your report is about Buddin’s analysis, not the Times’ work. But much of your critique focuses on decisions made by the Times, such as the decision to categorize teachers in quintiles rather than three groups, as you prefer. Buddin’s white paper has no mention of “more effective” or “less effective” teachers. It does not rank teachers or deal with categorizations at all. These decisions were made by the Times after careful deliberation and much internal debate and external consultation. Yet you made no effort to learn about our motives or process, and represent our decisions in your report as haphazard at best.

Had you asked, we would have told you that after Buddin conducted his initial analysis, a large team of reporters and editors at the Times spent months reviewing the results, testing them in various ways and consulting with Buddin and outside experts about how to best present them to the public. For example, to address the very sensitivity issues you raise in your paper, we eventually settled on releasing a far smaller dataset of 6,000 teachers. This was the result of a specification applied by the Times that several experts considered conservative: limiting public ratings to teachers who had 60 or more student observations. On average, teachers in our database had 110 observations, and many had upwards of 170 or more. (By contrast, DC’s IMPACT program allows high-stakes use of teacher data based on approx. 10 observations.)

Nowhere in the paper do you acknowledge these steps the Times took before making teacher data public. The omission is glaring and curious. Indeed, much of the weight of your critique rests on the false assumption that the reliability of the results for your 11,000 teachers can be directly applied to what the Times made public. As your belated re-analysis found, that is clearly not the case.

Unfortunately, this bit of context will not be available in your public report, nor to the dozens of groups who are likely to seize on your report for their own ends. You seem confident readers of your report will navigate these ambiguities without difficulty. I don’t share that confidence, and I would encourage you to revisit your assumption as bloggers, reporters, academics and partisans in the education wars interpret your findings.

In our conversation, your principal complaint seems to be that we did not document more of our due diligence process in Buddin’s white paper. That may be a fair criticism. But it should also have been noted that in addition to a dozen lengthy articles about the subject, the database and the methods paper, we published FAQs and Q&As, video explanations of the strengths and limits of VAMS, links to academic research. We gave the public and teachers opportunities to comment, fielded hundreds of individual queries from people across the country and spent months on the road explaining our process and results to often hostile audiences.

Despite all the differences I cite above, I continue to believe we share a common goal. Your efforts replicated much of the same due diligence we conducted internally, and reaffirmed many of our own conclusions and cautionary notes. To that extent, it provides a valuable service. Sadly, the takeaway most of your readers will have is a deeply misleading one.

Sincerely,

Jason Felch


Derek Briggs’ reply

Jason,

I suspect whether our report can/will be construed as misleading is
something we will need to agree to disagree about.

I find it very curious that you apparently have done all these
sensitivity analyses similar to what we presented and yet until now
never felt the need to make this public. So let me ask you this: when
you specified alternate, more complex models that controlled for–for
example–peer effects, a longer test score history, etc., etc., do you
mean to tell us that you found none of the differences that we found?
You found no worrisome signs of bias? What was the justification for
choosing such a simple model specification relative to other
well-known VAM applications (see our comparison in Table 1 on p. 6)?
If Dick conducted all those additional analyses, why did he feel no
need to mention this anywhere in his white paper? Due diligence is not
just about doing these things in the privacy of your office but about
making the results publicly known.

And who exactly where these “outside experts” that were consulted?
Some of the foremost experts in the world on value-added modeling are
Dick’s very colleagues at the RAND Corporation: Dan McCaffrey and J.R.
Lockwood. I happen to know for a fact that they had absolutely nothing
to do with reviewing his analysis–doesn’t that strike you as a bit
telling?

A first principle in scientific research is that procedures should be
presented in sufficient detail such that qualified scholars can
independently reproduce them. I think we’ve demonstrated quite clearly
that Buddin’s white paper fails this principle. But this was just one
part of our study. The rest raises important questions about the
validity of his model that to date had not been *publicly*
acknowledged or addressed by him or you.

Our report was a critique of Buddin’s white paper, not on the decision
by the Times to write the story and publish the ratings. We say this
explicitly on p. 2 and our narrative is consistent with this. So we
saw no reason to delve into the decision-making process that occurred
in between his analysis and your publishing of the story. That was
outside the scope of what we were trying to do. That said, we do make
explicit note of both your FAQ section and the forum for teachers to
post their comments (“To the credit of the L.A. Times, teachers have
at least been given the opportunity to post responses to their
ratings”)–I’m wondering whether you read our report all the way to
its conclusion on pp. 18-21?

As a final note, I hope that you and your editors appreciate that it
represents a rather significant conflict of interest for you to be
covering this story since you are clearly a big part of it. It seems
like something of an ethical minefield for a news organization to be
navigating.

Sincerely,

Derek

Derek Briggs
Associate Professor & Program Chair
Research & Evaluation Methodology
School of Education
University of Colorado, Boulder

9 Comments

  • The real issue here is not which approach to “value added” evaluation is more valid. The issue is how the Los Angeles Times tried to distort the debate.

    Had the Times simply reported the NEPC findings without adding its own absurd spin, allowed the relevant Times editors and Prof. Buddin to respond, and given readers the necessary information to make up their own minds, there would be no controversy now – and the Times would have added some value to the whole value added debate.

    Instead, the Times wrote a misleading story about the NEPC findings. Then, when NEPC and others complained, the Times retreated into petulance and name calling.

    So the Times declares that anyone who disagrees with *how* they evaluated teacher performance doesn’t want any evaluation at all. The Times statement declares that

    “Mr. Molnar’s claim boils down to this: Until a perfect value-added system is developed that everyone agrees upon, nothing should be published. We reject that idea.”

    And later: “For years, school districts around the country, as well as academic experts, have conducted value-added analyses of teacher performance which they have kept secret. With “Grading the Teachers,” we put this information before the public, with ample explanation of the method’s limitations. That, we submit, is exactly what a newspaper should do. Mr. Molnar would like to put this information back behind locked doors. We disagree.”

    In fact, as Derek Briggs states in his e-mail:

    “Our report was a critique of Buddin’s white paper, not on the decision by the Times to write the story and publish the ratings. We say this explicitly on p. 2 and our narrative is consistent with this.”

    But to some of us this kind of immature bunker mentality at the Times has a depressingly familiar ring. When my organization and many others criticized their child welfare coverage Assistant Managing Editor David Lauter replied by declaring that “presumably” we don’t want news organizations to cover “mismanagement or poor execution of policies.”

    Presumably, then, if anyone criticizes how the Los Angeles Times covers any topic, the newspaper will dive into a bunker and declare that we don’t want the topic covered at all.

    Richard Wexler
    Executive Director
    National Coalition for Child Protection Reform
    http://www.nccpr.org

  • All of this reporting was done, purportedly, to do a public service, to help parents decide which is the best teacher for their child. At least, that was the justification.

    Has this been accomplished? If experts in the field cannot agree about this methodology (although Felch and Song do not qualify as experts), then what is the likelihood that the average parent in LAUSD will understand the nuances of the debate? A great number of these parents don’t even speak English!

    If anything, the LA Times has done a tremendous disservice to the public by continuing to push forth a controversial method to evaluate teachers, and now lots of time an money will be spent further exploring this method when the money should be going to students in the classroom.

  • I read the entire NEPC paper today wearing both the hat of a scientist and as the mother of a child attending a Title 1 public school.

    A key tenet of science is reproducibility of results. That’s what the NEPC scientists tried to do. They didn’t set out to do a hatchet job. They just wanted to see if a more detailed analysis, including more confounding factors, would produce the same results.

    As a scientist, I want to see a sensitivity analysis and theirs unequivocally shows that student assignment is not random. From page 4:

    We then conducted a sensitivity analysis in three stages. In our first stage we looked for empirical evidence that students and teachers are sorted into classrooms non-randomly on the basis of variables that are not being controlled for in Buddin’s value-added model. To do this, we investigated whether a student’s teacher in the future could have an effect on a student’s test performance in the past—something that is logically impossible and a sign that the model is flawed (has been misspecified). We found strong evidence that this is the case, especially for reading outcomes. If students are non-randomly assigned to teachers in ways that systemically advantage some teachers and disadvantage others (e.g., stronger students tending to be in certain teachers’ classrooms), then these advantages and disadvantages will show up whether one looks at past teachers, present teachers, or future teachers. That is, the model’s outputs result, at least in part, from this bias, in addition to the teacher effectiveness the model is hoping to capture. Because our sensitivity test did show this sort of backwards prediction, we can conclude that estimates of teacher effectiveness in LAUSD are a biased proxy for teacher quality.

    As a parent, I have seen why teacher assignment is more strongly correlated with reading than in math. Many schools mix up the students at math time. My daughter’s elementary school holds math class at the same time for all classrooms in the same grade (and in neighboring grades). Teachers work on different units, some ahead, some behind grade level. Kids move to the appropriate classroom for math period based upon a placement test at the beginning of the year and recent math test scores.

    The LAT is way too sensitive here and owes Briggs an apology. Briggs et al went out of their way to say the things that Buddin’s analysis got right and politely stated how a more detailed analysis could come to a different conclusion.

    Look at this quote that they included on page 20.

    There is a danger that debates around the use of test scores to evaluate schools and teachers will involve two extreme, but mistaken, positions. First it is argued that unless value-added models can be shown to lead to perfect classifications of effective and ineffective teachers (however defined), they should not be incorporated into the high-stakes decisions likely to accompany teacher (and school) evaluations. This argument is a false one because, as a number of researchers have pointed out, relative to a status quo for teacher evaluations that most people regard as unacceptable, classifications driven at least in part by value-added estimates do not need to be perfect in order to constitute an improvement.41

    Full disclosure: I have never met professor Briggs, but I earned a PhD at U of Colorado’s Joint Institute of Laboratory Astrophysics.

  • “it is likely that there are a significant number of false positives (teachers rated as effective who are really average), and false negatives (teachers rated as ineffective who are really average) in the L.A. Times’ rating system.”
    -from the NEPC press release

    I brought up this exact issue the Beth Schuster, the LA Times k-12 editor, and she had absolutely no idea what I was talking about. She just said they had a statistician and as far as she knows the numbers are all correct. I know editors can’t know every detail about every story, but this value added story was a big deal and she should make an effort to understand this concept before publishing.

  • […] In addition, the study calls into question the validity and error rate of the rankings, the inclusion and exclusion of certain variables, and the lack of consistency between the two models.  Naturally, the authors of each study vigorously defend their methodology and cast doubt on the opponents (1). […]

Leave a Comment