Raising the standards of data journalism

By Jon Schwabish  ::  December 19th, 2014


Looking back, 2014 may be known as the Year of Data-Driven Journalism. Journalists’ use of more and better data will probably, ultimately lead to better stories and a better understanding of important issues, but there are some growing pains (for example, here, here, and here). As a social science researcher, some of these growing pains, well, just pain me.

Take Wednesday night’s post from Nate Silver at FiveThirtyEight on the decision by Sony Pictures to pull the Seth Rogen and James Franco movie The Interview. In it, Silver writes the following:

Both production budgets and Rotten Tomatoes ratings are predictive of box office grosses. A 1 percent gain in Rotten Tomatoes ratings is associated with a roughly $2 million increase in international box office gross (according to a linear regression analysis). And every additional $1 million in production budget translates to about $2 million more at the box office.

This is not merely a “correlation does not equal causation” critique. My more fundamental critique is: “Why are journalists less responsible for citation and documentation than researchers?” Any researcher in any field would be laughed out of the room if he or she were to run a regression without describing the method and at least providing a source for the data. Why are journalists who purport to be “driven by data” held to different standards? Why are we so keen on requiring journalists to source whom they interview, but not the data and methods they use?

I don’t know what data Silver used here (it might be the table of 22 data points shown earlier in the story), but my guess is that he ran some simple linear regression of box office grosses on Rotten Tomatoes ratings (which also leads me to believe that his conclusion above is a percentage point change, not a percent change). A natural question to ask is why use box office grosses rather than profitability in the first place? Isn’t it important to distinguish between two movies that have same exact Rotten Tomatoes score, but one made a profit of $100 million while the other lost $100 million? That said, the simple correlation also misses an awful lot of other aspects of movie profitability: release date, popularity of the main stars, production costs, type of production (explosions are really expensive), type of movie, film rating, and even weather and sporting events on opening weekend.

And a quick side note here: people have been yelling at researchers and government agencies for years to provide their data in more accessible ways, but media organizations like FiveThirtyEight continue to publish their data tables in picture formats, which means readers who want to use the data have to type those values in by hand.

So Silver runs a regression—it might be on these mere 22 observations and it might not, he doesn’t say—and we could debate what’s in and what’s not in that regression. But he at least gives us a link to “linear regression analysis,” a link that will surely have model and data documentation, right? Nope. It’s a link to an “Introduction to linear regression analysis” primer page from a professor at Duke University. Not so much in the way of documentation there, eh?

Is the relationship between movie profitability and Rotten Tomatoes scores an important research topic? Not for me, really, and probably not for most folks here at the Urban Institute. But it probably is important for folks in Hollywood and for folks in marketing and advertising firms. These relationships are the sorts of things that help them determine what types of movies to produce, what products to place in those movies, and who is going to star in them.

But Silver, FiveThirtyEight, and other data-driven news organizations also write about issues in which I am interested: inequality, immigration, employment, and the minimum wage, to name a few. These organizations have big platforms—Nate Silver himself has 1.1 million followers on Twitter—and they are wading into important topics and affecting how people think about them. How are we supposed to trust data-driven journalism on big national issues if we can’t trust them on a story about a Seth Rogen movie?

If data-driven journalists want to be more like researchers, they can’t just start making up their own set of rules. I’m not arguing that the rules and ethics around academic and science publishing are correct, or that they even make sense, but they are based in decades of experience and debate. Basic responsibilities, like documenting data and explaining the methods, are fundamental tenets of research and should be followed. Does that mean Silver needs to write a National Bureau of Economic Research working paper for every little regression he runs? No, but a little documentation might help. Oh, and a little data too.

Illustration by Tim Meko, Urban Institute

Should you rent or buy? What to consider in housing decisions

By Ellen Seidman  ::  December 19th, 2014

One by-product of the housing boom and bust has been a re-examination by many of the real costs and benefits of the American dream of homeownership. And while some have looked to condense the complicated decision into a simple rule, every family must make a careful decision that takes many factors into account. That’s why [...]

Read More

Cost of servicing delinquent mortgages is holding back credit access for many borrowers

By Laurie Goodman and Taz George  ::  December 17th, 2014

This post originally appeared on HousingWire. Anyone following the public discourse about the mortgage market knows that lending is overly tight right now. And those who follow the issue closely understand that a primary driver of this tightness is uncertainty over how Fannie Mae, Freddie Mac and the Federal Housing Administration enforce their underwriting rules. What most [...]

Read More


How Congress meddles with the District of Columbia: Marijuana edition

By Richard C. Auxier  ::  December 17th, 2014

Congress’ last-minute agreement to fund the federal government through September includes a provision banning the District of Columbia from regulating and taxing marijuana. District voters overwhelmingly supported legalizing marijuana in a November ballot measure, and the DC Council recently held hearings on the possibility of public sale and taxation of the drug. None of these [...]

Read More

Racial/ethnic differences in uninsurance rates under the ACA: Where you live matters

By Lisa Clemans-Cope and Hannah Recht and Matthew Buettgens and Anna Spencer  ::  December 16th, 2014

Initial estimates suggest that the Affordable Care Act (ACA) has already reduced uninsured rates across all racial/ethnic groups, likely reducing longstanding racial/ethnic differences in health insurance coverage between whites and minorities. However, for poor and near-poor adults living in nonexpansion states—states that have elected not to expand their Medicaid programs under the ACA by January [...]

Read More

New measure shows mortgage denial rate is triple traditional estimates

By Laurie Goodman and Wei Li  ::  December 16th, 2014

For more than three decades, researchers and policymakers have used the mortgage denial rate as a measure of general mortgage credit availability and of racial and ethnic disparities in access. But the traditional way of calculating the denial rate doesn’t distinguish between two very different sources of change in access to mortgage credit: the mix [...]

Read More

10 ideas for improving data to support healthy communities and end poverty

By Ellen Seidman  ::  December 15th, 2014

This month, the Urban Institute and the Federal Reserve Bank of San Francisco released What Counts: Harnessing Data for America’s Communities, a book of short, accessible essays and a website. What Counts helps answer some of the major questions raised by the prior volume in the series, Investing in What Works for America’s Communities, namely [...]

Read More

Collaborating with data for America's communities

By Ellen Seidman  ::  December 12th, 2014

Last week, the Urban Institute and the Federal Reserve Bank of San Francisco released What Counts: Harnessing Data for America’s Communities, a book of short, accessible essays and a website. What Counts helps answer some of the major questions raised by the prior volume in the series, Investing in What Works for America’s Communities, namely [...]

Read More


Eight recommendations for the Ryan-Murray Commission

By Margery Turner and Jon Schwabish  ::  December 11th, 2014

Last week, Sen. Patty Murray (D-WA) and Rep. Paul Ryan (R-WI) introduced sensible, bipartisan legislation that could improve federal policy by expanding the availability of data that can fuel actionable research. This is great news. As policy researchers, we believe in the power of evidence to strengthen public policies. But, too often, our work is [...]

Read More