Building better metrics for news



Brian Abelson | @brianabelson
Data Scientist, Enigma | Fellow, Tow Center
Former Knight-Mozilla OpenNews Fellow, The New York Times
Slides: brianabelson.com/scpr-2013

"Pageviews are dead"

Remind you of anything?

Pageviews Above Replacement
(un-juking the stats)



  • What if we could control for promotion when judging performance?


  • From July - August, I collected data on the promotion and performance of over 21,000 articles published on nytimes.com

Data sources


    Promotional Data:
  • ~ 200 NYT-related Twitter accounts
  • ~ 20 NYT-related Facebook accounts
  • ~ 20 section fronts
  • One homepage
  • One paper

  • Metadata:
  • Article type: (video, slideshow, interactive, article, blogpost)
  • Section: (US, World, Art, etc...)

  • Performance Data:
  • Pageviews and Social Media Activity for each article

Predicting pageviews


  • Sum all the pageviews for 7 days on the site

  • Use promotional features and article metadata to predict this number

  • Random Forests (the mode of a bunch of decision trees)

Variable importance


  • Time on all section fronts
  • Number of unique section fronts
  • Was the article in the paper?
  • Number of NYT-Twitter followers reached
  • Time on homepage
  • Number of NYT-tweets
  • Is the article from Reuters?
  • Is the article from the AP?
  • Max rank on homepage
  • Word count

So what?


  • Placing promotional data alongside pageviews gives us a better understanding of what the metric actually means.

  • (NYT) Pageviews are actually fairly predictable (90% of the variance explained in my model)

  • Incorporating this approach in your Newsroom should be fairly painless with particle. However, you should first ask yourself what you're optimizing for.

  • Predictive analytics can help increase your editorial responsiveness to the reader's preferences, the news cycle - http://fast.qcri.org/.

Thanks!



@brianabelson
brianabelson.com
OpenNews