The Economist.com data migration to Drupal

by moshe weitzman

The Economist is now using Drupal 6 to serve the vast majority of content pages to its flagship web site, economist.com. The homepage is Drupal powered, along with all articles, channels, comments, and more. The Economist evaluated several open source CMS and proprietary solutions aimed at media publishers. In the end, The Economist chose Drupal for its vibrant community, and the ecosystem of modules that it produces. The Economist will be adding lots of social tools to its site over time, and doing so on its existing platform was too slow/inefficient.

The Economist hired Cyrve to migrate its large and volatile dataset to Drupal. With the sponsorship and encouragement of The Economist, Cyrve open sourced its migrate module which is the heart of its migration methodology. The Economist and Cyrve hope this article helps more sites migrate to Drupal.

Before Drupal

  • 20-30 million page views per month. 3-4 millon unique visitors per month
  • Over 3 million registered users
  • Posting rate exceeds a comment per minute.
  • Powered by a custom Cold Fusion application and an Oracle database.

Get intimate with the source data

We usually start by reviewing an article web page and identifying where each piece of data is stored in the 'legacy' system. For the Economist the most interesting challenges were

  • The legacy schema attempted to impose an object-oriented design on a relational database. There was a central cms_object table, holding all kinds of content, with content-specific data two degrees of separation away (with a cms_relations table in the middle). This meant that joins were quite complex, even for conceptually simple cases.
  • The text content itself was embedded in an NITF object stored in the database, requiring run-time XML parsing to explode it out into Drupal fields.
  • Character sets were a challenge. Inevitably, source data that's supposed to be in UTF-8 (or other) isn't consistently so, and it took a great deal of trial-and-error with encoding functions like iconv() to get it right. This is a recurring issue in data migrations.
  • www.economist.com Drupal site makes heavy use of node reference fields. During migrations, you need to relate an article to something that does not exist yet in the database (e.g. an article can have several related articles). Migrate module has built-in support for this. It creates a stub node when the reference does not yet exist. The stub node will get filled in properly later when its information is available.

Break up the project several distinct "migrations"

A migration represents a flow from one set of source data (typically the output of a database query) to a Drupal content type. Destinations can include nodes, taxonomy terms, users, profiles, comments, or private messages. Here are some migrations at economist.com

  • Articles
  • Issues (in the sense of a periodical)
  • Newspapers (our different publications)
  • Customers (users)
  • User roles
  • Blog posts