erikb

TECH

Caching vs. parsing XML, 4/1 in speed this [2004-10-24]

This web page is generated for each access, by parsing an XML file and transforming the XML contents into HTML. All parsing and transformation is done in PHP. This is really nice from a functional point of view. Changes to the look and layout are very easy to do. The strong separation between content (the XML file) and the view (this HTML page) is valuable because it means the two formats are very loosely coupled - I can make changes to the content or view without affecting the other. The parsing and transformation code (the controller) forms the glue between the two formats. MVC for you programmers.

Parsing the XML data for each access is no problem if you have a fast computer. Today almost all computers are fast enough for such a tiny task. Sadly, the server I run on is busy doing a lot of other things - and sometimes you want to have a sound solution that also cares about practical issues and not just the functional ones. So, I decided to test if a cache of the XML contents would be possible. It turns out it was, and easier than expected too.

The functional solution, with parsing of the XML, is able to produce something like 3-4 pages per second. With the cache, the server now is able to do about 12 pages per second. Not too bad, especially considering it only took 15 lines of PHP code and there's no loss of functionality (no manual generation of the cache).

define("CACHE", "/some/path/web/cache"); // must be server writable

function parse($filename) {
  if (file_exists(CACHE)) {
    $origchange = filectime($filename); // when source content was changed
    $cachechange = filectime(CACHE);    // when cache was generated

    if ($origchange < $cachechange) {  // cache is newer than source
      $file = fopen(CACHE, 'r'); 
      $contents = fread($file, filesize(CACHE));
      fclose($file);      
      return unserialize($contents); // return contents of cache
    }
  }
  // we do not use the cache for this lookup
  $data = parsefromfile($filename); // read original data from xml, the old way
  
  $file = fopen(CACHE, 'w');
  fwrite($file, serialize($data)); // save to cache
  fclose($file); 
 
  return $data;
}

With this solution the cache is constructed by a simple serialization of the PHP data structure. For each access, if the cache exists and is newer than the XML contents, I read and unserialize the cache. If I make an update to the contents (the XML file) the cache is automatically re-generated (because it has an older timestamp) and I need not worry about the cache at all. Works out nicely I think.

This unserialization is almost 4 times faster compared to the XML parser. In reality, the unserialization is a lot faster but with all the overhead (generation of HTML etc) this is what I get when I look at the whole process of making this page. If you only compare the cache vs. parsing you would get a factor 10, if not more.

Prev Next