Tom Butler's programming blog

A guide to Transphporm's caching and recent improvements

Better caching

It's been a while since I did a major update to Transphporm. One thing that was lacking was Transphporm's caching system. The most recent update now allows for 4 different levels of caching. I'm going to explain a bit of what Transphporm does behind the scenes and how you can implement very finely tuned caching in the template system without needing to rely on HTTP layer caching systems like Varnish which are too far removed from the business logic to be useful in a lot of cases.

With HTTP level caches you either require frequent full-page refreshes or need some logic linking the cache layer to the business logic in order to update different blocks of the page at different frequencies. Because the cache layer is far removed from the display logic and business logic, solutions for achieving this are an ugly hackfest that involves mixing concerns at completely different ends of the application.

Transphporm puts all caching concerns in the view layer making it easy to update different sections of the page at different frequencies.

To demonstrate the different types of caching available, here's a basic script for displaying a product page on an e-commerce website using Transphporm, I'll implement the different types of available caching on this page.

$template = new \Transphporm\Builder('template.xml''sheet.tss');

echo 
$template->output($data)->body;

Where sheet.tss looks like this:

.product .price {
    
contentdata(getProduct().price)
}
.
product .name {
    
contentdata(getProduct().name)
}
.
product .description {
    
contentdata(getProduct().description);
    
formathtml;
}

footer .copyright {
    
content"now";
    
formatdate "Y"
}

Firstly, here's a brief breakdown of each of the caching levels:

0. No caching

This is the default behaviour (and should only be used for testing and low traffic websites). Without any caching, each time this script is run, Transphporm will do the following:

  1. Open and parse sheet.tss into a PHP data structure.
  2. Parse the XML file into a DomDocument object for performing transformations on
  3. Iterate over each of the parsed rules and apply the changes to the XML document
  4. Call getProduct() in the supplied $data object. This could go off and query the database to fetch the product information.

This is quite a lot of work and mostly will be identical on every request. To prevent the exact same process happening on each request, you can implement caching

1. TSS Caching

This works in a similar way to PHP's opcache. Instead of loading and parsing the TSS file on each request, the result of parsing the TSS file is cached between requests. This is very simple to implement and requires very little extra code. It's recommended for any real project as there are no downsides near zero additional configuration.

To do this you need to supply an object which implements PHP's ArrayAccess interface and persistently stores the data it's given. I suggest SimpleCache (or at least looking at it for an example) which can be used to very easily cache data in files. You could easily write a wrapper for memcached or redis.

To enable TSS caching, you can use the following code:

$cache = new \SimpleCache\SimpleCache('./cache');

$template = new \Transphporm\Builder('template.xml''sheet.tss');

$template->setCache($cache);

echo 
$template->output($data)->body;

With the cache set, the following happens when the script runs:

  1. The parsed TSS is loaded from the cache (or parsed and then cached if the file has changed since it was cached)
  2. Parse the XML into a DomDocument object for performing transformations on
  3. Iterate over each of the parsed rules and apply the changes to the XML document
  4. Call getProduct() in the supplied $data object. This could go off and query the database.

2. update-frequency

Transphporm supports the update-frequency property which can be used to make it so rules are only executed at certain intervals.

When caching is enabled, every time a template is rendered, the rendered HTML is stored in the cache. This is then used instead of loading the blank template each time the page is viewed.

As an example, the copyright year does not need to be updated every single time the page is viewed as it's only going to change once a year. So that it gets updated on January 1st, it can be updated daily. Once a day, Transphporm will execute the footer TSS rule and refresh the date in the cached template.

.product .price {
    
contentdata(getProduct().price)
}
.
product .name {
    
contentdata(getProduct().name)
}
.
product .description {
    
contentdata(getProduct().description);
    
formtathtml;
}

footer .copyright {
    
content"now";
    
formatdate "Y";
    
update-frequency1d;
}

With update-frequency: 1d set, the rule for footer .copyright is only run once a day. Each time the template is loaded, the transformations are applied to the output of the last run. Transphporm now does the following:

  1. The parsed TSS is loaded from the cache
  2. The HTML output from the last run (with the content replaced, e.g. 2018 in the footer already) is loaded from cache
  3. Iterate over each of the parsed rules but skip over any which don't need updating, if it's been less tha a day since the footer content was replaced, none of the logic for replacing (or even finding the footer .copyright element) will run.
  4. Call getProduct() in the supplied $data object. This could go off and query the database

By applying update-frequency to rules that trigger a database query, the result of the database query can be cached:

.product .price {
    
contentdata(getProduct().price);
    
update-frequency30m;
}
.
product .name {
    
contentdata(getProduct().name);
    
update-frequency30m;
}
.
product .description {
    
contentdata(getProduct().description);
    
formtathtml;
    
update-frequency30m;
}

footer .copyright {
    
content"now";
    
formatdate "Y";
    
update-frequency1d;
}

With update-frequency set on each of the rules that fetch data about the product, the rules for the product information will run every 30 minutes and the following happens:

  1. The parsed TSS is loaded from the cache
  2. The HTML output from the last run (with the content replaced) is loaded from cache
  3. Iterate over each of the parsed rules but skip over any which don't need updating
  4. getProduct() will only be called every 30 minutes, if it queries the database, the query will only run once every 30 minutes. Otherwise the cached output from the last run is returned.

With update-frequency you can specify how often each section of the page needs to be refreshed.

In the example above, every rule has an update frequency. However, until the new version of Transphporm, it would still need to load the parsed TSS, iterate over the rules, parse the XML into a DomDocument object and instantiate all the classes Transphporm needs to run.

3. *New* More intelligent update-frequency

In the new version of Transphporm a further enhancement is made. Previously, Transphporm would still need to load the parsed TSS, create a DomDocument object, iterate over the rules to see if they needed updated, set up and configure a lot of internal objects (and load dozens classes via the autoloader).

As of the new version you can use update-frequency to avoid 99% of Transphorm's code and do a very simple key based cache lookup.

If there is nothing in the template that needs updating. Effectively, all the code that's run is this:

if ($minimumUpdateFrequencyHasPassed) {
    
//do all the processing to generate the template
}
else {
    return 
file_get_contents('cachedoutput.html');
}

When this happens, 99% of Transphporm's logic is skipped entirely, most of the classes aren't autoloaded and all that happens is a very simple key based cache lookup.

To enable this level of caching, you must set an update-frequency on every rule.

When you visit the page, this is the process that runs:

  1. The minimum update-frequency for this template is read from the cache
  2. If nothing needs updating, return the cached output skipping most of the logic
  3. Otherwise, continue as per cache level 2 and update any blocks which require updating

4. *New* Per record caching

If we assume that the data object looks like this:

class ProcductModel {
    private 
$pdo;
    public function 
__construct($pdo) {
        
$this->pdo $pdo;
    }

    public function 
getProduct() {
        
$stmt $this->pdo->prepare('SELECT * FROM product WHERE id = :id');
        
$stmt->execute([
            
'id' => $_GET['id']
        ]);

        return 
$stmt->fetch();
    }
}

and there was a page for displaying each product in the template e.g. product.php that contained the code:

$cache = new \SimpleCache\SimpleCache('./cache');

$template = new \Transphporm\Builder('template.xml''sheet.tss');

$template->setCache($cache);

$data = new ProductModel($pdo);

echo 
$template->output($data)->body;

The TSS above won't work as intended

.product .price {
    
contentdata(getProduct().price);
    
update-frequency30m;
}
.
product .name {
    
contentdata(getProduct().name);
    
update-frequency30m;
}
.
product .description {
    
contentdata(getProduct().description);
    
formtathtml;
    
update-frequency30m;
}

footer .copyright {
    
content"now";
    
formatdate "Y";
    
update-frequency1d;
}

Even though the product data for an individual product is unlikely to change frequently, it's not possible to update the product details only every 30 minutes because `getProduct` won't always return the same product. If you visited product.php?id=3 the following would happen:

The output for the template (displaying product 3) would be cached and when you visited, e.g. product.php?id=3. Each time you viewed the page the cached output would be displayed.

However, because Transphporm has been instructed to update the template every 30 minutes, if you then visited product.php?id=4 to display a different product, and the 30 minutes hadn't expired, Transphporm would load the cached template still containing data for the product with the ID 3.

In earlier versions of Transphporm there was no way to avoid this other than use update-frequency: always (or omit update-frequency entirely) to force the rule to run every time the page was viewed.

This meant that every time you viewed a product page, it had to query the database and render the template. And since the product information is not likely to change often, this is a lot of wasted CPU cycles.

A few methods of avoiding this are:

  1. Implementing caching at the HTTP/routing level (e.g. Varnish) but this causes its own set of issues when you want users to see unique content. For example, a product page and a shopping basket. The cache hit rate for a user viewing a a product page with the same items in their basket is tiny and mostly will need reloading each page view.

    You need some logic linking the cache layer to the business logic in order to update different blocks of the page at different frequencies, needing to mix logic from different layers in the application.

  2. Caching in the model. Rather than query the database, the model's getProduct() method could be cached so that the data for the most popular products is kept in memory and not requested from the database. This adds extra logic to the model which needs implementing manually for each method and the whole template still has to be rendered on every page view.

@cacheKey

Transphorm now supports a new processing instruction @cacheKey which allows specifying a value to cache the page by. This allows caching of the rendered product template for product 1, product 2, product 3 etc.

To enable per-page caching you can add @cacheKey 'value'; to the TSS file. The value can be a string or a lookup in the data object supplied to the template.

The following TSS allows per-product caching:

@cacheKey getId();

.
product .price {
    
contentdata(getProduct().price);
    
update-frequency30m;
}
.
product .name {
    
contentdata(getProduct().name);
    
update-frequency30m;
}
.
product .description {
    
contentdata(getProduct().description);
    
formtathtml;
    
update-frequency30m;
}

footer .copyright {
    
content"now";
    
formatdate "Y";
    
update-frequency1d;
}

As long as the model exports the data point at which you wish to cache by, in this case for each unique product by its ID, you can cache per record

The model would look like this:

class ProcductModel {
    public function 
getId() {
        return 
$_GET['id'];
    }

    public function 
getProduct() {
        
$stmt $this->pdo->prepare('SELECT * FROM product WHERE id = :id');
        
$stmt->execute([
            
'id' => $_GET['id']
        ]);

        return 
$stmt->fetch();
    }
}

This is a crude demonstrative example, in reality you would want to avoid calls to $_GET in the model and use $this->id and a corresponding constructor argument.

With this model and TSS Transphorm will now do the following on a request to the page e.g .product.php?id=3

  1. The parsed TSS is loaded from the cache
  2. Transphporm calls getId() in the model and reads the cached page for the product with the id 3.
  3. If any of the rules need running, the template is parsed into a DomDocument any rules are executed.
  4. Alternatively, return the cached product page as a string and do zero processing.

The model has to be instantiated and getId() will be called on every page view but other than that, viewing a product is a straight cache by key lookup.

If you did want to implement a shopping basket that was updated on every page, you can do this:

@cacheKey getId();

.
product .price {
    
contentdata(getProduct().price);
    
update-frequency30m;
}
.
product .name {
    
contentdata(getProduct().name);
    
update-frequency30m;
}
.
product .description {
    
contentdata(getProduct().description);
    
formtathtml;
    
update-frequency30m;
}

footer .copyright {
    
content"now";
    
formatdate "Y";
    
update-frequency1d;
}

.
basket .total {
    
content"£"data(getTotal());
    
formatdecimal 2;
    
update-frequencyalways;
}

.
basket .numitems {
    
contentgetNumItems();
    
update-frequencyalways;
}

In this example, the shopping basket rules would run on every page view but each product would only be queried from the database every 30 minutes and the footer copyright year would only be updated every day.

Conclusion

This gives you very fine-grained control over which elements are updated when and under what circumstances:

  • For static pages such as home/about/contact pages, you can use update-frequency to refresh the cache from the database periodically. Most requests will just do a simple cache lookup and transphporm will do zero processing other than loading the cached page
  • For templates that handle multiple records e.g. product.php?id=3 requring a different cached page to product.php?id=4 you can use update-frequency and @cacheKey to cache the template per product (or article, category, post, message, etc).
  • For templates which use per-user or unique data as well as cached such as a product page with a shopping basket where the product information is cached between pageviews but the shopping basket is refreshed every time, you can use update-frequency: always on the elements you wish to refresh every page view.