Fishtrap

php and other stuff I know

22 November 2011
by nat
0 comments

Mysqlnd_qc Query caching in the client

One of the uses for mysqlnd’s plugin architecture that I found most interesting was query caching on the client.

The traditional MySQL Query Cache

The MySQL query cache is a really nice simple feature which helps speed up repeated queries of the same data. A certain portion of memory can be set aside to hold results sets in a look table against the query string. When a SELECT query comes in which the MySQL server has seen before it looks in the query cache and if it has seen the query before it doesn’t touch the table data or indexes instead it simply returns the result set it has in memory which is much faster.  This is particularly useful in web applications where repeated SELECT queries are very common.

Here’s a little script to demo the effect query cache can have

<?php
$mysqli = new mysqli('127.0.0.1', 'root', '', 'war_and_peace');
$mysqli->query('RESET QUERY CACHE');
$querySQL = "SELECT * FROM words WHERE word = 'sausage' ";
$time_start = microtime(true);
if ($results = $mysqli->query($querySQL)) {
 echo "Cold got {$results->num_rows} results in ";
 echo microtime(true) - $time_start,"s",PHP_EOL;
 $results->close();
}
$time_start = microtime(true);
if ($results = $mysqli->query($querySQL)) {
 echo "Server cache got {$results->num_rows} results in ";
 echo microtime(true) - $time_start,"s",PHP_EOL;
 $results->close();
}

Running this script against a completely un-normalised table with the 600 thousand odd words of war and peace (thanks Project Gutenberg) in it gives you

$ php MySQL_QueryCache.php
Cold got 5 results in 0.13019204139709s
Server cache got 5 results in 0.00035595893859863s

Why not put this in the client?

What would make it even faster is if the client did not need to connect to the server at all. If the client had a cache of all queries it performed it could save the time taken for the network round trip and help take load off the server by not even asking the question.

mysqlnd_qc attempts to do this.

There is, however, a major problem though, cache invalidation. In the traditional server based cache a very simple table based cache invalidation algorithm is used. If a query which alters the data in a table is detected all caches of data with that table in the result set are invalidated. This will even work for servers which are part of a replication cluster since the update sql will propagate from the master where the data was updated to the slaves and the slaves’ caches will then be invalidated.

But what if there are multiple clients? One client could update a table while another client still thinks its cache is still valid and so continues to serve it. The situation of two clients is actually relevant even if you don’t have multiple web servers talking to the same database server. It could occur by using something other than PHP and mysqlnd to connect the database. For instance using the command line mysql client (which uses libmysql) any updates made through this would not invalidate other clients query caches.

This problem is seemingly insoluble without some complex mechanism of feeding back to every client which caches are still valid. So the simple answer is to only cache for a few seconds and then only use the caching where slightly stale data is not hugely important.

Therefore by default after installing the extension with a quick

phpize
./configure
make
sudo make install

(you will need to be using mysqlnd obviously to be able to use it)

You will need to prefix all queries you want to cache with

“/*qc=on*/ SELECT * from some_table”

which for portability can be written as “/*” . MYSQLND_QC_ENABLE_SWITCH . “*/”;

e.g.

<?php
$mysqli = new mysqli('127.0.0.1', 'root', '', 'war_and_peace');
$mysqli->query('RESET QUERY CACHE');
$querySQL = "SELECT * FROM words WHERE word = 'sausage' ";
$time_start = microtime(true);
if ($results = $mysqli->query($querySQL)) {
	echo "Cold got {$results->num_rows} results in ";
 	echo microtime(true) - $time_start,"s",PHP_EOL;
    $results->close();
}
$querySQL = "SELECT * FROM words WHERE word = 'sausage' ";
$time_start = microtime(true);
if ($results = $mysqli->query($querySQL)) {
        echo "Server cache got {$results->num_rows} results in ";
        echo microtime(true) - $time_start,"s",PHP_EOL;
    $results->close();
}
$querySQL = "/*" . MYSQLND_QC_ENABLE_SWITCH . "*/";
$querySQL .= "SELECT * FROM words WHERE word = 'sausage' ";
$mysqli->query($querySQL);
$time_start = microtime(true);
if ($results = $mysqli->query($querySQL)) {
        echo "Client cache got {$results->num_rows} results in ";
        echo microtime(true) - $time_start,"s",PHP_EOL;
    $results->close();
}

Running this script will show an even better speed improvement.

$ php MySQL_QueryCache_mysqlnd_qc.php
Cold got 5 results in 0.1253068447113s
Server cache got 5 results in 0.00036311149597168s
Client cache got 5 results in 9.8943710327148E-5s

Win Win Win!

This is pretty impressive! This is using the default settings for the length of cache or time to live. The default setting is 30 seconds and it is set with the ini setting mysqlnd_qc.ttl. Of course after our 30 seconds are up the server cache is still there so we will still be able to make use of that. So in a way we can think of mysqlnd_qc as a way of protecting the db server from excessive repeated queries.

More Options

We are using the default handler which behaves in a similar way to the query cache in the mysql server. There are also other backends available such as APC, memcache and sqlite. Sqlite may well be an excellent choice even though using another db may seem a little weird, the reason is that the results sets are in the form of table data which fits well with sqlite. There is also the possibility of creating you own user handler which can use a custom invalidation algorithm there is an example at http://uk3.php.net/manual/en/mysqlnd-qc.set_user_handlers.php

4 October 2011
by nat
0 comments

Parallel PHP processes with pcntl_fork

Forking the PHP process requires the pcntl extension. This extension is limited to *nix operating systems. Once installed the key function is pcntl_fork(). The manual contains this excellent example of its use

$pid = pcntl_fork();
 if ($pid == -1) {
      die('could not fork');
 } else if ($pid) {
 // we are the parent
     pcntl_wait($status); //Protect against Zombie children
 } else {
      // we are the child
 }

if we insert a known delay in there and time it we can get an idea of whether we are running in parallel.

$pid = pcntl_fork();
 if ($pid) {
      sleep(10);
     pcntl_wait($status);
} else {
     sleep(10);
}
Mr-McHughs-MacBook-Pro:$ time php pcntl_fork.php
real	0m10.060s
user	0m0.020s
sys	0m0.025s

Excellent so it would appear we are running two PHP processes in parallel. Using a sleep() in this way we model an expensive and parallelisable function call for instance a heavy calculation or http request. By using a known delay instead of the real call we can work out how long it should take to execute and check our logic.

What about more than two processes

The first thing I tried was to put the previous code in a for loop with a couple of obvious alterations

for ($i=0; $i < 5; $i++) {
        $pid = pcntl_fork();
        if ($pid) {
                pcntl_wait($status);
        } else {
                echo 'starting child ',$i,PHP_EOL;
                sleep(10);
                die();
        }
}

The major alteration you will notice is that we exit or die after each child has done it’s thing. Otherwise each child will go through the remaining loops of the for loop and the script will run for a very long time. The other change is that we are no longer doing any work in the parent. There is no real reason for this other than we would always have to remember to add one to the the number of calls we were expecting to make. Even with these alterations this script has a major problem, one, that you get a good idea about if you watch it run.

Mr-McHughs-MacBook-Pro:$ time php pcntl_fork.php
starting child 0
starting child 1
starting child 2
starting child 3
starting child 4
real	0m50.113s
user	0m0.038s
sys	0m0.056s

Watching it in the terminal it becomes obvious that each child only gets started after the previous one has finished. The problem is the call to pcntl_wait($status) in the parent section. What this is doing is waiting for each child to end and hence stopping execution of the parent until it receives a signal to say a child has finished.

The solution is to put all calls to pcntl_wait in the parent but outside the loop that forks each process. The above script fixed is.

for ($i=0; $i < 5; $i++) {
        $pid = pcntl_fork();
       if ($pid) {
        } else {
                echo 'starting child ',$i,PHP_EOL;
                sleep(10);
                die();
        }
}
for ($i=0; $i <5; $i++) {
        pcntl_wait($status);
}

This way we run all 5 sleep commands in parallel

Mr-McHughs-MacBook-Pro:$ time php pcntl_fork.php
starting child 0
starting child 1
starting child 2
starting child 3
starting child 4
real	0m10.077s
user	0m0.037s
sys	0m0.056s

The neatest solution is to use an array to register the pid of each child we create and then check it to see if we have any open children.

<?php
$forks = array();
for ($i=0; $i < 5; $i++) {
        $pid = pcntl_fork();
        if ($pid == -1) {
                die('could not fork');
        } else if (0 === $pid) {
                $forks[] = $pid;
                echo 'starting child ',$i,PHP_EOL;
                sleep(10);
                die();
        }
}
do {
        pcntl_wait($status);
} while (count($forks) > 0);

3 October 2011
by nat
0 comments

Extension to find the number of processors with PHP

While working on a script to generate fractals in PHP I wanted to try and run the intensive calculations in parallel. The script I came up with has a fair amount of overhead in spawning and managing a thread since they actually each write a file to disk. So the logical optimal number of threads was the number of available processor cores. I was surprised to find there didn’t seem to be an easy way to get the number of processors in PHP. I found a short C script that seemed to do the trick. So I knocked together a quick PHP extension called num_procs. It seems to work on linux and OSX which is good enough for me.

The code is on github

To install just do

git clone git://github.com/natmchugh/PHP-Num-Processors.git
cd PHP-Num-Processors
phpize
./configure
make
sudo make install

Then add …

extension=num_procs.so

.. to your php.ini

It exports two functions

$available = num_processors_available();
var_dump($available); // int(2)
$configured = num_processors_configured();
var_dump($configured); //int(2)

The effect of parallelising the calculations was pretty dramatic by the way speed up execution by 50% at least. I intend to write a post on that sometime.

Zooming in on the Mandlebrot set

25 September 2011 by nat | 0 comments

Here’s a video I showed a couple of times while talking about escape time fractals in PHP. The video was made by using PHP to generate the mandlebrot set set centred on a point and the decreasing the scale logarithmically.

12 August 2011
by nat
0 comments

CURL less POST requests in PHP

Libcurl is a great library with a whole host of features. In PHP it is accessible with the php_curl extension. Although it’s fully features it has a rather ugly and clumsy api. Added to this php_curl is often not part of the default install of PHP.

There is another option. In PHP you probably know you can do requests with the streams extension and use functions like file_get_contents to retrieve a url contents.

<?php
$page = file_get_contents('http://fishtrap.co.uk');

Since PHP 5.0 you can also set a stream_context to include POST data. This can include files and other multipart form data.

I created a short script to try add a more intuitive and usable wrapper around the functions. To use it you just need to do:

<?php
include('StreamsHttpPost.php');
$request = new StreamsHttpPost('http://example.com');
$data = array(
'foo' => 'bar',
'baz' => 'bat',
);
$request->addFile('picture',  'path/to/file');
$page = $request->post($data);
var_dump($request->getResponseCode()); // int(302)

This will then send a post request as if the user had submitted a form with the form fields.

The code is here on github. Comments, pull requests and corrections welcome.