Chuniversiteit logomarkChuniversiteit.nl
Flat Earth

Blursed async PHP: Dumb, but also kind of fun

There are decent, battle-tested ways to write asynchronous PHP, but that’s not what this article is about.

A sinking RMS Titanic, with an iceberg in the background
Avoid sync at all costs

PHP is a fairly forgiving programming language. It doesn’t force you to learn about things like compilation steps, deployment strategies, concurrency, and proper memory management. If you only need a simple web page with some dynamic behaviour, all you have to do is write a PHP script and upload it somewhere!

Of course, as you move on to more complex applications you start to run into some of its limitations. The most jarring one is the fact that there is no straightforward way to run time-consuming parts of your code asynchronously.

For example, let’s say we have a PHP script slow.php which includes a function do_something() that takes a whopping 5 seconds to complete:

We can include slow.php and call do_something() from another script (let’s name this one sync.php):

As expected, this gives us the following output:

Unfortunately, sync.php is pretty slow. It’s not just do_something() that takes five seconds: everything takes five seconds. Someone who tries to load sync.php doesn’t see anything until the entire script has finished its execution. This makes the first line in the output more of a log than a notification or announcement that tells the user what is happening.

Since users probably aren’t as forgiving as PHP, we have to “offload” heavy tasks to something else that can run them in the background. There are various widely used solutions, including:

  • Long-running PHP processes based on Symfony Messenger, Laravel’s queues or “homemade” code, which can process queued tasks as jobs (or messages). This is done synchronously, but since they run separate from the main application this doesn’t affect execution times of HTTP requests.

  • Gearman is an application framework that helps you distribute tasks to other computers. It’s kind of like the first solution, except that Gearman is so ancient that its website isn’t even served over HTTPS.

But what if you are on shared hosting and can’t use any of these solutions?

Running shell commands

Link

If you’re lucky, your hosting provider lets you use functions like exec(), shell_exec() and system(), which make it possible to execute Linux commands from your PHP script.

Here we have another script that we’ll name exec.php:

Rather than including slow.php and calling the do_something() function directly, we execute the script using exec().

Like all other functions in PHP, exec() is synchronous. But that doesn’t matter, because we’ve appended > /dev/null & to our command. This tells the operating system that the command should run in the background and we’re not interested in – and thus don’t want to wait for – any of its output.

A user who accesses the script above from a web browser therefore only sees the following:

This response appears almost immediately! We can see in the server logs that do_something() finished about 5 seconds after the request to exec.php was handled:

Closing connections early

Link

Many hosting providers disable the use of functions like exec() for security reasons. Fortunately, there are other options.

Here we have another script, headers.php, that uses HTTP headers to create the illusion that do_something() is called asynchronously:

HTTP headers can be used to provide invisible instructions to browsers or the web server. The Location header for example can be used to transparently redirect a browser to another web page. But there are many other possible instructions. headers.php includes three such headers:

  • Connection: close tells the browser that the server wants to close the connection. From the user’s perspective, this means that the browser no longer shows a loading spinner, and the “stop” button turns into a “refresh” button.

  • Content-Length tells the browser how much data it can expect before the connection can be closed. Note that the value is based on the length of $output. This means that any content that is generated after the “Bye!” will not be visible in the browser.

  • Web servers are often configured to compress responses for performance reasons. In this case it means that our calculated Content-Length is likely larger than what the server actually sends. Content-Encoding: none disables compression for this request, so that the connection is closed at the right moment.

Finally, flush() makes sure that the headers and echoed $output are sent to the browser immediately. The script doesn’t terminate yet, because it still needs to do_something() – but as far as the user is concerned, it has finished.

The output of headers.php is very similar to that of exec.php:

do_something() was called directly in the script. However, the output generated by that function can only be found in the server logs:

Executing jobs using cron

Link

In the first two solutions, do_something() was called as quickly as possible. While this has its advantages, they can easily overload a server and bring everything to a grinding halt.

There’s a third solution that is safer and more popular: cron-based task execution. cron is a utility on that lets you schedule commands that need to run periodically, e.g. every minute. But wait, do_something() needs to be executed for each request – not periodically. So how does this help us?

Well, we can queue do_something() invocations. The following script (cron-trigger.php in the companion repository) shows how we can queue invocations:

Each time we receive a request, we store some metadata about the task that we want to execute asynchronously. Here we do that by creating files in /tmp, a directory for temporary files. It doesn’t really matter how and where you store the metadata, . You can also use a database if you have one!

queue_job() writes “jobs” to a “queue”, but we still need a script (cron-worker.php) that can process them:

By configuring this script in a so-called crontab, we can make sure that it is executed every minute:

Each time the script runs it will look for “jobs” in the /tmp/jobs directory and invoke the do_something() function for each of them. This is done synchronously and consecutively, so if /tmp/jobs contains a large number of files, .

The output in the browser for cron-trigger.php should look quite familiar by now:

The results of do_something() should be visible in the cron logs:

Of course, this solution isn’t perfect either. You probably need to make sure that jobs are processed in the correct order, exactly once, and don’t have to wait too long.

Choose your poison. ;-)