Blursed async PHP: Dumb, but also kind of fun

Published: 1 Nov 2021
Written by: Chun Fei Lung

There are decent, battle-tested ways to write asynchronous PHP, but that’s not what this article is about.

Avoid sync at all costs

PHP is a fairly forgiving programming language. It doesn’t force you to learn about things like compilation steps, deployment strategies, concurrency, and proper memory management. If you only need a simple web page with some dynamic behaviour, all you have to do is write a PHP script and upload it somewhere!

Of course, as you move on to more complex applications you start to run into some of its limitations. The most jarring one is the fact that there is no straightforward way to run time-consuming parts of your code asynchronously.

For example, let’s say we have a PHP script slow.php which includes a function do_something() that takes a whopping 5 seconds to complete:

<?php

/**
 * Helper function that makes it easier to see what’s going on.
 *
 * @param string $message Some text that we want to log and display
 */
function show_message(string $message): void
{
    error_log($message);
    echo $message . "\n";
}

/**
 * Something that takes a while to complete.
 *
 * @param string $jobId
 */
function do_something(string $jobId): void
{
    // This makes browsers show the output of this script using a monospaced
    // font
    echo '<pre>';

    show_message("Started  job $jobId at " . date('Y-m-d H:i:s'));

    // This would typically be a more useful task that may take a while to
    // complete
    sleep(5);

    show_message("Finished job $jobId at " . date('Y-m-d H:i:s'));
}

// -----------------------------------------------------------------------------

// $argv is a special variable that contains values that were passed to this
// script if it was called from the command line
if (isset($argv) && count($argv) > 1) {
    $jobId = $argv[1];
    do_something($jobId);
}

We can include slow.php and call do_something() from another script (let’s name this one sync.php):

<?php

require_once 'slow.php';

// Generate a unique-looking ID for our job
$jobId = 'sync-' . bin2hex(random_bytes(5));

echo "<pre>Starting job $jobId at " . date('Y-m-d H:i:s') . "</pre>";

do_something($jobId);

As expected, this gives us the following output:

Starting job sync-113104beaf at 2021-11-01 00:18:49
Started  job sync-113104beaf at 2021-11-01 00:18:49
Finished job sync-113104beaf at 2021-11-01 00:18:54

Unfortunately, sync.php is pretty slow. It’s not just do_something() that takes five seconds: everything takes five seconds. Someone who tries to load sync.php doesn’t see anything until the entire script has finished its execution. This makes the first line in the output more of a log than a notification or announcement that tells the user what is happening.

Since users probably aren’t as forgiving as PHP, we have to “offload” heavy tasks to something else that can run them in the background. There are various widely used solutions, including:

Long-running PHP processes based on Symfony Messenger, Laravel’s queues or “homemade” code, which can process queued tasks as jobs (or messages). This is done synchronously, but since they run separate from the main application this doesn’t affect execution times of HTTP requests.
Gearman is an application framework that helps you distribute tasks to other computers. It’s kind of like the first solution, except that Gearman is so ancient that its website isn’t even served over HTTPS.

But what if you are on shared hosting and can’t use any of these solutions?

Running shell commands

If you’re lucky, your hosting provider lets you use functions like exec(), shell_exec() and system(), which make it possible to execute Linux commands from your PHP script.

Here we have another script that we’ll name exec.php:

<?php

// Generate a unique-looking ID for our job
$jobId = 'exec-' . bin2hex(random_bytes(5));

echo '<pre>';
echo "Starting async job $jobId at " . date('Y-m-d H:i:s') . "\n";

// exec() lets you execute a command on the system. We can use this to execute
// slow.php in another process. Normally, the script that contains the exec()
// call waits until the command has finished. We can “fix” this by adding
// “> /dev/null &” to the command.
exec("php slow.php $jobId > /dev/null &");

echo 'Bye!';

Rather than including slow.php and calling the do_something() function directly, we execute the script using exec().

Like all other functions in PHP, exec() is synchronous. But that doesn’t matter, because we’ve appended > /dev/null & to our command. This tells the operating system that the command should run in the background and we’re not interested in – and thus don’t want to wait for – any of its output.

A user who accesses the script above from a web browser therefore only sees the following:

Starting async job exec-2418be2f4d at 2021-11-01 00:27:19
Bye!

This response appears almost immediately! We can see in the server logs that do_something() finished about 5 seconds after the request to exec.php was handled:

async-php-apache-1  | Started  job exec-2418be2f4d at 2021-11-01 00:27:19
async-php-apache-1  | Finished job exec-2418be2f4d at 2021-11-01 00:27:24

Closing connections early

Many hosting providers disable the use of functions like exec() for security reasons. Fortunately, there are other options.

Here we have another script, headers.php, that uses HTTP headers to create the illusion that do_something() is called asynchronously:

<?php

require_once 'slow.php';

// Generate a unique-looking ID for our job
$jobId = 'headers-' . bin2hex(random_bytes(5));

$output = '<pre>';
$output .= "Starting async job $jobId at " . date('Y-m-d H:i:s') . "\n";
$output .= 'Bye!';

header('Connection: close');
header('Content-Length: ' . strlen($output));
header('Content-Encoding: none');

echo $output;

flush();

do_something($jobId);

HTTP headers can be used to provide invisible instructions to browsers or the web server. The Location header for example can be used to transparently redirect a browser to another web page. But there are many other possible instructions. headers.php includes three such headers:

Connection: close tells the browser that the server wants to close the connection. From the user’s perspective, this means that the browser no longer shows a loading spinner, and the “stop” button turns into a “refresh” button.
Content-Length tells the browser how much data it can expect before the connection can be closed. Note that the value is based on the length of $output. This means that any content that is generated after the “Bye!” will not be visible in the browser.
Web servers are often configured to compress responses for performance reasons. In this case it means that our calculated Content-Length is likely larger than what the server actually sends. Content-Encoding: none disables compression for this request, so that the connection is closed at the right moment.

Finally, flush() makes sure that the headers and echoed $output are sent to the browser immediately. The script doesn’t terminate yet, because it still needs to do_something() – but as far as the user is concerned, it has finished.

The output of headers.php is very similar to that of exec.php:

Starting async job headers-c29a1e216d at 2021-11-01 00:43:48
Bye!

do_something() was called directly in the script. However, the output generated by that function can only be found in the server logs:

async-php-apache-1  | Started  job headers-c29a1e216d at 2021-11-01 00:43:48
async-php-apache-1  | Finished job headers-c29a1e216d at 2021-11-01 00:43:53

Executing jobs using cron

In the first two solutions, do_something() was called as quickly as possible. While this has its advantages, they can easily overload a server and bring everything to a grinding halt.

There’s a third solution that is safer and more popular: cron-based task execution. cron is a utility on Linux (side note: And other Unix(-like) systems) that lets you schedule commands that need to run periodically, e.g. every minute. But wait, do_something() needs to be executed for each request – not periodically. So how does this help us?

Well, we can queue do_something() invocations. The following script (cron-trigger.php in the companion repository) shows how we can queue invocations:

<?php

/**
 * Queue a job with $jobId.
 *
 * Most developers use something like MySQL, Redis, or MongoDB to store jobs,
 * but here I’ve chosen to use the filesystem to keep things simple.
 *
 * @param string $jobId
 */
function queue_job(string $jobId)
{
    // Create a temporary file on the filesystem that tells our cron worker that
    // it should start a job with a certain $jobId
    $directory = '/tmp/jobs';
    if (!is_dir($directory)) {
        mkdir($directory);
    }
    file_put_contents("$directory/$jobId", '');
}

// -----------------------------------------------------------------------------

// Generate a unique-looking ID for our job
$jobId = 'cron-' . bin2hex(random_bytes(5));

echo '<pre>';
echo "Queuing async job $jobId at " . date('Y-m-d H:i:s') . "\n";

queue_job($jobId);

echo 'Bye!';

Each time we receive a request, we store some metadata about the task that we want to execute asynchronously. Here we do that by creating files in /tmp, a directory for temporary files. It doesn’t really matter how and where you store the metadata, as long as it’s virtually instantaneous (side note: You may also care about other dumb requirements like reliability.). You can also use a database if you have one!

queue_job() writes “jobs” to a “queue”, but we still need a script (cron-worker.php) that can process them:

<?php

require_once 'slow.php';

function get_queued_jobs(): array
{
    return array_slice(scandir('/tmp/jobs'), 2);
}

// -----------------------------------------------------------------------------

$jobs = get_queued_jobs();

foreach ($jobs as $jobId) {
    // If two cron executions overlap, the file may no longer exist
    if (!file_exists("/tmp/jobs/$jobId")) {
        continue;
    }

    do_something($jobId);

    // Mark the job as done by removing the file
    unlink("/tmp/jobs/$jobId");
}

echo "Bye!\n";

By configuring this script in a so-called crontab, we can make sure that it is executed every minute:

* * * * * php /var/www/html/cron-worker.php

Each time the script runs it will look for “jobs” in the /tmp/jobs directory and invoke the do_something() function for each of them. This is done synchronously and consecutively, so if /tmp/jobs contains a large number of files, it may take a while before cron-worker.php completes (side note: Shitty solution: make the script stop itself once it exceeds a certain number of “jobs” or amount of time.).

The output in the browser for cron-trigger.php should look quite familiar by now:

Scheduling async job 1635725000-cron-bef2034ad6 at 2021-11-01 01:03:20
Bye!

The results of do_something() should be visible in the cron logs:

Started  job 1635725000-cron-bef2034ad6 at 2021-11-01 01:04:00
Finished job 1635725000-cron-bef2034ad6 at 2021-11-01 01:04:05
Bye!

Of course, this solution isn’t perfect either. You probably need to make sure that jobs are processed in the correct order, exactly once, and don’t have to wait too long.

Choose your poison. ;-)