Blursed async PHP: Dumb, but also kind of fun
PHP is a fairly forgiving programming language. It doesn’t force you to learn about things like compilation steps, deployment strategies, concurrency, and proper memory management. If you only need a simple web page with some dynamic behaviour, all you have to do is write a PHP script and upload it somewhere!
Of course, as you move on to more complex applications you start to run into some of its limitations. The most jarring one is the fact that there is no straightforward way to run time-consuming parts of your code asynchronously.
For example, let’s say we have a PHP script slow.php
which includes a function
do_something()
that takes a whopping 5 seconds to complete:
We can include slow.php
and call do_something()
from another script (let’s
name this one sync.php
):
As expected, this gives us the following output:
Unfortunately, sync.php
is pretty slow. It’s not just do_something()
that
takes five seconds: everything takes five seconds. Someone who tries to load
sync.php
doesn’t see anything until the entire script has finished its execution.
This makes the first line in the output more of a log than a notification or
announcement that tells the user what is happening.
Since users probably aren’t as forgiving as PHP, we have to “offload” heavy tasks to something else that can run them in the background. There are various widely used solutions, including:
-
Long-running PHP processes based on Symfony Messenger, Laravel’s queues or “homemade” code, which can process queued tasks as jobs (or messages). This is done synchronously, but since they run separate from the main application this doesn’t affect execution times of HTTP requests.
-
Gearman is an application framework that helps you distribute tasks to other computers. It’s kind of like the first solution, except that Gearman is so ancient that its website isn’t even served over HTTPS.
But what if you are on shared hosting and can’t use any of these solutions?
If you’re lucky, your hosting provider lets you use functions like exec()
,
shell_exec()
and system()
, which make it possible to execute Linux commands
from your PHP script.
Here we have another script that we’ll name exec.php
:
Rather than including slow.php
and calling the do_something()
function
directly, we execute the script using exec()
.
Like all other functions in PHP, exec()
is synchronous. But that doesn’t matter,
because we’ve appended > /dev/null &
to our command. This tells the operating
system that the command should run in the background and we’re not interested in
– and thus don’t want to wait for – any of its output.
A user who accesses the script above from a web browser therefore only sees the following:
This response appears almost immediately! We can see in the server logs that
do_something()
finished about 5 seconds after the request to exec.php
was
handled:
Many hosting providers disable the use of functions like exec()
for security
reasons. Fortunately, there are other options.
Here we have another script, headers.php
, that uses HTTP headers to create the
illusion that do_something()
is called asynchronously:
HTTP headers can be used to provide invisible instructions to browsers or the
web server. The Location
header for example can be used to transparently
redirect a browser to another web page. But there are many other possible
instructions. headers.php
includes three such headers:
-
Connection: close
tells the browser that the server wants to close the connection. From the user’s perspective, this means that the browser no longer shows a loading spinner, and the “stop” button turns into a “refresh” button. -
Content-Length
tells the browser how much data it can expect before the connection can be closed. Note that the value is based on the length of$output
. This means that any content that is generated after the “Bye!” will not be visible in the browser. -
Web servers are often configured to compress responses for performance reasons. In this case it means that our calculated
Content-Length
is likely larger than what the server actually sends.Content-Encoding: none
disables compression for this request, so that the connection is closed at the right moment.
Finally, flush()
makes sure
that the headers and echo
ed $output
are sent to the browser immediately. The
script doesn’t terminate yet, because it still needs to do_something()
– but
as far as the user is concerned, it has finished.
The output of headers.php
is very similar to that of exec.php
:
do_something()
was called directly in the script. However, the output generated
by that function can only be found in the server logs:
In the first two solutions, do_something()
was called as quickly as possible.
While this has its advantages, they can easily overload a server and bring
everything to a grinding halt.
There’s a third solution that is safer and more popular: cron-based task execution.
cron
is a utility on that
lets you schedule commands that need to run periodically, e.g. every minute.
But wait, do_something()
needs to be executed for each request – not periodically.
So how does this help us?
Well, we can queue do_something()
invocations. The following script (cron-trigger.php
in the companion repository)
shows how we can queue invocations:
Each time we receive a request, we store some metadata about the task that we
want to execute asynchronously. Here we do that by creating files in /tmp
, a
directory for temporary files. It doesn’t really matter how and where you store
the metadata, . You can also use a database
if you have one!
queue_job()
writes “jobs” to a “queue”, but we still need a script (cron-worker.php
)
that can process them:
By configuring this script in a so-called crontab, we can make sure that it is executed every minute:
Each time the script runs it will look for “jobs” in the /tmp/jobs
directory
and invoke the do_something()
function for each of them. This is done
synchronously and consecutively, so if /tmp/jobs
contains a large number of
files, .
The output in the browser for cron-trigger.php
should look quite familiar by now:
The results of do_something()
should be visible in the cron logs:
Of course, this solution isn’t perfect either. You probably need to make sure that jobs are processed in the correct order, exactly once, and don’t have to wait too long.
Choose your poison. ;-)