On 15 January 1994 a digital social network, De Digitale Stad (DDS, or the Amsterdam Digital City in English), opened its gates for citizens who wished to participate in one of the (if not the) first online communities in the Netherlands.
The first version of DDS was a simple text-based bulletin board system. The project was originally funded for only 10 weeks by the city of Amsterdam, as an experiment to bridge the gap between local politics and ordinary citizens. DDS turned out to be an enormous success and thus managed to secure further funding to continue beyond the initial experiment.
Fairly soon, a newer version of the DDS was developed that could take advantage of the new Hypertext Transfer Protocol (HTTP) to present itself as an interactive city, with facilities like a post office (email), city squares (directory listings), a café (chat room), and houses (personal home pages).
DDS was an online public space in an era when such services were still mostly limited to universities, libraries, and large companies. Over the years, many communities formed around the fledgling digital city.
Its maintainers assumed that the DDS would be of interest to archaeologists in a distant future, and thus had the foresight to create a full backup of its systems.
This week’s paper discusses how a group of web archaeologists (in a not so distant future) worked on the preservation of the DDS.
Web archiving, as done by the Internet Archive and its Wayback Machine, aims to preserve snippets of the internet as pages or snapshots. This works reasonably well for static pages, but so-called “born-digital material” like the DDS is dynamic: it’s made using scripts, which may yield different output each time they are run. Images and screenshots do not do such material justice.
The authors of the paper thus argue that dynamic pages should be preserved not with the resulting page as the starting point, but with the server that generates it.
Bumps in the road
This is of course easier said than done, even when one has access to full backups. The researchers ran into several issues when they tried to use these backups:
The original backups were written to digital linear tapes, which can only be read with obsolete hardware that’s no longer readily available.
Once the researchers managed to convert the backups to tarballs that can be processed by contemporary machines, they discovered that the backups included four corrupted files that acted as decompression bombs. These four files had to be excluded from the dataset.
Little was known about the system on which DDS ran. Files were all over the place, which made it a challenge to find the source code for the various parts of DDS – if it was included in the backup at all.
What made the search for the right files particularly challenging was the fact that much of the terminology that we have now had not been established yet. For instance, the source code for the DDS’s avatar generator was only found when the researchers learned from interviews with former users that avatars were called “DoDoS”.
It’s shocking how hard it is to reconstruct even the simplest things after a mere 25 years!
Reconstructing the digital city
In 2016 the researchers aimed to revive the DDS. They soon split into two groups: one group would try to run the original version of the software using emulation. The other group would focus on reinstating the user experience, by building a replica of the system, from scratch with modern technologies.
The first group quickly ran into a plethora of issues. The original version of DDS ran on machines with Sun’s SPARC architecture, which is hard to come by these days. Virtualisation, while technically possible, turned out to suffer from awful performance.
The researchers therefore had to resort to manually recompiling DDS’s components for x64. This turned out to be incredibly laborious; not least because much of the tacit knowledge about the original system and its dependencies is lost to time. Some parts would only run with “pragmatic patches”, while others (most notably the system for authentication) do not work at all.
The other group fared a bit better. Although the back end clearly misses out on historical accuracy, the user interface is virtually indistinguishable from the original version, which is all that really matters! Moreover, the replica is more maintainable and secure, which makes it more suitable as a museum exhibit.
The word “archaeology” suggests that we’re digging for things that are very old. From a technical perspective the DDS is indeed quite ancient, but realistically that’s not actually the case here: virtually all of the DDS’s inhabitants are still alive, which means that the DDS’s data poses major privacy issues.
Researchers (or “web archaeologists”) will have to weigh the potential harm their research might cause against the potential gain for society. Moreover, any data must be properly secured, which is nigh impossible with an emulated version of the original system!
- Preservation of digital systems should go beyond their outputs: dynamic behaviour should also be preserved