Digital Incunabula

About the importance of the Archipol project: archiving websites of Dutch political parties
Gerrit Voerman, André Keyzer and Frank den Hollander

 

Introduction
Dutch political parties on the Web
The Archipol project on archiving party websites
International initiatives to archive the Web
Possibilities and problems related to building a Web archive
Conclusion
Contact the authors; Dutch version of this article

 


 

Introduction

"The memory of the Netherlands pulverises". Thus over twenty major libraries raised the alarm in the spring of 1998. In an advertisement in a few national newspapers these libraries asked for hundreds of millions of guilders in order to restore, photograph and digitise books and newspapers. But while efforts are being made to rescue this "paper memory" and retain it for the future, the "digital memory" is allowed to disappear. What is referred to here are not digital databases – the storage of which has received more attention, as was shown by the establishment of the Nederlands Historisch Data Archief (Dutch Historical Data Archives) at the end of the 1980s – but to the building blocks of the World Wide Web: the websites. WWW (hereafter referred to as "Web") was introduced in the early 1990s, but as far as is known nowhere in the world have websites yet been systematically stored and made accessible. Here and there a modest beginning has been made with building a Web archive, but that does not alter the fact that a substantial part of this digital inheritance has already been lost. And for the moment it seems as if this process will continue. According to American historians attending a congress in 1998, "Internet threatens to cause a gap in historiography". For future research into the developments of the "virtual" world of the Web and its relationship to "real" society, there will be a lack of source material.

The Web expands at an enormous pace; according to a conservative estimate, over twenty million new pages appear every month. The number of sites is supposed to increase with more than 150,000 sites each month and would have amounted to 73 million at the beginning of the year 2000. This incredible expansion takes place despite the fact that at the same time many sites disappear: the average lifespan of a site is an estimated 75 days. They are often provided on only one location. If, for whatever reason, the supplier decides to discontinue, the site it is lost forever. In this dynamic process of rise and fall most existing sites are not static either. They keep changing all the time: a few seconds after visiting a site, it may have changed because the supplier added information or a visitor left a message.

Dutch political parties on the Web

"Pantha rhei," one could say, quoting the Greek philosopher Heraclitus living in the 6th century BC. On the Web everything is also in a state of flux as and is subject to continuous change. The same can be said of websites of political parties. Political parties could be found on the Web at a fairly early stage. In January 1994 GroenLinks (Green Leftwing Party) – an environmental-socialist party – was the first party to start a website. In a movement from left to right across the political spectrum the other parties followed suit: Partij van de Arbeid (Labour Party) in November 1994, the left-liberal Democraten 66 (Democrats 66) and Christen Democratisch Appèl (Christian Democratic Party) about the middle of 1995, and the right-liberal Volkspartij voor Vrijheid en Democratie (People's Party for Freedom and Democracy) in the spring of 1997. The last one was the orthodox-christian Staatkundig-Gereformeerde Partij (Political Reformed Party) in the autumn of 2000. Since they first appeared on the Web, most parties have completely restyled their site two or three times. At present nothing has remained of the older versions. Therefore, the first steps taken by the Dutch political parties on the Web cannot be retraced digitally.

On the eve of the general election of May 1998, all parties - with the exception of the SGP - had a website. During the election campaign they did not have many visitors. At most, an estimated 100,000 visited a party Webside in the month preceding election day. In the future, however, websites will undoubtedly become more important to the parties. As party members and voters will increasingly have access to the Internet, party sites will become more and more important in the dissemination of information at grassroots level. This is already indicated, for example, by the changing contents of party newsletters. They tend to develop into magazines which no longer give rather boring information about party affairs such as lectures and the agenda of party meetings, etc. Nowadays information of this kind is published on the website.

Although most sites tend to have a rather top-down approach – that is, aimed at disseminating information – the interactive dimension will probably gain in importance. At a time when parties have great difficulty holding on to and mobilising their members in the traditional way - after all, membership figures of the major parties (PvdA, CDA, VVD and D66) are dropping rapidly – digital forms of participation could very well offer new opportunities, although it should not be regarded as the ultimate panacea. CDA, for example, have taken advantage of this possibility by inviting visitors to contribute to the drafting of the party manifesto for the 2002 national elections. Thus there is every reason to document or - if desired - archive digital presentations of the parties, just like their printed publications. This will certainly be beneficial to researchers of various disciplines (historians, sociologists, political scientists, communication scientists) and journalists.

The Archipol project on archiving party websites

Since parties' websites threaten to vanish altogether sooner or later, the Documentatiecentrum Nederlandse Politieke Partijen (Documentation Centre Political Parties), or DNPP, together with the University Library of the University of Groningen, started making preparations for their preservation. This Archipol project is jointly funded by the University of Groningen and the steering committee Innovatie Wetenschappelijke Informatievoorziening (Innovation of Scientific Information Provision). First of all the project is aimed at the websites of the political parties represented in the Dutch Parliament and those of their subsidiary organisations, especially youth organisations, but also at parties which are not represented in Parliament. Research into the latter category is interesting because it is sometimes claimed that, as a result of the low costs involved in a website, the differences between established parties and newcomers are diminishing. At a later stage sites of provincial parties will be considered, as well as those of national politicians. Incidentally, in the latter category few sites can be found as yet. It is striking, on the other hand, that several members of the European Parliament, who work comparatively far away from home, benefit from their own website.

Below some technical and legal aspects of the project will be discussed, but first a general survey will be given of similar initiatives elsewhere.

International initiatives to archive the Web

By the mid-1990s, here and there people started to think about downloading and storing websites. The most ambitious project was developed by the American computer programmer Brewster Kahle. He set up the San Francisco-based Internet Archive which, since the summer of 1996, has been busy archiving the Internet – as its name suggests – from newsgroups to homepages. In order to achieve this, "Web crawling robots" are used: programmes which find sites by means of external links of other sites and subsequently download them all in their entirety. In this way a random picture of the Internet is provided. By September 2000 a billion unspecified Webpages had been gathered, as well as around 16 million news items - mainly HTLM files, incidentally.
Around the same time in Australia a modest first effort to store websites was made as well. This so-called Pandora project (Preserving and Accessing Networked Documentary Resources in Australia) was set up by the National Library. Within the scope of Pandora, Australian online publications considered important are being archived, including, among other things, a few websites of Australian political parties.
In 1997 the Royal Library of Sweden started the Kulturarw³ Project. The purpose of this project is to archive the Swedish part of the Internet, that is, all URLs with the extension se. Meanwhile several snapshots have been produced, random pictures, involving the storage of around 56,000 websites in total. The digital archive has not yet been made accessible to the general public. Meanwhile the Swedish project has been copied by the National Library in Finland.

In the Netherlands two organisations are involved in archiving parts of the Internet. The Koninklijke Bibliotheek (Royal Library) is in the process of setting up a Depot van Nederlandse Elektronische Publicaties (DNEP) (Repository of Dutch Electronic Publications). In addition to offline digital publications such as CD-ROMs, in the future online publications such as electronic journals, books and articles will also be included in this archive. Meanwhile the Nederlands Uitgeversverbond (Dutch Association of Publishers) approved an arrangement which allows the latter publications to be stored in the DNEP. In addition, specific Web documents can be stored in the Repository. Complete sites are not part of the collection, nor archives of discussion lists.
The Internationaal Instituut voor Sociale Geschiedenis (International Institute of Social History) (IISG) is particularly interested in these archives. Action groups and social movements discovered the Internet early on as a cheap and effective instrument to disseminate their views and mobilise support. Just like the IISG used to collect pamphlets and brochures of these groups, it now collects their messages in newsgroups. In 1995 a start was made with the building of an archive by the name of Occasio which entailed the storage of nearly one thousand Internet newsgroups from the Association for Progressive Communications. The archive, which contains a lot of information about the civil wars in former Yugoslavia and the revolt against Suharto in Indonesia, comprised around one million messages by the beginning of 2000.

Although the problem has received more attention recently, it has also become clear that the building of website archives its still in its infancy. The projects mentioned above are in fact only in their early stages. Moreover, except for the Swedish project, none of them is aimed in particular at archiving websites. The Swedish project has a number of drawbacks, however. First of all, the method used is fairly rough and only involves storing as many sites as possible once or twice a year. This means that in the intervals a lot of information of important sites will still be lost. Furthermore, the archive has not been made accessible with respect to content. The same can be said about the sites stored by Internet Archive.

Possibilities and problems related to building a Web archive

In contrast to the American and Swedish initiatives, the DNPP archiving project is aimed at a specific, limited category of websites. Below a brief description will be given of a few aspects of the project.
The archiving project is still in its initial stages. Its basic principle is that the sites are downloaded, saved and stored, catalogued and made accessible using a computerised and labour-saving method. On the basis of party sites already stored, sites are currently examined for the extent to which they have changed (measured in a percentage of their total size). Depending on the results of this investigation, an archiving standard will be determined. Two "ideal" options are conceivable here: frequent integral archiving of websites versus continuous archiving of all alterations of a site once it has been downloaded. In the first option a site will be downloaded in its entirety at fixed intervals. In the second one all alterations to a downloaded site will be copied and saved to a logfile continuously. Of course all kinds of solutions in-between those options are possible.

In the future, the short lifespan of hardware and software will lead to problems with regard to storage, maintenance and accessibility of the sites that are part of the archive. Also when hardware, programmes and text storage formats of audio and video have become obsolete, it should still be possible to consult the sites stored in the archive. This means that digital archives have to be transferred and converted to a new generation of information carriers and software and hardware systems. Whether this can be done without (slightly) damaging the authenticity and integrity of digital documents remains to be seen, bearing in mind the current state of technology.
Apart from that, it is by no means certain that the websites and their various versions stored in the digital archive can be made accessible for consultation and research through the Web. Archiving digital files by definition means copying. This will automatically lead to the problem of copyright. On being asked, several political parties expressed their willingness to cooperate in this archiving project. Important though their permission is, it must be checked whether there are others who also have to give their permission. A website is a collection of digital files, consisting of text, audio and visual materials, involving several owners of copyright. As a result it may not become standard practice to make copies stored in the archive available online.

Conclusion

In the United States in 1996, the report Preserving Digital Information, compiled by the Commission on Preservation and Access and the Research Library Group, appeared. In this report the authors advocated the creation of a decentralised network of digital archives whose task is to collect and store digital objects such as websites and make them accessible. The importance of a decentralised organisation was emphasised, for "a distributed structure … places archival responsibility with those who presumably care most about and have the greatest understanding of the value of particular digital information objects".

The Archipol project fits in well with this view. Considering its mission and expertise, the DNPP is the obvious organisation to take charge of the storage of websites of Dutch political parties. Although the DNPP can only play a minor part in recording and storing the Web for future reference, it will make a valuable contribution to preserving virtual political culture in the Netherlands.

 


Any comments on this article and the project described are welcome and may be addressed to info@archipol.nl

A Dutch version of this article appeared in De Nieuwste Tijd, 2000, no. 15, 125-131. A slightly adapted version was published in Informatie Professional. Vakblad voor informatiewerkers, 5 (2001), 3 (March), 16-19.
This latter version can also be read at this website: Digitale incunabelen, over het belang van het archiveren van websites van Nederlandse politieke partijen.

back to Publications