Digital dust?

I recently listened to a two part BBC radio documentary which was, in turn, inspiring, frightening, and anger provoking. Before considering the documentary let's start with a bit of judicious hyperbole … at least I hope it's hyperbole. Question: Where can I find the history of the 20th century? Answer: What history of the 20th century? In the UK, way back in the early 1980s, you could feel at the cutting edge of the computing revolution if you were the proud owner of a BBC Model B computer. Our North American cousins should substitute Tandy TRS-80, Commodore Pet, or whatever was their favourite electronic box at the time:)

Owners of the BBC Model B would have stored their data either on audio cassette tape (which they would later painfully try to reload back into the computer) or, if they were really cutting edge, they may have owned an external five and a quarter inch floppy disk.

Now fast forward to today and suddenly find you need that data that was stored on the BBC format cassette tape or floppy disk? Still have access to that BBC computer? Have the computer compatible tape deck? Is the tape still usable? Got access to a five and a quarter inch floppy disk? Is the disk still usable? Fast forward another 50 years and what honestly are the chances of being able to recover any data?

The data stored by your vintage computer may not have been of profound cultural or national significance, but it's lost, and our example problem is actually one facing the custodians of the world's knowledge bases today.

The digital world is giving us incredible storage and search potential right at our fingertips or computer desktops but it's an ephemeral world which can easily be built, according to Jeff Rothenberg, on technological quicksand. Note that Rothenberg's paper was written in 1998 before the many digital options that exist today. Today's leading edge delivery technology is, oh so quickly, tomorrow's junk, but, as we are now finding out, even junk matters.

A book written in the 16th century can still be read today whereas a floppy disk full of data from the mid 1980s requires a major investment in data recovery. So it's reassuring to think that the world's libraries at least are 'on the case'.

But the libraries track record isn't actually so good.

Indeed, according to Nicholson Baker's 2002 book Double Fold: Libraries and the Assault on Paper, libraries contributed signficantly to the problem. Baker described how libraries face major storage problems and so became enraptured by the archive potential of new technologies like microfilm and so embarked on a “slum clearance” process which began in the pre-war era until the early 1990s. Major libraries like the US Library of Congress, according to Baker, led the way in the destruction or 'sell-off' of original prints. Why? They assumed that the new technology (microfilm) was new shiny and better for researchers, a debatable point because a reel of film is sequential and therefore discourages browsing. Baker asserts that the second wave of destruction has now begun with the digitisation of books; he argues passionately for print due to its inherent durability and longevity.

Baker was probably a little too hard on modern libraries but, nevertheless, he made a valuable contribution to raising our awareness of a very serious issue. However, I'm not convinced about the longevity of modern print media which may use poor quality paper and ink which can turn information to dust. Richard J Cox's Vandals in the Stacks?: A Response to Nicholas Baker's Assault on Libraries takes issue with Baker's key assertions and his methodology.

New technology brings progress and advantages on the one hand but creates new problems with the other. Think, for instance, of the paperback book you want to lend to a friend. At the moment you hand over the artefact and your friend duly reads said book at their convenience.

Now think of an ebook equivalent in 10 years time. What do you mean you want to share a 'book' with a friend?

Digital rights backed by legislation will do their best to make such sharing difficult. Why? Because of course the ebook text is only being licensed to you and no one else; and digital rights management systems aim to ensure that remains the case. The technology is not the problem here but, arguably, the attempt to establish the conditions for further economic exploitation of the information encapsulated by the technology is becoming so.

Despite technological advances, the ebook equivalents of the iPod (irony deliberate) are unlikely to take take off until publishers finally learn that the ebook has to have advantages (to the users) greater than a paper equivalent. Restricting use and compromising data persistence may temporarily help publishers sleep easily in their beds at night, but to the detriment of users. In a recent Auricle article, Come in book number 3! … your time is up, I described how Sony's Librié could have been an object of desire. Who wants a 'book', however, which sits there like a timebomb waiting to self-detonate when its alloted time is up? Ok … more hyperbole, you just won't be able to read the 'book' any more when it's alloted time is up.

But what of the Web?

If anything that's even more ephemeral.

All users of the Web will have experienced that moment when the find a 'link is dead' – a site that was available only last week is no longer available and its information has disappeared, perhaps forever, into the ether. On the one hand the Internet can provide access to what was a previously unimaginable quantity of data and information. The transient nature of much of that information (or the sites on which it is hosted) can, however, create intense feelings of insecurity for those who occupation or interests rely on the persistence of such resources. It's not unusual now for the Web to become the primary source of reference and unlike a book/periodical there's no ISBN, no ISSN number, or back catalogue to refer to. Or is there?

First, consider Lots of Copies Keep Stuff Safe (LOCKSS), a collaboration involving Stanford University, the National Science Foundation and Sun Microsystems. LOCKSS maintains an infinite cache (one which is never flushed) of electronic journal articles with the cache being replicated by participating libraries. LOCKSS also has a self repair mechanism so that the integrity of the data at each site is maintained thus ensuring that digital content will be preserved and that if one site fails the others can continue to provide access and repair the compromised site. For more information on LOCKSS visit http://lockss.stanford.edu but this article Preserving today's scientific record for tomorrow on the British Medical Journal web site also provides a good overview, and from which I take this quote:

“For librarians whose mission is to transmit today's intellectual, cultural, and historical output to the future, it's fast becoming a nightmare.”

The LOCKSS initiative is also interesting because it recognizes that only by having many copies of a work in circulation is there a hope of preserving the data and information within. Common sense you may think, but compare this to the extreme measures taken to preserve 'rare' knowledge artefacts, e.g. restricted access, environmental control of light, temperature, and humidity. Arguably, the 'lots of copies' approach is the only one which makes sense in the digital world, but of course the challenge is in deciding who has the right to make, archive and disseminate such copies. One view could see data, information and knowledge as belonging to all and dissemination as the right of all, whereas another view perceives data, information and knowledge as commercially valuable and would seek to restrict copying and dissemination.

Second, consider the Internet Archive which to quote:

“… is working to prevent the Internet — a new medium with major historical significance — and other 'born-digital'materials from disappearing into the past. Collaborating with institutions including the Library of Congress and the Smithsonian, we are working to preserve a record for generations to come … with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.”

The Internet Archive has created The Wayback Machine which provides a tool for scholars and researchers to view Web sites as they were, not just how they are now. For example entering “http://www.bath.ac.uk” into the Wayback Machine will enable comparison of the University of Bath's changing Web site from April 1997 onwards. Now think of a researcher doing the same in say 50 years. The Wayback Machine, or any similar initiatives could, undoubtedly, make some governments, institutions or even individuals nervous since a permanent publicly accessible record or information archive is sometimes not desired, e.g. ongoing access to weapons building information post September 11, 2001. In reality it's easy for web sites to exclude themselves from the attentions of The Wayback Machine or have it's entries removed. Of course, it's perhaps dangerous to assume that The Wayback Machine, or its equivalent, will itself exist in 50 years time.

What's the relevance to e-learning? Well, assuming that the juggernaut of digitization is now unstoppable, we have to assume that, in one form or another, the intranet/internet is going to become the primary disseminator of learning artefacts/material/resources … you can even use the 'o' word if you like:) So are these artefacts to be considered ephemeral, disposable, of no historical significance? What about version control and quality assurance? As knowledge matches forward are we to to digitally pulp what was once believed to be true but has now been disproved?

And what about those online discussions, real time chats, instant messaging, weblogs (such as Auricle). What happens when all of these database driven web sites are declared obsolete, 'off message', or whatever?

And what about distributed learning arterfacts/materials/resources where there is no one centralized repository or system and which depend on syndicated 'feeds' via RSS/Atom etc? And what about aggregations of learning resources etc that are formed from the outputs of multiple Web services? Undoubtedly, some will see the only solution as being managed centralized repositories with guaranteed backup. The counter argument will be that it is only via diversity and widespread dissemination can we optimize the opportunities for artefact survival.

Even should the digital artefacts survive there is no guarantee we, or future generations, will be able to make sense of them. Back to our BBC computer example. So let's say you've copied the data on your five and a quarter inch disc to a modern USB memory stick. So load it into say Windows XP or any Linux distribution and what use can you make of it? Absolutely zilch! You need the original hardware and software which made use of this data. Haven't got a BBC computer? Haven't got the software? … Oh dear!

Just such a quandry faced what was once a BBC flagship project called the Domesday Project. In the UK in 1986 the Domesday Project was a relatively big event. It was a project which mobilized communities and nearly a million people across the UK to gather local data. All of course archived in that leading edge technology of the time, a pair of interactive video discs. A couple of years ago the BBC found it could no longer access this disc and so GBP 2.5 million and a whole lot of community effort was about to go to waste. For an account of some of the heroic efforts need to save it have a look at the wonderfully named CAMiLEON site. The Domesday interactive video discs weren't digital artefacts, but the issues are still highly relevant to this Auricle article. As well as what the CAMiLEON site has to say about the importance of emulation to preservation, Stewart Granger's D-Lib Magazine article (October 2000) Emulation as a Digital Preservation Strategy is still worth a read. Stewart was project co-ordinator of the CAMiLEON Project.

So let's finish off where I started with those two excellent BBC radio programmes called Losing the Past. The two programmes pose a possible future where the 20th century could be a new dark age in which future generations won't have access to the knowledge artefacts which will enable them to make sense of what we did?

To quote from the BBC Losing the Past site:

“In our headlong rush to go digital much of our past is becoming just meaningless code of 0s and 1s. A substantial amount of material stored on computers, magnetic tape and even CDs is no longer accessible due to rapid deterioration and obsolescence. The average life of a tape is fifteen years, a CD twenty, computer systems and software far less.”

There's also some really frightening stuff about how governments as custodians of, for instance, irreplaceable census data haven't been doing a very good job. The programme also raises concerns about the care (or lack of it) which digital data with military/political embarassment potential may or may not be getting. It becomes possible, therefore, for the past to be erased and thus history effectively changed. Those lost emails suddenly take on an new significance. And what about that Freedom of Information Act, recently implemented in the UK? It so easily could become the freedom of information we still have available or are not prepared to give you because we will declare business imperatives or confidentiality reasons. And what about the stuff which will now be passed orally with no written audit trail?

We are entering an era in which the issues related to freedom of information, preservation of information and access to information will not be the province of just a few. The growth of the Internet as a medium for communication and information dissemination now brings the debate out from being the province of just a few special interests or specialists, to being one which will affect us all.

Further reading:
Electronic Trail Goes Cold, Mark Tran, The Guardian, March 7, 2002

You can leave a response, or trackback from your own site.
Subscribe to RSS Feed Follow new Auricle posts on Twitter!
error

Enjoy this blog? Please spread the word :)