Category Archives: Archives

Preserving the digital

From physical to digital to…?

At Bristol Culture we aim to collect, preserve and create access to our
collections for use by present and future generations. We are increasingly dealing with digital assets amongst these collections – from photographs of our objects, to scans of the historical and unique maps and plans of Bristol, to born-digital creations such as 3D scans of our Pliosaurus fossil. We are also collecting new digital creations in the form of video artwork.

Photo credit Neil McCoubrey

One day we won’t be able to open these books because they are too fragile – digital will be the only way we can access this unique record of Bristol’s history, so digital helps us preserve the physical and provides access. Inside are original plans of Bristols most historic and well-known buildings including the Bristol Hippodrome, which require careful unfolding and digital stitching to reproduce the image of the full drawing inside.

Plans of the Hippodrome, 1912. © Bristol Culture

With new technology comes new opportunities to explore our specimens and this often means having to work with new file types and new applications to view them.  

This 3D scan of our Pliosaurus jaw allows us to gain new insights into the behavior and biology of this long-extinct marine reptile.

Horizon © Thompson & CraigheadThis digital collage by Thompson & Craghead features streaming images from webcams in the 25 time zones of the world. The work comes with a Mac mini and a USB drive in an archive box and can be projected or shown on a 42″ monitor. Bristol Museum is developing its artist film and video collection and now holds 22 videos by artists including Mariele Neudecker, Wood and Harrison, Ben Rivers, Walid Raad and Emily Jacir ranging from documentary to structural film, performance, web-based film and video and animation, in digital, video and analogue film formats, and accompanying installations.

What could go wrong?

So digital assets are helping us conserve our archives, explore our collections and experience new forms of art, but how do we look after those assets for future generations?

It might seem like we don’t need to worry about that now but as time goes by there is constant technological change; hardware becomes un-usable or non-existent, software changes and the very 1s and 0s that make up our digital assets can be prone to deteriorating by a process known as bitrot!.  Additionally, just as is the case for physical artifacts, the information we know about them including provenance and rights can become dissociated.  What’s more, the digital assets can and must multiply, move and adapt to new situations, new storage facilities and new methods of presentation. Digital preservation is the combination of procedures, technology and policy that we can use to help us prevent these risks from rendering our digital repository obsolete. We are currently in the process of upskilling staff and reviewing how we do things so that we can be sure our digital assets are safe and accessible.

Achieving standards

It is clear we need to develop and improve our strategy for dealing with these potential problems, and that this strategy should underline all digital activity where the result of that activity produces output which we wish to preserve and keep.  To rectify this, staff at the Bristol Archives, alongside Team Digital and Collections got together to write a digital preservation policy and roadmap to ensure that preserved digital content can be located, rendered (opened) and trusted well into the future.

Our approach to digital preservation is informed by guidance from national organisations and professional bodies including The National Archives, the Archives & Records Association, the Museums Association, the Collections Trust, the Digital Preservation Coalition, the Government Digital Service and the British Library. We will aim to conform to the Open Archival Information System (OAIS) reference model for digital preservation (ISO 14721:2012). We will also measure progress against the National Digital Stewardship Alliance (NSDA) levels of digital preservation.

A safe digital repository

We use EMu for our digital asset management and collections management systems. Any multimedia uploaded to EMu is automatically given a checksum, and this is stored in the database record for that asset. What this means is that if for any reason that file should change or deteriorate (which is unlikely, but the whole point of digital preservation is to have a mechanism to detect if this should happen) the new checksum won’t match the old one and so we can identify a changed file.

Due to the size of the repository, which is currently approaching 10Tb, it would not be practical to this manually, and so we use a scheduled script to pass through each record and generate a new checksum to compare with the original. The trick here is to make sure that the whole repo gets scanned in time for the next backup period because otherwise, any missing or degraded files would become the backup and therefore obscure the original. We also need a working relationship with our IT providers and an agreed procedure to rescue any lost files if this happens.

With all this in place, we know that what goes in can come back out in the same state -so far so good. But what we cant control is the constant change in technology for rendering files – how do we know that the files we are archiving now will be readable in the future? The answer is that we don’t unless we can migrate from out of date file types to new ones. A quick analysis of all records tagged as ‘video’ shows the following diversity of file types:

(See the stats for images and audio here).  The majority are mpeg or avi, but there is a tail end of various files which may be less common and we’ll need to consider if these should remain in this format or if we need to arrange for them to be converted to a new video format.

Our plan is to make gradual improvements in our documentation and systems in line with the NDSA to achieve level 2 by 2022:

 

The following dashboard gives an idea of where we are currently in terms of file types and the rate of growth:

Herding digital sheep

Its all very well having digital preservation systems in place, but the staff culture and working practices must also change and integrate with them.

The digitisation process can involve lots of stages and create many files

In theory, all digital assets should line up and enter the digital repository in an orderly and systematic manner. However, we all know that in practice things aren’t so straightforward.

Staff involved in digitisation and quality control need the freedom to be able to work with files in the applications and hardware they are used to without being hindered by rules and convoluted ingestion processes. They should to be allowed to work in a messy (to outsiders) environment, at least until the assets are finalised. Also there are many other environmental factors that affect working practices including rights issues, time pressures from exhibition development, and skills and tools available to get the job done. By layering new limitations based on digital preservation we are at risk of designing a system that wont be adopted, as illustrated in the following tweet by @steube:

So we’ll need to think carefully about how we implement any new procedures that may increase the workload of staff. Ideally, we’ll be able to reduce the time staff take in moving files around by using designated folders for multimedia ingestion – these would be visible to the digital repository and act as “dropbox” areas which automatically get scanned and any files automatically uploaded an then deleted. For this process to work, we’ll need to name files carefully so that once uploaded they can be digitally associated with the corresponding catalogue records that are created as part of any inventory project. Having a 24 hour ingestion routine would solve many of the complaints we hear from staff about waiting for files to upload to the system.

 

Automation can help but will need a human element to clean up and anomalies

 

Digital services

Providing user-friendly, online services is a principle we strive for at Bristol Culture – and access to our digital repository for researchers, commercial companies and the public is something we need to address.

We want to be able to recreate the experience of browsing an old photo album using gallery technology. This interactive uses the Turn JS open source software to simulate page turning on a touchscreen featuring in Empire Through the Lens at Bristol Museum.

Visitors to the search room at Bristol Archives have access to the online catalogue as well as knowledgeable staff to help them access the digital material. This system relies on having structured data in the catalogue and scripts which can extract the data and multiemdia and package them up for the page turning application.

But we receive enquiries and requests from people all over the world, in some cases from different time zones which makes communication difficult. We are planning to improve the online catalogue to allow better access to the digital repository, and to link this up to systems for requesting digital replicas. There are so many potential uses and users of the material that we’ll need to undertake user research into how we should best make it available and in what form.

 

Getting an archival tree-view to sort properly online

The digital team at Bristol Culture face new challenges every day, and with diverse collections come a diverse range of problems when it comes to publishing online. One particularly taxing issue we encountered recently was how to represent and navigate through an archives collection appropriately on the web.

Here’s what Jayne Pucknell, an archivist at the Bristol Record Office, has to say:

“To an archivist, individual items such as photographs are important but it is critical that we are able to see them within their context. When we catalogue a collection, we try to group records into series to reflect their provenance, and the original order in which they were created. These series or groups are displayed as a hierarchical ‘tree view’ which shows that arrangement.”

So far so good – we needed to display this tree-view online, and it just so happens there is a useful open source jquery plugin to help us achieve that, called jsTree.

Capture

The problem we found when we implemented this online, was that the tree view did not display the archive records in the correct order. The default sort was the order in which the records had been created, and although we were able to apply a sort to the records in our source database (EMu), we were unable to find a satisfactory sorting method that returned a numerical sort for the records based on their archival reference number. This is because the archival reference number is made up from a series of sub-numbers reflecting sub collections.

So this gave us a challenge to fix, and the opportunity to fix it was possible because of the EMu API and programming  in between the source database and collections online.  The trick was to write a php function that could reorder the archive tree before it was displayed.

Well, we did that and here’s a breakdown of what that function does:

The function takes 2 arguments – the archival number as a text string, and the level in the archive as an integer.

1.) split the reference number into an its subnumbers
2.) construct a new array from the subnumbers
3.) perform a special sort on the new array that takes into account each subnumber in turn

in theory that’s it – but looking at the code in hindsight there are a whole heap of complexities that would take longer to articulate here than just to past in the code, so lets make it open source and leave you to delve if you wish – here’s the code on Github

Another subtle complexity in this work is described further by Jayne:

“You may search and find an individual photograph and its catalogue entry will explain the specific content of that image, but to understand its wider context it is helpful to be able to consider the collection as a whole. Or you may search and find one photograph of interest but then want to explore other items which came in with that photograph. By displaying the hierarchy, you are more easily able to navigate your way through the whole collection.”

Because of the way our collections online record pages are built – a record does not immediately contain links to all its parents or children. This is problematic when building the archives tree as ideally we wish each node to link to the parent or child depicted. We therefore needed a way to get the link for each related record whilst constructing the tree. Luckily we maintain the tree structure in EMu via the parent field.

The solution was to query the parent field and get the children of that parent, then loop through each child record and add a node to the tree. This process could be repeated up the parents until a record with no parents was reached and this would then become the root node. Because the html markup was the same for each node, this process could be written as a set of functions:

1.) has_parent: take a record number and perfom a  search to see if it has a parent, if it does return the parent id.

2.) return_children: take a record number, search for its child records and return them as an array

2.) child_html: take an array of child records and construct the links for each in html

Taking advice from Jonathan Ainsworth from the University of Leeds Special Collections, who went through similar issues when building their online pages, we decided not to perform this recursively due to the chance of entering an infinite loop or incurring too much processing time. Instead I decided to call the functions for a set number of levels in the tree – this works as we did not expect more than seven levels. The thing to point out is that when you land on a particular record, the hierarchical level could be anything, but the programmed function to build the tree remains the same.

Here’s the result – using some css and the customisable features in jsTree we can indicate which is the selected record by highlighting. We also had to play around with the jsTree settings to enable the selected record to appear, by expanding each of its parent nodes in turn – to be honest it all got a bit loopy!

Capture

….here’s the link to this record on our Collections Online.

Hope this is of use to anyone going through similar issues – on the face of it the problem is a simple one, but as we are coming to learn in team digital – nothing is really ever just simple.