Monthly Archives: May 2018

Preserving the digital

From physical to digital to…?

At Bristol Culture we aim to collect, preserve and create access to our
collections for use by present and future generations. We are increasingly dealing with digital assets amongst these collections – from photographs of our objects, to scans of the historical and unique maps and plans of Bristol, to born-digital creations such as 3D scans of our Pliosaurus fossil. We are also collecting new digital creations in the form of video artwork.

Photo credit Neil McCoubrey

One day we won’t be able to open these books because they are too fragile – digital will be the only way we can access this unique record of Bristol’s history, so digital helps us preserve the physical and provides access. Inside are original plans of Bristols most historic and well-known buildings including the Bristol Hippodrome, which require careful unfolding and digital stitching to reproduce the image of the full drawing inside.

Plans of the Hippodrome, 1912. © Bristol Culture

With new technology comes new opportunities to explore our specimens and this often means having to work with new file types and new applications to view them.  

This 3D scan of our Pliosaurus jaw allows us to gain new insights into the behavior and biology of this long-extinct marine reptile.

Horizon © Thompson & CraigheadThis digital collage by Thompson & Craghead features streaming images from webcams in the 25 time zones of the world. The work comes with a Mac mini and a USB drive in an archive box and can be projected or shown on a 42″ monitor. Bristol Museum is developing its artist film and video collection and now holds 22 videos by artists including Mariele Neudecker, Wood and Harrison, Ben Rivers, Walid Raad and Emily Jacir ranging from documentary to structural film, performance, web-based film and video and animation, in digital, video and analogue film formats, and accompanying installations.

What could go wrong?

So digital assets are helping us conserve our archives, explore our collections and experience new forms of art, but how do we look after those assets for future generations?

It might seem like we don’t need to worry about that now but as time goes by there is constant technological change; hardware becomes un-usable or non-existent, software changes and the very 1s and 0s that make up our digital assets can be prone to deteriorating by a process known as bitrot!.  Additionally, just as is the case for physical artifacts, the information we know about them including provenance and rights can become dissociated.  What’s more, the digital assets can and must multiply, move and adapt to new situations, new storage facilities and new methods of presentation. Digital preservation is the combination of procedures, technology and policy that we can use to help us prevent these risks from rendering our digital repository obsolete. We are currently in the process of upskilling staff and reviewing how we do things so that we can be sure our digital assets are safe and accessible.

Achieving standards

It is clear we need to develop and improve our strategy for dealing with these potential problems, and that this strategy should underline all digital activity where the result of that activity produces output which we wish to preserve and keep.  To rectify this, staff at the Bristol Archives, alongside Team Digital and Collections got together to write a digital preservation policy and roadmap to ensure that preserved digital content can be located, rendered (opened) and trusted well into the future.

Our approach to digital preservation is informed by guidance from national organisations and professional bodies including The National Archives, the Archives & Records Association, the Museums Association, the Collections Trust, the Digital Preservation Coalition, the Government Digital Service and the British Library. We will aim to conform to the Open Archival Information System (OAIS) reference model for digital preservation (ISO 14721:2012). We will also measure progress against the National Digital Stewardship Alliance (NSDA) levels of digital preservation.

A safe digital repository

We use EMu for our digital asset management and collections management systems. Any multimedia uploaded to EMu is automatically given a checksum, and this is stored in the database record for that asset. What this means is that if for any reason that file should change or deteriorate (which is unlikely, but the whole point of digital preservation is to have a mechanism to detect if this should happen) the new checksum won’t match the old one and so we can identify a changed file.

Due to the size of the repository, which is currently approaching 10Tb, it would not be practical to this manually, and so we use a scheduled script to pass through each record and generate a new checksum to compare with the original. The trick here is to make sure that the whole repo gets scanned in time for the next backup period because otherwise, any missing or degraded files would become the backup and therefore obscure the original. We also need a working relationship with our IT providers and an agreed procedure to rescue any lost files if this happens.

With all this in place, we know that what goes in can come back out in the same state -so far so good. But what we cant control is the constant change in technology for rendering files – how do we know that the files we are archiving now will be readable in the future? The answer is that we don’t unless we can migrate from out of date file types to new ones. A quick analysis of all records tagged as ‘video’ shows the following diversity of file types:

(See the stats for images and audio here).  The majority are mpeg or avi, but there is a tail end of various files which may be less common and we’ll need to consider if these should remain in this format or if we need to arrange for them to be converted to a new video format.

Our plan is to make gradual improvements in our documentation and systems in line with the NDSA to achieve level 2 by 2022:

 

The following dashboard gives an idea of where we are currently in terms of file types and the rate of growth:

Herding digital sheep

Its all very well having digital preservation systems in place, but the staff culture and working practices must also change and integrate with them.

The digitisation process can involve lots of stages and create many files

In theory, all digital assets should line up and enter the digital repository in an orderly and systematic manner. However, we all know that in practice things aren’t so straightforward.

Staff involved in digitisation and quality control need the freedom to be able to work with files in the applications and hardware they are used to without being hindered by rules and convoluted ingestion processes. They should to be allowed to work in a messy (to outsiders) environment, at least until the assets are finalised. Also there are many other environmental factors that affect working practices including rights issues, time pressures from exhibition development, and skills and tools available to get the job done. By layering new limitations based on digital preservation we are at risk of designing a system that wont be adopted, as illustrated in the following tweet by @steube:

So we’ll need to think carefully about how we implement any new procedures that may increase the workload of staff. Ideally, we’ll be able to reduce the time staff take in moving files around by using designated folders for multimedia ingestion – these would be visible to the digital repository and act as “dropbox” areas which automatically get scanned and any files automatically uploaded an then deleted. For this process to work, we’ll need to name files carefully so that once uploaded they can be digitally associated with the corresponding catalogue records that are created as part of any inventory project. Having a 24 hour ingestion routine would solve many of the complaints we hear from staff about waiting for files to upload to the system.

 

Automation can help but will need a human element to clean up and anomalies

 

Digital services

Providing user-friendly, online services is a principle we strive for at Bristol Culture – and access to our digital repository for researchers, commercial companies and the public is something we need to address.

We want to be able to recreate the experience of browsing an old photo album using gallery technology. This interactive uses the Turn JS open source software to simulate page turning on a touchscreen featuring in Empire Through the Lens at Bristol Museum.

Visitors to the search room at Bristol Archives have access to the online catalogue as well as knowledgeable staff to help them access the digital material. This system relies on having structured data in the catalogue and scripts which can extract the data and multiemdia and package them up for the page turning application.

But we receive enquiries and requests from people all over the world, in some cases from different time zones which makes communication difficult. We are planning to improve the online catalogue to allow better access to the digital repository, and to link this up to systems for requesting digital replicas. There are so many potential uses and users of the material that we’ll need to undertake user research into how we should best make it available and in what form.

 

Culture KPIs

There are various versions of a common saying that ‘if you don’t measure it you can’t manage it’. See Zak Mensah’s (Head of Transformation at Bristol Culture) tweet below. As we’ll explain below we’re doing a good job of collecting a significant amount of Key Performance Indicator data;  however, there remain areas of our service that don’t have KPIs and are not being ‘inspected’ (which usually means they’re not being celebrated). This blog is about our recent sprint to improve how we do KPI data collection and reporting.

The most public face of Bristol Culture is the five museums we run (including Bristol Museum & Art Gallery and M Shed), but the service is much more than its museums. Our teams include, among others; the arts and events team (who are responsible the annual Harbour Festival as well as the Cultural Investment Programme which funds over 100 local arts and cultural organisations in Bristol); Bristol Archives; the Modern Records Office; Bristol Film Office and the Bristol Regional Environmental Recording Centre who are responsible for wildlife and geological data for the region.

Like most organisations we have KPIs and other performance data that we need to collect every year in order to meet funding requirements e.g. the ACE NPO Annual Return. We also collect lots of performance data which goes beyond this, but we don’t necessarily have a joined up picture of how each team is performing and how we are performing as a whole service.

Why KPIs?

The first thing to say is that they’re not a cynical tool to catch out teams for poor performance. The operative word in KPI is ‘indicator’; the data should be a litmus test of overall performance. The second thing is that KPIs should not be viewed in a vacuum. They make sense only in a given context; typically comparing KPIs month by month, quarter by quarter, etc. to track growth or to look for patterns over time such as busy periods.

A great resource we’ve been using for a few years is the Service Manual produced by the Government Digital Service (GDS) https://www.gov.uk/service-manual. They provide really focused advice on performance data. Under the heading ‘what to measure’, the service manual specifies four mandatory metrics to understand how a service is performing:

  • cost per transaction– how much it costs … each time someone completes the task your service provides
  • user satisfaction– what percentage of users are satisfied with their experience of using your service
  • completion rate– what percentage of transactions users successfully complete
  • digital take-up– what percentage of users choose … digital services to complete their task

Added to this, the service manual advises that:

You must collect data for the 4 mandatory key performance indicators (KPIs), but you’ll also need your own KPIs to fully understand whether your service is working for users and communicate its performance to your organisation.

Up until this week we were collecting the data for the mandatory KPIs but they have been  somewhat buried in very large excel spreadsheets or in different locations.  For example our satisfaction data lives on a surveymonkey dashboard. Of course, spreadsheets have their place, but to get more of our colleagues in the service taking an interest in our KPI data we need to present it in a way they can understand more intuitively. Again, not wanting to reinvent the wheel, we turned to the GDS to see what they were doing. The service dashboard they publish online has two headline KPI figures followed below with a list of the departments which you can click into to see KPIs at a department level.

Achieving a new KPI dashboard

As a general rule, we prefer to use open source and openly available tools to do our work, and this means not being locked into any single product. This also allows us to be more modular in our approach to data, giving us the ability to switch tools or upgrade various elements without affecting the whole system. When it comes to analysing data across platforms, the challenge is how to get the data from the point of data capture to the analysis and presentation tech – and when to automate vs doing manual data manipulations. Having spent the last year shifting away from using Excel as a data store and moving our main KPIs to an online database, we now have a system which can integrate with Google Sheets in various ways to extract and aggregate the raw data into meaningful metrics. Here’s a quick summary of the various integrations involved:

Data capture from staff using online forms: Staff across the service are required to log performance data, at their desks, and on the move via tablets over wifi. Our online performance data system provides customised data entry forms for specific figures such as exhibition visits. These forms also capture metadata around the figures such as who logged the figure and any comments about it – this is useful when we come to test and inspect any anomalies. We’ve also overcome the risk of saving raw data in spreadsheets, and the bottleneck often caused when two people need to log data at the same time on the same spreadsheet.

Data capture directly from visitors: A while back we moved to online, self-completed visitor surveys using SurveyMonkey and these prompt visitors to rate their satisfaction. We wanted the daily % of satisfied feedback entries to make its way to our dashboard, and to be aggregated (both combined with data across sites and then condensed into a single representative figure). This proved subtly challenging and had the whole team scratching our heads at various points thinking about whether an average of averages actually meant something, and furthermore how this could be filtered by a date range, if at all.

Google Analytics:  Quietly ticking away in the background of all our websites.

Google sheets as a place to join and validate data: It is a piece of cake to suck up data from Google Sheets into Data Studio, provided it’s in the right format. We needed to use a few tricks to bring data into Google Sheets, however, including Zapier, Google Apps Script, and sheets Add-ons.

Zapier: gives us the power to integrate visitor satisfaction from SurveyMonkey into Google Sheets.

Google apps script: We use this to query the API on our data platform and then perform some extra calculations such as working out conversion rates of exhibition visits vs museum visits. We also really like the record macro feature which we can use to automate any calculations after bringing in the data. Technically it is possible to push or pull data into Google Sheets – we opted for a pull because this gives us control via Google Sheets rather than waiting for a scheduled push from the data server.

Google Sheets formulae: We can join museum visits and exhibition visits in one sheet by  using the SUMIFS function, and then use this to work out a daily conversion rate. This can then be aggregated in Data Studio to get an overall conversion rate, filtered by date.

Sheets Add-Ons: We found a nifty add-on for integrating sheets with Google Analytics. Whilst it’s fairly simple to connect Analytics to Data Studio, we wanted to combine the stats across our various websites, and so we needed a preliminary data ‘munging’ stage first.

Joining the dots…

1.) Zapier pushes the satisfaction score from SurveyMonkey to Sheets.

2.) A Google Sheets Add On pulls in Google Analytics data into Sheets, combining figures across many websites in one place.

3.) Online data forms save data directly to a web database (MongoDB).

4.) The performance platform displays raw and aggregated data to staff using ChartJS.

5.) Google Apps Script pulls in performance data to Google Sheets.

6.) Gooogle Data Studio brings in data from Google Sheets,  and provides both aggregation and calculated fields.

7.) The dashboard can be embedded back into other websites including our performance platform via an iframe.

8.) Good old Excel and some VBA programming can harness data from the performance platform.

logos
Technologies involved in gathering and analysing performance data across museums.

Data Studio

We’ve been testing out Google Data Studio over the last few months to get a feel for how it might work for us. It’s definitely the cleanest way to visualise our KPIs, even if what’s going on behind the scenes isn’t quite as simple as it looks on the outside.

There are a number of integrations for Data Studio, including lots of third party ones, but so far we’ve found Google’s own Sheets and Analytics integrations cover us for everything we need. Within Data Studio you’re somewhat limited to what you can do in terms of manipulating or ‘munging’ the data (there’s been a lot of munging talk this week), and we’re finding the balance between how much we want Sheets to do and how much we want Data Studio to do.

At the beginning of the sprint we set about looking at Bristol Culture’s structure and listing five KPIs each for 1.) the service as a whole; 2.) the 3 ‘departments’ (Collections, Engagement and Transformation) and 3.) each team underneath them. We then listed what the data for each of the KPIs for each team would be. Our five KPIs are:

  • Take up
  • Revenue
  • Satisfaction
  • Cost per transaction
  • Conversion rate

Each team won’t necessarily have all five KPIs but actually the data we already collect covers most of these for all teams.

Using this structure we can then create a Data Studio report for each team, department and the service as a whole. So far we’ve cracked the service-wide dashboard and have made a start on department and team-level dashboards, which *should* mean we can roll out in a more seamless way. Although those could be famous last words, couldn’t they?

Any questions, let us know.

 

 

Darren Roberts (User Researcher), Mark Pajak (Head of Digital) &  Fay Curtis (User Researcher)