In the conclusion to our series of discussions on digital preservation, we turn a spotlight on sustainability and preservation and look at efforts being made today to ensure continuing access to digital material into the future. And we will be investigating how a file format originally developed in the space sector in Europe and the USA in the 1970s for the sharing of astronomical data can be used by archives for sustainable long-term preservation, to protect digital collections for centuries against obsolescence and attack.
What are the current issues with long-term digital preservation?
We sought an expert opinion on digital preservation today from a colleague at the National Library of Norway. Svein Arne Brygfjeld is currently special advisor on artificial intelligence and technical international collaboration for the Library, where he has also held responsibility for digital strategies and development.
He explained that in today’s world, where information is to a large extent produced and distributed digitally, this information must also be preserved digitally. Noting that in recent years, practitioners in large-scale digital collections have established good examples of well-functioning systems and organisations, he maintains that the main challenges now are related to the scale of systems and organisational skills.
From an organisational point of view, this translates to the need for dedicated teams with diverse skills to take responsibility for preservation. And essentially, for the overall care of digital content to be elevated to the highest organisational level.
Concurrently, from the perspective of technology and methodology, he pointed towards three key concepts, amounting to: keeping things simple; having a few core principles for formats, identification, metadata and storage structure; and using general purpose data centre technology.
FITS for the future?
And that is where FITS comes into the picture. The Flexible Image Transport System (FITS) was developed by astronomers in Europe and the USA in the late 1970s for interchange of data between observatories and was brought under the auspices of the International Astronomical Union (IAU) in 1982. FITS is still in widespread use today by the scientific community as an open format to store and share data between different systems and preserve information. (Data from major international programmes, such as the NASA/ESA Hubble Space Telescope, and ESA missions including Herschel, Integral, XMM-Newton and SOHO are released in the FITS format).
You can open a FITS file using a binary (decimal) editor, as at its most simple it is binary code to read. FITS uses a standardised structure, where the first 2880 bytes of the file explain the rule for all the information pertaining to it – this is known as self documentation and gives access to the content and structure of the digital object. These instructions for reading and processing the data can be used in centuries to come to decode the file.
This results in a difference between preservation in FITS and using other services which involve constant migration, creating new documents every time. However, the metadata mapped in a FITS format can be exported from FITS into any currently used standard file format, for viewing.
The FITS format therefore guarantees preservation, sustainable access and use of digital objects for an unlimited time, making it a perfect ally in long-term digital preservation, and resolving the tension we have previously noted between the differing requirements of digitisation for access or preservation.
But what exactly is it?
If that all sounds rather esoteric, our Archives team has come up with a more succinct definition for non-specialists! Essentially FITS is a perennial file format that, unlike a jpg or pdf file, you cannot see (without using a viewer, such as the NASA-created FITS Liberator) but that acts as a container for code including all the information pertaining to a file, which can be converted into a jpg or pdf file, or any other standard file type that might be used in the future.
What do we mean by sustainability?
So how can such technologies in digital preservation guarantee sustainability, in terms of preservation, access and environmental footprint?
A useful tool for assessing the suitability of digital formats for preserving digital material are the seven sustainability factors listed by the US Library of Congress, covering factors ranging from disclosure, adoption and transparency to external dependencies (you can find a link to read the full technical specification for these sustainability factors below). Our colleague from the Vatican Library, Dr Paola Manoni, stressed the self documentation and backward compatibility of FITS as fundamental principles of digital preservation: as the list points out, “digital objects that are self-documenting are likely to be easier to sustain over the long term and less vulnerable to catastrophe than data objects that are stored separately from all the metadata needed to render the data as usable information or understand its context’’. And the LOC goes on to confirm that “FITS was designed with an eye towards long-term archival use, and the maxim once FITS, forever FITS or once FITS, always FITS has guided the IAU to ensure that the format is backwards compatible as new features are added”, meaning that if you have a tool that can read FITS today, you can read FITS files from 40 years ago.
Our previous article on IIIF touched on sustainable access, meaning that information becomes available and remains accessible. Brygfjeld elaborates, reminding us that the purpose of long-term digital preservation is to give access to information and knowledge over time. Information in digital form is for many reasons fragile, and having central repositories to take responsibility for the longer user access perspectives may contribute to a longer life for information and knowledge in society. Digital services can also contribute to more democratic access to information and knowledge as they may be far more accessible than physical collections.
Whereas for the environmental aspect, many factors have to be considered, like the re-use of energy, the practical construction of data centres, choices on preservation strategies (including data formats, what to preserve and more) and the expected lifetime of equipment. While all of these leave a carbon footprint, the selection of perennial solutions such as FITS should involve a one-off ‘downpayment’. And if that means that keeping multiple digital copies of information can be avoided, reducing digital pollution and saving server space, that brings its own positive environmental effect which can be offset against this investment.
How are the ESA Archives using FITS?
The ESA Archives is, perhaps unsurprisingly, an advocate of FITS and is using a FITS-compatible platform called OMNES to preserve its digital material. (OMNES is able to store and export preservation metadata as XML files according to PREMIS preservation standards, to trace conversion into FITS format, and to store information over time using the FITS format). We hope that this article may also encourage others in the sector to adopt FITS for their digital preservation needs.
Collaboration as enabler
This brings us back to Brygfjeld’s point about organisational skills. With the increasing development of digital technology, adoption rates will vary between organisations, as result of differing strategic decisions, institutional frameworks, resources, funding and staff competence. Inevitably, some institutions may take leading roles and to some extent define practice, and as such they may not have too many others to collaborate with.
But on the other hand, he emphasises that the sharing of expertise and experience is essential to develop understanding and best practice. This is the way in which what he calls ‘memory institutions’ have always worked, with many examples of fruitful informal and formal collaboration.
With that in mind, we thank everyone who has given their time to share their thoughts with us. And we look forward to playing our continued part in this community of memory.
Read more about FITS
Library of Congress sustainability factors
Library of Congress format description for FITS
NASA FITS Support Office
IAU FITS Working Group