Page 2: Digital Preservation

I’ve been thinking about file formats lately, specifically in the context of digital preservation. (For reference I’m writing this section on January 5, 2021). Now that we have a system for saving files, in which formats do we save them? I’ve taken a look at archival recommendations such as those from the Smithsonian Institute. Here are some thoughts on two data types.

Images

When I save photos from around the web I usually convert them to HEIF to conserve storage space. But HEIF is proprietary. There are plenty of arguments about open-source vs. proprietary formats and which ones are likely to have the longest support. In my research so far, open formats seem to be preferred.

The Smithsonian Institution Archives lists TIFF as the preferred preservation format, with acceptable formats like JPG, DNG, PNG, and JP2 (Jpeg2000). TIFF, or Tagged Image File Format, is uncompressed and modern operating systems index TIFF tags as metadata for file search.

TIFF is great for exporting original images like in graphic design and photography. But conversion is different than exporting. If I starting converting saved images to TIFF instead of HEIF, I don’t think that’s the same as an original quality TIFF image. You’re going from lossy compression to lossless compression, which doesn’t work. So for saving images we can stick with JPG and PNG.

If you have an iPhone 12 Pro or iPhone 12 Pro Max, you could start taking your photos in Apple’s new ProRAW format. This is based on DNG and is the closest we’ll get to an uncompressed photo using a smartphone. You could keep the ProRAW versions as the original preservation format and share JPG versions as the access format.

Text

For text documents, presentations, and spreadsheets, PDF is listed as the preferred format. Acceptable formats include TXT and RTF. I see the benefit of the former in my workflow. Saving an article as a PDF gives you the original document complete with images. Whereas if I use my shortcuts to save an article it only gives me the text without images.

However, saving an article to TXT or Apple Notes gives me something a PDF can not. At the top of each saved article I like to list certain data about it: Title, Author, Date Written, Data Saved, and URL. I haven’t yet discovered if it’s possible to include that in a PDF using shortcuts, like as a first page, while maintaining the original layout. I want automation instead of having to manually edit each PDF I save. As an alternative we could save both formats because text files barely take up space.

Conclusion

For now I think formats like HEIF and TXT are acceptable. It would be different if I worked as an actual archivist, but the files I save are for personal use. Ultimately, those are the types of questions you have to ask yourself with digital preservation; I believe they call it “risk assessment”. Do you plan to pass on your files to friends and family members? Do you want a historian in 2100 to be able to access your files?

This guide has been a work in progress since I created it in 2018, and I’ll keep it updated as I make changes. And if you’ve created your own personal information management system, mention it in the comments. I’d love to read about it.

Subscribe
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments