
Writing a data management plan: file format migration
At some time during your research you may need to convert or migrate your data files from one format to another. This may be due to a new computer, new software, sharing with someone who has different software, working on a shared platform instead of your own PC, or simply in order to ensure that your data can be read and used in the future.
Some losses may occur when migrating from one file format to another. It is important for you to understand what is at risk for the type of data you are working with.
Potential risks for loss or corruption on conversion or migration to new media
- Word processed files: fonts, text formatting, headers, footers, footnotes, links to other documents.
- Numeric files: special characters (such as quotation marks), end of line returns, last characters in rows (due to row size limitations), last rows (due to row number limitations)
- Database files: as above but also relations between items in a table and between tables.
- Image files: loss of layers, colour fidelity, resolution, sound quality, and so on.
- Multimedia: as above, but attention to frame rates, codecs and wrappers is needed.
- File sizes may change and even become surprisingly large.
It is worth briefing yourself on the format you are converting from and to before you begin; at least look them up on the web.
Check the integrity of converted files as thoroughly as possible immediately afterwards, for example, by counting rows and columns, testing functionality, testing export, and so on. 'Eyeball' the data too.
A checksum algorithm tool can be used to compare the bits of a file copied from one media to another, but these won't work if the file format changes, or if comparing files on different computing platforms.