
Writing a data management plan: text vs binary formats
Filetypes are based on either text or binary encoding.
Text files are machine-readable through a character encoding standard such as ASCII or Unicode. Binary files can only be read by applicable software, and may be proprietary. Only binary formats can be executed. Some files contain both binary and text (such as Rich Text Format .rtf files).
A great advantage of creating or saving research data in a text format is that such files can be rendered in a plain text editor - like Windows Notepad, which is human readable. They can be opened in any operating system, and by a wide range of applications. Therefore, text files are the most unlikely to become obsolete over time, and are a good format for sharing and long-term preservation.
Well-known file extensions of 'plain text' are .txt, .csv, .dat, .asc, .por, .html, and .xml.
Most software applications offer export or exchange formats that allow a text-formatted file to be created for importing into another program.
A typical example is Microsoft Excel, which through the Save As command, can save spreadsheet data in comma delimited format (.csv or comma separated values). The structure of the rows and columns is preserved through commas and line returns. However, multiple worksheets must be saved as separate .csv files and any text formatting or macros in the native format will be lost on conversion.