
Writing a data management plan: data normalisation
Depending on the research field you are working in, normalisation may have different meanings. Below are two distinct meanings, each of which may be relevant to your own research data management practice.
- Statistical normalisation - using a formula or algorithm to transform values into numbers that can be better compared (apples with apples) or analysed in a chosen statistical model. A typical example is computing the logarithm of the variables to make a skewed distribution normal (for example, displayed as a normal curve in a chart).
- Database normalisation - following a set of relational database design rules that make the database more robust by eliminating duplication and inconsistency. For example, breaking up large tables into smaller groups of tables, and linking fields between tables through a 'key' or common ID. By reducing complexity, the chance of anomalies occuring in the data is reduced and the database becomes more flexible with regard to how it can be used.
You can find more information about both types of normalisation in the recommended resources unit. It is also worth exploring the literature in your own field on how normalisation is generally interpreted and practised.