PILOT logo, post doc PILOT logo, information PILOT logo, literacy PILOT logo, online PILOT logo, tutorial welcome information contents page search and retrieve contents page acquiring and managing information contents page new technologies contents page
digital footprint contents page
copyright contents page IL pilot license contents page databases contents page publication process contents page
Welcome

Transcript 15

One of the key issues I've had in my research is that I’m dealing with crime data. Now, crime data is specifically listed as one of the classes of sensitive data under the Data Protection Act. So, if that crime data might in some way disclose details of an individual, then you might find yourself in breach of the Data Protection Act, in breach of first confidentiality issues, and so on.

So you need to be very careful, and think about how you are going to use that data. And obviously a key issue for the police, if they were going to agree to give me their data, is that they had to be confident that I was going to keep their data safe, and again that I wouldn't disclose details about individual locations of crimes or individual people who might have been offenders or victims of crimes who didn't want to disclose themselves. Never mind the sort of legal side of the Data Protection Act as well.

So, one of the things I've had to think very carefully about is how can I get the data at that specific location which I need because I want the flexibility of being able to basically aggregate it, shape it, map it [at] whatever scale I want to as part of my analysis stage, and how can I then make the data suppliers -- in this case the local police force -- confident that when I finally get around to publishing that data it would be suitably anonymised. So, what I did with the police was to write a very detailed project proposal which set out clearly what the research was that I was going to do with them, what I was hoping to do with the data that they were going to give me, why I wanted the data that I wanted. And another thing that I also did was to ask for the minimum data that I actually needed. Now, I am interested in locations that vandalism happens at, not interested in other crimes, so I didn’t ask for details of other crimes, I just asked for data about the crimes I was interested in.

I am also not interested in (directly, because I am particularly looking at aspects of place in my research) [in] who actually does the offenses or who is the victim of it, so I didn’t need any data, any names of offenders or victims. I did need the location, and the police happen to actually record specific location, they call it LOCAS which they record as a grid referencing, an easting and a northing, which will actually give you a precise point you can locate on the map, they also have full address and quite often postcode. So, giving I could get for almost every crime a location and/or a postcode that meant that I didn’t actually need to get the full address from them. So, that had the advantage that they knew that I was getting from them the minimum amount of data. And that was the minimum amount of data that I felt I could use. And I think because I was able to set out, well, this is the minimum amount of data that I can use, they were more confident about giving me the data. I didn’t just say ‘give me everything that you’ve got’ I said actually what I specifically need is this information, can I keep with you, like that.

A certain thing I did was make it very clear that I was going to store the data securely. Now, I do that by using the university service where there are regular backup systems and secure data storage systems. I also make use of an encryption programme which is an off-the-shelve encryption package called Pretty-Good-Privacy or PGP desktop, which is one of the industry standards for encryption packages and produces a sort of virtually encrypted desk for you which is a file which you can sort of keep in various places. And I was able to detail that, explain that which means that the data is actually encrypted and has a second level of password, beyond my university password. And again I was able to detail that and explain that to them. So they could be confident that the data was going to be secure.

I also set up the backup systems that would be in place, because some data I keep in my laptop, but also a backup at home. So I would make clear what the backup systems that were going to be used. I also made clear the sort of levels of data that I was going to keep where. So, some data I have is very detailed and it does have the location of each crime, it also has a crime number. So if I ever had any particular surprising results, I could go back and say ‘This particular record is giving me these very odd results, is there something odd about it? Can you tell me a bit more?’ and I have a name to contact I can go back to. Again, having a name to contact with the organisation you are dealing with if you are taking your data from, especially if something like sensitive and precise data, I think it’s very helpful. So, if you should run into any problems you’ve got someone to go back to. And if necessary, if you have enough for the data to start with, you could always go back and ask for more detail if you really find you need it. So, in some ways asking for, you need to sort of be starting at asking for enough data that you can get what you need, but not so much data that they start thinking why do you need that level of information.

So, the third level I did was to guarantee that before I showed the data to anyone else, or even before it gets taken off the most secure place within the university data system, I would anonymise it in some way, and I have anonymised in a couple of different ways. One is I've taken out the actual grid location completely, and just given it a unique identifier. And also for each of these grid locations, given them instead a point that is higher up, so I've said well this particular… this particular point location is in this hundred meter grid I've created, or this output area, or whatever, so I've stored those, but I don't anymore store the location. I've also ready aggregated the data up to a certain level, so you can no longer actually see the precise house or location where it is occurring at. I've also used a data smoothing technique, called Kernel, that's the estimation which what it effectively does is rather than having exactly where the point is, it spreads the points out a bit. It sort of fuzzifieses the data, to put it in a sort of non-technical way. So, where you've got the biggest concentrations of crime, you will get a heavier a sort of greater weight if you map it becomes darker, of known as heat maps. Where you've got less a concentrations of crime, you get sort of a lighter density, so when you map that that looks like a lighter colour. And it has the effect of kind of weighting the data towards where the concentrations of the dark, and effectively it also move the points about a bit, I mean it also displays the data at a grid level rather than at individual point level. So, again that anonymises it out.

So, I am anonymising the data by sort of either smoothing or aggregation, and that’s a way of sort of dealing with sensitive data in that way. And what I did for the police was, although I didn’t have their data yet, I would take a dataset which would behave a bit like the police data, and the dataset that I used in that case is postcode data where you get individual locations of postcodes, and I showed what sort of smoothing and aggregation techniques would look like. So I would use a series of maps which sort of said ‘this is what it looks like as a point, and this is then what happens when I sort of map it like points and I map it using these various aggregation techniques’. I also showed them the scale I was likely to show the data at, which I think is about 1:85,000 or possibly 1:50,000 but I think I’ve been generally using 1:85,000. So for the area I was looking at, so they can actually get a view of once I’ve transformed their location points what that data would look like. And once they had all that information, how I was going to store the data securely, how I was going to take steps to anonymise it, what that might look like when that was anonymised, that put all in the application, presented it to the data manager. The data manager said ‘fine you can have the data’. And actually the police officer I had contacted was quite surprised how easily they had agreed, but I think actually the fact that I had all that detail that says this is how I’m going to store your data, this is the sort of things I might do with your data, this is what it might look like, I think then gave them the confidence that I would be sort of looking after the data securely and thinking about how I would maintain it.

The other thing they were quite pleased about, and this is another requirement to the Data Protection Act, is that I also stated how long I would hold the very detailed confidential data, in my case I said I would hold it for up to 5 years after the research project had finished. And they felt it was very important that I had actually thought about how long I would actually be holding that data for. I’ve said I would hold the aggregated data potentially much longer than that, but the actual very detailed potentially sensitive disclosive data, again I made it very clear how long I’d be holding that data. I also made it clear who else would potentially get to see that, which in my case it’s obvious is my supervisors, and that’s been very important because I’ve been discussing various levels of analysis, I haven’t had any concern about showing my supervisors very detailed information some of the time, because my data suppliers knew that potentially I would be discussing this with my supervisors and they potentially would have access to this very detailed data, although they never have, but it means I don’t have to worry when I’m doing initial maps of the data that I might be showing them exactly where some of these crimes are located, so when we’re having discussions about how the research is going I don’t have to be quite so careful about the information I put up.

And on my own laptop, even though lot of the data I am dealing with isn’t sensitive, anything I think might be sensitive I leave in another encrypted file, it just makes me feel happier and safer that if for some reason my laptop got found or lost or whatever, there is no danger of anything going out that I think is in any way disclosive. So, that all keeps it sort of… I am confident that I am not going to breach any rules; I am not going to upset my data supplier, and they’re confident I can use their data.

GCU logofont +  |   font -  |   Nextsmall plane | small plane Back |  unit home small plane

Creative Commons Licence
PILOT - Writing a data management plan by Edina, University of Edinburgh modified by Marion Kelt, GCU is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://datalib.edina.ac.uk/mantra/introduction.html