skip to primary navigationskip to content

Data Research in Technology and Physical Sciences

Audience

This page is intended for use by students and researchers in the University of Cambridge Schools of Technology and Physical Sciences whose research involves analysing personal data that has been collected indirectly.. There is a separate page describing survey methods such as questionnaires and interviews. It is part of a larger set of research guidance pages on work with human participants.

 

Ethical review guidance

This page gives general guidance relating to conduct of data research. The following issues are particularly relevant with regard to ethical review:

 

Practicalities

If you are directly collecting data on human behaviour, those providing the data should be aware that you are doing so. In the case of surveys, this is already clear. However, data research often makes use of data that was originally collected by other people, in which case it is important to be sure that your own use of the data is consistent with the terms under which it was originally collected. This is discussed below: "Data obtained via third parties"

If data is collected as a side effect of other actions, for example by people using a piece of research software, the situation is not so straightforward. This is discussed in a separate page on Instrumented software releases.

Network monitoring

Monitoring of network traffic is subject to legal constraint in the UK, and research making direct use of network traffic may be subject to the Telecommunications (Interception of Communications) Regulations 2000. Whatever the legal status of the means by which the data has been collected, it will still be subject to the constraints described below in the section on Data retention and security.

If network traffic is being monitored on an official university network, then the project may be subject to further constraints as defined in terms of use for the UK Joint Academic Network (JANET). For further information, please see the JANET Policy on Research Use of Network Traffic Data.

Data obtained via third parties

Data that has been provided by a third party (e.g. market research, student records, customer data, system usage data) is likely to be subject to the terms of the Data Protection Act. You should ensure, at the time you receive the data, that the person supplying it is doing so within the terms allowed by the Act.

It is increasingly common to carry out research using data that has been 'scraped' automatically from websites. However, this should only be done with permission from the administrators of the site concerned. Note that most social networking sites (Facebook, Twitter etc) have terms of use that explicitly forbid this kind of research, unless prior permission has been obtained.

Data Security

All data on human behaviour should be kept secure, and should be distributed only to known individuals as consistent with the terms on which the data was obtained (i.e., data like this should never be placed on websites, or distributed to mailing lists).

Data that is not anonymised, or could be used to identify specific classes of people, requires special precautions. You should consult departmental system administrators to ensure that storage locations on departmental machines and servers are appropriately secure against access by anyone other than the researcher needing to use this data.

Data that is not anonymised should not be stored outside the department, on personal laptops, or carried on removable media, without encryption. (There are several options by which data files can be securely encrypted before copying to such devices). Remember that many applications may create working files or other disk traces from which data could be recovered. In the case of very sensitive information, physical destruction of disk drives and other media at the end of the project may be appropriate.

Anonymity

If at all possible, you should arrange that data is provided to you in an anonymised format. Anonymised data should not include names, addresses, email addresses, or date of birth.

It should not be possible to combine anonymised data with other data in a way that would make the subjects identifiable, for example by combining office numbers with a university directory, or student registration codes with exam entries.

There are sophisticated statistical inference attack techniques that can be used to 'de-anonymise' data. If your data is going to be placed in a public archive, it would be wise to consult an expert in these techniques, to ensure that privacy cannot be compromised.

Identifying specific classes of people

As a general matter of research ethics, if the data you are studying may lead to research outcomes affecting an identifiable class of people, an explicit ethical review should be requested. This is more likely to arise in social science or clinical research, but take care if your own research questions are likely to involve technology use by groups that are sensitive in any way. Examples include:

  • genealogy of war criminals
  • politicians and sex offenders
  • estimating genetic potential for disease in an identifiable class of living people

 

References

 

Authorship, extension and corrections to this page

The initial version of this page was drafted by Alan Blackwell. If you wish to give feedback on the page, suggest corrections or provide further information, please see the page on guidance feedback.