skip to primary navigationskip to content

Survey and Data Research in Technology and Physical Sciences

These pages are still under development, and comments are welcome. Send to

 

Audience

This page is intended for use by students and researchers in the University of Cambridge Schools of Technology and Physical Sciences whose research involves collecting data from people outside your own research team, but not involving experiments or field work. It is part of a larger set of research guidance pages on work with human participants.

 

Ethical review guidance

If using this page to identify potential concerns during review of a proposed research project, please refer to the following points below:

  • Recruitment
  • Anonymity
  • Data retention
  • Incentives and Compensation
  • Permission

 

The context of survey methods

later in this page {INSERT LINK}, we provide some more general introductory advice …

 

Definitions

Any study where you collect data by asking people questions is a survey. This can be conducted using paper questionnaires, email, web-based survey forms, or occasionally by telephone or in public spaces. The people who participate in a survey are generally called 'respondents'.

A study in which you collect information about people and their activities without specifically asking them questions is described as data research.

A study in which you ask people to keep records of their daily life, or their usage of some technology, is described as a diary study.

 

Practicalities - Surveys

Questionnaire design

Questionnaires generally include a combination of closed questions (predetermined responses, either yes/no or multi choice), Likert scales to indicate strength of agreement with a statement, and open questions (free text, which must be coded for analysis).

It is easy to make serious errors when you first attempt to design a questionnaire. There are many textbooks and online guides - make use of them. If possible, ask an expert to review a prototype of your questionnaire, and try it out in advance with several pilot respondents. Typical traps include biased questions, ambiguous questions, poor 'guard' logic, inconsistent response formats, failure to anticipate some valid answers, or reasons for not giving an answer.

Recruitment

Who do you want to respond to your survey, and is this sample expected to be representative of a larger group? Most surveys are initiated from some database or email list. You should ensure that using it in this way is consistent with the terms of use, including any Data Protection Act considerations. You need to check this with the owner of the list.

It is possible to recruit directly by telephone or pedestrian samples. These approaches are stressful and time consuming, and should only be attempted with expert guidance and preparation.

Incentives and compensation

Not everyone who you ask to complete a survey will do so. It is reasonably common to encourage survey responses by offering a gift or other incentive to randomly selected respondents. This often requires that you collect contact details, which raises issues of anonymity as below.

Anonymity

Most surveys are anonymous - they do not record either the name of the respondent, or the name of any institution that the respondent represents. This can be inconvenient if you realise that you need more data after collecting responses (either clarification, compensation for errors in the survey design, or investigating subsequent research questions). Nevertheless, we advise to make surveys anonymous whenever possible.

If it is essential to contact respondents subsequently, it may be acceptable to request an email address, but this should be optional. If email addresses are collected, your data will then be subject to the terms of the Data Protection Act

Many surveys incorporate demographic data (age, gender, education etc). This should be minimised - you should not collect any demographic data unless it is related to a specific research question. Demographic data may include personal details that would bring your research within the terms of the Data Protection Act, in which case precautions noted below must be taken.

Tools

There are a range of tools for administering online surveys. You should check whether you can extract all your data, and whether they have any limit on number of responses. The most popular at the time of writing, SurveyMonkey, does have a limit. SurveyBob has no limit, but displays advertising on some pages.

Data Retention

If survey responses do not include any personal data, then the data may be retained. If they do contain personal data, then they fall within the terms of the Data Protection Act. Personal data should be kept secure (see data security below). Data that would allow a respondent to be identified should be kept in a separate place throughout the research project, with an anonymised code used during analysis work and at publication time. It is good practice to destroy any personal data after a stated period of time.

Informed consent

In general, voluntary completion of a questionnaire or interview can be taken as consent for this data to be used in research. Nobody should ever be compelled to participate in a research survey (for example, students should not be required to participate in research as a condition of course grading). You may wish to assure rspondents that no personal data is collected, or if it is collected, that it will not be published, and will be destroyed

 

Advice on Survey Validity

Sampling, Response Rate and Selection

You will want to make claims, in presenting your research, that your results are applicable to a larger number of people beyond those that responded - that you had a representative sample. How can you justify that your recruitment database was genuinely representative? Only a subset of those in the database will have responded (often between 5 and 50%). Can you be sure that those who didn't respond would have given the same answers (what are the reasons they didn't respond - might these be related to any of the questions)?

Coding and Analysis

Closed questions can be used as a basis for statistical comparisons, either investigating differences between groups within your sample, or correlations between responses to questions. Survey responses are not generally particularly sensitive measures, so the statistical techniques available might not be straightforward.

Single value statistics have little research relevance unless they can be related to an external comparison or prior hypothesis. (30% of respondents said they liked your product - but what would they have said about a different product?)

Where your survey included open questions, how will you draw conclusions about patterns or trends across their answers? This will involve creating a set of coding categories, assigning each answer to one or more categories, and dealing with those that fall outside the coding scheme, are ambiguous and so on. You should probably get a second person to re-code the same data, and make a statistical inter-rater reliability analysis.

It may be the case that you did not have prior research hypotheses relating to some open questions. In this case, it can be valuable to follow a rigorous process by which codes and potential theoretical concerns are derived from the collected data (for example, Grounded Theory methods). However, these are time-consuming. It is unwise to collect large amounts of verbal data without having a firm plan in advance of how it will be analysed.

 

Practicalities: Data Research

If you are directly collecting data on human behaviour, those providing the data should be aware that you are doing so. In the case of surveys, this is already clear, as described above.

Data research often makes use of data that was originally collected by other people, in which case it is important to be sure that your own use of the data is consistent with the terms under which it was originally collected. This is discussed below: "Data obtained via third parties"

If data is collected as a side effect of other actions, for example by people using a piece of research software, the situation is not so straightforward. This is discussed below "Instrumented software releases"

Data obtained via third parties

Data that has been provided by a third party (e.g. market research, student records, customer data, system usage data) is likely to be subject to the terms of the Data Protection Act. You should ensure, at the time you receive the data, that the person supplying it is doing so within the terms allowed by the Act.

It is increasingly common to carry out research using data that has been 'scraped' automatically from websites. However, this should only be done with permission from the administrators of the site concerned. Note that most social networking sites (Facebook, Twitter etc) have terms of use that explicitly forbid this kind of research, unless prior permission has been obtained.

Instrumented software releases

It is reasonably common in technology research to release a piece of experimental software, and collect research data based on the software usage. This does not directly correspond to well-established categories in ethical guidance for research. However, it is clear that it raises the same issues to be considered and addressed.

Users of instrumented software should be clearly informed that this will be done, and should be asked for their informed consent, giving permission for the data to be collected. Distribution of software for research purposes raises a number of other issues (e.g. warranty, liability, duration of licence, intellectual property) that should be reviewed by the University's legal department. In some cases, it may be the case that data will be collected from people other than those who provided the original consent (for example, consent may have been provided by a research collaborator within the organisation that is hosting the software).

Data Security

All data on human behaviour should be kept secure, and should be distributed only to known individuals as consistent with the terms on which the data was obtained (i.e., data like this should never be placed on websites, or distributed to mailing lists).

Data that is not anonymised, or could be used to identify specific classes of people, requires special precautions. You should consult departmental system administrators to ensure that storage locations on departmental machines and servers are appropriately secure against access by anyone other than the researcher needing to use this data.

Data that is not anonymised should not be stored outside the department, on personal laptops, or carried on removable media, without encryption. (There are several options by which data files can be securely encrypted before copying to such devices). Remember that many applications may create working files or other disk traces from which data could be recovered. In the case of very sensitive information, physical destruction of disk drives and other media at the end of the project may be appropriate.

Anonymity

If at all possible, you should arrange that data is provided to you in an anonymised format. Anonymised data should not include names, addresses, email addresses, or date of birth.

It should not be possible to combine anonymised data with other data in a way that would make the subjects identifiable, for example by combining office numbers with a university directory, or student registration codes with exam entries.

There are sophisticated statistical inference attack techniques that can be used to 'de-anonymise' data. If your data is going to be placed in a public archive, it would be wise to consult an expert in these techniques, to ensure that privacy cannot be compromised.

Identifying specific classes of people

As a general matter of research ethics, if the data you are studying may lead to research outcomes affecting an identifiable class of people, an explicit ethical review should be requested. This is more likely to arise in social science or clinical research, but take care if your own research questions are likely to involve technology use by groups that are sensitive in any way. Examples include:

  • genealogy of war criminals
  • politicians and sex offenders
  • estimating genetic potential for disease in an identifiable class of living people

 

References