Features of good-quality data

QuoteThe fact exists - your knowledge is only as good as the data collected.Quote Stratagem Group, 2008

It is always possible to misinterpret 'good' data. This will, in turn, impact on the information and knowledge that ensues. What is more likely though is that the information and knowledge prove to be faulty because the data collected is itself erroneous.

There are three aspects to consider when looking at the legitimacy of data:

  • accuracy
  • reliability
  • validity


This is a term often linked to quantitative (objective) data measurements. For example, measuring the weight of a commodity such as rice using an appropriate balance would be considered to give accurate measurements. If, however, the balance only measured in 0.5 kilo intervals, forcing you to judge between these intervals, you would not consider your measurements to be accurate.

In the analysis of data relating to pupil attainment and progress, you are almost entirely reliant on decisions made by others. For example, the design of a test used to measure attainment will influence the accuracy of the results. In most cases, you will need to take the accuracy of data produced by external organisations on trust, but be aware that we may attribute precision to quantitative data that is in fact based on complex statistical analysis. A good example of this in education is the comparison of schools' value-added (VA) scores, which are based on a complex statistical analysis of pupils' attainment, measured at different times by different tests.


This is "the extent to which a test, a method or a tool gives consistent results across a range of settings, and if used by a range of researchers" (Scaife, 2004). It is clear that with quantitative data, reliability is often easy to discern. If our balance above gives the same reading every time and it is used with the same standard weight, then the reading (data) it gives is reliable. If it does not then the data is unreliable. However, in the evaluation of school effectiveness, we often have to compare the mean or average of measurements of attainment or progress in a school with national data. In these situations an understanding of the reliability of the comparison is vital.

  A B
  5.4 6.6
  5.2 4.9
  5.3 5.3
  5.1 5.7
  5.5 4.0
Mean 5.3 5.3
95% confidence interval ±0.16 ±0.96

Consider the two sets of numbers in the table above. In it, A and B both have the same mean of 5.3, but which mean is the more reliable?

Common-sense immediately suggests that A is the more reliable because the values in column A only vary between 5.1 and 5.5, while those in column B vary between 4.0 and 6.6.

This can be expressed mathematically by calculating the 95% confidence interval and is a measurement of the intuitive judgment that we have just made.

  • In column A, we are 95% confident that the mean lies between 5.3 + 0.16 = 5.46 and 5.3 – 0.16 = 5.14. 
  • In column B we are 95% confident that the mean lies between 5.3 + 0.96 = 6.26 and 5.3 – 0.96 = 4.34. 
  • If we have another value of, say, 5.48 we would say that it was significantly different from the mean of A but not significantly different from the mean of B.
  • This is because the difference from the mean (ie +0.18) is greater than the 95% confidence interval of mean A but not of mean B.

Tests of significance such as those performed on school data in the RAISEonline report state that they use the 95% confidence interval because that is the level at which differences are conventionally judged to be significant.

Statistical significance is a key concept when reading school performance data. It enables us to identify differences from benchmarks, such as the national averages, that can be confidently taken to be meaningful. However, differences that do not reach the level of statistical significance might still be educationally important because statistical significance is only a convention.

How do you ascertain reliability in terms of qualitative data? The bottom line, in many cases, is that a true answer does not actually exist. That does not mean to say that reliability has no relevance. Reliability is, after all, only the probability of obtaining the same results again if the measure were to be duplicated.

Oppenheim, 1998

Reliability with qualitative data can be tested by a number of means. For example, we can ask the same question in different ways or re-interview with different procedures to get at the same data. Respondents to a questionnaire might say one thing about what goes on in their classrooms but observation of actual practice may reveal something completely different. The reliability of the data obtained in the questionnaire could be questioned, but equally the validity of the original questions could be at fault.

As an example, let us propose that we ignore national curriculum assessment results as a measure of ability of pupils for the purpose of setting classes.

This may not be as inappropriate as it sounds and raises the debate about whether this measure is valid for the purpose proposed. Instead we could use their height to organise pupils into classes. Not a good idea you say. Height has nothing to do with ability, so it is not a valid measurement for setting classes. Look at this table which shows new AIDS cases reported by the Centre for Disease, Control and Prevention.

Year 1991 1992 1993
Reported cases 43,672 45,472 103,691

US Center for Disease, Control and Prevention

Woman in front of computer using the telephone

Validity of data

Based on the information in the table above, is it valid to say from this data there was an AIDS epidemic in 1993?

All things being equal (disregarding questions about the definition of what it means to have AIDS or the data collection techniques used) the unwary reader might say yes. The answer is that you cannot judge validity until these issues have been answered. The reality is that the definition of what it means to have AIDS was actually expanded, resulting in the increase. There was a shift in the knowledge of AIDS that called into question the validity of the data-collection methods.

What about the reliability of the data? Again, it is not possible to say. We do not know how it was collected, by whom, and whether it was confirmed by other means. We may have to make the assumption that it is reliable but this is not the same as accepting that it is reliable. Finally, it is impossible to determine whether the data is accurate, as it is based on complex systems for the collection and aggregation of official government statistics.

To conclude this section, we will refer to our national curriculum assessment data, add to it using the table below which shows the percentage of pupils gaining national curriculum assessment Level 2 or above at age 7 and Level 4 or above at age 11 and then ask some more questions.

Year Maths – Age 7 Maths – Age 11
2000 90 72
2001 91 71
2002 90 73
2003 90 73
2004 90 74
2005 91 75
2006 90 76
2007 90 77

Combination of data from DCSF (2007, 2008a) and OECD (2008)

Questions on the data

How accurate is this data? We know it will be totalled from actual results from SATs and in this sense we can assume a degree of accuracy in the numbers.

How reliable is the data? We cannot say with any real certainty. If we had the 95 per cent confidence interval for each annual mean we could perform statistical tests to evaluate the significance of the differences from year to year. We might assume it is reliable, but unless we know all the schools treated their pupils in the same way, for example, no 'practising for the test' took place, we cannot say for certain it is 100% reliable.

How valid is the data? Validity as we have seen is a property of the data so we need to define what it is we want to claim as valid.

Is there a general trend in improvement in results? Yes, there appears to be at age 11 if we assume that the data is reliable. However, to be certain we would need to perform further statistical tests.

Are KS2 pupils more able than KS4 pupils? Ignoring the question of whether these tests indicate anything about ability, the answer is no, because comparing pupils of different ages is like comparing apples and pears.

Did KS4 pupils have a harder paper than KS2 pupils? No, this is a completely subjective statement and cannot be validated by simply looking at this data.

Is it valid to say KS4 teachers are less effective than KS2 teachers? No, because the children are of different ages and the tests were different. We are not comparing like with like.



8: Reflections on data

This section has attempted to highlight the importance of data, its collection, interpretation and limitations.

You might find it helpful at this point to summarise your own thinking on the issues raised so far in relation to data's:

  • importance
  • collection methods
  • interpretation issues
  • limitations