Data analysis

Home Schemes of work Key Stage 3 Key Stage 4 Post 16

The content for this section is sub-divided into the following areas:

Measures of location	Measures of spread	Other summary statistics	Time series
Quality assurance	Correlation and regression	Estimation

Measures of location Top

Foundation tier	Higher tier	Notes	Resources
Mean, median and mode for raw data.	Use of change of origin when calculating the mean. Effect on the average of changes in the sample,	eg the addition or withdrawal of a sample member. eg the mean of the numbers 1003, 1005, 1006, and 1009 is equal to 1000 plus the mean of 3, 5, 6 and 9.
Mean, median and mode for discrete frequency distributions. Modal class for grouped frequency distributions. Median for grouped frequency distributions. Mean for grouped frequency distributions.		Graphical methods of obtaining the median will be acceptable. Candidates may make use of a linear change of scale when calculating the mean.
Advantages and disadvantages of each of the three measures of location in a given situation.	Reasoned choice of a measure of location appropriate to the nature of the data and the purpose of the analysis.
	Geometric mean.

Measures of spread Top

Foundation tier	Higher tier	Notes	Resources
Range.
Quartiles for discrete data. Quartiles and percentiles, for grouped frequency distributions.	Deciles.	Graphical methods will be accepted.
Interquartile range for discrete and continuous data.	Interpercentile ranges.
	Variance and standard deviation.	Divisor n. To include grouped frequency distributions. Efficient use of a calculator should be encouraged.
Advantages and disadvantages of each of these measures of spread.
Construction of box and whisker plots. Use of box and whisker plots to identify outliers.		An outlier is defined as an observation less than Q1 - 1.5 (Q3 - Q1) or greater than Q3 + 1.5 (Q3 - Q1), where Q1 and Q3 are the lower and upper quartiles respectively.
	Calculation and interpretation of standardised scores.	Only general interpretation is expected.
Use of tabulated data, diagrams, measures of location and measures of spread to compare data sets.		Use of standardised scores to compare values from different frequency distributions.

Other summary statistics Top

Foundation tier

Higher tier

Notes

Resources

Simple index numbers.

Weighted index numbers.

Chain base numbers.

General Index of Retail Prices. (RPI).

W weighted index

Crude rates.

Standardised rates.

For example, birth, death, unemployment.

Time series Top

Foundation tier

Higher tier

Notes

Resources

Drawing a trend line by eye and using it for prediction.

Evaluating and plotting appropriately chosen moving averages.

Trend lines will not be required to pass through the mean.

Identification of seasonal variation.

Trend line based on moving averages.

Seasonal effect at a given data point.

Average seasonal effect.

Prediction of future values.

Graphical methods only will be expected.

Quality assurance Top

Foundation tier	Higher tier	Notes	Resources
	Plotting sample means, medians or ranges over time to view consistency and accuracy against a target value.	To include looking for indications where the process is off target or of an increase in variability.

Correlation and regression Top

Foundation tier	Higher tier	Notes	Resources
Scatter diagrams. Recognition by eye of positive correlation, negative correlation, lack of correlation.
The distinction between correlation and causality.
	Spearman’s rank correlation coefficient as a measure of agreement; its calculation and limitation in interpretation.	Includes the case of tied ranks. Calculations for large samples will not be expected. The formula for Spearman’s rank correlation coefficient will be given.
Fitting a straight line by eye through mean to the plotted points on a scatter diagram.	Obtaining the equation of the fitted line in the form y=mx+c; the interpretation of m and c. Non-linear data.	Includes discussion of whether such a straight line is appropriate. A ‘suggested’ relationship will be given
Interpolation and extrapolation.		Including the dangers of inappropriate extrapolation.
Interpretation of bivariate data presented in the form of a scatter diagram.		Comparison of the degree of correlation between two or more pairs of data sets with reference to scatter diagrams and/or rank correlation coefficients.

Estimation Top

Foundation tier

Higher tier

Notes

Resources

Estimation of population mean from a sample.

Estimation of a population proportion from a sample; the use of this method of estimation in opinion polls.

Variability in estimates from different samples and the effect of sample size.

Estimation of population size based on the capture/recapture method.

An elementary quantitative appreciation of appropriate sample size.

Higher Tier : eg to include the concept that to halve the variability in an estimate, four times the sample size is required.