ASTR 3130, Majewski [FALL 2012]. Lecture Notes

## ERROR ANALYSIS, PART II: RANDOM AND SYSTEMATIC ERRORS

REFERENCE:

L. Lyons, A Practical Guide to Data Analysis for Physical Science Students, Chapter 1, pp.1-15.

### What is error analysis? Why do it?

• We rarely find "truth" by performing an experiment once.
• Repeat experiments with refined techniques and methods

--> Approach truth asymptotically (or at least we have confidence that we do).
• "ERROR" = Difference between calculated / observed value and truth value.
• Of course, we do not usually know "truth" (or there would be no reason to do experiment).
• So we need to come up with ways to determine from the data themselves how much confidence to have in the results (reliability).
• This is the subject of Error Analysis:.
• Error analysis is a required -- and very important -- part of scientific research.
• It's not enough to give the results of an experiment --

we have also to give some assessment of the reliability of the result you give.
• This alerts the user of the information about how much trust one should put in the result.

• We often do this by actually giving an estimate of the level of unreliability in the experiment.

This is done by quantifying an uncertainty in the result.

• This expected part of your stated results is the only way to assess whether theory proven/expectations met.
• We typically give a quantitative assessment of the result of an experiment as a number appended with a "+/-" value that accounts for the uncertainty on the result (in the same units):

e.g., in the form y +/-ε , where ε is the uncertainty.

Sometimes we see the form y +/-ε +/-δ , where ε is the contribution to the uncertainty from random errors and δ is the contribution from systematic errors (see below).

• We also display errors graphically with use of error bars in plots showing our data:

### Example from Lyons:

You do an experiment to measure the acceleration due to gravity and find

g = 9.70 m/s2 +/- 0.15 m/s2.

Here you are stating that the uncertainty in your result is 0.15 m/s2.

Now, the actual known value of g found after numerous experiments over the centuries, is

g = 9.81 m/s2.

What do we make of the discrepancy between these values? Consider three possibilities:

1. Since we said that our uncertainty in our experiment was 0.15 m/s2, the two values of g are actually in agreement, given the quoted reliability (uncertainty) in our experiment.

g = 9.70 m/s2 +/- 0.15 m/s2 is consistent with 9.81 m/s2.

2. However, if our quoted uncertainty had been +/-0.01 m/s2, then our measured result would be greatly at odds (by 11 times the quoted uncertainty in our experiment) with the accepted value.

g = 9.70 m/s2 +/-0.01 m/s2 is NOT consistent with 9.81 m/s2.

In this case, one of several things might be going on:

• We have greatly underestimated the real uncertainty in our experiment.

• We have estimated the uncertainty of our experiment properly, but somehow our experiment was biased with an unexpected systematic error to give a value offset from the norm (see description of systematic errors below).

• We have estimated the uncertainties properly and our experimental is giving a proper value; hence we have made a discovery with a significance of 11 times the uncertainty in our experiment (which is quite large).

(Perhaps the gravity we measured locally has been skewed by local geology beneath the test apparatus or as a result of our altitude.)

3. Let's say that we had evaluated the true uncertainty in our experiment to be +/-5 m/s2. Our result is now consistent with the true value of g, but the accuracy of our measurement is so poor that it is incapable of really distinguishing even significant differences between what we measure and the real value of g, and our experiment is not really very useful.

g = 9.70 m/s2 +/- 5 m/s2 is consistent with 9.81 m/s2,

but is consistent with a huge range of other possible values as well.

In this case, that we got as close to 9.81 m/s2 would have been a case of good luck, because we might well have also measured 7.1 or 13.9 m/s2.

As this example demonstrates, how we evaluate our experiment greatly depends on the numerical estimate of the uncertainty (accuracy) of our results.

Now consider the above three cases where the quoted uncertainties had been correctly determined (i.e. +/-0.15, +/-0.01 and +/-5 m/s2, respectively), but we never cited those uncertainty values:

No one would be able to judge the meaning/significance of your experimental result.

### Types of "Error"

• Illegitimate Errors
• These are outright mistakes or blunders in computation, measurement or recording of a result.
• Dealt with by performing previously erroneous operation correctly.
• This is a source of errors one typically does not discuss in reporting scientific results:

Your colleagues (and professors!) expect that if you are reporting a result to them you are competent enough to do the corresponding computations and measurements correctly and that you have checked your work before reporting them to others!
• Systematic Errors
• Systematic errors can arise from:
• Faulty calibration of equipment.
• Faulty use of equipment.
• Bias of the observer (e.g., how one observer's eyes perceives a star against the reticle in an eyepiece compared to the perception by another observer, or e.g., how one reads the needle on a dial compared to another person).
• Poor assumptions of how to do experiment properly or how to analyze/interpret the results.
• Systematic errors are usually not easy to detect!
• Have to sense there is a problem (e.g., from theory or by seeing differences between different experimental apparati).
• Have to assess experimental conditions or techniques for possible errant influences.
• A key to systematic errors is that they are reproducible:

When the experiment is repeated in the same way, you always get an answer that is offset systematically from truth in the same way.
• We refer to this systematic offset as a bias in the experiment.

• Examples of systematic error:
• HST spherical aberration: Test rig at wrong distance, mirror ground incorrectly.
• Using a metal measuring stick ruled at 25o C at a temperature of 0o.
• The history of measuring Hubble's constant.
• Ideally systematic errors should be absent, but they should always be checked for.

• A key way to do this is to test your apparatus on something where you know the correct answer.

For example, if you suspect your ohmmeter is measuring resistances incorrectly, one could measure some resistors with known values.

Or to check your wristwatch, you could compare its reported time against that from the atomic clock at the U.S. Naval Observatory.

• Calibration of your experimental apparati is a key part of experimentation that is intimately related to whether it will bias your results.

For example, if you find your wristwatch is running 10 minutes fast, then to use your watch to obtain accurate time you could either:

• calibrate (i.e. correct) your watch against an accurate timepiece (e.g., the atomic clock), or

• you could remember that your watch is biased to be 10 minutes fast and always account for (i.e., subtract) that bias every time you get the time from your watch.

• Random Errors (Statistical Errors)
• These are fluctuations in observations that lead to different results each time same experiment performed.
• These are irreproducable - You don't get exactly the same answer each time the experiment is performed in the same way.
• Random errors are an indefiniteness coming from either:
• An inability of your measuring device to give infinitely accurate answers.
• Nature: Fluctuations that occur in observations of a small sample drawn from a large population.
• The effect of random errors is to produce a spread of answers around some mean value (which is hopefully near the correct value).

For example, note the spread out distribution of results of repeated experiments shown below; random errors create the spread, and result in the displacement of, e.g., the highlighted measurement from the mean value:

##### From http://www.uiah.fi/projects/metodi/evirheet.gif.
Note that in the case shown, the mean value of the distribution is actually offset from the correct value that should have been gotten, so that there is also apparently a systematic error affecting the results of the experiment.

• As we will see below, we can get some idea of the size of random errors affecting our experiment by simply analyzing the spread in the distribution of answers we get when we repeat the experiment.

• Examples of sources of random errors:
• Making mm measurements with a meterstick ruled only in cm increments.
• Counting # of photons received from a weak source in a relatively short time span (e.g., 1 sec).
• While systematic errors are often hard to overcome, we often have an easier time with random errors, which we can do something about:
• Improve precision of equipment.
• Count more events (e.g., expand the time interval to count photons arriving to 100 sec).
• Repeat experiment and average results (the more the better).

Examples: Systematic vs. Random Error

• Thought experiment: Imagine a wristwatch with only hour and minute hands but no second hand and running 2 minutes late.
• In this case, what is the character of the expected systematic and random errors?
• Accuracy vs. Precision - THEY ARE NOT THE SAME!!
• Accuracy - How close experiment comes to true value.

Has primarily to do with Systematic Errors.
• Precision - How exactly is result determined without reference to what result means or how it compares to "truth".

Has primarily to do with Random Errors.
• Example distributions of measurements in an experimemt (where x0 denotes the "truth value sought"):

(a) This distribution shows random errors (i.e. the mean value of many trials of the experiment gives a result that is accurate but the individual measures themselves are not very precise).

(b) This distribution shows primarily systematic errors (i.e. the experiment is able to give very precise results but they are not very accurate).

(c) This distribution shows both random & systematic errors (the experiment is giving results that are neither precise nor accurate).

• Example: The Hubble Space Telescope mirror was ground very precisely but to an inaccurate shape. This resembles distribution (b) above.

• Here is another example -- shots fired at a target -- that vividly demonstrates the concepts of precision, accuracy and bias:

##### From http://www.behav.org/ecol/wildlife/w_05_design.htm.
We often want to describe the precision of an experiment or a piece of equipment. This can be done in several ways:

• Absolute Precision
• Magnitude of uncertainty given in same units as result.

e.g. 30 km/s +/- 3 km/s
• Relative Precision
• This is a fractional uncertainty, often given as a percentage.

e.g. (3 km/s) / (30 km/s) = 10 % error

From Lyons: A good experimentalist is one who minimizes and realistically estimates the random errors of his apparatus, while reducing the effect of systematic errors [as much as possible].

#### Estimating the Truth - Simple Cases

• In doing a measurement or experiment we hope to asymptote to truth value (we'll call that μ) by beating down random errors (and hoping systematics are not present).
• Normally we repeat experiments and get a distribution of results xi meant to get at the value of μ (like the distributions we have shown above).
• In error analysis, we use the xi to estimate truth value and to assess the reliability of the result.
• An estimator of the truth value is also called the "expectation" of the true value.
• There are often a variety of estimators possible to give an expectation of the truth.

• A simple estimator you know is the mean value:
• The hope is that we can use averaging to approach the truth:
• But the mean value of the many trials of an experiment may not be a good estimator if the distribution xi is not symmetric.
• For example, look at the mean value of this distribution, which is affected by the presence of one experiment trial that gave a very high ("outlier") result:

(NOTE: There are ways to deal with outliers during averaging processes by iteratively identifying and "throwing out" outliers.)

• A more reliable assessment of the truth might be the "mode" (the most common xi value) --- indicated below by xmax.
• BUT - The precision of this estimator depends on the "bin size" of how you "histogram" the data, because this sets the "resolution" spacing of xi.
• That is, the precision in the modal value can be no finer than the selected bin size.

• Note that the size of bins does have some influence in determining the modal value, e.g.:

• One might want to make the bins narrow to improve resolution, but making the bins so fine that all bins have either 1 or 0 counts in them is of no use.

• Using the mode as an estimator requires both a large number of measures and an appropriate "bin-size" to overcome random fluctuations that could make any particular bin the "highest" by chance.
• The "median" is an estimator that is more robust to outliers and is useful when you don't have a very large number of xi.

• The median is defined as a value (x1/2) for which half of observations/measures lie above and half below:
• Example of mean, median and mode for a particular distribution x:
• xmax, x1/2, and can each be used as an estimator of μ, but each is best used under different circumstances:
• : Useful when you have small N (few measures).
• , useful when you have very many measures.
• x1/2: The median is useful when you have a moderate number of measures (say, N >~ 10).
• Also when it is apparent that there are "outliers" in the distribution.

But these rules are not firm.

#### Characterizing the Uncertainty in the Truth

Note, the word "error" is sometimes used for the word "uncertainty".

• Some scientists abhor this usage, but MANY others (like me!) were raised routinely using the word "error" to mean "uncertainty".

• As John Taylor says in his An Introduction to Error Analysis, the words error and uncertainty can be treated as interchangeable in the current context.

Characterizing the "width" of the distribution leads to an estimation of the "uncertainty" in μ.

There are various ways of doing this.

Assume that our preferred estimator of μ is the mean value of our multiple experiments:

• Deviation (for one measure):
• Mean Deviation:
• Variance (Easier to calculate):
• Standard Deviation = "Root Mean Square"
• NOTE - when N is not large, should use Sample Standard Deviation:
• This is because the true standard deviation should have been measured against the truth value μ, whereas we do not know μ and are measuring a dispersion against an estimator of μ, namely the mean value of x.

In this case, the number of degrees of freedom (NDOF) in the situation has been lowered by 1, leaving only N - 1 as our NDOF.

(Think about it this way, if N = 1 you would set μ = xi and you would find an unrealistic σ = 0....

... at least with the s definition above, the standard deviation is indeterminate.)

• Note that as N approaches infinity, the sample standard deviation approaches the standard deviation, since 1 / (N - 1) approaches 1 / N.

It is most customary to quote the random errors of an experiment as gauged by the standard deviation or the sample standard deviation.

• When you see or give values given in a form such as X +/- Y , generally the Y value (the error or uncertainty) is the standard deviation or sample standard deviation. Unless stated explicitly otherwise, it is typical to assume this is what is meant.
• Under these circumstances, often the Y value is referred to as the "1-sigma" (1σ) errors (the reason for this will become evident in another lecture).
• In plots of the values X, it is customary to plot error bars that are the size of "1-σ"; that is, the error bar extends 1-σ from X up to X+Y, and 1-σ down to X-Y.
• On occasion you may find people giving other-sized error bars (e.g., "2-σ"), but unless explicitly stated, generally 1-σ are assumed.

#### A Final Comment on Characterizing the Spread in a Distribution

In the above discussions we have assumed that the spreads in the distributions of results from multiple trials of the same experiment arose completely from random errors.

• In this case, the spreads we have calculated are giving us an estimation of the uncertainty in our estimator of the truth value.

Then we can say that the values of σ or s are the uncertainty in the mean, for example.

But sometimes we are measuring something that has an intrinsic spread in values, e.g., the height of students taking ASTR 3130.

• In this case, because not everyone in the class has the same height, we will measure a spread of values if we measure the height of everyone in the class, even if those measurements are infinitely precise.

• The measured spread will have contributions not only from the intrinsic spread of people in the class (which is relatively large), but also due to errors in our ability to measure the height of everyone in the class accurately (which is presumably much smaller).

• It is important to remember that in the case where there is a true intrinsic spread in what we are measuring, then the values of σ or s are not a measure of the uncertainty in the mean, but are measures (mostly) of the intrinsic spread of heights.

The error in the mean will actually be much smaller than σ or s, and this is a more useful descriptor of the uncertainty in the result (if the result is to determine the mean value of the experiment and not its intrinsic spread).

• One can prove that this is the case by thinking about what would happen to our value of σ or s if we substantially increased the number of students in the class (say by a factor of 10 or 100).

The spread of heights (σ or s) will be relatively unchanged, even though we know we should be getting a better estimate of the mean height of someone of college age as N substantially increases.

• We will discuss in a future lecture how to disentangle from the measured spread the contribution caused by errors in measurement and the contribution from the intrinsic spread.

Random & systematic error figure from Lyons Practical Guide to Data Analysis for Physical Science Students; 1991: Cambridge University Press; Cambridge. All other material copyright © 2002,2006,2008,2012 Steven R. Majewski. All rights reserved. These notes are intended for the private, noncommercial use of students enrolled in Astronomy 313 and Astronomy 3130 at the University of Virginia.