ASTR 3130, Majewski [FALL 2012]. Lecture Notes
ASTR 3130 (Majewski) Lecture Notes
ERROR ANALYSIS, PART II: RANDOM AND SYSTEMATIC ERRORS
REFERENCE:
L. Lyons, A Practical Guide to Data Analysis for Physical Science Students,
Chapter 1, pp.115.
What is error analysis? Why do it?
 We rarely find "truth" by performing an experiment once.
 Repeat experiments with refined techniques and methods
> Approach truth asymptotically (or at least we have confidence that we do).
 "ERROR" = Difference between calculated / observed value and truth value.
 Of course, we do not usually know "truth" (or there would be no reason
to do experiment).
 So we need to come up with ways to determine from the data
themselves how much confidence to have in the results (reliability).
 This is the subject of Error Analysis:.
 Error analysis is a required  and very important  part of scientific research.
 It's not enough to give the results of an experiment 
we have also to give some assessment of the reliability of the result
you give.
This alerts the user of the information about how much trust one should put in the
result.
 We often do this by actually giving an estimate of
the level of unreliability in the experiment.
This is done by quantifying an uncertainty in the result.
 This expected part of your stated results is the only
way to assess whether theory proven/expectations met.
 We typically give a quantitative assessment of the result of an experiment
as a number appended with a "+/" value that accounts for the uncertainty on the result (in the same
units):
e.g., in the form y +/ε , where ε is the uncertainty.
Sometimes we see the form y +/ε +/δ , where ε is the contribution
to the uncertainty from random errors and δ is the contribution from systematic errors
(see below).
 We also display errors graphically with use of error bars in plots showing our data:
Examples of the use of error bars in plotting data.
From http://www.upscale.utoronto.ca/PVB/Harrison/ErrorAnalysis/Graphical.html and
http://support.dundas.com/OnlineDocumentation/WebChart2005/ChartType_Images/ErrorBarsChart.png.
Example from Lyons:
You do an experiment to measure the acceleration due to gravity and find
g = 9.70 m/s^{2} +/ 0.15 m/s^{2}.
Here you are stating that the uncertainty in your result is 0.15 m/s^{2}.
Now, the actual known value of g found after numerous experiments over the centuries, is
g = 9.81 m/s^{2}.
What do we make of the discrepancy between these values? Consider three possibilities:
 Since we said that our uncertainty in our experiment was 0.15 m/s^{2}, the
two values of g are actually in agreement, given the quoted reliability (uncertainty)
in our experiment.
g = 9.70 m/s^{2} +/ 0.15 m/s^{2} is consistent with 9.81 m/s^{2}.
 However, if our quoted uncertainty had been +/0.01 m/s^{2}, then our measured
result would be greatly at odds (by 11 times the quoted uncertainty in our experiment)
with the accepted value.
g = 9.70 m/s^{2} +/0.01 m/s^{2} is NOT consistent with 9.81 m/s^{2}.
In this case, one of several things might be going on:
 We have greatly underestimated the real uncertainty in our experiment.
 We have estimated the uncertainty of our experiment properly, but somehow our
experiment was biased with an unexpected systematic error
to give a value offset from the norm (see description of systematic errors below).
 We have estimated the uncertainties properly and our experimental is giving a proper
value; hence we have made a discovery with a significance of 11 times the uncertainty
in our experiment (which is quite large).
(Perhaps the gravity we measured locally has been skewed by local geology
beneath the test apparatus or as a result of our altitude.)
 Let's say that we had evaluated the true uncertainty in our experiment to be +/5 m/s^{2}.
Our result is now consistent with the true value of g, but the accuracy of our measurement
is so poor that it is incapable of really distinguishing even significant differences between what
we measure and the real value of g, and our experiment is not really very useful.
g = 9.70 m/s^{2} +/ 5 m/s^{2} is consistent with 9.81 m/s^{2},
but is consistent with a huge range of other possible values as well.
In this case, that we got as close to 9.81 m/s^{2} would have been a case of good luck, because
we might well have also measured 7.1 or 13.9 m/s^{2}.
As this example demonstrates,
how we evaluate our experiment greatly depends on the numerical estimate of the uncertainty
(accuracy) of our results.
Now consider the above three cases where the quoted uncertainties had been correctly determined
(i.e. +/0.15, +/0.01 and +/5 m/s^{2}, respectively), but we
never cited those uncertainty values:
No one would be able to judge the meaning/significance of your
experimental result.
Types of "Error"
 Illegitimate Errors
 These are outright mistakes or blunders in computation, measurement or recording of a result.
 Dealt with by performing previously erroneous operation correctly.
 This is a source of errors one typically does not discuss in reporting scientific results:
Your colleagues (and professors!) expect that if you are reporting a result to them you are competent
enough to do the corresponding computations and measurements correctly and that you have checked
your work before reporting them to others!
 Systematic Errors
 Systematic errors can arise from:
 Faulty calibration of equipment.
 Faulty use of equipment.
 Bias of the observer (e.g., how one observer's eyes perceives a star against the reticle in an
eyepiece compared to the perception by another observer, or e.g., how one reads the needle on a dial
compared to another person).
 Poor assumptions of how to do experiment properly or how to analyze/interpret
the results.
 Systematic errors are usually not easy to detect!
 Have to sense there is a problem (e.g., from theory or by seeing differences between
different experimental apparati).
 Have to assess experimental conditions or techniques for possible errant influences.
 A key to systematic errors is that they are reproducible:
When the experiment is repeated in the same way, you always get
an answer that is offset systematically from truth in the same way.
 We refer to this systematic offset as a bias in the experiment.
 Examples of systematic error:
 HST spherical aberration: Test rig at wrong
distance, mirror ground incorrectly.
 Using a metal measuring stick ruled at 25^{o} C
at a temperature of 0^{o}.
 The history of measuring Hubble's constant.
 Ideally systematic errors should be absent, but they should always be checked for.
 A key way to do this is to test your apparatus on something where
you know the correct answer.
For example, if you suspect your ohmmeter is measuring resistances incorrectly,
one could measure some resistors with known values.
Or to check your wristwatch,
you could compare its reported time against that from the atomic clock at the U.S. Naval Observatory.
 Calibration of your experimental apparati is a key part of experimentation
that is intimately related to whether it will bias your results.
For example, if you find your wristwatch is running 10 minutes fast, then to use your
watch to obtain accurate time you could either:
 calibrate (i.e. correct) your watch against an accurate timepiece (e.g.,
the atomic clock), or
 you could remember that your watch is biased to be 10 minutes fast and
always account for (i.e., subtract) that bias every time you get the time from your watch.
 Random Errors (Statistical Errors)
 These are fluctuations in observations that lead to different
results each time same experiment performed.
 These are irreproducable  You don't get exactly the same answer each time
the experiment is performed in the same way.
 Random errors are an indefiniteness coming from either:
 An inability of your measuring device to give
infinitely accurate answers.
 Nature: Fluctuations that occur in observations of a
small sample drawn from a large population.
 The effect of random errors is to produce a spread of answers around some mean
value (which is hopefully near the correct value).
For example, note the spread out distribution of results of repeated experiments shown below; random errors create the spread, and result in the displacement of, e.g., the highlighted measurement from the mean value:
From http://www.uiah.fi/projects/metodi/evirheet.gif.
Note that in the case shown, the mean value of the distribution is actually offset from
the correct value that should have been gotten, so that there is also apparently
a systematic error affecting the results of the experiment.
 As we will see below, we can get some idea of the size of random errors
affecting our experiment by simply analyzing the spread in the distribution
of answers we get when we repeat the experiment.
 Examples of sources of random errors:
 Making mm measurements with a meterstick ruled only in cm increments.
 Counting # of photons received from a weak source in a relatively
short time span (e.g., 1 sec).
 While systematic errors are often hard to overcome, we often have
an easier time with random errors, which we can do something
about:
 Improve precision of equipment.
 Count more events (e.g., expand the time interval to count photons arriving to 100 sec).
 Repeat experiment and average results (the more the better).
Examples: Systematic vs. Random Error
 Thought experiment: Imagine a wristwatch with only hour and minute hands but no
second hand and running 2 minutes late.
 In this case, what is the character of the expected
systematic and random errors?
 Accuracy vs. Precision  THEY ARE NOT THE SAME!!
 Accuracy  How close experiment comes to true value.
Has primarily to do with Systematic Errors.
 Precision  How exactly is result determined without
reference to what result means or how it compares to "truth".
Has primarily to do with Random Errors.
 Example distributions of measurements in an experimemt
(where x_{0} denotes the "truth value sought"):
(a) This distribution shows random errors
(i.e. the mean value of many trials of the experiment gives a
result that is accurate but the individual measures themselves are not very precise).
(b) This distribution shows primarily systematic errors
(i.e. the experiment is able to give very precise results
but they are not very accurate).
(c) This distribution shows both random &
systematic errors (the experiment is giving results that are neither precise nor accurate).
 Example: The Hubble Space Telescope mirror was ground very precisely
but to an inaccurate shape. This resembles distribution (b) above.
 Here is another example  shots fired at a target  that vividly demonstrates the concepts of
precision, accuracy and bias:
From http://www.behav.org/ecol/wildlife/w_05_design.htm.
We often want to describe the precision of an experiment or a piece of equipment. This can be
done in several ways:
 Absolute Precision
 Magnitude of uncertainty given in same units as result.
e.g. 30 km/s +/ 3 km/s
 Relative Precision
This is a fractional uncertainty, often given as
a percentage.
e.g. (3 km/s) / (30 km/s) = 10 % error
From Lyons: A good experimentalist is one who minimizes and realistically estimates the random
errors of his apparatus, while reducing the effect of systematic errors [as much as possible].
Estimating the Truth  Simple Cases
 In doing a measurement or experiment we hope to asymptote to truth value (we'll call that μ) by beating down random errors (and hoping systematics are not present).
 Normally we repeat experiments and get a distribution of results
x_{i} meant to get at the value of μ (like the distributions we have shown above).
 In error analysis, we use the x_{i} to estimate truth
value and to assess the reliability of the result.
 An estimator of the truth value is also called the "expectation" of the true
value.
 There are often a variety of estimators possible to give an expectation
of the truth.
 A simple estimator you know is the mean value:
 The hope is that we can use averaging to approach the truth:
 But the mean value of the many trials of an experiment
may not be a good estimator if the distribution x_{i} is not
symmetric.
For example, look at the mean value of this distribution, which is affected by the
presence of one experiment trial that gave a very high ("outlier") result:
(NOTE: There are ways to deal with outliers during averaging processes
by iteratively identifying and "throwing out" outliers.)
 A more reliable assessment of the truth might be the "mode" (the most common
x_{i} value)  indicated below by x_{max}.
 BUT  The precision of this estimator depends on the
"bin size" of how you "histogram" the data, because this sets
the "resolution" spacing of x_{i}.
That is, the precision in the modal value can be no finer than the selected bin size.
 Note that the size of bins does have some influence in determining the modal
value, e.g.:
 One might want to make the bins narrow to improve resolution, but
making the bins so fine that all bins have either 1 or 0 counts in them is
of no use.
 Using the mode as an estimator requires both a large number of measures and an
appropriate "binsize" to overcome random fluctuations
that could make any particular bin the "highest" by chance.
 The "median" is an estimator that is more robust to outliers and is useful when
you don't have a very large number of x_{i}.
 The median is defined as a value (x_{1/2}) for which
half of observations/measures lie above and half below:
 Example of mean, median and mode for a particular distribution x:
 x_{max}, x_{1/2}, and can each
be used as an
estimator of μ, but each is best used under different circumstances:
 : Useful
when you have small N (few measures).
 , useful when you have very many measures.
 x_{1/2}: The median is useful when you have a moderate
number of measures (say, N >~ 10).
Also when it is apparent that there are "outliers" in the distribution.
But these rules are not firm.
Characterizing the Uncertainty in the Truth
Note, the word "error" is sometimes used for the word "uncertainty".
 Some scientists abhor this usage, but MANY others (like me!) were
raised routinely using the word "error" to mean "uncertainty".
 As John Taylor says in his An Introduction to Error Analysis,
the words error and uncertainty
can be treated as interchangeable in the current context.
Characterizing the "width" of the distribution leads to an estimation of the
"uncertainty" in μ.
There are various ways of doing this.
Assume that our preferred estimator of
μ is the mean value of our multiple experiments:
 Deviation (for one measure):
 Mean Deviation:
 Variance (Easier to calculate):
 Standard Deviation = "Root Mean Square"
 NOTE  when N is not large, should use
Sample Standard Deviation:
This is because the true standard deviation should have
been measured against the truth value μ, whereas we do not know μ and
are measuring a dispersion against an estimator of μ, namely the mean
value of x.
In this case, the number of degrees of freedom (NDOF) in the situation
has been lowered by 1, leaving only N  1 as our NDOF.
(Think about it this way, if N = 1 you would set μ = x_{i}
and you would find an unrealistic σ = 0....
... at least with the s definition above, the standard deviation is indeterminate.)
 Note that as N approaches infinity, the sample standard deviation approaches the standard
deviation, since 1 / (N  1) approaches 1 / N.
It is most customary to quote the random errors of an experiment as
gauged by the standard deviation or the sample standard deviation.
 When you see or give values given in a form such as X +/ Y , generally
the Y value (the error or uncertainty)
is the standard deviation or sample standard deviation.
Unless stated explicitly otherwise, it is typical to assume this is
what is meant.
 Under these circumstances, often the Y value is referred to as
the "1sigma" (1σ) errors (the reason for this will become evident in another lecture).
 In plots of the values X, it is customary to plot error bars
that are the size of "1σ"; that is, the error bar extends 1σ from
X up to X+Y, and 1σ down to XY.
 On occasion you may find people giving othersized error bars (e.g.,
"2σ"), but unless explicitly stated, generally 1σ are assumed.
A Final Comment on Characterizing the Spread in a Distribution
In the above discussions we have assumed that the spreads in the distributions of results
from multiple trials of the same experiment arose completely from random errors.
 In this case, the spreads we have calculated are giving us an estimation of the uncertainty
in our estimator of the truth value.
Then we can say that the values of σ or s are the uncertainty in the mean, for example.
But sometimes we are measuring something that has an intrinsic spread in values, e.g., the height
of students taking ASTR 3130.
 In this case, because not everyone in the class has the same height, we will measure
a spread of values if we measure the height of everyone in the class, even if those
measurements are infinitely precise.
 The measured spread will have contributions not only from the intrinsic spread
of people in the class (which is relatively large), but also due to errors in our ability to
measure the height of everyone in the class accurately (which is presumably much smaller).
 It is important to remember that in the case where there is a true intrinsic spread
in what we are measuring, then the values of σ or s are not a measure of
the uncertainty in the mean, but are measures (mostly) of the intrinsic spread of heights.
The error in the mean will actually be much smaller than σ or s, and this is a more useful descriptor of the uncertainty in the result (if the result is to
determine the mean value of the experiment and not its intrinsic spread).
 One can prove that this is the case by thinking about what would happen to our
value of σ or s if we substantially increased the number of students in the
class (say by a factor of 10 or 100).
The spread of heights (σ or s) will be relatively unchanged, even though we know we should be
getting a better estimate of the mean height of someone of college age as N substantially
increases.
 We will discuss in a future lecture how to disentangle from the measured spread the contribution
caused by errors in measurement and the contribution from the intrinsic spread.
Random & systematic error figure from Lyons Practical Guide to
Data Analysis for Physical Science Students; 1991: Cambridge
University Press; Cambridge. All other material
copyright © 2002,2006,2008,2012 Steven R.
Majewski. All rights reserved. These notes are intended for the private,
noncommercial use of students enrolled in Astronomy 313 and Astronomy 3130 at the
University of Virginia.
