[an error occurred while processing this directive]
ASTRONOMY 3130: Guide to Making Proper Scatterplots
This webpage has borrowed figures and content from the following websites, which are also
In this class, and very often in a scientific career, we will need to
understand how various measurements vary as a function of some other variable.
The goal is typically to see whether two properties are correlated, revealing
a dependency that demonstrates a physical connection.
This is best done by plotting the data in what is called a "scatterplot",
which shows how one variable varies as a function of the other.
Once a pattern emerges that suggests a correlation/dependency, it is typical
to look for a fit of the trend that perhaps reveals a fundamental
This webpage lays out the standard expectations for doing a proper scatterplot.
I will expect the conventions laid out here to be followed in your lab reports.
I. BASIC RULES FOR PLOTTING SCATTERPLOTS
You should ensure that you follow these basic rules when plotting a graph:
- The independent variable is always plotted along the abscissa and the dependent
variable is plotted on the ordinate.
Be clear of the definitions of the above terms:
- The independent variable is the variable that you control and adjust.
For example, in Lab 1 this could be the magnification, which you are
changing from high to low as you change eyepieces from short to large
- The dependent variable is the variable that you don't know before the
experiment begins and is the one you are measuring.
For example, in Lab 1 this would be the fields of view that you measure for each
If x is the independent variable, the dependent variable is f(x).
Examples of the proper orientation of variables on a plot: The dependent variable is
on the ordinate (y-axis). From
- If you have ordered pairs of points to plot, the abscissa is the
element of the ordered pair that corresponds to the horizontal axis
(sometimes called the "x-axis").
- When you have ordered pairs of points to plot, the ordinate is the
element of the ordered pair that is plotted along the vertical axis (sometimes called the
Plot numerical scales along each axis.
Your plot should have regularly spaced tickmarks and numerical scales along each axis. You do not have
to give a number for each tickmark, but you should give enough numbers that the reader
can easily interpret the plot.
Often one can make neater plots by having your "major" tick mark intervals
subdivided into "minor" tick mark intervals; one should only label
the major tickmarks.
Use of major and minor tick marks.
Make good use of available space.
The range of the numerical scale for each axis should be sensible -- large enough to include
all data of interest, but small enough that there aren't large sections of the plot that
are empty, serve no purpose, and have the net effect of compressing the available space
for the plotted data. The plot itself should be large enough and neatly
drawn so as to be easily legible.
Three examples of scatterplots making poor use of space and one (the last one) with a good
use of space.
Despite the admonition to make good use of space, it is often helpful to include the origin
(0,0) in the plot, if it isn't going to make the plot ridiculous in the use of space.
Sometimes you will have spurious, outlier points that are very distant from the main
group. In the case of ridiculously separated points,
rather than squeeze a lot of points into a small area in the plot just so you
can include one extreme outlier into the plot, it is acceptable to leave the outlier
out of the plot, but indicate its presence with an arrow pointing to it just inside
the nearest edge of the plot to the point. Generally, any such point is being
ignored in the estimation of any linear trend from the data (see below).
Example of a scatterplot showing an obviously suspicious outlier.
Example of a scatterplot (right hand panel) where some points are so far from the main
group of points that their positions are only shown by arrows to indicate the direction
they lie (and a label stating the abscissa point corresponding to that point, so the reader
can determine how far from the plot the point lies).
From Munoz et al. (2005, Astrophysical Journal Letters, ApJ, 631, L137,
Always label what variable is plotted along each axis. These labels should
also make clear what units are being used for the variables being plotted.
For example, in Lab 1 you will be making a plot that should have on the
ordinate a label of "field of view (arcminutes)". If you do not put the units
it will be impossible for the reader to know whether you are taking about
degrees, arcminutes, radians, etc.
In the "Car Wash" plot above, you can see that the revenue is in "$". In the
golfing plots you see the units are speed in mph and distance in yards.
Plot your (x, f(x)) pairs on the plot using points or other clear symbols.
If you are plotting multiple data sets on the same plot, make sure each set has a
unique and easily distinguishable symbol (or color) for all its points.
Include a legend
in the plot that explains what the different symbols mean.
Example of a scatterplot (showing what relationship I cannot understand!)
showing two sets of points and best fit lines. The
"legend" in the upper left hand diagram explains the points and the lines associated
with each set of points.
Do not use a bar graph!
Many commercial plotting packages are made with business, not science, applications in mind,
and their default is to plot vertical bar graphs, not point plots. Unless you are intentionally
trying to make a histogram you should not use a bar graph.
Use of a bar graph when a scatterplot is appropriate. Do NOT do this!!
If you know the uncertainties in your measurement, plot the points with appropriate
eror bars showing the uncertainties in both the downward and upward directions.
The error bars are made from the center of the point with a length equal to the estimated
uncertainty in each direction. For example, in the figure below, the uncertainty in the
ordinate is 0.2 units, so there is an upward error bar of 0.2 units length and a
downward error bar of the same length.
Error bar construction from http://www.upscale.utoronto.ca/PVB/Harrison/ErrorAnalysis/Graphical.html .
Example of a scatterplot with the uncertainties in the ordinate shown as errorbars.
Drawing only vertical error bars is
typically done when you very accurately know, or set, the abscissa values, so that
their uncertainty is negligible. Note in this case that each point has a different
Example of a scatterplot with the uncertainties in both directions shown as error bars.
Do NOT connect the dots!!
Many plotting programs have a feature that connects adjacent points with lines.
Do not play "connect the dots"! Rarely is this conveying useful information.
The plot below is NOT what to do:
Example of a BAD scatterplot, which, among numerous other problems (discussed below)
is connecting the dots.
In most cases your data are not precise enough to be considered "absolute" -- that is, the
measurements always have uncertainties (random errors) which scatter them from any
actual perfect global trend. This means that any trend from one individual point to the next
one is not very meaningful and we are only concerned with estimating from the entire ensemble
of points what the global trend is. So it is totally worthless and meaningless
to "connect the dots" on a graph. Instead...
Draw a line of best fit.
Most commonly (but not always) we will be seeing a linear correlation between dependent
and independent variable. You should estimate this linear trend by plotting a line
that passes through or near as many data points as you can. (If you suspect the correlation
is not linear, you can draw a curve of the shape that you expect for the correlation.)
Example of a GOOD scatterplot of the same data
as shown just above, which, among numerous other positive aspects (discussed below)
is fitting the lines with a mean trend.
Beware of outliers and influential observations when drawing a best fit line and
interpreting your results.
Often you will have outlier measurements -- anomalous results that are simply way off the
trend of the rest of the points (even if their formal uncertainty is small). An example
of this was already shown above. Generally
these happen for unknown reasons, but typically it means that something just went plain
wrong in the experiment (e.g., a human error of some kind).
It is generally considered acceptable to ignore
truly suspicious points in your linear fit; but you should clearly mark such
ignored points (for example, by circling them).
Example of a scatterplot showing an obviously suspicious outlier, which is circled
and ignored in the fitting of the best fit (regression) line.
In many plotting routines this line can be more accurately calculated using linear
regression. Such routines not only more accurately calculate a best fitting line,
but they can automatically and objectively reject outliers (e.g., through
"iterative n-sigma rejection") and
can weight the points that influence the fit by their uncertainties.
Also be aware of influential observations in a scatter diagram. These are isolated points
separated by a wide horizontal separation from the majority of other points. These isolated
points may have no better precision than the other points but, by virtue of their placement
in the plot, can have
undue influence on a fit or on the interpretation of the overall trend.
Example of a scatterplot showing several points on the right that have a great
influence on the overall position of the best fit line.
One can tell whether a point is an "influential observation" if when the point is removed
from the plot or should happen to be relocated, the position of the
best fit line changes substantially.
Put a title above the graph or make a descriptive caption for it (beneath the figure).
For example, a plot in Lab 1 should have the title "Field of view as a function
II. EXAMPLES OF GOOD AND BAD PLOTS
Below we have two plots of the same data, with one showing a BAD graph with
a number of problems we already discussed, and the other showing how to do things properly.
The above plot that has violated at least four of the rules listed above, and even if the
data points are correctly plotted, you can't interpret what they mean:
Here is a better version of the same plot (though, without error bars still):
- There is no title or caption. What is this a plot of?
- There are no axis labels. What are those numbers representing?
- There are also no units on either axis. Again, what are these numbers and on what scale?
- There is no reason to "connect the dots" -- these lines have no meaning and their
slopes are clearly dictated by the random errors that are scattering the points away from
the main trend.
- No error bars plotted.
Here is another example of a bad/good pair of plots of the same data. Can you identify
the problems in the "bad" one?
From [can't find source...].
III. HOW TO PLOT GOOD SCIENCE GRAPHS IN EXCEL
It's perfectly acceptable to hand in neatly hand-drawn plots for this class (but do them in
On the other hand,
there are many commercially made plotting programs one can use to make good scatterplots,
but often students are most familiar with Excel.
I found this video, which shows
how to use Excel to make reasonably good plots.
Aug 2012 by srm4n