Thursday, March 12, 2009

What does uncertainty really mean?

In science, we try to report the results of our research in careful terms so as not to over- or under-state the implications. In particular, we must acknowledge that all measurements have some associated "error." What the layperson usually thinks of as error is that a mistake was made. This is not the meaning of error when scientists speak with statistics.

I think a relatively simple example is best. If you wanted to record the temperatures in the shade under your porch compared with the temperatures in the sun every Saturday over the course of a year, you might end up with 104 observations. 52 of them might be of the temperature under the porch and the other 52 would be of the temperature in the sun. Let's say you took those observations at about 5:00 every evening. For this thought experiment, you're using two simple dial-read thermometers for the measurements (like in this image).

Let's examine some of the sources of uncertainty (error).
  1. The dial is graduated to 2 degrees F of precision. That is, each small mark indicates 2 degrees F. You can guess that about half-way between two of the small marks is one degree. You cannot be much more precise than that.
  2. Each mark has some width. Should the temperature be read as 52 degrees F on the right edge, the left edge, or the center of the mark? Were you being consistent in the reading every Saturday, and for both thermometers?
  3. The needle of the thermometer does not stay a constant size; in warmer weather, it expands slightly and it contracts slightly in cold weather. Does this matter? It depends on how precise you want to be.
  4. Are you reading the thermometers at the exact same time every day?
  5. This particular thermometer is on a swivel mount; is it always in the same spot?
  6. How well calibrated are your thermometers?
  7. Do both thermometers record the same temperature when placed in the same location for the same amount of time?
Each of the seven points above add some amount of uncertainty, noise, error, whatever you want to call it, to every measurement, and we need to understand that uncertainty and report it. All of these errors must be accounted for, but we don't exactly remove those errors/noise when we report the temperature trends for the year, we report the errors as well as the measurements. There are various ways to make sure we get good estimates for all of the various errors. I don't have space for a class on statistical analysis or error propagation in scientific analysis. If you're truly interested, I can recommend a bunch of books to read.

Now, the measurements of a single yard do not necessarily say a whole lot about the temperatures in your neighborhood. Let's assume everyone in the neighborhood took the same measurements for their backyard and we wanted to report the neighborhood's average temperatures for each week of the year. Every reading is going to be different and when compiled together, there will be an uncertainty associated with each average. For example, we might report the neighborhood temperature as:
week 9; 45 +- 7 degrees F
week 10; 46 +- 9 degrees F
week 11; 42 +- 12 degrees F
week 12; 49 +- 3 degrees F

That +-7 degrees F for week 9 has some very important implications, and it's terribly imprecise at the moment. Is that 1-, 2-, or 3-sigma standard deviation? Did you include the systematic corrections? Did you account for all of the errors? How did you calculate this 7 degrees?

I just threw out a bunch of terms that may not be familiar. Let me define one of them. What is standard deviation? It's a measure of the variability of the observations. You have a ~68.3% confidence that any of your measurements are one standard deviation away from the mean value. Let's say that everyone was so careful in their measurements and calibration that the +-7 degrees F for week 9 is the 3-sigma standard deviation. That means that 99.7% of the neighborhood measurements were within 7 degrees of the mean value, 45 degrees F. If you only reported 1-sigma, only 68.3% of your measurements were within 7 degrees of the mean.

If you were to continue these measurements over a decade, you'd be able to say something about the long-term change and the short-term variability of the temperatures in your neighborhood. If you do this right, you might notice a pattern. Here's a good discussion of long-term change vs. short-term variability; I think it's written by someone who knows how to educate people.

What's the point of this terribly long and boring discussion? Well, scientists report the errors/uncertainty in their data as a method of limiting the strength of their conclusions. When they report that 99.7% of their measurements fall within a range and that range indicates such-and-such, that means there's only a 3/1000 chance of something outside that range being important to their conclusions.

Lots of people willingly or ignorantly seem to see scientific "uncertainty" or "error" or any other scientific language as a way to wiggle out of doing anything they don't want to do.

Obviously, scientists are only human; some may be sloppy and some may be dishonest. That's what peer-review and reproduction of results is about. Sometimes fraud slips through (and it's always reported as if this happens regularly), sometimes mistakes are made, but those are usually corrected rather quickly and are an embarrassment to the journal publishing the incorrect results. Often models are only approximations to reality. That's fine; they give us a way of understanding complex situations, not as a way of saying exactly what will happen.

Let's go back to measuring temperatures. Let's say that you were told that over the past 100 years, the mean temperature of the earth's lower atmosphere has increased by 0.75 degrees C, and that the data plotted here are plotted with a 95% confidence interval. (Note that these are just measurements, not models or approximations. The error bars are from the kinds of errors I discussed above.)

You are also told that this increase is very likely (>90% to >99%) to have been caused by human activity.

What does this mean?

First, what does that 95% confidence interval mean? It means that those little grey bars on the plot here are 2-sigma error bars. It essentially means that 19 of 20 observations were within the range of the bar-ends. Notice that while the errors were relatively large early in the measurements, the old measurements rarely overlap with the new measurements. That is, even if you were to take the warmest measurements from 1850 and the coldest measurements from 2000, we're still significantly warmer.

What is the meaning that there's a 90% probability that human activity has caused this warming? It means that after all the known errors in the measurements and in the models have been accounted for, we're virtually certain that this warming is due to human activity. It means that all of the other possible sources of error account for less than a 10% chance that global climate change is not mostly human-induced. It means that those who advocate doing nothing to slow down CO2 emissions are counting on a--at best--1 in 10 chance of being right.

When scientists speak in terms of uncertainty, they're being careful to note that there is noise associated with all measurements. When laypeople glom on to this language as a way to avoid doing anything because "there's uncertainty" or "there are errors in the data", they're being the opposite of careful. They're claiming that a 1 in 10 chance of not causing 1000 or more years of difficulties for our posterity is the right bet to make.

By the way, there is not a single climate model out there that can reproduce the observed mean temperature increases without including anthropogenic sources. None. There is no controversy among scientists. There may be controversy among politicians, media, and special interests, but no honest climate scientist thinks that humans are not a major cause of global climate change.

Who would take their children or grandchildren on a flight with an airline if they knew that it had a record of crashing nine times for every ten flights it operated? Who would take a flight with an airline if it even had a 1 in 10 chance of crashing?

So, why are we doing this with the future?


Jennifer said...

I didn't realize there were so many variables to consider. In my nutrition research recently, I have thought a lot about data values and have noticed the results vastly differ depending on who performs the study. I'm sure there is some degree of this in your field too. What is your recommendation to discovering which study is more valid over another?

I am Moses. said...

Those were just a few of the possible source of noise; there will always be sources that, for one reason or another, are considered unimportant, and there could be many sources of noise that aren't even considered.

As far as being able to decide whether one study is more valid than another... In my field, I first look at the methodology. If they have math that makes sense and is applicable to the problem, I'm more likely to trust their work. If they don't show any math, then I need to dig further into the work and what it's built on to understand where they get their results. Often I'll have to go through several levels of references before I can find the physics behind an argument. Sometimes, the physics is not directly applicable to the problem in the paper of interest because of certain simplifications. In either case, if the work is not in my specialty, I have to rely on the idea that the reviewers were reviewing work in their specialty and that it wouldn't have been published if it wasn't a valid interpretation of data or valid application of a model. In the case of competing publications, it becomes difficult, and I just have to dig deeper until I understand both arguments well enough to agree with one over the other...

In the case of nutrition science, I would expect that there's a lot more statistical analysis than anything else. In this case, you need to understand statistics pretty well to understand how the conclusions will apply to any particular application of the theory. Often, a different use of statistics will lead to a different conclusions, and you have to choose---based on your knowledge of the statistics involved---which approach makes more sense.

Let's go back to your post about the sine wave of doom. There's no statistics in this guy's post, there's no research, there're no references, so there is no way to know that this guy knows what he's talking about (from this one post---knowledge of the author is a good way of evaluating whether you should start reading with a lenient or strict view of their work).

So, we go by anecdote. Does this sine wave happen to you? Does it happen to your friends?

Sorry, but anecdotes are not data, they're informal answers to leading questions. Anecdotes cannot be used as valid answers to scientific questions, which is why statistics is necessary.

I'm not saying this guy is wrong, just that without any other evidence, I'd consider his post in light of what it is, an opinion, not science. I'm not even sure he's trying to DO science or report on science, so don't take this as a criticism of this person or his posts, just as an example of non-science.

Anyway, anecdotes taken as data/evidence is what happens when you hear about Washington DC having the coldest winter on record, and how that must mean global climate change is a hoax. Not really. That's one anecdotal report with no context and no understanding of what long-term trend vs. short-term variability is all about.

deborah said...

Hmm...I find it interesting that in the field of education one of the ways of documenting a child's progress lets say in math or reading is to make anecodotal records of the behavior you are observing on a regurlar basis. These notes do not stand alone but are used along with testing records to try and provide a more accurate picture of the child's progress etc. I can see how in science that it might just muddy the water.

I am Moses. said...

Hi, Debbie.

While a single anecdote is not at all useful in science, giving an example of continually observed behavior is another beast. If you observe (and record) certain advances in a child's reading or math ability, it can be useful to give examples to illustrate that. Especially useful, I would guess, would be a before-and-after comparison that is used to illustrate the advances you've observed (and recorded).