Skip to main content

Article Archive

Confidence Limits in Temperature Measurements

Date: May 01, 2003

G. Raymond Peacock, Inc.

Southampton, PA 18966-3836



The R&R (Repeatability and Reproducibility) of temperature-measuring devices applies to quantitative thermal imagers as well as the most precise temperature sensors used in standards calibration laboratories. Once you understand what’s involved with R&R and how it can affect the results of your measurements, you will think about real temperature and temperature difference measurements in a new way. The links to calibration and traceability are then relatively easy steps to take. The significance of calibration of a temperature-measuring thermal imager and the likely uncertainty of results in the field begin to make real sense. A better understanding of these measurement fundamentals can help you relate measurement results and their confidence limits.


Measurement science, called metrology by some, is a very precise discipline. It is best known for its use in National Standards labs, like NIST (National Institute of Standards and Technology) in the USA, and NRC (National Research Council) in Canada. When using measurement science, people are usually pushing the limits of their available technology to get the smallest measurement uncertainties possible. However, just because Thermographers are not, or don’t think they are, pushing the limits of available technology when measuring temperatures, it does not imply that they should be neglecting good measurement science practices in their work. Measurements are measurements regardless of who makes them and they have value depending upon the understanding and necessary care taken when the measurements are made. If you report measurements, you are in the measurements business and you should understand not only your equipment, and all the lore of thermography, but also about measurement science and the use of statistics. Actually, with software and compact computers available today, the statistics are the easy part. The hard part is deciding to follow the established practices related to good measurement science practices.

The object of this paper is to review some simple measurement science concepts and how they can be used in making thermographic temperature measurements and what needs to be reported of the data taken and the people and equipment making the measurements. An instrument reading of temperature needs to be well understood and sometimes challenged by the person responsible for the measurement or else the value and confidence in the measured values are greatly diminished. Confidence is, after all, one of the keys to customer satisfaction. If they are confident that you are doing your job correctly then your relationship will grow. Similarly, confidence in measurement results is a key to self-assurance; further, it can be quantified, or not, as part of the measurement practices followed.

It is also critical to realize that better measurement practices are an integral part of ISO 9000 and all modern statistical process control and maintenance reliability practices. The quality assurance wheel is still turning, even though it doesn’t make much press. Its impact will increase rather than decrease in the future, if, for no other reason, than increasing global competitiveness.

Basic Measurement Concepts

Measurement results can never be better than the basic measurement capability of a given measurement device. It is often overstated, by implication, in reported results by having too many significant figures in the results data. If a result is an average of say six measurements that mathematically work out to 23.33 °C and that precise value is reported, it implies that we have a measurement capability of 0.03 °C! We may be able to see a 0.1 °C temperature value, but certainly not 0.03 °C! So, common sense when reporting result values is important and should not imply that you have more capability than is true.

As an example, typical rulers used in carpentry are graded in 1/16th inch intervals. If one claimed a measurement capability of 1/64th of an inch with such a device, far better than its minimum measurement resolution, it would of course not be believed because it is impossible to achieve. Furthermore, anything less than 1/16th is suspect because that value appears to be the basic calibration limit of the device. We don’t usually have rulers certified and calibrated at the 1/16th inch level, but there are indeed gauge blocks and precision gauges used by machinists that are not only certified, but carry a correction in a certificate as a function of the block’s temperature, to correct for any expansion or contraction. Typically such blocks and gauges measure to within 1/10,000th of an inch or thereabouts. However, someone who used them would not claim measurement capability to within 1/100,000th of an inch.

What about thermal imager temperature resolution? We can see usually 1 °C or, on some units, 0.1 °C resolution. Does that imply a measurement capability? Some manufacturers, by implication, suggest that you can, when in fact you cannot. Most thermal imagers have, as a minimum, about a 2% accuracy specification, or something closer to about 2 or 3 °C calibration uncertainty. Clearly, such devices are different, as measurement devices, than common rulers. They have a calibration limit that is larger than the temperature resolution capability of the device. So, what would be the minimum believable temperature resolution? This depends on a few other things, but certainly no better than the manufacturer’s calibration specification. We’ll get back to those later. Typically the result of careful measurements is reported as a number, plus or minus an uncertainty value or a standard deviation value for the data set used to calculate the estimate. Say, for discussion’s sake, that the average measured value is 87.677777 °C ± 1.833 °C, where the 1.833 °C is further specified as the estimated standard deviation. The technically correct way to report these values for an instrument having a fundamental measurement capability of ±2 °C would be to round the values up to the nearest increment of resolution capability, or as:
88 °C ± 2 °C.

Since a thermal imager is an expensive, complex temperature measurement device, it is a primary, essential requirement that the measurement calibration uncertainty is well known, traceable, and its measurement stability known usually by a calibration history record. An expensive instrument without regular, periodic checks of one of its key capabilities is a wasted resource. If it is a prime source of income or plant evaluation, then you need to be sure that it functions at its best at all times.

Since a thermal imager is an expensive, complex temperature measurement device, it is a primary, essential requirement that the measurement calibration uncertainty is well known, traceable, and its measurement stability known usually by a calibration history record. An expensive instrument without regular, periodic checks of one of its key capabilities is a wasted resource. If it is a prime source of income or plant evaluation, then you need to be sure that it functions at its best at all times.

Many Thermographers will fall prey to the argument that is often made that they are not really measuring temperatures; they are measuring temperature differences in a scene. Therefore absolute calibration of a thermal imager is not a problem or concern. To any paranoid ear, that sounds like an excuse for not understanding how an instrument functions. The fact is, like so many “sales pitches”, there is an appeal to the argument, but it is seldom true.

There are two very important aspects of instrument performance that bear on the subject of calibration stability and uncertainty whether measuring temperatures or temperature differences:

1. The error in a measured temperature level varies with both errors in the instrument zero and gain values, whereas errors in temperature differences vary with the error in the calibration gain and not the zero level. If the calibration gain is off, then there will be a temperature level sensitivity in measurements of true temperatures and temperature differences or gradients. For instance, say the temperature difference in a scene between two points is 20 °C. One part is at 120 °C, the other at 140 °C. Now suppose that the system zero calibration has shifted by 30 °C. In that case the difference is still 20 °C. (Most people associate zero shifts that with a temperature difference-but, in fact gain shifts are just as likely to occur and are the source of serious measurement errors.) If the system gain has shifted, the difference will vary according to the amount of the gain shift. Take the same example where the output is related by a typical linear relationship; for example, where we assume that the bias is 0.0 and the gain is 10.0:

Output =gain x Input + bias

Factors: (bias=0 gain = 10) Inputs before Output Before Output Difference Input After Output After Output Difference
Change Zero by 10% 12, 14 120 °C, 140 °C 20 °C 12, 14 130 °C, 150 °C 20 °C
Change Gain by +10% 12,14 120 °C, 140 °C 20°C 12,14 132 °C, 154 °C 22 °C

Table 1Output effects of zero and gain changes.

If the gain shifts by +10%, a 20 °C difference will look like a 22 °C difference. It’s actually a lot worse than the example given, because thermal imager calibration is not linear, it is noticeably non-linear and gain calibration errors result in much larger temperature errors.

2. Knowing that your calibration is good under the fixed, stable conditions of a calibration environment is not enough. If you measure the same object in an hour or a day or a month from now, chances are very good that the measurement conditions, and possibly even the person making the measurements will not be the same. You need to know the calibration stability and the effect of each of the variables that can influence the measurement results. You need to be aware of how a measuring instrument behaves when conditions that could influence its measurements change.

One set of “simple” tests for stability and calibration checking is given for spot radiation thermometers in ASTM Standard E1256. It’s a good starting point for testing the stability of thermal imagers although more complete practices need to be available. Work on them has begun in ASTM Subcommittee E20.02, Radiation Thermometry. So, in order that a temperature differential measurement at one point in time can be compared fairly to another requires that the instrument be calibrated during both sets of measurements and that the effects of the likely influencing factors, that may be different each time, be known and any corrections carefully made.

Making absolute temperature measurements is yet another step in complexity, but it has the very same basics as a differential temperature measurement. Knowing that your calibration is correct is but the first step. You need also to know the effect of the major influencing factors involved in making practical measurements. You learn about and understand measurement science. It stands to reason that if you are reporting measurements that you understand their believability. Also, your calibration checking procedures need to have a root source that is at least 4 times better than the calibration sensitivity you are seeking.

As an example, consider the case where one uses the boiling point of water as a “reality check” on one equipment calibration. Unless one uses traceably calibrated thermometer to verify the boiling temperature of water, one must be careful to correct for local air pressure since the boiling point is pressure sensitive. Normal weather-related atmospheric air pressure variations introduce about a 0.8 °C uncertainty in the boiling point and the altitude at which the water is boiling can introduce an even larger error. The boiling point of water changes about –1 °C for each 355 meters increase in altitude. In fact, a boiling point apparatus makes a pretty good altitude meter. You need to know how your instrument calibration is established and maintained. If you use boiling water without an independent, reference temperature sensor to indicate the actual boiling point, you can expect that your system calibration to have an uncertainty of at least 3 °C, assuming you correct for altitude effects. It’s more if you don’t!

Now, assuming that you have a calibrated instrument and go into the field and make a temperature difference measurement, is one measurement enough? How many is enough? Do you know the ambient temperature, the atmospheric humidity level, the solar intensity, the temperature of the objects surrounding your measurement spot, who is operating the unit, what the various instrument settings are? Good, glad you do. Do you also know the impact each of these factors has on your resulting measurement values? Unless the manufacturer of the equipment provides that relationship, you will need to test the equipment yourself, or have it tested by a qualified laboratory. We recommend that you use some established practices as, for example, recommended in ASTM E 1256.

How well do you know the thermal settling time of your imager, say, when leaving an air-conditioned vehicle and walking into an area at an ambient that is 30 °F hotter? How does your thermal imager correct for the fact that it stabilizes (in one or two hours more or less) at a temperature that is 30 °F hotter than the temperature at which its calibration is certified? Do you know? If you don’t, you could be making significant measurement errors.

Characterizing the performance sensitivity of an imager is a matter for experts, especially the equipment manufacturers. If suppliers expect you to believe that instruments have a certain measurement capability, they should be following the same basic measurement principles that you need to follow in reporting results. They know, or should know, and be able to explain to you, the measurement capability details of their equipment in numerical terms. You may have to request the information because it is usually not included as part of the equipment specifications. In fact, the specifications produced by most imager makers are often vague and incomplete, leaving much to the imagination of the user. Part of the problem with imager measurement specifications is that the devices were developed as imaging devices and not quantitative measurement devices. The only measurement specifications that are of value for understanding measurement capabilities are those that come complete with uncertainty values at stated confidence levels under stated conditions of measurement. For example, temperature calibration is often expressed as accuracy. The preferred technical term is uncertainty, not accuracy, and it should be expressed in the same terms as NIST uses in expressing measurement uncertainty. NIST’s booklet, NIST Technical Note 1297:Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results includes an explanation of how they use the term. The booklet can be downloaded from the Web and is free by mail also.

Basic Measurement Statistics

Measurements of objects having temperature variations made with devices that are slightly imperfect require that an average measurement be determined. Individual measurement results are, in reality, samples from the range of possible values that the instrument reports. There is a true average value and some variability about that average. If we take only one measurement, we could be anywhere within the range of possible values. However, if the factors causing the fluctuations are random, then the effect of making additional measurement is well known and explained in simple statistics. An excellent reference to both measurement statistics and temperature measurement and calibration is the book Traceable Temperature by J.V. Nicholas and D.R. White, (John Wiley & Sons). Some of the important definitions used in statistics are defined in Table 2 below. Please note that a major shift has occurred in US industry over the last 10 years or so. Measurement practices are being tightened up in all industries as ways to help improve global competitiveness. The techniques and resources are well established since they have been practiced without interruption by the military, power-generation and aerospace industries since the 1950’s.


Definition or Source

Mean or Average

Tav = (T1 + T2 +.. + Tn)/n

Estimated Standard Variance

s=  {(T1– Tav)+(T2– Tav) +..(Tn – Tav)}2 /(n-1)

Standard Uncertainty

uc = Square root of (s2)


k from a table of t-values vs. (n-1) and p

Expanded Uncertainty

U = k * uc

Table 2-Measurement Terms & Statistics


Confidence Limits and Levels

The resulting confidence limits, the real object of this paper, and level of confidence are directly related as shown in Table 3. They are based on the variability in measurement results. The confidence limits are related to the size of the standard variance and uncertainty. The confidence level results from the statistics of random errors and describes the percentage of readings that will be within the desired confidence limits.

So, how does one achieve the confidence levels in temperature and temperature gradient measurements with a quantitative thermal imager?  It is a big question and one that cannot be answered quickly or easily because of the many factors involved. However, the steps to obtain the limits are rather straightforward and can be easily outlined. There are two big steps with lots of little details to be acquired in the first.

Step 1: Determine the confidence level that you can achieve in measurements in the field. That involves knowing your equipment’s calibration uncertainty and its likely measurement uncertainty under less than ideal conditions. We’ve touched on that, but there is also a series of tests called R&R tests to measure the influence of the equipment operator(s) on the measurement results. If properly done, the field variability sensitivity and the operator influences can be grouped together in one set of tests.

Step 2: Determine the confidence level that your customer requires. If the two levels do not match at the outset, you could be in trouble or in roses, depending on which is larger.

If you are in “trouble” there are two options, they are:

Option 1: If the customer requests smaller measurement uncertainty or better capabilities than you can deliver, one could explain that your measurement capabilities are as you have measured and documented and represent a realistic appraisal of the capability of state-of-the-art equipment and trained operators.
This option, of course, assumes two things: first, that your conclusions are true, backed by documentation and second, that the customer may be seeking unrealistic measurement performance. You should be able to convince the customer that you are competent and request that similar documentation be provided from any competitor. It doesn’t always work, partly because some customers refuse to become better educated, and also when the requirements are really better than your capability.

Option 2: If the customer really needs measurements better than your best capabilities, you could undertake improving them. Having assessed your present capability carefully, you would have a very good idea of where to begin such improvements and what the cost tradeoffs would be.

But there is one more step to be considered before you have a complete understanding of your measurement capabilities or confidence limits, i.e., the combined effects of instrument errors, operator skill and measurement condition influences.

Repeatability and Reproducibility

One way the overall effects of instrument calibration uncertainties and other variances due to operators can be determined is through a set of controlled R & R tests, or Repeatability and Reproducibility tests. The basics of R&R testing lie in statistical results from controlled tests.

There is a well-defined formalism used, for example, by The Automotive Industry Action Group, AIAG. They are one of the biggest driving forces (no pun intended) in improving production quality in North America and have published a series of booklets and practices recommended for measurements and measurement devices. Any company expecting to do business with a major auto manufacturer or their suppliers in the USA, Canada or Mexico must follow these practices in order to be a minimally acceptable supplier.

Included in AIAG’s basic measurement quality assurance are R&R measurement procedures for testing equipment and operators. Although written primarily for dimensional gauging (a significant portion of automobile production quality requirements), the practices are applicable to any measuring device. Within the automotive industry, this type of testing is often called GRR, standing for Gauge R&R. The handbooks and sample data sheets are available at modest fees from the AIAG and some of the software vendors to the industry.

Within the semiconductor industry, a very similar need was evident. They worked with their own research and production resources and NIST to develop a measurement practices policy that follows the same methodology as the AIAG’s. The resulting measurement practices handbook is

freely available on the Web and can be viewed and downloaded from the NIST web site. The version of the Handbook on the NIST web pages is integrated with the Dataplot statistical software. In order to use Dataplot from the Handbook, it must be downloaded and installed on your computer.

The basic procedure for R&R testing is also straightforward. One starts with a calibrated measurement device and has an operator measure a variety of objects, usually about three to five, each having a different value, that are different but not necessarily known. The only requirement is that they do not change during the tests.  Several operators using the same objects perform the same set of measurements, usually with only one instrument shared among them. That corresponds to one testing round. Then the round is repeated, usually two to five times.

If different environmental conditions are likely to affect the results, then one or more of the objects can be in different real or simulated environments. The key is to have each operator or “appraiser” repeat the same measurement more than once, usually a minimum of two or three times. Each of the operators measures the same objects. This enables one to statistically evaluate and separate the effects of the operator, the effects of the equipment and the effects of the environment. It also enables one to determine the statistics related to the combined effects of all the major influencing factors.

There are numerous software packages on the market as well as a complex, but free package, Dataplot, from NIST that will not only help one set up R&R tests, but can also guide one through the test steps and calculate the resulting statistics from the measurement data.


    1. Temperature measurements made with thermal imagers are like any other measurement; they have built-in errors.
    2. There are well-established methods for assessing such errors and reporting measurement results with confidence limits to meet the users expected measurement confidence levels.
    3. It is in your best interest to begin to practice good measurement science in order to responsibly qualify your measurement capability and measurement results with confidence factors that can enable you to meet the expectations of your customers.

If you don’t follow good measurement practices you will lose to the supplier that does.


ASTM Standard E1256, Standard Test Methods for Radiation Thermometers (Single Waveband Type). W. Conshohocken, PA: American Society for Testing and Materials, 1995. (On the web at – downloadable  for a small fee)

Nicholas, J.V. and D.R. White, Traceable Temperatures Second Ed., John Wiley & Sons, Ltd., 2001.

Taylor, B. N. and C. E. Kuyatt. NIST Technical Note 1297:Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, National Institute of Standards and Technology, US Dept. of Commerce, Gaithersburg, MD, 1994. (On the web at:

AIAG-Automotive Industry Action Group, MSA-3 Measurement Systems Analysis (MSA) Third Edition for Automotive QS-9000 Suppliers,  2002,  (Can be purchased by telephone from AIAG Customer Service department at (248) 358-3003 or on the Web at: Automotive Industry Action Group (AIAG), 26200 Lahser Road, Suite 200, Southfield, MI 48034

Engineered Software, Inc. Measurement Assurance (Software supporting the analytical techniques detailed in the Automotive Industry Action Group (AIAG) Measurement System Analysis Manual), Engineered Software, Inc., 43737 Timberview Drive, Belleville, MI 48111 (On the Web at:

Croarkin, C., Editor, Measurement Process Characterization, Chapter 2 in the NIST/SEMATECH e-Handbook of Statistical Methods, National Institute of Standards and Technology, US Dept. of Commerce, Gaithersburg, MD (On the web at

Filliben, J. J. and A. Heckert, Dataplot (A free, public domain, multi-platform {Unix, VMS, Linux, Windows 95/98/ME/XP/NT/2000, etc.} software system for scientific visualization, statistical analysis, and non-linear modeling with GUI interface by R. R. Lipman), National Institute of Standards and Technology, US Dept. of Commerce, Gaithersburg, MD 1978-2002 (On the web at


Latest Articles

Pay Attention or Pay a Price

July 01, 2024

Planned Maintenance as a Safety Requirement

January 31, 2024

How to Use Reliability to Offset Supply Chain Issues

May 01, 2023

Basic Inspection Tools Are Vital to Improve Your Condition Monitoring Process

January 01, 2023

IR Inspection Windows – See What You’ve Been Missing

July 01, 2022