Proxy Bias: Simple Explanation + Example

Proxy bias occurs when the proxy variable used is systematically different from the variable of interest. A proxy or surrogate variable being a variable related enough to the variable of interest to be used as its substitute.

But why use a proxy in the first place?

One reason could be because the variable of interest cannot be measured directly. A good example is hidden constructs. These are subjective ideas such as:

  • Intelligence
  • Loyalty
  • Job satisfaction
  • Quality of life

They are said to be hidden because we cannot measure them directly, and therefore we need substitutes such as IQ or school grades as proxies for intelligence.

Other reasons for why we might need a proxy include cases where:

  • the variable of interest is expensive to measure (in terms of time or money)
  • the variable of interest has a lot of missing values (and therefore cannot be used in statistical analyses)

What constitutes a good proxy?

A good proxy variable has 2 attributes:

  1. it should not be biased: A good proxy is NOT required to have the same values as the variable of interest, as it is not a direct measure of it. So it is perfectly fine for a proxy to have some variability (or noise) as long as it is not biased in some direction.
  2. it should be highly predictive of the variable of interest: The relationship between the 2 may not be linear– so a high correlation coefficient is not required. However, a good proxy SHOULD predict the variable of interest with high accuracy.

Example of proxy bias

Can caregivers be used as proxy for dementia patients to assess their quality of life?

Let’s find out!

As dementia involves a decline in memory and reasoning capabilities, assessing the patients’ health-related quality of life by using their own ratings may not be valid. So, using a proxy such as a caregiver may be a better solution to assess the quality of life of these patients.

Caregiver's ratings only influenced by patient's quality of life

The problem is that the quality of life reported by the caregiver may be affected by things other than the quality of life of the patient.

Arons et al. set to study if the caregivers’ own characteristics can influence their ratings of the quality of life of patients. In this case, proxy bias is present if caregivers are shown to project their own opinion into their ratings:


The study found that the caregiver’s quality of life, financial situation, ability to do things for fun and age influenced their ratings of the patient’s quality of life. However, because the effect of these characteristics was relatively small, the researchers concluded that the bias is present but minor.

How to avoid proxy bias

Detecting and quantifying the influence of proxy bias can be challenging in most cases. Therefore, avoiding proxy bias is not a straightforward matter and there is no one-size-fits-all solution.

If we have an understanding of the causal structure of this bias, some statistical techniques can be useful to correct for or limit its effects.

In any case, we should pay close attention on how to interpret results of a study that is subject to proxy bias.

Further reading