Who what & why?
Few people who use or follow wearable tracking tech will have failed to hear that Fitbit is facing multiple lawsuits in the US alleging a lack of accuracy in the heart rate (HR) monitoring function of devices such as the Fitbit Surge ™ and Charge HR™. Whilst most people would agree that activity trackers can help quantify and motivate more physical activity in the general population, you would think that by paying extra for the inclusion of a heart rate sensor you ought to be getting something that is reasonably accurate and providing added value.
However according to researchers Edward Jo, PhD and Brett A. Dolezal, PhD of California State Polytechnic University, Pomona, this is apparently not the case. They were commissioned as expert witnesses to test Fitbit devices against a gold standard ECG criterion measure and report on their findings.
What did they do?
They conducted studies with 43 healthy young adults who had to wear a Fitbit Surge™ on one wrist and a Fitbit Charge HR ™ on the other wrist, as well as a Zephyr Bioharness™, which provided ECG accurate R-R (individual heartbeat) intervals.
The subjects had to perform a range of resting, low and higher intensity activities including running, stair climbing and plyometrics in both outdoor and lab conditions whilst being simultaneously monitored by all 3 devices.
In total they took over 250,000 readings, which they analysed using four different techniques:
This is a very commonly used statistic that indicates whether two variables track in the same direction i.e. if they go up or down together. However, it doesn’t tell you how close together they are.
- Paired sample T-Test.
This tells you how well the averages of two sets of data compare. Again it needs the caveat that this doesn’t tell you how far apart the actual scores are, only how well their averages compare.
- Bland-Altman plot.
This technique tells you about the degree of bias between two sets of data, both from an average and an individual data point of view. In this respect it is more revealing than either correlation or the sample T-Test.
- Absolute differences.
For example, if one device recorded 125 bpm whilst the other recorded 130 bpm, the absolute difference would be 5 bpm.
A standard error of estimate of 5 bpm, correlation > 0.9 and mean bias of 3 bpm is considered an acceptable agreement between two heart rate monitoring devices.
In spite of a reasonable 0.85 overall correlation between the ECG reference and the Charge HR, the chart below shows how much the readings could differ by at different exercise intensities (an ideal match between the two would be straight line with no cloud of data points above and below):
At a moderate exercise intensity of 140 bpm as measured by the ECG, the Fitbit indicated anywhere between 80 and 180 bpm. Correlation at this intensity and higher was only 0.48, with a mean absolute difference of 15.5 bpm and a mean bias of -12.5 bpm, i.e. the Charge HR™ was consistently under reporting the actual heart rate.
At resting, and low intensity levels, the mean bias was reduced to almost zero, but the mean absolute difference was still 8.9 bpm. If we look at the chart again, a healthy resting HR of 60 bpm could be reported as anywhere between 55 and 80 bpm.
Results for the Fitbit Surge™ were worse, with a mean absolute difference of 22.8 bpm and bias of -20.8 bpm at exercise intensities and 8.2 bpm and -1.9 bpm respectively at rest & low intensity (below 132 bpm).
The mean bias and wide limits of agreement (LoA) for the combined results are illustrated in the Bland-Altman plot below:
Since both devices use Fitbit’s PurePulse technology, it’s surprising to see how much the devices differed when compared against each other with time synchronized data. So for example, the values at an HR of 150 bpm could differ from 80 to 180 bpm from the other model worn on the opposite wrist of the same user at the same time.
Overall, both devices failed to meet the validation criteria:
- Standard error of estimate was 17.2 bpm vs requirement of 5 bpm
- Correlation of 0.88 vs requirement of 0.9
- Mean bias of -8.9 bpm vs requirement of ±3 bpm
What does it mean?
The authors observed that although the bias towards under reporting exercise heart rates was not systematic, the data was extensively dispersed and the wide limits of agreement created this effect. They also noted that the Fitbit devices failed to produce a reading on a number of occasions.
They concluded that the PurePulse™ technology used in these devices does not accurately record or report heart rate, and is more unreliable at higher heart rates.
It’s certainly disappointing that with so many people now using this kind of activity tracker, and Fitbit being the leading brand, that pulse rate measurement is not a good deal more accurate. If you are exercising to heart rate zones, it is important to know your actual current heart rate to an accuracy of better than ±5 bpm, and all the more so if you are on a program set by a physician or cardiologist. Under reporting is especially concerning as you may be exercising at an intensity substantially higher than that prescribed, which is potentially dangerous of course.
These devices also seem to be a very long way from the better than 1% precision needed for heart rate variability (HRV) measurement. By way of comparison, the validation study performed by the University of Sydney on the ithlete Finger Sensor found an almost perfect correlation of better than 0.99 with the ECG reference. They also found a mean error of 0.05% at the resting heart rates for which this sensor was intended.
We hope that this report will stimulate Fitbit and others to renew efforts to bring accurate devices to consumers which can be used during exercise, at rest and eventually for HRV
- Validation of the Fitbit® SurgeTM and Charge HR™ Fitness Trackers. Authors: Edward Jo, PhD and Brett A. Dolezal, PhD
- Heathers, J.A.J., Smartphone-enabled pulse rate variability: An alternative methodology for the collection of heart rate variability in psychophysio…, International Journal of Psychophysiology (2013)
Very interesting. Thank you for bringing this to our attention. I have two devices and will measure them against each other: a Tomtom Multisport & my iThlete of course.
Ideally you need to compare both devices against a reference which you know to be good. In research settings this is usually an ECG, but most people don’t have access to that, so a validated alternative like a Polar HRM is good. You also need to be careful that you are comparing at the same time, and measuring over the same period of time. Most HRMs report an average over 5-7s whereas the ithlete HR is of course for the duration of the measurement ie 55s.
Hope this helps!
Consumer reports tests heart rate monitors.
Have you seen tests on Fitbit HR accuracy, if so can you post a link for others who are interested?
I always train and with a chest strap HRM to get an accurate measure. Unless the Fitbit is positioned properly up the wrist during training, it will under report.
I only use the Fitbit heart rate measure to assess resting heart rate, which I find much more accurate.
Thanks Guy. I guess that would be a lot of peoples’ experience – optical pulse sensors really need to be still to get a good read on the small fluctuations in light that occur due to changes in blood volume (pulse) rather than relative motion between wrist & device.
I was hoping that the resting measurements of the Fitbits tested would be quite accurate compared to those during exercise, but it was still quite a ways off whereas a difference of 2 bpm is significant for changes in fitness.
I have a fitbit charge HR, and the heart rate readings on it are way off
for instance I like to do mountain biking and cycling up steep ascents etc, i track with my cyclemeter app on my iphone and wear a wahoo tickr HRM , this will report say average HR 145 and max 163 , fitbit will say average hr 72 and max 120. I’ve had a few optical wrist based HRMs and they are all rubbish.
Do you have any research on Garmin ‘elevate’? It’s in their newer 235 and 735 models.
There’s a review of the latest Garmin ‘elevate’ optical HR here on DC Rainmaker’s blog. DC is a thorough, experienced fellow, and his results show that during steady cycle & run workouts, the results are really quite a good match for those obtained with a chest strap. Results are less good during intervals, where there are lags in the optical data, and Garmin are honest enough to say that the optical HR is not suitable for swimming (or HRV). Overall it does look to be one of the best optical HR performances, and certainly better than both the Fitbit HR and Apple Watch.
Fitbit is now also being sued for the sleep tracking feature allegedly being inaccurate as well
and in a further development in April 2017, 3 prominent app developers have been told to stop making misleading claims about the accuracy of smartphone camera based pulse sensors http://www.businessofapps.com/new-york-attorney-general-investigates-misleading-health-app-claims-hands-out-30k-in-penalties
I compared my fitbit charge HR to my carotid pulse and find it to be off by as much as 20bpm when lifting weights (not at all challenging). When I’m running it seems to be reasonably accurate. It seems to fail when biking, often giving no reading whatsoever.
It would be interesting to see the calorie accuracy.
Thanks for the comment. Although optical pulse sensors are making some progress, the fact remains that the back of the wrist is just not a good place to sense bloodflow. It’s also very sensitive to motion which disturbs the accuracy of the HR value.
For calorie burn, this is an individual thing and would really need to be individually calibrated. There is a good discussion on a Fitbit community forum here:
A couple of years ago I had a EKG test and had my fitbit surge on. My heart rate on my surge match the rate on the hospitals EKG machine.
Yup I often would be hiking and knew my heart rate was over 140 and Fitbit versa told me it was 110 or something. Similarly, doing yoga it would say 132 when it was under 100… useless. Bought a Jew luxe hoping it would be better but that probably won’t do the trick if it’s a Fitbit wide problem