Is e-racing a level playing field?

In the virtual world, winning and losing is all down to the data. Are the avatars up the road beating yours fair and square? David Bradford gauges the state of play

In many ways, e-racing has been our saviour during the pandemic. Having a competitive outlet to replace all the cancelled outdoor events has kept us sane, fit and distracted from lockdown monotony. 

But anyone who’s developed an e-racing habit will attest that it comes with a certain level of doubt and distrust: are the avatars ahead of mine really putting out more power and deserving their place ahead of me in the results? 

>>>> Subscribe to Cycling Weekly magazine and save up to 59%

We decided to conduct a survey to find out how CW readers feel about the state of fair play in the virtual realm. Of the more than 800 who responded, 96 per cent said they doubted e-racing was a level playing field.

This was deemed to be no big deal by the 45 per cent who race indoors purely as a fun way to test themselves, but for those more serious about e-racing who want to rate themselves against others, fairness matters. It was time to find out why e-racing is so plagued with dubious performances, and whether anything can be done to make the results as realistic as the virtual parcours.

Stefan Abram on Tacx Neo Bike

(Image credit: Daniel Gould)

Is your trainer telling the truth? 

The chief limitation on e-racing fairness is the accuracy of equipment being used –  currently a very mixed bag. In community racing, Zwift – understandably wanting to keep the platform as accessible as possible – allows riders to compete using anything from the most advanced smart-trainer to a basic wheel-on ‘dumb’ trainer combined with a speed sensor.

In the latter case, Zwift estimates power based on wheel speed – a method with a relatively low degree of accuracy. Using a power meter provides greater accuracy, but even then, there are discrepancies between different types and brands, as we will see. The greatest assurance of accuracy comes from using a direct-drive smart-trainer or smart-bike.

In our survey, 61 per cent said they use this type of equipment for their e-racing. Less than five per cent said they relied on a speed sensor and estimated power. No surprise, then, that three-quarters of those surveyed told us they believe their e-racing power is accurate – they’ve invested in serious equipment and are doing their best to play fair. But what about the far wider pool of riders we’re going up against

This isn’t primarily a question of deliberate cheating. Most instances of inaccurate power measurement in e-racing are inadvertent, resulting from faulty or uncalibrated equipment. Even among the most sophisticated smart-trainers and power meters, accuracy (or ‘trueness’) varies from one device to the next. 

Beat Mueller from the Swiss Federal Office for Sport disclosed to me that he and his team had recently tested 350 power meters from 14 different companies as they sought to establish a level playing field for the Swiss E-racing National Championships. He agreed to share the results with me, albeit in anonymised form. They reveal a scattergun distribution of power readings, with wide divergence from the true figure across almost all the brands – typically by two or three per cent, but by as much as 10 per cent in the worst cases. Such was the inconsistency that the Swiss opted for outdoor testing via an ‘Engine Check’ analysis app for their e-racing qualification. 

Keen to better understand the apparently haphazard accuracy of trainers and power meters, I’m turning to track ace and self-confessed power geek Dan Bigham. “First, it’s worth explaining exactly what power is, so as to understand what all these different units are trying to achieve,” says Bigham. “This can be broken down into two components, torque and speed. Speed is easy to measure and so is not the problem. The big problem comes from torque measurement, which every power meter on the market is trying to do – but in multiple different ways.” The different forms, Bigham explains, are strain gauges, piezoelectric, transducers and, in smart-trainers, electric motors.

“Each method has various issues, with various ways of controlling or correcting them. Some manufacturers do really well and calibrate well at the factory, others don’t.” And even a power meter that is accurate when fresh from the factory may not stay that way. “How well the end user treats it, how well they fit it, whether they calibrate or zero-offset it, whether they use it in changing temperatures – all these factors create loads of issues.”

Bigham explains that just a few degrees of temperature change is enough to throw out your power measurement, and he clarifies that calibration is not the same as zero offset, even though the terms are often (wrongly) used interchangeably. If using a power meter, it should be zero offset before every ride; this is basically a reset, similar to zeroing a scale. Calibration (or ‘slope’) involves hanging a known weight off the crank – but not all power meters allow this procedure.

Speaking of cranks, some power meters measure from just one side, presenting further accuracy issues. “Let’s say you’re using a single-sided power meter that over-reads by five per cent,” says Bigham, “and, added to that, you push harder with that leg because you have a 55/45 imbalance. Suddenly, you’re doubling down and now you’re dealing with a huge error source.”

What about the different ways in which power meters produce resistance, is that another area of significant divergence and potential unfairness? “Yes, and it’s something nobody really reports on,” says Bigham. “What you’re talking about here is torque distribution or inertia.” He explains that, in the real world, if you’re riding uphill at a slow pace, there is little kinetic energy “stored in the system” meaning that if you stop pedalling, you stop moving almost immediately, whereas if you’re pelting along on the flat you’re carrying lots of forward momentum (inertia) and you can coast along for a very long way. According to Bigham, it matters a great deal how a trainer replicates these pedalling dynamics because it massively alters the demands on the rider’s muscles. “No power measurement system makers talk about inertia and torque distribution, but there is a huge difference across the market.”

In just a few minutes talking to Bigham, I’m overwhelmed by the almost countless variables, each power meter and smart-trainer measuring in a different way, each with its own fallibility. And that’s before you consider the susceptibility to manipulation (see rider comments opposite). In an ideal world, all these variables and hacks would be continually checked and controlled, with continuous calibration. But like most ideals, this one ain’t gonna happen any time soon!

Most power meters and smart-trainers seem to be marketed with a claimed accuracy figure, typically +/- one per cent – doesn’t this offer some solid reassurance? Not according to Bigham, who believes this figure is vague and largely unhelpful. “It’s something that annoys me,” he says. “They should be reporting precision and accuracy in all the different torque ranges, cadence ranges, temperature ranges, etc. But none of them are doing that.”

Stefan Abram on Zwift

(Image credit: Daniel Gould)

Fliers and sand-baggers

It doesn’t matter how accurate your own set-up is if others in the race are failing, deliberately or accidentally, to play by the rules. Even if these riders don’t cheat you out of your rightful position, they may interfere with the racing and alter the dynamics. As a case in point, in the second race of the CW Winter Lockdown RR Series, a C-category rider on faulty equipment rode off the front, averaging over 500 watts while seriously interfering with how the race played out behind him. Such riders are branded ‘fliers’.

Another problem is riders entering the wrong category, either by mistake or deliberately because they want an easier race – these are known as sandbaggers. Thankfully ZwiftPower does a good job of filtering out riders who have raced under the wrong category or exceeded power limits for their category. Any serious Zwifter knows to pay attention only to post-filtered results on ZwiftPower. Even so, the system of categorisation is something of a blunt instrument and can itself seem unfair. For example, a low-end A rider who enters a hard, high-calibre A race may stand no chance of even holding a wheel, yet is effectively barred from dropping down a level. 

How can Zwift address these issues to help everyone compete at the level that’s right for them? “Machine learning is something we have been looking at,” says Zwift marketing manager Chris Snook. “This would help us, from community racing all the way up to the pro level, allowing us to build up a digital passport on riders.” The company is also working to overhaul ZwiftPower, which started life as a third-party community site, to improve its integration with the platform, but it’s a big job that will take some time to complete.

Weight and height doping

In virtual racing just as in real life, rider weight is a key determinant of performance – even more so on Zwift, in fact, because height and weight determine your simulated aerodynamic drag as well as the virtual gravitational effect. Unlike in real life, your e-racing weight and height are whatever figure you punch into Zwift. Fair play relies on rider honesty.

In our survey, 84 per cent told us their e-racing weight is accurate to within 1kg, while 15 per cent confessed to a degree of uncertainty or mild massaging of the figures. Of course, the accuracy of your bathroom scales plays a role here too. Personally I try to keep my weight accurate on Zwift, but I don’t step on the scales more than once every couple of months – not least because I’m aware that obsessive weight-checking is an unhealthy habit. Compelling riders to focus on their weight more than they ordinarily would is not without risks. For the same reason, I’m uncomfortable with the very skinny avatar physique I’ve noticed on Zwift. “Avatars are currently limited to three different shapes – small, medium and large,” says Zwift’s Chris Snook. “Those are based on BMI... Ultimately we want to increase the level of avatar customisation, to allow riders to tweak their avatar to look more like [their actual body shape].”

The extent to which ‘weight doping’ matters depends on your point of view. In community racing, which is meant to be ‘just for fun’, those who lie about their weight are only really cheating themselves. At higher levels of e-racing, weight verification is required.

Performance verification at the top level

While the “it doesn’t matter, it’s just for fun” attitude may cut a certain amount of mustard in community racing, at the top end of the sport, e-racing is becoming serious, with highly competitive leagues, invitation-only events and swelling prize pots. At the top level, data accuracy certainly does matter – and the growth of e-racing as a serious discipline depends on adequate performance verification and transparency.

All riders in the Zwift Racing League Premier Division have to be approved by Zwift Accuracy and Data Analysis (ZADA) – effectively, Zwift’s in-house anti-doping body. Approval means submitting pre-race data: riders must film themselves completing the ‘Three Sisters’ course on Zwift, including four ‘best effort’ tests. This gives ZADA benchmarks by which to assess the legitimacy of future performances. In addition, riders must submit a weight verification video, first showing that their scales are working accurately, within 24 hours before each race. Height must be verified in a similar way.

In all races subject to ZADA rules, riders must also dual-record their performance, i.e. record with two power sources, the primary smart-trainer as well as a power meter. If there is deemed too big a difference between the two power traces submitted, a rider may be disqualified. In Premier League races, ZADA claims to conduct a detailed check of the performance data of the top three finishers and two randomly selected riders. Even with all these controls in place, there is no guarantee that every competitor’s power is spot-on accurate.

“There will never be a level playing field,” says Rhys Howell, manager of e-racing team Canyon ZCC, “until there’s standardisation as to how watts are measured and/or everyone’s using the same equipment. All the other measures aimed at achieving fairness are basically Polyfilla.” Short of having all the riders in one place on the same, verified equipment – arena-based contests may be the solution for top-level e-racing post-coronavirus – what would standardisation look like? “Zwift would need to reduce the amount of equipment that is allowed,” says Howell, “but that would require a lot of testing.”

On the subject of testing, there is nothing to prevent competitors and teams from doing their own experimentation to ascertain which combination or primary and secondary power source produces the highest power figures while being consistent enough to satisfy dual-recording rules. Howell is phlegmatic about such practices. “I see it as legitimate hacking,” he says. “You’re taking the rules right up to the limit, it’s very Team Ineos.”

At the top level of e-racing, competitors have to upload the data from their secondary power source after each race so that Zwift can, if necessary, check that it corroborates the data from the primary power source. There are currently four riders serving suspensions for Zwift racing infringements. All of them have been deemed guilty of the same category of offence, namely “fabrication or modification of data”.

The implication is that these riders deliberately edited files in order to ensure that the power from the secondary source was sufficiently close to the primary recording. All of the riders have protested their innocence. Without going into the minute details, these cases raise several head-scratcher questions; not least, why would a rider try to cheat in such a clumsy and blatant way? It’s at least theoretically possible that a rider could alter a file by mistake, or do so carelessly, not intending to cheat, and end up sanctioned. While it’s good to see ZADA taking a zero-tolerance approach, the technical complexity of dual-recording carries a risk of wrongful bans.

E-quipment inequality 

Just like in the real world, there is a huge range of virtual bikes, wheels and kit available on Zwift, each with different performance characteristics – available to those who can afford it in virtual credit. Zwift is meritocratic in the sense that experience (XP) points and drops (Zwift’s virtual currency), with which to access better tech, are earned through distance ridden and calories burned – Zwift rewards time on the platform. According to zwiftinsider.com, the fastest road bike/wheelset combo on Zwift is about a minute faster over a flat 40km flat course compared to the entry-level (free) kit – a meaningful but not unrealistic advantage. Whether or not you approve of ‘material’ advantages extending into the virtual world probably depends on your point of view.

More straightforwardly unfair is riders accessing the best kit through foul means rather than earning the right to it – such as when, in 2019, Cameron Jeffers tricked the platform into granting him the Tron bike, which he then used to win the inaugural British eRacing Championships, before getting caught and banned. This type of ‘kit doping’ skulduggery is going to be difficult for Zwift to completely eradicate but it’s probably not a major concern, in the scheme of things, for most of us.

So what's the state of play? 

As we have seen, e-racing has a long way to go before the virtual playing field is anything like level. While Zwift is making progress at the top level of racing, the verification processes used require riders and teams to invest significant amounts of time and money, and it would be unrealistic to expect these controls to drip down to grassroots ‘community’ e-racing. So, we’re going to have to continue to take certain performances with a pinch of salt, accept a margin of error and trust our rivals for the foreseeable future – until someone invents an unfailingly accurate, tamper-proof smart-trainer that doubles as an incorruptible set of scales... and that’s a big ask.

‘Power manipulation is rife’: Elite Zwift racer speaks out

A top-level US e-racer contacted CW via our survey to express his concern that some riders are exploiting tech weaknesses to make dubious gains. He offered the following list of potential illicit gains:

Sticky watts: “This is a communication glitch between power source and Zwift, where power gets ‘stuck’ at an artificially high level for a few seconds at a time. My concern is that some riders are exploiting this glitch to their advantage. A direct wired connection from trainer to computer would probably solve this problem.”

Trainer miscalibration: “Many flywheel trainers can be manipulated to give an extra 20-50 watts, which can’t be detected without taking a deep dive into the FIT files. I believe this is a very common hack. In my view, Zwift needs to insist on continuous calibration (as available in the Wahoo Kickr V5).”

Crank length adjustment: “If a rider sets a (dishonest) longer crank length for their secondary power source, they can increase the reported wattage. It’s a cheat that shows up in the FIT file, so could be detected.”

Adjusting the slope: “Misadjusting the slope of the secondary source (power meter) can give a steeper power curve, conferring greater advantage as the watts increase. Again, continuous calibration would rule this out.”

This feature was originally published in the 25 March 2021 print edition of Cycling Weekly magazine. Subscribe online and get the magazine delivered to your door every week.