Can Android Anti-malware rankings be trusted?

With hundreds of Android security apps available in the market, Android owners and tech journalists tend to rely on the rankings of independent testing labs to tell us which products are most effective at malware detection. You'll usually find the same products in the top tier of tests: The likes of Avast, F-Secure, Kaspersky, McAfee, and so forth. Is this a coincidence? Maybe not...

As a reviewer, I’ve often wondered how much I can trust these rankings. And here's what I've come to conclude: About as much as you'd trust an advertisement. As one vendor I talked to put it: "Participating in these tests is generally seen as a function of PR/marketing."

Currently I've only come across three independent testing labs that release comparative reports about mobile security apps, all for Android at the moment. With each new test it's clear the testing methodology is improving, but they're still painfully, and perhaps fundamentally, flawed.

First, let's break down the existing testing labs that issue public comparative tests:

1. AV-Test from Germany has published two comparative tests for Android security suites so far: November 2011 (7 products tested), March 2012 (41 products tested).

2. AV-Comparatives in Austria has published three: August 2010 (4 products tested), August 2011 (9 products tested), September 2012 (13 products tested).

3. West Coast Labs in California has published two: October 2011 (certification for NetQin only), October 2011 (commissioned by NetQin, 8 products tested).

Broadly speaking, each lab will come up with a collection of alleged Android malware samples, and either manually install them onto Android smartphones – AV-Comparatives has hundreds of volunteers doing this – or simply run them in an Android emulator. Finally, the lab gives a grade based on the percentage of samples an AV engine picks up. I say "alleged" malware because the Android anti-malware community has yet to agree on how they define malware – should you call a free Android game with too many ads malware? Or an app that spams? Or a penetration testing app? But defining mobile malware – that's another story!

Questioning the malware samples

As far as testing goes, one of the biggest problems that no one has managed to resolve yet is coming up with a credible set of malware samples. Ideally, you want real-life, zero-day malware that's freshly discovered – after all, you want an app that protects against real threats, not theoretical ones, right? In PC tests, most of the samples come from the wild, which mitigates potential bias.

But in Android tests, that's a lot harder than it sounds because of the short lifespan of most Android malware (days and weeks versus months and years on Windows).

The labs find their samples from several sources. AV-Test plucks them from app stores, private sources, and vendors. AV-Comparatives does the same, but told me they also pull malware from honeypots. A honeypot is a server that's been set up to attract malware that attempts to penetrate devices.

But while it's relatively easy to identify zero-days and live samples on Windows, that level of sophistication hasn't come to Android yet. How do you verify a malware sample is legit? What if someone's tampering with them? It sounds wild, but here's what I've heard happens, though it's out of my realm to confirm. An unethical hacker might take an actual sample, run it through an APK disassembling tool (which are freely available, by the way), and artificially tweak it a bit before submitting it to obvious places for the labs to "discover." As a result, the tampered sample could pass Vendor A's detection engine, but not Vendor B's. The conspiracy theory is that vendors do this to trip each other up. Of course, the losers are those of us who rely on the rankings.

Maik Morgenstern from AV-Test gave a knowing sigh when I asked him about the tampering speculation. "Sometimes we receive files that have been unpacked or analysed. It's the same with Windows, but in that case we have more experience identifying tampering," Morgenstern said. "We can’t really do anything about it, but we try to compare everything to existing samples within the same family."

Paying to participate

It gets darker. Guess who funds these tests? The vendors themselves.

Speaking off the record, one vendor explained how payment negotiations usually work. First, a testing lab releases a public test, and vendors dispute the results; the testing lab will offer vendors unhappy with their results a series of commissioned private retests, during which the testing lab will show the vendor every malware sample it failed to detect or falsely detected. The retests ultimately lead up to another public test. Practically, this means that when the next public test takes place (and the schedule of these tests is public as well) the vendor will be armed with at least some of the malware samples that might be used. Ideally the public test uses a fresh batch of malware samples, but in reality, that's hardly the case, I'm told.

The tests aren't cheap to participate in either – one vendor I talked to had forked out $26,000 (£16,000) to get a West Coast Labs CheckMark Certification, and the same amount to renew it each year. Another company had to pay about $7,000 (£4,400) to be included in AV-Comparative's public ranking, while AV-Test charges $1,300 for internal malware detection certifications, and plans to charge around $3,000 for inclusion in a bi-monthly public test beginning next year.

These issues only skim the surface of all the challenges facing Android anti-malware testing labs.

Meanwhile as the tests continue to improve – and they will inevitably – how should consumers and journalists interpret existing results?

"It's all about consistency," said Roel Schouwenberg, a senior virus analyst at Kaspersky Lab and board member of the Anti-Malware Testing Standards Organisation (AMTSO). "You'll always see differences in the results because every lab uses a different sample set, so look for which company is always top dog. If someone's always doing well, that says something. If someone's always doing poorly, that says something else."