Earlier this morning, as part of a story on Cylance’s claims that AV-Comparatives was using deceptive testing methodologies and pirated software, Salted Hash revealed details on a new test Cylance commissioned with AV-TEST.
We reached out to all of the vendors Cylance selected for the test – including Kaspersky, Trend Micro, Sophos, Symantec, and McAfee – and asked each of them for reactions and comments.
Symantec declined to answer questions, instead they offered a single comment stating they were not aware of this test and encouraged us to talk to AV-TEST. McAfee (Intel Security) never responded. Along with our questions, we shared the executive summary of the Cylance / AV-TEST report, which details the new testing criteria.
Each vendor, as well as AV-TEST, provided long, detailed answers to our questions. Salted Hash felt that summarizing or editing them for length wouldn’t help anyone following this story, so we’ve published their answers in full. As a result, the article is much longer than originally anticipated, but we’re okay with that. AV-TEST’s remarks can be found on page two.
To recap, AV-TEST and Cylance co-developed the four test cases used in the report. Cylance says it took six months to develop the testing process, and AV-TEST will soon add these methods to their ongoing anti-Virus testing.
The outline below was published earlier today by Salted Hash:
“The first test case (the holiday test) will focus on products seven days out of date, with no access to the internet – denying them the ability to check cloud-based sources or apply updates. The second test involves executable files (malware) created by AV-TEST to simulate certain types of attack. During the third test, AV-TEST disables URL filtering and determines how well a product can detect drive-by malware and other web-based threats. Finally, the fourth test looks at false positives. AV-TEST conducted this scenario by downloading 38 common applications and recording any blocks or warning messages.”
Were you aware of this test prior to it being ran? Specifically, were the methods, test variables, and custom malware shared with you at any point before or after?
Trend Micro was not informed of this test and we were not sent information on the test methodologies used prior to or during the test.
No, we were not aware that this test had been commissioned and no information was provided by AV-Test. We were given no opportunity to comment on the test setup or to review the methodology or results. We understand that this test did not involve our next-generation endpoint protection offering – Sophos Central Endpoint Protection Advanced with Intercept X.
We didn’t have a chance to review test’s methodology and provide feedback so we can’t comment on this. We encourage all security vendors to participate in independent tests that offer transparent methodology that is mutually accepted by all parties.
After reading the summary and the test settings, do you feel this was a valid test against your product? If not, can you explain why?
It appears they tried to run a number of different scenarios, which is good, but as we don’t have access to the detailed test methodologies used in each of them, it is difficult for us to comment on whether they were robust or not.
The only visibility of the methodology we have is what you provided, and it doesn’t appear that the use cases necessarily reflect the real word range of threats and situations faced by customers.
This also mentioned that some of the protection capabilities in our product were turned off, which isn’t an accurate reflection of how our products are used – the key strength of our products is that we deliver a cross-generational blend of threat protection techniques including machine learning and behavioral analysis, alongside more classic techniques such as AV and web reputation, which protects customers from a broad range of threats.
We do not feel these were valid tests.
The “advanced attack” scenario looks artificial because real world advanced attacks typically don’t use executable malware. We would like to see tests of representative real world advanced attacks that exploit legitimate software and use email, documents and websites as part of the attack.
The “malware distributed by websites” test also appears artificial as it does not test against real infected websites. We note that in this test the URL scanning features of competitors has been turned off and yet in the real-world this technology stops more threats than any other. URL scanning is not included as a feature in Cylance.
The selection of the “holiday” scenario seems designed to highlight a strength of Cylance because their technology depends less on a live internet connection than others. In reality, the scenario where a customer signs on and becomes infected before an update can occur seems very rare. Furthermore, we suspect that the test did not involve real infected websites which would have been blocked by URL filtering technology that is used by vendors such as Sophos but not by Cylance.
In case of a commissioned test (a test fully controlled by a certain vendor) it is up to this vendor to provide feedback. Usually this process includes providing a participating vendor with samples, product logs, proofs of failure so a participating vendor can analyze and change the results should the error be discovered. Such errors are quite frequent.
We were not aware of being included in this test, and the methodology was not shared with us. Also, we were not given the opportunity to validate the results and provide our comments during and after the test. Our position is that a test of this kind may offer only academic value. It does not reveal the true behavior of a product on a real customer’s machine.
What are your thoughts about the tests developed by Cylance and AV-TEST overall?
AV-Test is a respected testing organization, and Trend actively participates in their public testing. This test however, as a private test, was commissioned by Cylance, and as such they would be able to choose use cases and variables that show their product in the best possible light, but don’t necessarily reflect the reality for most customers.
That is why we participate in the AV-Test public tests, because they are fully independent and designed by AV-Test, without influence from vendors – we feel they give customers unbiased testing results. Cylance has only participated in the public test once, and did not have a strong showing.
Cylance has paid to highlight its technology, which is all about scanning of executable files
It is not uncommon for vendors (including Sophos) to commission third party tests. However, what is unusual is for AV-Test to conduct such “marketing influenced” assessments. We, like other vendors, rely on the independence of testers such as AV-Test to provide a neutral territory, and we are surprised to see AV-Test depart from this neutral position and run tests that are biased to highlighting one particular vendor’s technology.
We are also dissatisfied that AV-Test has tested on a traditional basis of scanning malicious executables offline rather than simulating real world scenarios that would far better reflect the performance that prospective customers would actually see in today’s highly connected IT environment.
Based on what we have learned from you, the methodology limits the functionality of our product by excluding some of core protection technologies from the technology stack. Particularly, database updates and cloud access were turned off in Test Case 1 and the database of malicious URLs was not used in Test Case 3. This scenario is far from the normal operation of our product in a real-life situation.
We are glad to see that even in these tough unrealistic conditions our product showed an impressive performance by itself. But the environment chosen does not, in our opinion, allow accurate comparison between different solutions.
We encourage all security vendors to work together with independent testing institutions to discuss and participate in industry-wide product evaluations. We support the introduction of new testing routines, but our position is that these tests should reflect real-life scenarios as close as possible. A no-disclosure, no-feedback, testing approach is far from ideal.
Do you have any additional thoughts or comments to share?
Trend Micro has been committed to continuous innovation over its 28+ years in the security business. We promote a cross-generational approach that incorporates the latest technologies like high-fidelity machine learning now supported in our corporate endpoint product (OfficeScan XG) as well as traditional detection technologies (like web reputation) that allow our customers to block threats during any point in the infection chain.
By isolating or disabling any one of these areas during a test, inhibits a security product from fully protecting against a threat. Defense in depth is a proven way of ensuring a threat can be detected and blocked as it attempts to infect an organization.
Customers should not base buying decisions on a single, vendor commissioned test. They should look at a variety of independent tests.
To protect well against today’s advanced threats, good endpoint protection products need to use a combination of advanced technologies, not just one. As well as file based malware scanning, Sophos uses technologies such as exploit prevention, anti-ransomware, run time behavior protection, HIPS, web and malicious URL scanning, email and malicious document scanning, application control, and device control.
We will never stop innovating and looking for ways we can improve the protection we offer our customers, for example through our announced acquisition of Invincea with its advanced machine learning technologies.
We would like to encourage the testing community to continue to improve their tests to better reflect the real-world environment of web and email borne threats faced by our customers and we are pleased to see some testers making progress on that difficult task.
We are huge advocates of independent and thorough testing to expose gaps in our own protection as well as our competitors, because such tests help us all improve the protection we offer to customers. Unfortunately, this does not seem to be an example of such testing. It seems out of character for AV-Test.
More details on Test Case 1: This test case may as well be categorized as a behavior/proactive test. It has a significant methodology problem and it does not simulate the described use-case accurately.
Even if the system was turned off for a period of time, when it starts, the product connects to the cloud service to check all new files in the system, and starts the process of a database upgrade.
This means that even if a new sample is introduced into the system, not only can its reputation be verified via the cloud, but also new detect could be done based on machine learning algorithms, assisted with cloud-based data. Turning off the cloud access is not a realistic scenario.
AV-TEST responded to questions, their full remarks are below:
Question:Cylance has said that the Feb 2 test you did with them might be the standard going forward at AV-TEST, can you confirm this? Also, some of the vendors included in the Cylance test said the parameters used were a bit unfair, and they were surprised to see AV-TEST conduct “marketing influenced testing”. Do you have any comments or statements in response?
e that the test was commissioned and by whom. It is also important to clearly outline the methodology and what the purpose is. This has been done by us. We are even explaining the caveats of some of the test cases and point out that they may not represent a usual/common case.
Question: The test states that you feel you were aligned with AMTSO testing standards (based on the public documents), have you discussed this test with anyone after Cylance helped create it, or were they the only AMTSO member (outside of yourself) to have any input?
Usually commissioned tests are not discussed with a third party before publication. Also AMTSO guidelines are really just guidelines and are not mandatory.
We agree that the “Fundamental Principles of Testing” are a good baseline to follow, which we did. For certain specific tests there are documents that give advise what to consider when testing those certain cases. However, they are not designed as step by step descriptions how a test shall be run. And they are not meant to prevent testing labs from running tests differently or come up with new tests.
In fact the test cases we used are not really new or unique:
- Test Case 1 (Holiday Test) is similar to a test of another testing lab, that is regularly performed (the RAP test of Virus Bulletin). The Chief of Operations of this lab is chairman of the AMTSO board and we haven’t heard complaints about this type of test from AMTSO before.
In the past, there was a controversy which implied that next-gen products are just using multi-scanning services like VirusTotal to identify files as malicious. This test case, being an offline test, shows that it is not the case for Cylance.
- Test Case 2 (Simulated Attacks) is covered by another AMTSO paper as described above.
- Test Case 3 (URL Test) is something that is not directly covered by an AMTSO papers.
- Test Case 4 (False Positive Test) is a standard false positive test that we are using in our regular testing as well.
We agree that Test Case 1 and Test Case 3 are not representative for the real-world performance of products. They highlight certain technical aspects, as we clearly outlined in the document: The test shows how products perform when disconnected from the Internet, or when multiple technologies are disabled.
The tests are primarily showing that Cylance is able to deliver a similar level of protection as the other products, even without up-to-date signatures or cloud connection (or by querying VirusTotal). We are also pointing out in the report that the other products are able to provide this level of protection, as shown in our regular tests, when they have updated signatures and can query the cloud.
Whether this is relevant to the user, should be decided by the user itself. We are not making this decision, but instead provide all the information to enable the user to make the decision.