Can you trust the NSS Labs report touting the benefits of IE8? (Update)

Last week, I wrote an article where I questioned the use of an NSS Labs report to push security for Microsoft’s Internet Explorer 8. I called it “misleading to say the least” and said that the report “appears biased and questionable at best.” This sparked comments on and offline, as well as a conversation with Rick Moy, the President of NSS Labs. This update will address the conversation with Rick as well as serve as a wrap-up to the previous article.

On March 19, Microsoft released Internet Explorer 8. One of the benefits touted by the software giant was security. To back the claim of security, Microsoft included a link to and information from a NSS Labs report detailing the results of a security audit in the browser. However, in the report and the press release from Microsoft, the fact Microsoft paid to have the test performed is omitted.

The NSS Labs report shows that IE8 is effective on its own 69 percent of the time when dealing with socially-engineered Malware. Firefox v3.07 earned a 30 percent effectiveness rate, while Safari (v.3) earned 24 percent, Google’s Chrome (v. 1.0.154) earned 16 percent, Opera (v. 9.64) earned 5 percent, and IE7 earned 4 percent.

In my original article I questioned several aspects of the report. The testing sample size, the lack of ability for anyone to repeat the testing (peer-review), the fact Safari was outdated, and the lack of accounting for layered defense offered in the browsers. I singled out defenses available for integration in Firefox (NoScript), anti-Virus integration in Safari, sandboxing on Chrome, and block lists within Opera.

“This was not the end-all-be-all test of browser security as a whole. The focus of this test was socially-engineered malware only. That means a user must acknowledge he/she wants the download. Exploits (drive by downloads) and phishing are two very different and significant threats that we excluded for scope reasons. We say this in the report test methodology, but perhaps it deserves a more prominent position earlier on. (Page 9 of the NSS report, Section 5.2.1) These are important things to analyze. Expect other interesting reading from us on these topics in the future,” Moy wrote in his email to The Tech Herald.

“As far as the ‘layered defense’ comment, you are right if you're asking a different question. But, the first rule of any scientific testing is to limit variables. And one way of doing that is by limiting the question you are looking to have answered by the test. In this case, the question was, "What inherent protection is offered by each browser against socially engineered malware (via their reputation systems). If we begin to include third party security add-ons, we need to do so for all browsers. And that is a different question. ‘How much protection against socially engineered malware do the following add-ons provide?’ or ‘Which add-on provides the most protection against socially engineered malware?’ But those are different questions - for a different time.”

Moy wanted to be clear that NSS was not advocating end users shouldn’t be using anti-Virus software, but that the “point of this test was to see how good each browser's "reputation-based" protections were against Socially Engineered Malware as an added layer of protection.”

“Also, if Mozilla or Google or Apple publicly states, “We don't care about Socially Engineered Malware. We rely on AV companies to provide that protection.” then users know what to expect. But if they claim to protect against Socially Engineered Malware, they should. And testing their capabilities is valid. We don't expect 100%, but users should have facts to base their decisions upon,” he said

The point of the test is not the issue. Not entirely. The issue I had originally was the use of the test to enforce fear into end users. That is how it was pitched by Microsoft. When you dig into the testing, only then did I take issue with the other mentioned aspects.

If there is one aspect to the “point of the test” that I must focus on, it is that browsers and end users will face far more threats online than a selective list of malicious links could ever show. It is because of this, that the test overall is still slanted. No test can measure every threat that could be presented to a user via a browser, and no browser can truly defend against it.

With regard to my mention of NoScript and sandboxing Moy said, “You also mention [NoScript] for Firefox, sandboxing Chrome, etc. However, using [NoScript] would have absolutely no impact on the test since it was socially engineered malware, and not Clickjacking or drive-by downloads (browser exploits with malware payloads). Same goes for sandboxing in Chrome, which is about protecting crashes in one tab from affecting the other. But most importantly, the fact is that the broader public adoption of plugins like NoScript is quite low. And they often break websites.”

NoScript did break many sites in earlier versions. These days, thanks to developed user controls, code fixes, and other additions, NoScript no more breaks a website than poor coding would. The idea that sandboxing in Chrome protects against crashes is important to note as well, most browser based exploits trigger a crash that allows code to be executed. If the crash is prevented or held to one part of the browser, these types of attacks are hampered if not stopped completely.

Again, this type of Malware might have been skipped by NSS during the test, but you cannot ignore its existence or the protections Chrome offers against it. The standout theme in this test is Malware where the user has to confirm that they want to install it. A perfect example of this is Malware that is presented if you attempt to watch a video. These are the various “fake codec” delivery methods.

Some browsers will catch this type of attack, depending on how it is offered, others will not. However, if a user is using layered protections, including add-ons like NoScript, Google’s sandboxing, or the security offered by Opera and Safari, along with AV protection, the attack will fail.

The browser is only a means to infect. It should never be seen as a single method of defense. Testing browsers as such, and then using the marketing of the vendors against them to validate the test, only runs things in a circle. While the browser’s vendors each reference and even pitch security in various forms, at no time do any of them discount the need for layered protection.

In our email discussion, Mr. Moy explained the base set of 492 sites used in testing. I asked him, “The 492 samples, where there were 154,702 results, that is using the same subset of samples over an over correct?”

He said, “Sort of. Many were removed within a couple days because the site was taken down. So some URLs were counted once, some twice, some much more often. Here's a rough example: At t=0 we tested 100 URLs. 2 hrs later, 20 of the URLs were dead, and 25 new ones were added. So we tested the 80 URLs from the original run again, and the 25 new ones. We repeated this process for 12 days, 24x7.”

This clears up how they arrived with the sample set of 492 domains, as the report itself is leading the reader to assume more. It states that a collection of 60,000 domains is where they started; this was narrowed down for various reasons to 1,779 URLs, which it ultimately narrowed down to a testing set of just 492 sites. Later it shifts and says that the recorded results were based on 154,702 results. This can baffle most laymen and security people will see this as an apparent method to boost image rather than validity. This is because the 154,702 results are not what matters when questioning the size of the sample pool, the 492 samples are the issue.

Moy explained the small sample size in his email.

“First, a single Malware sample is often re-circulated and posted on hundreds of sites. So when we say 492 malware samples, we are really talking about thousands of sites.  Second, we felt we should discard any domain that represented more than 10% of the malware. A couple domains stood out, and [they] were pruned accordingly. However this favored Firefox, Chrome, Safari, and Opera since none were blocking for the URLs in this domain. Third, we ran this test over a period of 12 days and collected [154,702] results (sampling every 2 hours). Fourth, we clearly state (twice) that the margin of error is 3.76%. This is a conservative approach where we based our margin of error on the sample size, not the number of URLs.”

“We started with over 60,000 URLs. But since the average lifespan of these sites is just over 2 days, many of the sites were removed before the URLs were added to our test harness (every 2 hours). Also, many of the URLs crashed various browsers - Opera being the most exploited - so we were forced to remove them from our test set. Exploit testing was not in scope for this project. Finally, and perhaps most importantly, we wanted to focus on validated samples. The antimalware testing standards organization (AMTSO) of which we are active members stress the importance of using real malware. We took great pains to run all samples through TWO sandboxes and further confirmed maliciousness with two desktop AV products. So, smaller than tests of Millions of malware samples. But an extremely high confidence level in terms of quality,” Moy wrote.

As a post on an Opera blog points out, “Out of the 492 final tests, the same site could have up to 10% of the URLs, meaning that in a "worst case scenario", 10 unique sites were tested! If a browser did particularly well on one of these sites making up more than 10% of the test, their score would obviously be inflated (the report mentions that a number of sites were pruned after reaching their limit).”

In the end, there were 492 sites. It does not matter that there were 154,702 results because of them. A lab such as NSS could have used resources and vendor connections to guarantee a live base set of several thousand sites that met their criteria.

In my original article I also mentioned that Safari’s score should not be counted. “…this is because a new version of Safari was released before the NSS Labs test. Safari 4 was released to the public on February 24, 2009. NSS Labs says that it conducted its testing from February 26 to March 10, 2009. Why then, would it intentionally pick the older browser when testing features against those of Internet Explorer 8?”

Moy addressed this point as well. “…we had setup the test harness and run stability and smoke tests since early February, and changing at the last minute was not practical. Nor would it reflect security offered to users in a shipping/GA version. We are happy to look at Safari in a later test.”

As a note, it should be mentioned that at the time I wrote the statements I was unaware that the version of Opera tested was not yet released during the testing periods referenced in the NSS report.

When asked about the fact Microsoft sponsored the test itself. Moy said, “The fact that Microsoft sponsored the report is noted but less relevant. A study this big costs a lot of time and money to do properly. The truth is most organizations can't afford to do it right. We had developed the live testing methodology prior to getting the contract from Microsoft.”

Lastly, I asked why the samples were not made available for peer review. Moy responded, “The test methods are absolutely in the report in great detail. We are an independent lab, and frankly have never ever released samples used in the test. Not because we're afraid they're wrong, but because it guarantees the ability to perform valid tests in the future.”

“In fact this is where a lot of our credibility comes from in the industry. Let me explain. Other test labs (e.g. Westcoast labs, Virus Bulletin, AV-Test, AV-Comparatives, and ICSA, etc.) all give samples to the vendors. The vendors typically strong arm the labs to do so as a way to 'ensure the samples are valid'. Ahem. What happens is the vendors add detection for these samples to the test and voila, over a period of time you see AV getting 99.x% detection rates, which we know is not real-world.”

“But this is also our IP, so we don't want other labs who haven't made the investment duplicating it. After all we're trying to be a for-profit company…There is a lesser argument of protecting the public too, but I don't want to play that card. Actually, Google, Firefox, and Safari obfuscate their list of bad URLs in their product. And you may wish to know Google said they did not design SafeBrowsing to stop socially engineered malware. I can point you to the PR person there if you wish. Nobody from Mozilla or Apple would return our inquiries before or after report release,” Moy added.

The defense that because of the way the testing environment is designed, the list compilations, and samples are IP as a reason for no public peer review is valid. Mr. Moy offered me a chance to view screen captures and other recorded tests over WebEx. I have not taken him up on the offer, but I would assume any vendor or IT person who wants to look at them can request this.

Yet, the claim that creditability comes from the fact that they never give into the vendor pressure other labs face holds little water when you consider that the test was paid for by a browser vendor. Yes, tests do cost money, but at the same time, because a vendor paid for the test the methods and samples should be public, if for no other reason than to make NSS look good as their client won hands down compared to the others.

The lists for SafeBrowsing were not targeted at Malware delivery, but at the same time they have been used to help block such sites and are still valid considering the scope of Malware online. While not all methods and variants of Malware delivery were tested, even the socially engineered kind, the block list does include sites and other measures aimed at this level of delivery.

I want to stress I do not think that the testing NSS Labs performed is invalid. I had some issues with the test, but the method in how the results were used and the way they were presented by Microsoft were what I took issue with. Using fear to pitch a browser, no matter who does it, is wrong.

I was happy to have the email discussion with Mr. Moy and post his comments in an article as a rebuttal to my original story. With that said, I respectfully disagree with some of his answers. I can honestly say that instead of slamming the original article as marketing-based defensive comments, the fact Mr. Moy emailed me directly to discuss things is the most impressive response I have ever had to an article.

Like this article? Please share on Facebook and give The Tech Herald a Like too!