Blacklists and Cross-Site Scripting

Microsoft gets a lot of criticism over Internet Explorer not being “standards-compliant.” However, it’s actually not so simple, for a variety of reasons. One of them is that the web itself is not very standards-compliant — while IE8 has a standards-compliant-browser mode, it has to offer an IE7 rendering fallback mode because most web sites don’t render properly if you strictly interpret XHTML. (Opera and Firefox violate the standards in the same way for the same reason.)

However, another is that sometimes doing things the “right” way can be bad for security. To prevent cross-site scripting attacks, many websites implement a blacklist — they search for specific “bad” data and refuse to show it. Others are behind a protective appliance that filters out “bad” data and eliminates it before it even reaches the web server. This is not the proper way to do this — you should allow a whitelist of good data, not look for badness, which comes in many forms — but it is nevertheless common. This process will, however, filter out obvious attacks, like a user putting this into a message post:

<script>alert(“This is some script!”);</script>

However, it’s not so likely to catch, say, this:

¼óãòéðô¾áìåòô¨¢Ôèéó éó óïíå ïâæõóãáôåä óãòéðô¡¢©»¼¯óãòéðô¾

So, what the heck is that? Actually, it’s the same script in 7-bit ASCII, but the high-order bit of each byte is set, making it a different character. If you were running a blacklist checking for, say, <script> tags, this would sail right through. Likewise, a filtering appliance will not see anything wrong with this.

However, if this is displayed on a web page with the encoding set to US-ASCII, (e.g. a page with <meta http-equiv=content-type content=’text/html; charset=us-ascii’> on it, which an attacker may also be able to inject given the right circumstances) Internet Explorer will render it properly, causing the script to execute! Other browsers, however, will be safe due to their non-standards-compliance. They don’t render 7-bit ASCII properly, instead taking the presence of an 8th bit to indicate that you really “meant” UTF-8, and thus show only the gibberish characters above.

Standards compliance is not an unalloyed good — the standards are documents on paper, and don’t always consider their own security implications. They were written to tell people how to do things, not how not to do them. Real browser behavior is based on a combination of standards and precedent. There are few real-world reasons why rendering US-ASCII as US-ASCII and not ISO-8859-1 is important — on non-malicious pages, you should get basically the same output. However, trying to do the “right” thing can open up a security vulnerability. Due to this and the compatibility issues, I think that Microsoft’s attempt to make IE8 the first standards-compliant browser is not actually going to work out — my guess is that when it comes time to release it, they’ll make the IE7-like rendering mode the default, with standards-compliant mode only an option.

So, as a web developer, how can you defend against attacks like the above? You could look for “<script>” encoded in US-ASCII, but there are dozens of other encodings out there, and as RSnake’s XSS Cheat-Sheet shows, there are dozens of bad things you can encode in them. What you instead have to do is use regular expressions to allow only a limited subset of good user input. For fields like ZIP code, this is easy (allow numbers only, and the – character if you want ZIP+4), but with general message posts, it can be harder. Letters, numbers, common punctuation marks, spaces, and carriage returns may be enough. If you need to use HTML tags, it’s best to go in multiple passes — match the tags you want to allow (like bold and italics) and replace them with a custom marker, then HTML Encode the entire message, and then finally replace the custom markers with allowed (unencoded) tags. It’s still not 100% effective in all cases, but it’s a lot safer than any blacklist can be.

attacks, mitigations

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.