Aug 24 2008

DefCon 16, Day 1

Posted by Grant Bugher

Having finished up with the BlackHat briefings, it was time to go on to DefCon.  While many of the speakers from BlackHat stay on for DefCon, there’s also a lot of DefCon-only presentations, usually with a more attack-oriented focus (in keeping with DefCon’s nature as a hacker convention rather than a security conference like BlackHat.)

The day began with Hacking E.S.P. (Educational Software Packages.)  Schools, by their nature, have sensitive PII data — transcripts, schedules, billing information, etc.  A lot of this data is either stored directly in web-based educational software used by students, or is stored in other systems students access… probably with the same password.  Overall, though, this was a pretty typical application service provider hacking presentation — many of the schools they investigated used the same software on their sites, and that software was often woefully bad: Passwords sent “encrypted” in Base64 encoding — and not even that if JavaScript is turned off.  Trivial session stealing via Hamster-style sidejacking, with the added bonus that the Session ID is set before login so you can steal a session ID then wait for someone to use it.  Copious cross-site scripting vulnerabilities to allow for cookie stealing.

Generally someone would have to have a login on such a system to be able to exploit these things.  However, the username/password scheme is often helpfully revealed on the front page, and some schools even allow you to create your own account on the system. Google showed 34,000 instances of this one flawed software package alone.  Considering as schools account for 34% of data breaches, this sort of buggy software is probably commonplace.

The second presentation I attended was about Adobe local shared objects.  In short, these are Flash cookies.  Just as your browser will store small data items (cookies) for a website, and return those items to the website when asked, Adobe Flash has a similar mechanism for Flash applets.  However, since these are stored by Flash and not by the browser, your browser doesn’t manage them — there is no indication to the user what data is being stored, and the data is not removed when you delete cookies or “private data” in your browser.  Ad networks have used these to “back up” your cookies — if you delete them, they are restored from a Flash local shared object when you next visit a site with the ads on it.  These are also hard to filter for systems like Privoxy and other anonymizers, because Flash uses a proprietary encoding for its XML RPC calls, in which the local shared objects are embedded.

On the bright side, there is a Flash applet on the Adobe site called Flash Settings Manager that will let you delete these objects and put in settings to manage them.  On the not-so-bright-side, this is in-band signaling (i.e. Flash is configured by a Flash applet), so any advertiser can override your settings later.  Also, as you may recall from the RIA presentation at Black Hat that I discussed earlier, there are a lot of other local storage mechanisms besides this one in Flash — Silverlight, HTML 5, and other frameworks also have local storage that is outside your browsers’ ability to manage.

I next attended a presentation about vulnerabilities in TOR, the onion-routing anonymity provider originally developed by the Department of the Navy and until recently maintained by the EFF.  TOR has become quite popular, at this point containing 1,500 relays and 200,000 users.  However, over the last several years, it’s seen several vulnerabilities that have threatened the anonymity it provides:

  • 2004: Error in how AES counter mode was used resulted in cryptography with only 16 bits of entropy.
  • 2005: A relay cell length overflow could be used to force an exit node to send contents of memory
  • 2005: Diffie-Hellman handshake bug in OpenSSL didn’t check for trivial keys, so a malicious entry node could mount a man-in-the-middle attack
  • 2006: By running several fast TOR servers, an attacker could end up as both entry & exit node for a user, thus compromising anonymity and potentially finding hidden services.  The fix for this was the addition of “entry guards” — trusted entry nodes that are re-used by users.
  • 2006: Clients could create or extend channels even if server mode was turned off
  • 2007: “Stable” or “Guard” status, normally applied to the top n nodes, could be stolen by malicious nodes by claiming high uptime and bandwidth.  The fix for this was to put in thresholds above which a node always gets guard status, rather than making it a top n calculation.
  • 2007: XSRF attacks by web sites could make use of the TOR control port
  • 2008: Nodes could be made to connect to their own public IPs
  • 2008: The Debian OpenSSL PRNG flaw compromised 300 of the 1,500 relay identity keys, and 3 of the 6 directory authority keys.  If one more authority key had been compromised, someone could have taken control of the network

There are still some outstanding issues in TOR:

  • You can build infinite-length circuits and use them as a DOS multiplier
  • Snooping on exit relays works — if someone uses an insecure protocol that gives away their identity (like POP… or even HTTP depending on what they send), TOR won’t necessarily protect them.  This isn’t a bug in TOR, but just the nature of what it does — no software package will give totally anonymous communication if the communication itself gives your identity to the recipient.
  • People who run relays are unknowns — there is no way to know how many are malicious.  However, TOR depends on having a large, diverse set of servers, so making more restrictions on who can run servers might actually lower, rather than raise, the network’s secuirty.
  • Exit relays sometimes end up in restricted space (e.g. behind China’s firewall) — which means TOR users get restricted, too.
  • Many users of TOR toggle it on and off during a single browser session.  However, a JavaScript refresh attack on one of the non-TOR sessions can sometimes retrieve data from the previous TOR session.
  • Firefox bugs leak data, and that data doesn’t go through TOR, since on Windows it works as an HTTP proxy.  Users can work around this by proxying their entire network stack through a VPN connection like a JanusVM.
  • It’s possible to block access to TOR.  If an adversary (say, the Chinese government) filters out the directory authorities, the download site, and all the relays, it’s very hard to get on.
  • If you can see both input and output (by running many, many nodes, or having a massive filtering apparatus at the Tier3 ISPs — FBI, maybe?) traffic confirmation is easy.  (i.e. if I already suspect you, specifcially, of doing something, I can confirm you did it much more easily than I can “go fishing” for people doing unknown bad things.)  Defensive dropping or adaptive packing would help with this, but would raise TOR’s latency.
  • You can fingerprint websites based on the size & response time of the pages and tell what people are doing via traffic analysis.
  • A congestion attack by a website can find TOR nodes, and coupled with latency analysis on routers, can find the person communicating.
  • Data retention laws in many countries are resulting in data being stored that could make traffic analysis easier.

So, with all these problems in TOR, does that mean we shouldn’t use it?  Not at all!  The known vulnerabiliites currently outstanding would apply to any low-latency mix network.  They’re not bugs in TOR, they’re limitations in this approach to anonymity, which remains better than any other approach to anonymity known.  TOR isn’t perfect, it’s just better than everything else.  Now, there may be better approaches to specific problems (e.g. there is one particular adversary you want to hide from, not just people in general), but for general anonymity, you still can’t beat TOR, even with its flaws.

I unfortunately missed much of strace and RSnake’s presentation on Google Gadgets.  In short, gadgets are pieces of HTML and JavaScript code hosted on third-party sites and brought into a Google-owned namespace.  Though this namespace doesn’t have direct access to Google cookies, the fact remains that it’s loading unknown JavaScript onto a Google page — it’s basically XSS-by-design.  Gadgets can communicate with or post to other users and gadgets on Google, and it turns out to be pretty easy to sneakily force a user to install a gadget onto their Google homepage.  If someone could crack a server hosting a trusted gadget, they could take control of the data of many Google users simultaneously.  Most of these vulnerabilities would apply to any gadget-based architecture, such as the Live start page, or Facebook’s apps, too.

The next presentation I attended was “Satan is on my Friends List,” about attacking social networks.  In short, social networks are full of promiscuous and pervasive trust relationships, which results in a large number of business logic flaws.  These attacks aren’t on code vulnerabilities like SQL injection, but rather just exploiting how the systems work.

Sites often don’t protect “non-sensitive” operations, like logging out or adding friends, from XSRF attacks.  Thus, it’s possible to craft comments that log out anyone who views them… which makes it rather hard to delete them (since you have to be logged in to delete a comment.)  XSRFs can be put into image links, meta refresh tags, IFRAMEs, etc.

In addition, social networks are ideal platforms for social engineering attacks.  Build a plausible profile for someone else using social and public sources, and then friend a few dozen people who are known to friend everyone right back to build a “respectable” number of connections.  Joing groups, and start friending real associates of the person being impersonated.  At that point, you have a web site that can be used to confirm your identity as someone else.

The Facebook and OpenSocial APIs for integrated social apps are also good avenues for attack.  They have convenient APIs and execute arbitrary code by design — the social networks don’t care about application malware, as it’s on a second domain and they’re legally protected by their EULA.  However, if you widely distribute an app based on some current meme, get hundreds of users, then replace that app in-place with malware, you have an instant social network botnet.  You can use them as open redirects, put phishing items on their social network page, etc.  Also, applications have all the access that friends do — just the data disclosure may be enough for identity theft, impersonation, or at least some minor mayhem.  They can publish to a user’s profile to infect others, too.

Unfortunately, the fixes for these issues are just what the social network sites don’t want to hear — less external content, reduced API functionality, no opt-in security models, review and lifetimes for applications, etc.  Thus, these vulnerabilities are probably here to stay.

The last presentation I attended was by Errata Security, about interesting penetration tests.  Modern penetration tests are “supposed to be boring” — they’re often done for the purpose of meeting compliance objectives, so companies are mostly interested in meeting a checklist, not in security.  They want to be secure against likely attacks and “script kiddies,” but are not interested in making the kinds of expensive changes required to defend against a determined, well-funded adversary.  The main exceptions are government agencies and Wall Street, who know they’re the targets for those determined adversaries.

Maynor & Graham walked through a couple of interesting things they did as part of penetration testing.  One of these involved hacking the well-firewalled network of a company that was based at a secure facility, one where they could not simply walk onto the premises.  Instead, the wired an iPhone to a UPS battery and put it into the original iPhone packaging, then mailed it to the company.  With a UPS battery, an iPhone can run for 5 straight days with the WiFi on.  They modified the phone to add tcpdump and APlogger, and added a cron job that would send an SSH tunnel out to their computers every hour over the AT&T EDGE connection.  The result was a WiFi sniffer & endpoint inside the “secure facility” from which they could scan the internal network and run Metasploit to break into things.  An iPhone, after jailbreak has been run, is essentially a tiny BSD box — a perfectly suitable hacking platform.  Who thinks about their network being hacked by a cardboard box in the mailroom?

They also built a better phishing site.  Even security-aware people who are looking for phishing sites look for a valid SSL certificate bound to the site and signed by a trusted authority.  However, all it takes to get a real SSL certificate signed is about $2,700.  Start a business, register with Dun & Bradstreet to get a credit rating, then apply for a real certificate from VeriSign or Thawte.  You can even sign ActiveX controls and require users to install them as a “secuirty feature.”  Since so many banks and companies outsource their applications or their HR and IT infrastructure, a phishing site with a good certificate is often indistinguishable from an outsourced site.  Just send someone at the comapany an email saying that the company has changed 401(k) providers, and they need to go to this outsourced site and sign up.

As is customary with DefCon, there wasn’t much talk about how to prevent these vulnerabilities.  However, it gives you something to think about, and it’s often very hard to guard against clever attacks against business logic flaws.  There’s no substitute for good threat modeling and flexible thinking.

May 16 2008

Charter Communications Using Ad Replacer

Posted by Grant Bugher

A story in the New York Times tells us that Charter Communications (the United States’s fourth-largest cable company) is going to start tracking user behavior and using it to sell ads.  They spin this as a potential problem because of privacy implications — it means that the cable company is watching your web surfing so it knows what ads to show you.  While they say it will be anonymous (i.e. they only know that a specific tracking cookie is associated with one user, but not who the user is), when it comes to an ISP this simply isn’t true — they do know who you are (due to billing information) and if they were not-so-politely asked (i.e. with a subpoena) they would be able to associate your tracking cookie with you as the individual user.  As a matter of policy they don’t associate the tracking profiles with individual users’ personal information and share it with their advertising partner — but they have the data, which means law enforcement can have the data.

However, all the discussion about privacy in the article is, in my opinion, a secondary issue.  As I’ve discussed before, using an ad replacer has other effects that may be much more serious.  It means Charter is now mounting a man-in-the-middle attack on all its customers and editing the web pages they view.  Thus, if there are any security flaws in the NebuAd software (like, say, a cross-site scripting vulnerability as we saw with Barefruit in a previous post), they are now embedded in every web site viewed by every Charter customer.  When you’re a large ISP like Charter, this makes it worthwhile for hackers to try to attack the system — being able to steal the bank account passwords of every Charter customer at a given bank is almost as good as being able to do it to all customers of the bank.  It may only be 10% of people, but 10% of everyone is still a lot of people.  In addition, Charter customers are no longer contributing to the revenue of the web sites they visit (which could be interpreted as an attack on those websites by Charter — they just stole all their revenue.)  I don’t much expect Charter to care, nor their customers, but the more ad replacers that are out there, the less advertising is able to support web sites.

So, what to do if you’re a Charter customer?  Well, you can opt out of the tracking system by setting a cookie, which means the ads you’re served will not be targeted.  However, the ads probably will still be replaced, so you’re still not helping pay for the web sites you visit.  And chances are that Charter could still come up with a record of all your web surfing if they were served a subpoena.  If you want to avoid that, the only choice is using an encrypted tunnel and mix network like TOR (which law enforcement has probably at least partially compromised, but this puts them in a situation like the Allies after they broke the Enigma machine — if they use evidence from a TOR compromise to prosecute you, then they give away that they’ve compromised the network and criminals will stop using it.  Thus, you’d need to do something pretty serious for them to be willing to admit they know about it.)  And what to do if you’re an advertiser-supported website?  Not much.  You can lobby for net neutrality laws, or ban Charter customers outright (which will hurt you more than it hurts them.)  However, I would expect Google, DoubleClick, and other ad networks to start working on obfuscating their ads soon if more major ISPs embrace ad replacement.

Apr 10 2008

Surveillance and Ubiquity

Posted by Grant Bugher

HexView has an article about tracking vehicles with RFID tire pressure monitors. The devices are found in tires and transmit tire pressure to the engine control module, which sounds innocuous enough, but to prevent modules from reading neighboring cars’ tires by accident, they also transmit a unique ID. Thus, you can follow a car around town based on its ID, turning tire pressure monitors into tracking devices.

RFID devices are becoming more and more common, and this trend will continue — they’re too convenient for many purposes for the security risks around them to stop them. You may not want every consumer good you buy to be tagged with an ID that lets people watch your shopping from 100 yards away, but the scenario of being able to check out at the grocery store by instantaneously scanning every item in your cart simultaneously is too compelling for people to resist.

Bruce Schneier has a post on the ineffectiveness of security cameras, but while calling them ineffective it does note that criminals moved their crimes to somewhere the cameras couldn’t see. This may be “ineffective” for a government camera system designed to deter crime, but it’s precisely what privately-owned security cameras are meant to do — make a target unappealing so criminals go elsewhere. This actually shows that cameras do deter crime… but only where they can see it.

However, both of these technologies can have pernicious effects, too. The HexView article points out that you could use the RFID tire monitors to commit murder — set a bomb with a radio trigger that goes off when the “right” car drives over it. It would also be just as useful to private investigators spying on citizens as it is to law enforcement chasing down criminals. And speaking of law enforcement, these cameras create a dangerous imbalance in their favor — the camera evidence is all under their control, and thus can come up when needed to prove a perpetrator’s guilt yet be conveniently lost in cases of police brutality, abuse of power, corruption etc.

This is an interesting time for surveillance — police and government surveillance ability is skyrocketing (London is practically blanketed in cameras at this point, as the British seem much less uncomfortable with them than Americans are) but it is still largely in the hands of authority figures. This is dangerous because of how fast the change is coming — our criminal laws and sentencing structures are based on the principle that most criminals get away with it. A $75 fine for speeding seems pretty reasonable, but what if that fine were levied every time a car hit 1 mph over the speed limit? Most of us would get fined a dozen times a day, every day, despite not even meaning to speed, because our behaviors are based on the idea that we probably won’t get caught and that even if we are police are unlikely to punish us for very minor transgressions. If people were caught for speeding every time, and fined every time, a $75 fine would be absurd — the fine could probably be under $1 and still bring in a few hundred dollars a month from every citizen. What is the right legal structure here? I can see two possibilities:

  • Raise the speed limits to the speeds we really think no one should exceed, and continue to fine every time.  Maybe you should get charged every time you exceed, say, 85 on a highway or 55 on a city street.  Set them high enough that there’s no leeway required.
  • Leave the speed limits where they are but set the fine really low, say a $0.25 per minute of speeding.  This makes speeding discretionary — you can obey the law, or not, but if you choose not to you pay a penalty.  This is a fundamental change in the whole idea of crime and punishment, and itself has some pernicious consequences — it means that a certain income level can render you “above the law,” which is not a good thing.  Obviously some crimes (such as murder) should not be treated as discretionary, but for traffic violations it could make sense.

It’s not just traffic laws that are like this; consider the War on Drugs.  If every person who ever smoked marijuana went to prison, we would have a nation of felons — there’d be few people left who could vote, get security clearances, hold most jobs, etc.  The RIAA lawsuits against file-sharers are a good example of what happens when technology that catches everyone gets used to enforce laws designed under the assumption that only the worst and most flagrant criminals will be caught — people being hit by millions of dollars in fines for using technology to do something that wouldn’t even raise an eyelash if done by old, physical means (e.g. posting a song on BitTorrent vs. handing it to a friend on a cassette tape.)

A surveillance society needs a different kind of jurisprudence — one that sets punishments that fit the crime even if applied every time.  On the bright side, actually doing this would lower crime rates tremendously due to the psychology of criminals.  Escalating punishments does little to deter crime because criminals are risk-seekers — they do not expect to get caught.   Even a small punishment can be a strong deterrent if applied every time — if criminals are usually caught, such that all criminals have some first-hand experience with being caught and punished, it would break this idea.  On the not so bright side, a surveillance society must have very liberal laws to avoid being a police state — our current legal system, applied to everyone every time, would result in tyranny.  We all break 10 laws a day, it’s only sloppy enforcement that allows us to live our lives.  Unfortunately, the technology for ubiquitous enforcement will come well before the legal system changes to make it livable do.

What’s interesting to me is what will happen when surveillance becomes even more common: that is, when it is no longer monopolized by authority.  This has already started with cellular phones.   Almost everyone carries around a device which, while primarily for communication, contains a camera and often a voice recorder and videocamera as well.  Everyone is equipped to carry out impromptu surveillance at any time.  Devices like these glasses from ThinkGeek (found via BoingBoing) coupled with the rapidly falling cost of storage capacity will change this to everyone actually carrying out impromptu surveillance all the time.  This will have a chilling effect on human behavior at first — would you act differently if you knew everyone around you was videotaping everything you did?  Everything you say will, indeed, be able to be used against you, and not just in a court of law.  However, look at what young people put on MySpace and Facebook these days — the next generation does not have the assumption of privacy.  They’ve grown up in a world where they know everything goes on a permanent record, and have simply accepted it.  Sure, they’ll be occasionally shocked by it (e.g. the first time their party photos on MySpace disqualify them from a job), but the knowledge of permanence has not stopped them from sharing themselves, and eventually the rest of us will adjust, too.

Consider what the democratization of surveillance does to government power.  When we’re all recording, someone is watching the watchers.  Corruption, abuse of power, etc. all rely on the fact that authority figures can get away with crimes because they are more reliable witnesses in court than their victims are.  When everything is on the record — and not just the official record, but everyone’s record — police and government officials become compelled to act within the law.  While this may not be much of an impediment in truly totalitarian societies like China where the courts are as corrupt as everyone else, it’s a very strong bulwark of freedom in any society with an independent judiciary and a liberal tradition like the Untied States and Europe.  This is the next generation of surveillance — everyone sucking in light and sound from their glasses, or lapel pens, or even contact lenses, recording every moment of their lives on multi-terabyte devices that fit in their pockets.  It’s probably only 5-7 years away, and it washes away the current problems of a surveillance society and replaces them with new ones.

I think this cycle will continue for some time.  After all, once we’re past the era of democratized surveillance, computer graphics and artificial intelligence technology will improve to the point that ordinary people can modify their recordings to create perfect video of events that never happened, indistinguishable from the real thing.  What happens to recordings in law courts then, when they cease to be reliable evidence and become hearsay?  Tapes will become the new eyewitnesses, known to be unreliable and requiring corroboration from others.  When it becomes truly easy to make forged video, perhaps we will have emerged from the surveillance society from the other side — why bother to record anything when there’s no way to tell if it’s real?  Sometimes the only way out is through.

Jan 24 2008

IP Addresses: Personally Identifiable Information?

Posted by Grant Bugher

Peter Scharr, Germany’s Commissioner of Data Protection and head of the European Union’s privacy working group, has stated that information identified only by IP address must be considered personally identifiable information. As the AP article points out, this could have rather serious implications for search engines and many other electronic businesses, and RSnake is concerned about it messing up the entire advertising business model of the Internet.

First, for those not working in the information security industry: something being classified as personally identifiable information (PII) is a big deal. If data is PII, you are liable for damages if the data is ever released, and you are required by statute to take significant and often expensive measures to protect it. If you’re a public corporation, Sarbanes-Oxley requires you to do all sorts of things to protect the data (e.g. encryption.) If your company takes credit card payments, the Payment Card Industry Data Security Standard requires you to do even more (e.g. physical protection of the hardware the data sits on, specific firewall/router configurations, etc.) Most large companies have their own standards for how PII must be protected that combine or even go beyond the regulatory and industry requirements. Overall, the required protections around PII are onerous enough that companies strive to minimize how much PII they have at all — it’s often cheaper and easier to just delete the data than to protect it the way you need to protect it. Companies must make the decision of “How much business value do we get out of storing, say, our customers’ addresses, and does it exceed the cost of protecting that data?” Often the answer is no.

On the surface, calling IP addresses PII is ridiculous. IP addresses are found on every packet anyone sends on the Internet; if IP addresses count as a personal identification, then logging basically anything about Internet traffic makes the logs PII. It takes a label currently applied only to a small amount of high-value data and applies it to something that everyone everywhere logs; it seems absurd. But as I think about it more, I’ve come to realize that Scharr has a point.

The EU is much more aggressive about privacy law than the United States. The United States Constitution guarantees privacy from the government through the Fourth and Fifth amendments; this sharply limits what the government can collect on you and what it can do with the data it does collect. However, there is no Constitutionally or legislatively defined general right to privacy — anyone can collect whatever data they want, so long as they’re not a branch of government. This is usually an adequate protection against government abuse, but it does mean the private sector can accumulate a frightening amount of data about you, and that could be prone to abuse as well. EU nations, on the other hand, often have a general right to privacy and various data collection expected in the United States is often illegal; in addition, where the data can be stored, sharing it with any third party without express user consent is almost always illegal.

If IP addresses are PII, what really happens? It requires changing a lot of current practices, but this is not the same as breaking scenarios. Remember, the privacy issue isn’t with transmitting or using IPs — it’s with storing them or sharing them with a third party.

  • Currently search engines like Google use your IP to identify where you are geographically, so as to establish search profiles for regions and target ads. They store the first 24 bits of your IP (dropping the last octet) as a proxy for location. They would need to switch to storing a different proxy for location (e.g. latitude and longitude), though they could still base this proxy on your IP.
  • Pay-per-click ad networks would still function. When they’re clicked, the ad network records the click (so as to be able to bill the advertiser), then issues a 301 redirect to the advertiser, who also records the click (to know it happened and the ad was effective.) These records would need to leave out IP, or be protected as PII. Lacking IP, however, would make detecting and preventing click fraud (spoofed clicks, or many clicks from the same person) much more difficult. Currently a skilled fraudster can evade IP-based click-fraud prevention, but losing even that would make click fraud easy. Also, without IP addresses, the ad networks would have a hard time proving to advertisers that clicks were real if an advertiser chose to sue them. Large ad networks would probably have to just eat the cost of protecting their logs as PII.
  • Contrary to RSnake’s comment, I do not think this would affect embedded content. Embedded content comes in two forms — content linked to on a page, which your browser loads (objects), and content retrieved by the server and displayed on the page (mashups.)
    • In the object case (e.g. viewing a YouTube video on someone’s web page), the web site owner is not leaking your IP to the third party — you are. The web site is not sending your IP to YouTube at all; your web browser is sending it in response to a link tag in the page.
    • In the mashup case (e.g. web pages that get data from an API, like Facebook pages, pages embedding Google Maps, etc.), the web site owner is also not leaking your IP to the third party. You access the site, and then the site accesses the third party not as you, but as itself. The site leaks its own IP, not the customer’s. No PII is released.
  • Sites that do user tracking (via logins simply recognizing users between sessions) would be unaffected; they use cookies, not IP, to track users. Most ad networks work this way, too.
  • The biggest change, though, is to simple website logs. Currently every time you access any web page, it makes a note in a log of your IP and which site you accessed, which is used for statistical analysis, forensics, etc. Even this blog is doing it; with most web providers you can’t even turn this logging off if you want to. Sites will either have to stop doing this or take substantial steps to protect the logs (or else be subject to significant statutory liability if they don’t.) Not keeping logs is, from a security perspective, very dangerous — if something happens, you have no idea what happened and thus may not be able to fix it.

However, despite all that cost and difficulty, when you think about it… IP addresses really are personally identifying. If you have an always-on broadband ‘net connection, your IP address changes very rarely (maybe only once in several months), so all your web traffic everywhere, complete with your search queries, emails, etc., can be tied together with that number. Your ISP can connect that number to your name, address, etc. If you’re at a corporation, the IP is tied to a corporate gateway or proxy… which has logs tying each communication (based on date and time) to your desktop’s IP, which once again likely uniquely identifies you (unless you always compute from a shared machine.)

IP is a unique identifier for confirming identity, but not so much for initially finding it. In other words, if someone attacks my website, and I have only their IP address, it may not do me much good in finding out who they are unless I can get someone with subpoena powers to get it from the ISP. However, if I suspect a specific person of something, I can probably find out their IP and check it against my attacker’s IP, thus confirming their identity. Likewise, if I am an ad network or search engine with a lot of IP data, I don’t know who you are based on your IP, but the commonality in IPs between all the data I have may enable me to figure it out based on data aggregation.

I think this is a case where something is considered ridiculous merely because it changes things. Yes, a lot of business models and current practices would have to change if IP-as-PII became the default assumption. Yes, it would make some security people’s jobs harder, and cause web providers to incur a lot of costs. But does that mean it’s wrong? Perhaps what it means is that current businesses & web sites under-value their users’ privacy, and are freeloading while providing inadequate protections. It’s a different world if we have to discard IPs or protect them as PII, but I’m not convinced it’s a worse one.

Dec 10 2007

Anonymity with TOR and its limits

Posted by Grant Bugher

The post at the Unwired Video Blog about TOR has been getting a lot of publicity, having been linked to by both Lifehacker and Boing Boing. It provides a quick overview of TOR, how it works, and how to use it to browse the Web anonymously.

This is a good thing; people using services like this does help protect their privacy and anonymity, and due to how TOR works the more people that use it, the more secure it becomes (indeed, the Navy, who developed TOR, released it publicly because they realized that if only the military used it, it was worthless.) Most of all, if normal, everyday people value and use anonymity and privacy services, it shows policymakers that anonymity is a social good desirable to all and not something that people only want when they have something to hide.

The argument against anonymity, however, is that it can be used to cover up crimes. Will something like TOR protect criminals? How do we track down a malicious hacker if the attack comes from a TOR node?

There are actually quite a few ways TOR leaks information, and really they’re all centered around idea — one cannot simply “be secure,” one must be secure from something. TOR protects against some attacks and adversaries but not against others.

TOR (”The Onion Router”) provides anonymity by encrypting your traffic multiple times and routing it through TOR nodes. Loading the Amazon.com home page through TOR would look like this:

  1. Your computer contacts two TOR nodes, which I’ll call A and B, and requests their public keys.
  2. Your computer forms the web request to Amazon.com.
  3. It encrypts the web request in key B, then encrypts the result (along with the address of B) in key A.
  4. The packet is sent to node A.
  5. Node A, which has the private portion of key A, decrypts the packet. Inside is another address (that of B) and an encrypted blob. Thus, Node A knows you you are, but it doesn’t know what you’re transmitting, or who you’re sending it to. It forwards the blob to Node B.
  6. Node B, which has the private portion of key B, decrypts the packet. Inside is your transmission to Amazon.com, which by its nature says where it should be sent. Thus, Node B knows what you’re transmitting, and who you’re transmitting it to, but has no idea who you are or where the packet came from. Node B sends the packet to Amazon.com.
  7. Amazon.com gets the packet and replies, sending the reply to Node B. Note that Amazon.co, like Node B, has no idea who you are or where the packet came from.
  8. Node B gets the reply and forwards it to Node A.
  9. Node A gets the reply and forwards it to you.

This is simplified; there’s additional encryption so the nodes can’t all read the reply as it makes its way back to you, and there can be more than two nodes in the chain (in which case the intermediate nodes know even less about the transmission than A and B above.) However, the above is the simplest case, and shows how much each part of the chain knows and doesn’t know.

The primary adversary TOR is designed to protect against is the actual site you’re browsing. It hides your IP address (which, with a subpoena or some social engineering to your ISP, can be tied to you personally) from the target site, so that the site does not know who is visiting it. The obvious counter to this, though, is for the site to apply a cookie to your browser when you visit it, such that it “recognizes” you on subsequent visits. TOR alone will not protect against this, which is why TOR is almost always packaged with Privoxy, a filtering proxy that runs on your own computer, examines all your web traffic, and strips out data that can be used to identify you. Here’s the first weakness in TOR — it can only strip out so much.

Web traffic is stateless; each web request is not tied to any other in any persistent way. When you load a web page, your browser nearly-simultaneously requests the page and all the images, media, embedded frames, ads, etc. on the page. When you click a link, the server has entirely forgotten who you are — it’s a totally different page load. This would make any kind of integrated experience impossible (the web was originally designed just to serve up static reference pages, not implement shopping carts), but for cookies. The web server sets a session cookie (a cookie that is deleted when you close your browser) when you load the first page, and uses that to track your movement through the site. There’s nothing menacing about this — it is not “tracking” in any Big Brother-ish way, it’s just linking all your page loads together to provide a sense of state or flow — and the web doesn’t work without it. Thus, Privoxy has to let these session cookies through. This can leak a little bit of information about you.

How is this anonymity defeated? In the simplest case, the user leaks the data on their own! A web request contains a decent amount of information about you (what browser you’re using, your operating system) as well as any cookies the user sends. Privoxy strips most of this, but if the user uses TOR alone, then the fact that the endpoint can’t see your IP may be irrelevant — you just told it who you are anyway. Likewise, browsing a site that requires login (like a webmail provider) through TOR is plain silly — it’s obvious who you are, you just logged into the site. This is true even if you’re worried about investigation not from the site but from authorities, spies, or other hackers — the webmail site logs that you connected, and it probably logs everywhere you’ve ever connected from. Thus, using a site with login through TOR only provides anonymity if you never use that site except through a TOR connection. Otherwise, your communication can be correlated over time.

Also, it’s possible to make an end run around TOR. If someone simply hacks into your computer (or seizes it, in the case of legal authorities), they don’t need to have logs from the other end to know what you’ve been doing; your own computer probably has records of your activity. Deleting them usually does little good in the legal case — modern data recovery can get deleted data quite easily. To avoid this, it’s necessary to have a computer that simply doesn’t keep any records — and this is hard to do in a normal Windows or Linux system (for one, they both arbitrarily swap portions of memory to disk during normal use.) For better anonymity, it’s necessary to boot a LiveCD environment (i.e. an OS with no hard drive or writeable media), where all evidence is destroyed when the computer is powered down. But unless your oppressive dictatorial nation’s secret police are after you, this is probably excessive when it comes to normal protection of privacy and anonymity.

Finally, there’s one more attack through TOR that’s more troubling than either of the above — anyone can be a TOR node. “Node B” (the exit node) in the example above gets to see your traffic, in both directions — it just doesn’t know your IP address. You are, thus, trusting a complete stranger somewhere on the Internet with your traffic, complete with the ability to carry out man-in-the-middle attacks on you (i.e. maybe he doesn’t forward your traffic to Amazon.com at all, but rather a fake site of his own design; or maybe he edits the traffic to add a virus or Trojan.) This actually happens; Bruce Schneier linked to some logs of a TOR exit node trying to carry out a MitM on an SSL session. So while TOR protects your anonymity, it may actually risk your privacy — it’s very hard to carry out MitM attacks on random Internet users, but doing it as a malicious TOR exit node is comparatively easy.

Another thing to consider: there are only so many TOR exit nodes. There are few enough of them that if, say, the NSA, or the RIAA/MPAA, wanted to, they could set up hundreds of exit nodes, all of which spy on traffic, and have a set of nodes large enough to comprise a substantial portion of the TOR network. If one agency controlled, say, 10% of the exit nodes, their ability to figure out who you are would be pretty significant. If they controlled normal nodes as well (even easier), they might even get lucky and get both the incoming and exit communications on their hostile network, allowing them to completely monitor your traffic.

TOR is meant to protect your anonymity from the site you’re browsing. It does this pretty well, as long as you’re reasonably careful, don’t browse sites that require you to identify yourself, and use a cookie-filtering proxy like Privoxy. However, it is not meant to provide privacy from TOR node operators, and thus it does not. You can have privacy, or anonymity, but you can’t have both at the same time in a perfect fashion. (Even using open WiFi access points with an obscured MAC address provides anonymity but not privacy — the operator of the access point can do everything a TOR exit node operator can do to you and more.

Overall, it’s a valuable tool, but if someone wants to track you down badly enough, and they have the resources or authority, they can still do so. This is why criminals aren’t out there committing heists with TOR every day; if you do something bad enough, it won’t protect you. Of course, most computer criminals aren’t caught due to malicious TOR exit nodes or anything so arcane — they’re caught because they brag about their accomplishments, or because investigators follow the money. Even hackers that excel in covering digital tracks thankfully usually have no experience in money laundering.

Nov 28 2007

Why Hackers Love Wi-Fi

Posted by Grant Bugher

Hackers love wireless networking. At DefCon 15, it was easy to predict which sessions would have lines running out the door and require getting there well in advance for a seat - it was the sessions with “wireless” or “Wi-Fi” in the title. The Wireless Village was very popular, and many of the hacking contests involved wireless access points.Why do hackers love wireless networks? Really, there are two reasons, and those two together have some scary implications for risk on the modern Internet.

1.) Wireless Networks Use Shared Media

Back in the 80’s and 90’s, most wired Ethernet networks were based on shared media topologies. In principle, when you plugged into an Ethernet network and sent a packet, the packet on the wire (the actual electrical impulses) went to every other machine on the network. Hubs were simple repeaters, broadcasting everything they received. Only when your signals reached the router at the Internet edge were they actually intelligently processed. Thus, every computer on the LAN got every packet - the network cards just threw away any packets whose destination address specified another computer. However, a hacker wanting to eavesdrop on others had an easy job - just toggle the network card into “promiscuous mode” (a hard task on some network cards and OSs, but completely trivial on others) and it will receive every packet, giving you a god’s-eye view into the network. Protocols were mostly unencrypted then, too - so you saw everyone’s email, their paswords as they logged into Telnet or IMAP, etc. You could also spoof traffic - since you saw the packets sent by others, you could simply send responses back claiming to be the recipient. So long as your response arrived before the real one, yours would be accepted and the actual response discarded as out of sequence. It was the golden age of network-protocol hacking. Such easy access to passwords made other types of hacking easy, too - once you had the password to someone’s UNIX account or email box, there was a very good chance it would work on all their other accounts, too.

Then it all changed. Shared media has significant disadvantages as it scales - since everyone is dumping packets onto what essentially amounts to a single wire, collisions occur when two systems transmit simultaneously. Both then have to back off, slow down, and retransmit their garbled packets. The packets are tiny (Ethernet frames are normally restricted to 1500 bytes or less), but if you have 100 systems communicating at once, collisions can become quite frequent. Plus, even in the late 90’s people were not totally unaware of the security risks - the fact that any student could read all the network traffic of everyone else in their dorm was not considered desirable by universities, for instance. Thus, Ethernet was converted over to switched media. Switches, unlike hubs, do not treat all ports as equal. Instead, they remember which ports they have received traffic from an address on, and only forward traffic to an address to those ports. Traffic is only broadcast to all ports when a switch has no idea for which port it is intended, or when a packet is actually marked as a broadcast. Now, when you put your Ethernet card in promiscuous mode, all you hear is traffic meant for you - everything else has been blocked by the switch. Suddenly, packet sniffers went dead - there was nothing to see anymore. Ethernet became a lot more secure.

But wireless changes things again. Wireless networks are shared media, and they are shared inherently, in a way that cannot be changed. Radio waves fly in all directions. There is no way for your laptop to transmit only to another laptop or an access point - all radio is broadcast. Thus, when you sit down in a coffee shop and turn on wireless, you begin broadcasting everything to everyone within range (about a mile, for attackers who have good antennas and high-power network cards.) The shared media nature can be mitigated somewhat via cryptography - if all the traffic to the access point is encrypted, it hardly matters if someone can eavesdrop since they can’t understand it anyway. But open access points are, by their nature, open - they’re either not encrypted at all, or they’re encrypted in such a way that everyone is using the same key. Once the hacker has the key (either by cracking it, which is not hard on most Wi-Fi networks, or by simply paying as a legitimate user of the wireless hotspot), they can read all the traffic just like in the hub-based glory days of old.

There are solid wireless encryption systems. A network based on WPA2 with a strong passcode is quite secure, about as good as a wired connection (keeping in mind that “as good as a wired connection” is not an absolute guarantee of safety, either.) Modern encryption systems like AES coupled with 802.1x certificate-based authentication can make a well-engineered corporate wireless LAN quite safe.

But hackers don’t love well-engineered corporate wireless LANs. They love the terrible ones in coffee shops and bookstores and your house. On these networks, they can listen to all traffic, they can spoof traffic, and they can even kick people off and hijack their connections, or edit their connections on the fly. The “airpwn” attack from a DefCon 2-4 years ago was particularly amusing; using two wireless cards, it would sniff everyone’s HTTP traffic on one connection, then on the other card spoof responses to all requests for images, substituting other images (such as the hacker group’s logo, or more unsavory fare like the infamous goatse.cx site; that is not a hyperlink on purpose, do not navigate to that URL as it is not safe for work or, indeed, for anywhere else.) The result was that one laptop at a security conference was able to dynamically edit the HTTP streams of everyone else there - hundreds of people. That’s the kind of power a hacker can have on a shared-media network. In addition, on these sorts of networks, it’s trivially easy to hijack sessions. This means that on any site that uses HTTPS for authentication only, but then HTTP for the actual service (a category that includes all of the Google apps like GMail, as well as all the Yahoo! and Windows Live services), a hacker gains full access to your account if they overhear any of your wireless traffic.

The only truly safe way to use a public wireless hotspot is to use it only to VPN to a network you trust. Anything else is dangerous.

2.) Wireless Networks Provide Plausible Deniability

The legal system is not terribly friendly to hackers. Even innocuous and non-destructive activity, when applied to networks you don’t own, is often illegal. Now, for the most part hackers don’t worry overmuch about getting caught - if you don’t cause more than $5,000 in damages, the FBI won’t get involved, and the average local police department is about as capable of investigating sorcery as computer crime. However, when a hacker does worry about legal prosecution, a public wireless network is the next best thing to Siberia for where to commit a crime from.

When you do anything on the Internet, a host of servers are recording your activity based on your IP address. IP address, however, is not necessarily long-lived. Depending on how you access the Internet, your IP address might change every time you plug your computer in, or reboot, or move from building to building. Thus, investigators must be able to tie the IP address they know committed a crime to a specific, physical person.

With wireless, this is a problem. All the sites being attacked don’t see the IP address of the hacker - they see the IP address of the wireless access point. Thus, they have to subpoena the owner of the access point and demand to know who was using it. In the case of a well-designed corporate wireless LAN, they can check their logs to see which 802.1x certificate was using that IP at that time, and uniquely identify you. But in the case of a public hotspot, there probably aren’t any logs at all! They’re completely incapable of giving you up. And even should someone who was there say “I saw a shifty guy in the corner using a laptop!” to the police, that’s not going to be enough evidence. And if there are logs, they will tie your traffic to your MAC address, a unique code assigned to your network card at the factory.

Most people think MAC addresses cannot be changed, so it uniquely identifies your network card. If the police get a hold of your network card, they’ve caught you. This is actually totally untrue. Many network cards will allow you to change the MAC address to whatever you want (in Windows, it’s on Connection Properties -> Configure -> Advanced -> Physical Address), though this is entirely up to the network driver. Many Windows drivers block this functionality, thinking that users don’t need it. On Linux, however, the network drivers have been written by geeks, who operate under the impression that users need everything. Thus, on Linux systems changing your MAC address is as simple as typing one command (”macchanger eth0 00:11:22:33:44:55″), and you can even configure the network stack to give you a new, random MAC address every time you connect to a network.

As a result, a trail that leads to a wireless hotspot is basically a dead end for investigators. They get nothing but a fake MAC address that could correspond to any computer within a 1-mile radius - the hacker might not have even been in the building. Hard to get “beyond a reasonable doubt” out of that.

And those are why hackers love wireless networking. It’s like the 80’s phone networks, where a hacker can be a ghost in the machine, undetectable, and with tremendous power. It’s a dangerous place.

You might wonder, if wireless networks are so anonymous, how hackers ever get caught. Actually, there are three main ways:

  1. They get stupid, and brag about what they did.
  2. They get stupid, and while performing their illegal activities they also do something that identifies them, like log into their email account.
  3. Investigators follow the money. We don’t catch you breaking into the bank, we see where you sent the money to. We don’t catch you stealing credit card numbers, we catch you using them.

Luckily for those of us in the business of investigating and preventing computer crime, wireless networks won’t save criminals from their own stupidity, and you can’t send cash through the airwaves.

Nov 06 2007

Secure P2P for Pirates

Posted by Grant Bugher

According to a recent Reuters article, the unrepentant pirates of Sweden’s The Pirate Bay are working on developing their own peer-to-peer networking system.  It turns out that this is a relatively fascinating security problem, even though in this case it’s the criminals needing the security, vs. the law-abiding companies trying to break it — a bit of a reversal, to say the least.

Currently, the Pirate Bay is probably the world’s most popular BitTorrent tracker for downloading pirated media, receiving 1.5 million unique visitors a day.  With a quick trip to the Pirate Bay, you can quickly acquire any piece of music, any episode of any recent television show (usually within a couple hours of its first airing), any movie (generally while it’s still in theaters), etc.  Membership is required to enforce ratios (i.e. ensure you upload as well as download), but is free and open to all.  However, they’re unsatisfied with the BitTorrent protocol for a variety of reasons — chiefly the legal risk that their “customers” take.  Downloading from the Pirate Bay via BitTorrent runs two risks — first, that a copyright holder will grab your IP address and send a cease-and-desist order to your ISP, or worse, a subpoena which under the DMCA in the United States could carry a fine of tens of thousands of dollars, and second, that your ISP itself will cancel your subscription for using too much upstream bandwidth.  Comcast, in particular, is notorious for doing this without being willing to admit how much “too much” is, even as they cut you off for using it.

BitTorrent is an ingenious protocol.  The idea is to prevent massive load on single servers for downloading popular files by ensuring that everyone who downloads the file also shares it with others, even as the download occurs.  You don’t need the entire file to start sharing it — you register with a BitTorrent “tracker” like (The Pirate Bay) as working on a file, and all the other peers who either have or want that file are notified of your existence.  Peers then communicate with each other, swapping whatever parts of the file they have for the parts they don’t.  Thus, everyone’s upload bandwidth is being used at the same time as the download, unlike some previous P2P protocols.  This is used for many legal purposes — for one, Blizzard’s World of Warcraft uses it to update the game, to get around the obvious difficulty of having about 4 million of its 6 million subscribers all trying to download a 450-meg content update on the same day.  Thanks to BitTorrent, these updates go smoothly every time.

The problem, however, comes when the files being shared are illegal.  In the United States, uploading copyrighted media can result in rather substantial fines and statutory damages, and the RIAA and MPAA are actively suing people by the thousand to get them charged.  People want to download copyrighted media, so sites like the Pirate Bay exist.  But RIAA and MPAA agents can connect to these trackers, too — they’re open to all — and the tracker shares everyone’s IP address with them.  Since with BitTorrent, downloading and uploading go hand in hand, there’s no way to download copyrighted material without not only breaking the law but also advertising your IP to anyone who wants it.  There are blacklists of known RIAA/MPAA peers that will protect a pirate from the most ham-fisted detection, but it would be trivial for the copyright holders to evade this sort of blocking.  The Pirate Bay itself is largely immune to prosecution — they are located in Sweden, where copyright law subjects them to at worst a $300 fine every time they’re arrested (which has happened more than once.)  For the most part, legal threats just amuse them.  However, they’re concerned about their downloaders — as without people sharing files, they cannot exist.

In addition to the legal issues, there is the issue with ISPs.  “Unlimited” low-cost home broadband survives because people generally use only the tiniest fraction of their upstream bandwidth.  Comcast allocates me, and everyone else in my area, 384 kbit/sec.  If I used this bandwith to full utilization for an entire month, I’d have uploaded 118 gigabytes.  This is actually quite a lot — by way of comparison, playing World of Warcraft 24/7 for an entire month would use only 1.2 megabytes, or 1% as much.  This is fine by Comcast, because most of their users are only surfing the web, using only a few hundred kilobytes per month.  If everyone used their entire allotment of 118 gigabytes, Comcast would have to raise rates tremendously — from the current $50 or so per month to probably 5 times as much (or more.)  Compare business Internet rates (which assume you are hosting servers, and thus upload a lot) with residential ones (which assume you almost always download and upload very little) to see the difference. Instead, the many light users subsidize the few heavy users.  BitTorrent, in which everyone helps take load off servers by uploading everything they download, often many times over, threatens this model — if everyone uploads, Internet rates will have to go way up.

Thus, ISPs often try to stop BitTorrent and other peer-to-peer systems.  They use copyright as an excuse, but really, they don’t care about copyright — they care about cost.  Your downloading costs very little.  Your uploading to other customers on the same ISP costs very little.  Your uploading to the Internet costs them quite a lot by comparison.  The most primitive way they’ve tried this is simple port-blocking — they ban connections to the port TCP/6119 (BitTorrent’s default) on all their customers PCs.  This doesn’t work very well — for one, it’s obvious (BitTorrent simply fails to function), and for another, BitTorrent doesn’t need to use any port in particular.  Due to the tracker, other peers can find you no matter what port you choose, so simply changing the default in your BitTorrent client gets around this.  Slightly less primitive is “traffic shaping” — the ISP slows traffic to the default port, or it inspects all traffic for BitTorrent headers and slows any packets showing them.  (The latter approach is much more expensive for the ISP, since it requires a deep inspection firewall on all traffic.)  Once again, changing port is easy.  In addition, some BitTorrent clients have added a header encryption feature to evade traffic shaping — this limits which peers are usable (specifically, to only other peers that support the header encryption), but evades the traffic shaping.  Comcast has recently been using the Sandvine intelligent traffic management system, which has caused some controversy since it actually impersonates the user and sends forged traffic on their behalf, in a further attempt to limit BitTorrent and other P2P traffic.

The above problems are inherent to BitTorrent, and at first, they seem inherent to all peer-to-peer systems.  However, the buccaneers of the Pirate Bay have come up with a rather ambitious plan to improve on BitTorrent, developing their own protocol to better suit their needs.  They’re still working on the specification (there’s a wiki up for suggestions), but I find it interesting the security and privacy issues they need to overcome.  At first glance, it seems the problems they must solve are the following:

  • How can people upload pirated files without their IP addresses being detected by groups like the MPAA and RIAA?
  • How can people hide the use of a file-sharing application so their ISP does not detect it and cut them off?

But that’s actually rather short-sighted, and the suggestions on the wiki seem to indicate that they’ve realized that, too.  Creating a new peer-to-peer protocol to replace BitTorrent for pirates requires not looking at the current attacks, but rather at the threats themselves.  The problem they really want to solve is simply to defend against these two threats:

  • Legal prosecution for uploading pirated files
  • ISP retribution for uploading large amounts of data

This is rather different!  What they want to avoid is not detection per se, but rather the current consequences of that detection.  In addition, they seek to address several technical/functional shortcomings of the BitTorrent protocol while they’re at it (such as that the tracker software does not scale to their traffic volume, and that upload bandwidth use in BitTorrent is suboptimal — many peers are not uploading anything.)

Right now, ISPs face no legal liability for transferring all this pirated media, since they are only content-indifferent carriers.  Thus, a system that allowed users to also be content-indifferent carriers (i.e. sharing data they did not choose to download as well as the files they acquire on purpose) might provide some legal protection.  The problem is that right now, users are from a legal standpoint sharing media they have, not simply transmitting media.  Thus, a system of “reflector nodes”, where the aforementioned suboptimal bandwidth use instead has the empty bandwidth filled by data relayed from other peers might work.  The ideal from an anonymity perspective would be onion routing, as performed by the TOR Project.  Unfortunately, this causes a serious growth in bandwidth requirements for all peers — basically defeating the purpose of BitTorrent.  Some balance must be found between true anonymity, as can be provided by a high-latency encrypted mix network with traffic-analysis resistance like TOR, and simple obfuscation, or even juggling around what is transmitted to be able to stick to the letter of the law while violating its spirit.  No one would believe that pirates don’t mean to transmit pirated software, the mix network just makes it look that way, but it doesn’t matter if anyone believes it so long as they can’t prove it beyond a reasonable doubt in a court of law.

Avoiding ISP retribution is a bit harder.  You can encrypt and use random ports, thus making detection impossible.  However, this causes a problem — if everyone does this, and everyone uses P2P, then everyone’s Internet rates go up!  This is hardly the desired outcome.  An ISP administrator has contributed some novel suggestions regarding changing the protocol to help ISPs save costs.  If the peer-to-peer system would deliberately prioritize other peers on the same ISP (ideally using WHOIS/ARIN data, though even simple CIDR subnets would help) for uploads, it could drastically reduce the ISP’s costs.  Napster provides a good example — during their heyday, when Napster pirated transfers were killing college networks, they worked with universities to institute just this type of solution.  The Napster client would look for other users at the same university to share with, only going to the Internet when this failed.  This type of solution — not fighting the method by which ISPs hurt P2P but rather fighting its motivation — is bound to work better.  It’s a good example of thinking about the threat, not about the particular vulnerability.  In addition, it’s probably the only way to fight things like Sandvine (which, due to the way it works, can’t be stopped by a BitTorrent client unless it went to full encryption with all the negative effects that has — lightweight ways to evade Sandvine require patching the TCP/IP stack and altering RFC-mandated behavior, which is doable by people willing to hack their OS but not something you can just bundle into your P2P software.)

Another issue that the Pirate Bay has is with fake files.  Sometimes, a user (either an RIAA/MPAA shill or just someone who likes being obnoxious) will upload a file of the approximate right size with a filename matching something new and popular (like a just-released movie or album) that contains no or bad data.  With nothing but the filename to go on, users download the fakes, causing the seed count to go up and making the fake appear even more “realistic” on the tracker — and hundreds of gigabytes of bandwidth are wasted.  Currently, the only thing to be done about this is to look at the uploader and ensure he is someone trusted, but identity is impossible to verify.  Some sort of digital signature/PKI system would be very helpful here.

Overall, it will be very interesting to see what they come up with.  Like all open-source projects, it may or may not actually get off the ground, and pirates are of course not well-known for their altruistic contributions.  However, it’s not likely the BitTorrent creators (who don’t get any money from pirates) will work on these problems, so it falls to people like the Pirate Bay to try.  Even if you don’t want pirated media, the resultant system could be useful for a host of purposes — the same technologies being used for fighting piracy and cutting ISP bills in the United States are used for hunting down dissidents and limiting free access to information in totalitarian nations.  In addition, a sufficiently large peering system with deep storage and forced reflectors (i.e. people sharing data they did not specifically choose to download or share) could result in a sort of distributed information well in which any human knowledge could be stored for easy access and rendered almost indestructible.  Criminals have been putting legitimate technologies to underhanded uses for centuries — an illegitimate technology can be put to beneficial uses as well.

Nov 02 2007

Do Not Track Lists: Good Luck With That

Posted by Grant Bugher

The New York Times reports that people will be able to sign up for “do-not-track” lists to prevent online advertisers from monitoring their activities.  It is not clear from the article if they’re expecting a government solution, along the lines of the National Do Not Call Registry for telemarketers, or merely solutions from ISPs and advertisers themselves.

Unfortunately, there is a slight problem with either solution: it’s pretty much impossible.

First, a bit about how ad networks work.  Whenever your browser loads a page with a banner or text ad on it, the page contains a link to the ad network’s web server telling it to load the ad.  As it does with any site, your browser first checks to see if it has a cookie recorded for that site.  If it’s the first time you’ve ever visited that ad network, then it does not; if you have visited before, then there is a unique ID number for you in the cookie.  The browser then sends a request to the ad network, along with a cookie (if any) and a referrer header (saying what page the ad was loaded from.)

The ad network site then looks up the ID in the cookie.  This ID is linked with a list of all the referrer headers it’s ever received from you — this is the “tracking” component.  It adds the new referrer header to the list, and then uses the list to try to puzzle out what sort of things you like and pick the ad it thinks you’re most likely to click on.  It then returns that ad.  If no cookie was received from you, it also creates an ID for you and sends that so as to set the cookie for next time.

That’s pretty much all it does.  There are variants, which also use script to inspect the pages you linked from and use that to make better predictions of what you want to see adds for, but the overall effect is the same.  The ad network doesn’t know who you are, or any demographic info about you — all it knows is that some person with a random ID has visited a specific list of sites.  In addition, there’s a simple way to dump all that tracking information — tell your browser to delete all the cookies (or just the ones for ad networks.)  Whenever you do this, the ad networks will all think you’re a “new” person and provide you with a new ID number.

So, how do we stop the ad tracking (should you even really want to)?  I can see a few possibilities, but all have some significant difficulties associated with them:

1.) Set a cookie that essentially sets your ID as “don’t track me, use random ads instead.”  Whenever you visit an ad network, this “do-not-track” ID is sent, and the ad network sends you back a random ad without bothering to record your referrer.  Issues: due to the same-site rule, this cookie must be set by each ad network itself.  So there’s no common registry — you have to opt out with each ad network, and then trust each ad network to continue to obey the opt-out.

2.) Install an app or modify the browser to dump cookies.  Works great; no more tracking.  Issues: also breaks half of the Web.  If you allow even per-session cookies, some limited tracking is possible, and if you don’t allow session cookies, you break pretty much all of the Web.

3.) Have your ISP scan all your web traffic, find cookies that are going to ad networks, and strip only those.  This makes the web work normally while killing ad networks.  Issues: requires all the ISPs offering this sort of technology to keep track of every ad network in the world so they know which cookies to block.  What about single-site ad networks? (e.g. the New York Times tracking which articles on their site you read and targeting ads based on those.)  There are probably tens of thousands of them.

Also, the above three examples are only pointing out issues when ad networks are not malicious – that is, they want to allow you to opt out if you so desire.  If they are hostile, then they can work around any of the above options.  They can simply disregard the do-not-track cookies and set a different ID, or track you via codes embedded in image tags.  The latter method is inferior, since it does not persist across sessions (it forgets who you are whenever you close your browser) without the cooperation of the actual sites the ads are on, but it does still allow some tracking capability.  Affiliate networks are constantly advertising and improving their “cookieless traffic” capabilities.

Of course, if the government cares to get involved, it can simply mandate that all ad networks offer an opt-out, and pursue legal action against any who don’t, or who evade their own opt-out systems.  However, what it can’t do is offer a centralized list like the Do Not Call Registry.  After all, the ad networks do not know who you are – they only know you are some random ID number who has visited various sites in the past.  Thus, they have no way to check against a list and see if you’re on it.  And since cookies can only be sent to the site they came from, the government site can’t set some kind of master “do-not-track” cookie — your browser would refuse to send the cookie to any ad networks!

However, before instituting a system like this at all, we should perhaps consider the unintended consequences.  The reason that ad networks institute tracking is that targeted ads are more valuable to advertisers than random ones.  A car company would rather show ads to car buffs than to people who don’t drive, and it will pay more for ads it knows are going to interested parties.  Thus, if ad networks cannot target ads with tracking, they will have to charge less for ads.  This means that sites will get paid less per ad for placing ad network links on their sites.  Therefore, eliminating ad network tracking means sites will have to carry more ads.  Is “more ads” really what we want here?  Are we willing to accept more ads to ditch the tracking?  How big a privacy threat is this, anyway?  There are people I don’t want to track my web surfing, certainly, but DoubleClick and Aquantive are not the people I’m thinking of here.  Perhaps what we need is not a way to opt out of ad tracking, but more limits on who can get that data?  Were ad tracking data illegal to resell and not admissible in court, would we care about it at all?  I’m not sure that I would.

Of course, much of this is moot if instead of opting out of the tracking systems, you just “opt out” of the ad networks altogether, either with a plugin like AdBlock (which advertisers hate) or a custom hosts file.  It doesn’t get 100% of the networks, of course, but it sure gets a lot of them.