Anonymity with TOR and its limits

The post at the Unwired Video Blog about TOR has been getting a lot of publicity, having been linked to by both Lifehacker and Boing Boing. It provides a quick overview of TOR, how it works, and how to use it to browse the Web anonymously.

This is a good thing; people using services like this does help protect their privacy and anonymity, and due to how TOR works the more people that use it, the more secure it becomes (indeed, the Navy, who developed TOR, released it publicly because they realized that if only the military used it, it was worthless.) Most of all, if normal, everyday people value and use anonymity and privacy services, it shows policymakers that anonymity is a social good desirable to all and not something that people only want when they have something to hide.

The argument against anonymity, however, is that it can be used to cover up crimes. Will something like TOR protect criminals? How do we track down a malicious hacker if the attack comes from a TOR node?

There are actually quite a few ways TOR leaks information, and really they’re all centered around idea — one cannot simply “be secure,” one must be secure from something. TOR protects against some attacks and adversaries but not against others.

TOR (“The Onion Router”) provides anonymity by encrypting your traffic multiple times and routing it through TOR nodes. Loading the Amazon.com home page through TOR would look like this:

  1. Your computer contacts two TOR nodes, which I’ll call A and B, and requests their public keys.
  2. Your computer forms the web request to Amazon.com.
  3. It encrypts the web request in key B, then encrypts the result (along with the address of B) in key A.
  4. The packet is sent to node A.
  5. Node A, which has the private portion of key A, decrypts the packet. Inside is another address (that of B) and an encrypted blob. Thus, Node A knows you you are, but it doesn’t know what you’re transmitting, or who you’re sending it to. It forwards the blob to Node B.
  6. Node B, which has the private portion of key B, decrypts the packet. Inside is your transmission to Amazon.com, which by its nature says where it should be sent. Thus, Node B knows what you’re transmitting, and who you’re transmitting it to, but has no idea who you are or where the packet came from. Node B sends the packet to Amazon.com.
  7. Amazon.com gets the packet and replies, sending the reply to Node B. Note that Amazon.co, like Node B, has no idea who you are or where the packet came from.
  8. Node B gets the reply and forwards it to Node A.
  9. Node A gets the reply and forwards it to you.

This is simplified; there’s additional encryption so the nodes can’t all read the reply as it makes its way back to you, and there can be more than two nodes in the chain (in which case the intermediate nodes know even less about the transmission than A and B above.) However, the above is the simplest case, and shows how much each part of the chain knows and doesn’t know.

The primary adversary TOR is designed to protect against is the actual site you’re browsing. It hides your IP address (which, with a subpoena or some social engineering to your ISP, can be tied to you personally) from the target site, so that the site does not know who is visiting it. The obvious counter to this, though, is for the site to apply a cookie to your browser when you visit it, such that it “recognizes” you on subsequent visits. TOR alone will not protect against this, which is why TOR is almost always packaged with Privoxy, a filtering proxy that runs on your own computer, examines all your web traffic, and strips out data that can be used to identify you. Here’s the first weakness in TOR — it can only strip out so much.

Web traffic is stateless; each web request is not tied to any other in any persistent way. When you load a web page, your browser nearly-simultaneously requests the page and all the images, media, embedded frames, ads, etc. on the page. When you click a link, the server has entirely forgotten who you are — it’s a totally different page load. This would make any kind of integrated experience impossible (the web was originally designed just to serve up static reference pages, not implement shopping carts), but for cookies. The web server sets a session cookie (a cookie that is deleted when you close your browser) when you load the first page, and uses that to track your movement through the site. There’s nothing menacing about this — it is not “tracking” in any Big Brother-ish way, it’s just linking all your page loads together to provide a sense of state or flow — and the web doesn’t work without it. Thus, Privoxy has to let these session cookies through. This can leak a little bit of information about you.

How is this anonymity defeated? In the simplest case, the user leaks the data on their own! A web request contains a decent amount of information about you (what browser you’re using, your operating system) as well as any cookies the user sends. Privoxy strips most of this, but if the user uses TOR alone, then the fact that the endpoint can’t see your IP may be irrelevant — you just told it who you are anyway. Likewise, browsing a site that requires login (like a webmail provider) through TOR is plain silly — it’s obvious who you are, you just logged into the site. This is true even if you’re worried about investigation not from the site but from authorities, spies, or other hackers — the webmail site logs that you connected, and it probably logs everywhere you’ve ever connected from. Thus, using a site with login through TOR only provides anonymity if you never use that site except through a TOR connection. Otherwise, your communication can be correlated over time.

Also, it’s possible to make an end run around TOR. If someone simply hacks into your computer (or seizes it, in the case of legal authorities), they don’t need to have logs from the other end to know what you’ve been doing; your own computer probably has records of your activity. Deleting them usually does little good in the legal case — modern data recovery can get deleted data quite easily. To avoid this, it’s necessary to have a computer that simply doesn’t keep any records — and this is hard to do in a normal Windows or Linux system (for one, they both arbitrarily swap portions of memory to disk during normal use.) For better anonymity, it’s necessary to boot a LiveCD environment (i.e. an OS with no hard drive or writeable media), where all evidence is destroyed when the computer is powered down. But unless your oppressive dictatorial nation’s secret police are after you, this is probably excessive when it comes to normal protection of privacy and anonymity.

Finally, there’s one more attack through TOR that’s more troubling than either of the above — anyone can be a TOR node. “Node B” (the exit node) in the example above gets to see your traffic, in both directions — it just doesn’t know your IP address. You are, thus, trusting a complete stranger somewhere on the Internet with your traffic, complete with the ability to carry out man-in-the-middle attacks on you (i.e. maybe he doesn’t forward your traffic to Amazon.com at all, but rather a fake site of his own design; or maybe he edits the traffic to add a virus or Trojan.) This actually happens; Bruce Schneier linked to some logs of a TOR exit node trying to carry out a MitM on an SSL session. So while TOR protects your anonymity, it may actually risk your privacy — it’s very hard to carry out MitM attacks on random Internet users, but doing it as a malicious TOR exit node is comparatively easy.

Another thing to consider: there are only so many TOR exit nodes. There are few enough of them that if, say, the NSA, or the RIAA/MPAA, wanted to, they could set up hundreds of exit nodes, all of which spy on traffic, and have a set of nodes large enough to comprise a substantial portion of the TOR network. If one agency controlled, say, 10% of the exit nodes, their ability to figure out who you are would be pretty significant. If they controlled normal nodes as well (even easier), they might even get lucky and get both the incoming and exit communications on their hostile network, allowing them to completely monitor your traffic.

TOR is meant to protect your anonymity from the site you’re browsing. It does this pretty well, as long as you’re reasonably careful, don’t browse sites that require you to identify yourself, and use a cookie-filtering proxy like Privoxy. However, it is not meant to provide privacy from TOR node operators, and thus it does not. You can have privacy, or anonymity, but you can’t have both at the same time in a perfect fashion. (Even using open WiFi access points with an obscured MAC address provides anonymity but not privacy — the operator of the access point can do everything a TOR exit node operator can do to you and more.

Overall, it’s a valuable tool, but if someone wants to track you down badly enough, and they have the resources or authority, they can still do so. This is why criminals aren’t out there committing heists with TOR every day; if you do something bad enough, it won’t protect you. Of course, most computer criminals aren’t caught due to malicious TOR exit nodes or anything so arcane — they’re caught because they brag about their accomplishments, or because investigators follow the money. Even hackers that excel in covering digital tracks thankfully usually have no experience in money laundering.

anonymity, attacks, crypto

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.