Archive for December, 2007»
A Bit About DNS
The Domain Name System is generally taken for granted. You put in a name, like perimetergrid.com, and you get back an IP address (at the time of this post, 66.33.198.185.) The addresses change sometimes, but it just works.
However, it’s taken for granted so often that sometimes big security consequences lurk within. I’m not going to talk about DNS rebinding — that’s a relatively big issue, and a relatively new attack (and if you want to know a ton about it, you should read Dan Kaminsky’s blog; he’s a master of weird DNS tricks and a great speaker.) No, I’m talking about much simpler things.
But first, a bit about how a DNS lookup works. Imagine you’re sitting on your home computer, with Comcast as your ISP. You want to use Google’s search engine. You type “www.google.com” into your web browser. What happens?
- Your computer checks its hosts file and local DNS cache to see if it happens to know the IP address for “www.google.com.” If it does, it makes sure that the TTL (Time To Live) on the cache entry hasn’t expired. Assuming there’s a current entry, you’re done here — the web browser connects to the cached IP. But if not…
- Your computer has a record of Comcast’s DNS server IPs. It got these along with your IP when you connected to the Internet, and there are always at least two. It contacts this IP (at the time of this post, 68.87.69.146) and asks for www.google.com in a UDP packet.
- Comcast’s DNS server has a cache, too. If it happens to know www.google.com, it returns it to you and you’re done.
- If it doesn’t, it needs to find a DNS server that does. It checks for the top-level domain (google.com) and checks to see if it knows the authoritative DNS for that name. If so, it forwards the request there. If not, it goes up another level (.com) — every DNS server knows the IPs of a small core of top-level domain name servers (administered by ICANN), and so it forwards the request there. It does the forwarding itself rather than telling you to do it so that it can cache the result to improve performance next time.
- The top-level domain name server recieves a request for www.google.com. These top-level servers don’t cache and don’t know any subdomains — so all it can tell you is the authoritative DNS for google.com. In the case of large companies like Google, this will be one of Google’s own servers. For smaller websites like this one, it will be a server owned by the ISP hosting the site. (Of course, it doesn’t tell you, since you didn’t ask it — it tells the domain name server that forwarded it the request.)
- The request now reaches the DNS for google.com, and asks for the IP for the server “www.” Google’s DNS knows this, and returns it — finally your browser can load the page you want.
That whole process happens nearly instantaneously — sometimes a dozen times to load a single web page. Every time a name is returned, it is accompanies by a TTL, which tells the servers (and your PC) how long they are allowed to cache the results before asking again (due to load-balancing and dynamic IPs, DNS results aren’t usually good forever, and sometimes are only good for a few minutes.) In addition, DNS can return three major different kinds of records:
- A records: This simply says “this IP goes with this server.” They are the “normal” DNS responses.
- MX records: These state what the mail server is for a domain. This is why you can send mail to someone at a top-level domain without having to know the name of their actual mail server — DNS will ask the domain what the actual mail server is.
- CNAME records: Standing for “canonical name”, this tells the requesting server that the name it’s asked for (say “ftp” or “www”) is just an alias for another server, and it returns the DNS name of the actual server.
(DNS can actually return at least 63 other kinds of records, but they’re beyond the scope of “a bit about DNS” and some are barely ever used at all.)
The system was not designed for security. It does not have much in the way of security features (though there is an effort to design a new, secure DNS.) However, it does have at least two of major security options — you can restrict who is permitted to perform a “zone transfer” (essentially a special DNS lookup that tells the authoritative DNS for a domain “tell me everything you know about the domain,” meant to be used for DNS servers that mirror each other), and you can restrict who is permitted to perform a recursive lookup. What would happen if you left those features off?
With zone transfers, it’s relatively obvious (at least compared to the recursion restriction) — if a hacker could ask for a zone transfer, they would know the names and IP addresses of every server in the domain. This can tell you a lot about how the internal network is organized! It tells you what targets to attack, how many servers there are, sometimes what the servers are for (if they have descriptive names), etc. It’s enough information that trying a zone transfer is usually the first step in information gathering for an attack — it almost never works, but if it does, it’s a wealth of information.
Recursive lookups are not so obvious in their security implications. A recursive lookup is what your ISP’s DNS does for you when you try to go to a website — it sees if it has the target in its cache, and if not, forwards the request up and down the chain until it gets it. It won’t get stuck in infinite loops or anything — if the top-level domain name servers or the reported authoritative DNS say the name isn’t found, the lookup stops. In a proper configuration, only people within the domain can perform a recursive lookup of a site outside the domain; people outside should only be able to look up sites within the domain. If I’m not on Google’s network, how would my request for anything other than a Google machine end up at their DNS server? It shouldn’t happen.
So what’s the harm if it does? Actually, a lot. There are two implications here: DOS amplification and cache poisoning.
In DOS amplification, a hacker wants to take down some remote IP address. DNS lookups are tiny UDP packets (about 75 bytes), and require no authentication — a DNS server will respond to anyone, anytime. If the hacker sets up a program to forge UDP packets claiming to be “from” my target to a DNS server, it will respond with packets “to” my target that are potentially much, much larger than my little UDP packet. If he has a malicious DNS server to make the lookups to, there could potentially be a 4 kilobyte response to a simple lookup. Due to the caching, any DNS server enlisted in this scheme will only hit the malicious DNS once — after that, it will amplify 75 bytes into 4000 every time it’s asked! With this, one hacker on a 384 Kb cable modem uplink can bring down a web server with a 20-megabit line. While this isn’t an attack on the person running the badly-configured DNS server, it’s not the kind of public service a network admin wants to provide.
Cache poisoning, however, it is an attack on the company with the badly-configured DNS server. This time, the hacker sets up his own google.com — or more likely, his own citibank.com or amazon.com. It’s probably a proxy of the real thing, only with a bit of spying going on to steal information, or site modification to install malware. He then issues a huge number of DNS lookups to the target DNS server, all requesting the site he wants to hijack, like google.com
For each one, the target DNS dutifully goes to the top-level domain servers, gets google.com’s IP, and asks the DNS server for the DNS record. Google will reply, along with a sequence number that lets the DNS know which request this response matches.
However, at the same time the hacker sent all his requests, he also sends a similarly huge number of replies, all spoofed to be “from” the authoritative DNS being queries, with random sequence numbers. This makes use of a “birthday attack” — guessing a single 16-bit sequence number is very hard, but if you make 300 guesses and there are 300 right answers, your chances of hitting one are actually 50%. 800 packets is effectively certainty. This isn’t quite as easy as it sounds, since the attacker’s replies have to get there before the real ones. But they’re 75-byte UDP packets, so they’re fast, and a determined attacker will sometimes DoS the real DNS at the same time to slow it down.
Now the target’s DNS server has cached a false record for google.com or whatever other domain was chosen — and that record probably has an insanely long TTL (say, weeks) so another lookup won’t happen for quite a while. Every time a legitimate user of the domain tries to go to the chosen site, they get directed, transparently, to the hacker. It says “google.com” in the address bar of their browser, because the browser knows that’s where they are — it trusts DNS, and the user doesn’t want to know about IP addresses anyway.
Correct configuration of DNS servers is vital to the secure operation of the Internet — the smallest errors have huge consequences, because DNS is the trusted foundation on which all other communication relies.
Flash and the Same-Origin Policy
Web browsers protect the user from attacks largely through the same-origin policy: any code from one web site (such as HTML pages or JavaScript) is not permitted to interact with any code from another web site. I can make a web page that embeds a Hotmail window in the middle of it (with an IFRAME), and you’ll see your Hotmail in the window — but my page’s script is not permitted to read or write what’s in that window. Without the protection of the same-origin policy, using the Web for commerce or anything at all sensitive would be impossible — you would never know when a site you access might try to log into your bank, or your email, without you even noticing. Since your browser supplies your cookies to sites automatically, if a site could script against another, it could do so as you with all your privileges. Luckily, all browsers do enforce the same-origin policy. But what about things that aren’t browsers, but live inside them? A lot of rich content on the web, including many advertisements and all the videos on YouTube, are actually Adobe Flash applications. Others are Java applets. Though they’re embedded in your browser, they’re not the browser itself — so the browser can’t enforce the same-origin policy on them. However, we’re still safe, because Adobe and Sun have been smart enough to build the same-origin policy into the Flash and Java runtimes. They also check to make sure a web page can communicate only with the site it came from, not other sites. However, Flash and Java provide many things a regular browser cannot, and they’re frequently used in enterprise applications where cross-site communication is desirable. In addition, sometimes you actually want other sites to be able to script against you. If you’re a public web service and you want your components to work in mashups, you have to allow some cross-site access. To enable this, Flash allows web servers to place a file called crossdomain.xml on their server, which contains XML that looks something like this:
<cross-domain-policy>
<allow-access-from domain=”www.domain1.com”>
<allow-access-from domain=”www.domain2.com”>
</cross-domain-policy>
With this code present on a site, that site becomes accessible to any Flash application on domain1.com or domain2.com. Wildcards are allowed, including putting in the domain “*”, which means any site on the Internet can script against it. This is a legitimate thing to do if your site is a public API without authentication (e.g. Google Maps.)
However, it’s quite dangerous to do to a site that isn’t fully trusted. It is critical that crossdomain.xml not allow any more sites than necessary, because of the risk that relaxing the same-origin policy entails. If, say, an online bank were foolish enough to allow “*” or some easily-manipulated domain (i.e. one with a lot of user-uploaded content, like a social network or a forum), then anyone able to add content to that domain could upload a Flash applet that would connect to the bank as the user, using the user’s cookies, and perform whatever tasks it wanted — invisibly, with no sign to the user anything is going on. (Just because Flash apps usually have an appearance onscreen and are used to render graphics doesn’t mean they have to be; a Flash app is just a program.)
However, there are two serious problems with this method of relaxing the same-origin policy — and either of these can allow a malicious website to “relax” the policy of another site against its will. In each case, it involves combining another well-known attack (cross-site scripting in one case, DNS rebinding in the other) with the Flash security model to produce a vulnerability.
Cross-site scripting is the term for any vulnerability where user input is echoed back to the user (either the same user or a different user) from the site without proper sanitization. For instance, if I ask the user for his name, and he answers “<script language=”JavaScript”>alert(”Hello!”);</script>”, and from this I create a web page that says “Hello (name)” to him, he’ll get a pop-up on screen, because the “name” is actually code that, when it comes from the web server, is executed. This is a problem for the same-site rule because that code comes from the web server — instead of just popping up a dialog, it could have manipulated the web site and performed tasks on the user’s behalf! According to the same-origin policy, it’s “safe.” Now, in this scenario this sounds pretty harmless — after all, he’s attacking himself — but what if another page on the site allows any user to see a list of everyone’s name? Or what if it’s a forum and the user’s name is displayed on every post? Now that attacker’s code is running for everyone, in each case as themselves and able to take actions on their behalf.
How does this relate to Flash? Well, the crossdomain.xml file can be located anywhere on the server — the Flash applet chooses where to load it from. So if I can use a cross-site scripting attack to make a file on the web server that looks kind of like a crossdomain.xml file, I can tell my malicious Flash applet to load and apply that policy. (The filename doesn’t have to be crossdomain.xml — it can be kittens.jpg if I want. As long as it’s on that server, it can grant access to that server if the Flash applet knows where to find it.) There’s a good illustration of this attack on the Hardened PHP Project.
The other, and far scarier, attack is to use DNS Rebinding. The same-origin policy means you can only load from the site with the same domain name, like perimetergrid.com. But pages really load from IP addresses, not names. The DNS system translates names to addresses. However, the DNS system is federated — there’s no one master library of all DNS names. When you try to go to a site, your computer asks your ISP’s DNS server for its IP address, and the request is forwarded to the authoritative DNS server for that name. That DNS server then replies with the IP. If I wanted to (I don’t), I could set up my own DNS server that I control, have the root DNS point perimetergrid.com there, and then have my DNS server respond to any lookup up my site with any IP I wanted. The web browser would then load that page, and proudly display it as “perimetergrid.com,” because it trusts DNS.
Malicious DNS servers can do many horrible things. Imagine this process:
- I load up a web forum. Someone has made a post with an embedded Flash applet.
- The Flash applet loads from evil.com, and then it gets a crossdomain policy from evil.com (IP 6.6.6.6. Note that I don’t mean the real web site “evil.com”, but am just using the name to stand in for some malicious site.)
- Evil.com, which is a malicious site with a malicious DNS server, responds with a crossdomain.xml that allows script from “*”. Now my Flash applet is allowed to script against evil.com.
- The Flash applet now tries to load a web page from evil.com again. However, the malicious DNS server instead returns the IP address of your mail server, or your bank, or somesuch.
- The Flash applet on your computer loads the page right up and can script against it. After all, it’s just evil.com, which it knows from step 3 is safe for scripting. It gets the data it wants, using your credentials, without your knowledge.
- The Flash applet sends the data back to evil.com — which this time returned its real IP address so it could receive the communication.
This is really hard to defend against. We assume that DNS can be trusted, but DNS was not designed with security in mind and will never be secure. Evil.com could even have returned IP addresses inside your local network, behind your firewall and the browser and Flash applet would dutifully access them.
There are some techniques called “DNS pinning” that help mitigate this, by not allowing the DNS to return different IPs in rapid succession. The problem is that they break load-balancing — when you access a major online property with hundreds of servers, your request probably really is handled by many servers, all of which have the same name. Breaking this attack also breaks Google and Microsoft and Facebook.
Luckily, Adobe is aware of the issues and in Flash 9 has some mitigations proposed, including forcing socket access to get cross-domain policy by IP rather than by name. There’s a full whitepaper about it on Adobe’s site that’s a good read; Adobe is quite security conscious and has a mature security model for Flash, it’s just very hard to stop these sorts of design flaws in the web. Restricting socket access will help a lot — at least a malicious app won’t be able to port-scan behind your firewall and perform network attacks, though it could still browse web pages. This is an arms race between attackers and software companies that will continue quite a while.
The post at the Unwired Video Blog about TOR has been getting a lot of publicity, having been linked to by both Lifehacker and Boing Boing. It provides a quick overview of TOR, how it works, and how to use it to browse the Web anonymously.
This is a good thing; people using services like this does help protect their privacy and anonymity, and due to how TOR works the more people that use it, the more secure it becomes (indeed, the Navy, who developed TOR, released it publicly because they realized that if only the military used it, it was worthless.) Most of all, if normal, everyday people value and use anonymity and privacy services, it shows policymakers that anonymity is a social good desirable to all and not something that people only want when they have something to hide.
The argument against anonymity, however, is that it can be used to cover up crimes. Will something like TOR protect criminals? How do we track down a malicious hacker if the attack comes from a TOR node?
There are actually quite a few ways TOR leaks information, and really they’re all centered around idea — one cannot simply “be secure,” one must be secure from something. TOR protects against some attacks and adversaries but not against others.
TOR (”The Onion Router”) provides anonymity by encrypting your traffic multiple times and routing it through TOR nodes. Loading the Amazon.com home page through TOR would look like this:
- Your computer contacts two TOR nodes, which I’ll call A and B, and requests their public keys.
- Your computer forms the web request to Amazon.com.
- It encrypts the web request in key B, then encrypts the result (along with the address of B) in key A.
- The packet is sent to node A.
- Node A, which has the private portion of key A, decrypts the packet. Inside is another address (that of B) and an encrypted blob. Thus, Node A knows you you are, but it doesn’t know what you’re transmitting, or who you’re sending it to. It forwards the blob to Node B.
- Node B, which has the private portion of key B, decrypts the packet. Inside is your transmission to Amazon.com, which by its nature says where it should be sent. Thus, Node B knows what you’re transmitting, and who you’re transmitting it to, but has no idea who you are or where the packet came from. Node B sends the packet to Amazon.com.
- Amazon.com gets the packet and replies, sending the reply to Node B. Note that Amazon.co, like Node B, has no idea who you are or where the packet came from.
- Node B gets the reply and forwards it to Node A.
- Node A gets the reply and forwards it to you.
This is simplified; there’s additional encryption so the nodes can’t all read the reply as it makes its way back to you, and there can be more than two nodes in the chain (in which case the intermediate nodes know even less about the transmission than A and B above.) However, the above is the simplest case, and shows how much each part of the chain knows and doesn’t know.
The primary adversary TOR is designed to protect against is the actual site you’re browsing. It hides your IP address (which, with a subpoena or some social engineering to your ISP, can be tied to you personally) from the target site, so that the site does not know who is visiting it. The obvious counter to this, though, is for the site to apply a cookie to your browser when you visit it, such that it “recognizes” you on subsequent visits. TOR alone will not protect against this, which is why TOR is almost always packaged with Privoxy, a filtering proxy that runs on your own computer, examines all your web traffic, and strips out data that can be used to identify you. Here’s the first weakness in TOR — it can only strip out so much.
Web traffic is stateless; each web request is not tied to any other in any persistent way. When you load a web page, your browser nearly-simultaneously requests the page and all the images, media, embedded frames, ads, etc. on the page. When you click a link, the server has entirely forgotten who you are — it’s a totally different page load. This would make any kind of integrated experience impossible (the web was originally designed just to serve up static reference pages, not implement shopping carts), but for cookies. The web server sets a session cookie (a cookie that is deleted when you close your browser) when you load the first page, and uses that to track your movement through the site. There’s nothing menacing about this — it is not “tracking” in any Big Brother-ish way, it’s just linking all your page loads together to provide a sense of state or flow — and the web doesn’t work without it. Thus, Privoxy has to let these session cookies through. This can leak a little bit of information about you.
How is this anonymity defeated? In the simplest case, the user leaks the data on their own! A web request contains a decent amount of information about you (what browser you’re using, your operating system) as well as any cookies the user sends. Privoxy strips most of this, but if the user uses TOR alone, then the fact that the endpoint can’t see your IP may be irrelevant — you just told it who you are anyway. Likewise, browsing a site that requires login (like a webmail provider) through TOR is plain silly — it’s obvious who you are, you just logged into the site. This is true even if you’re worried about investigation not from the site but from authorities, spies, or other hackers — the webmail site logs that you connected, and it probably logs everywhere you’ve ever connected from. Thus, using a site with login through TOR only provides anonymity if you never use that site except through a TOR connection. Otherwise, your communication can be correlated over time.
Also, it’s possible to make an end run around TOR. If someone simply hacks into your computer (or seizes it, in the case of legal authorities), they don’t need to have logs from the other end to know what you’ve been doing; your own computer probably has records of your activity. Deleting them usually does little good in the legal case — modern data recovery can get deleted data quite easily. To avoid this, it’s necessary to have a computer that simply doesn’t keep any records — and this is hard to do in a normal Windows or Linux system (for one, they both arbitrarily swap portions of memory to disk during normal use.) For better anonymity, it’s necessary to boot a LiveCD environment (i.e. an OS with no hard drive or writeable media), where all evidence is destroyed when the computer is powered down. But unless your oppressive dictatorial nation’s secret police are after you, this is probably excessive when it comes to normal protection of privacy and anonymity.
Finally, there’s one more attack through TOR that’s more troubling than either of the above — anyone can be a TOR node. “Node B” (the exit node) in the example above gets to see your traffic, in both directions — it just doesn’t know your IP address. You are, thus, trusting a complete stranger somewhere on the Internet with your traffic, complete with the ability to carry out man-in-the-middle attacks on you (i.e. maybe he doesn’t forward your traffic to Amazon.com at all, but rather a fake site of his own design; or maybe he edits the traffic to add a virus or Trojan.) This actually happens; Bruce Schneier linked to some logs of a TOR exit node trying to carry out a MitM on an SSL session. So while TOR protects your anonymity, it may actually risk your privacy — it’s very hard to carry out MitM attacks on random Internet users, but doing it as a malicious TOR exit node is comparatively easy.
Another thing to consider: there are only so many TOR exit nodes. There are few enough of them that if, say, the NSA, or the RIAA/MPAA, wanted to, they could set up hundreds of exit nodes, all of which spy on traffic, and have a set of nodes large enough to comprise a substantial portion of the TOR network. If one agency controlled, say, 10% of the exit nodes, their ability to figure out who you are would be pretty significant. If they controlled normal nodes as well (even easier), they might even get lucky and get both the incoming and exit communications on their hostile network, allowing them to completely monitor your traffic.
TOR is meant to protect your anonymity from the site you’re browsing. It does this pretty well, as long as you’re reasonably careful, don’t browse sites that require you to identify yourself, and use a cookie-filtering proxy like Privoxy. However, it is not meant to provide privacy from TOR node operators, and thus it does not. You can have privacy, or anonymity, but you can’t have both at the same time in a perfect fashion. (Even using open WiFi access points with an obscured MAC address provides anonymity but not privacy — the operator of the access point can do everything a TOR exit node operator can do to you and more.
Overall, it’s a valuable tool, but if someone wants to track you down badly enough, and they have the resources or authority, they can still do so. This is why criminals aren’t out there committing heists with TOR every day; if you do something bad enough, it won’t protect you. Of course, most computer criminals aren’t caught due to malicious TOR exit nodes or anything so arcane — they’re caught because they brag about their accomplishments, or because investigators follow the money. Even hackers that excel in covering digital tracks thankfully usually have no experience in money laundering.
New Legislation: SAFE and PRO IP
There has been some controversy over two new security-related bills in the United States Congress right now: the SAFE Act and PRO IP.
The SAFE Act (Secure Adolescents From Exploitation Online; another case where the acronym almost certainly came first) aims to protect children and teenagers from exploitation by increasing enforcement of child pornography laws. Not, on the surface of it, a bad thing. The controversy comes from its means: it requires anyone operating an internet service to report not just actual child pornography, but also fully-clothed minors in “lascivious poses” (whatever that means) and any “drawing, cartoon, sculpture, or painting” consisting of an obscene depiction of minors. This troubles people for two reasons: first of all, due to the vagueness of what is prohibited (can you tell if a drawing, cartoon, sculpture, or painting is of a 17-year-old or an 18-year-old?), and second, because of the apparent requirement that providers monitor all their traffic in order to make these reports.
According to C|Net News, the monitoring requirement would apply to anyone providing an open Wi-Fi node, such as coffee shops, restaurants, and even homes that simply don’t choose to encrypt their Wi-Fi, in addition to social networking sites, web-based email providers, domain name registrars, etc. Were the bill interpreted in this way, this would place an impossible burden on any provider of connectivity — there is no automated way to scan the traffic of all your subscribers for vaguely-defined unlawful depictions of fictional minors, you would need to have a person manually inspect all the traffic, which is obviously impossible at any scale (not to mention a terrible privacy invasion.)
However, I think that this is an overly alarmist reading of the bill. It’s certainly not the author’s intent (indeed, Rep. Rick Lampson’s office has responded to the C|Net article) for the bill to apply to every small Wi-Fi provider, though author’s intent is often beside the point once a law is passed. More importantly, though, the bill does not mandate surveillance or detection at all — it mandates reporting if child pornography (or something that kind of sort of looks like it) is detected. In other words, it forbids finding out about illegal activity and looking the other way; it does not mandate actually looking for it. I think that Ars Technica has a much more balanced article about the bill. Overall, I think it’s feel-good “for the children” legislation that won’t accomplish much (ISP’s are already required by law to report child pornography if they detect it, this just raises the penalties and expands the definition), and that prohibiting fictional depictions of children where no actual children are involved is a poor idea from a legal standpoint (since it is very open to abuse by subjective interpretations of judges, prosecutors, and jurors), but that this bill, if it passes — which is likely — will not impose a serious technical burden on service providers.
Meanwhile, the Electronic Frontier Foundation reports on the PRO IP Act (”Prioritizing Resources and Organization for Intellectual Property (PRO IP) Act of 2007″ — doesn’t anyone ever just name a bill and then come up with the acronym anymore?), which aims to fight copyright infringement in the typical ineffective way, presumably to shore up the music industry’s failing business model. It increases penalties for peer-to-peer file sharing from their current ridiculous levels (which build animosity toward the recording industry via outlandish million-dollar damages levied against ordinary university students) to new even more ridiculous levels, while also creating a new $25 million federal bureaucracy to step up copyright enforcement.
Having a copyright system is important. However, you would think that by now the music industry would realize that if suing customers for $250,000 does not stop piracy, the problem is not that they’re not suing them for enough money, and stepping up the penalties will have no effect. People believe either a.) that they’re not doing anything wrong or illegal, or b.) that they’re extremely unlikely to get caught (this latter belief being true.) In order to change this, they’ll need to either offer a legal alternative that at least approaches the convenience and usability of illegal downloading (which you would think would not be a high bar — BitTorrent is not very convenient) and is affordable for broad categories of consumers, or they’ll need to decrease the penalties while increasing the percentage of people who get caught.
With regard to the former, coming up with a pricing model seems to be their stumbling block. Some customers buy several CDs a month, spending $100 or more on music. These customers would love a monthly-fee option, and would pay a substantial amount for unlimited downloads. Other customers buy one CD in a great while, and a subscription model is terrible for them — and thus they prefer individual song downloads like iTunes. All customers hate DRM, as it prevents them from using music in ways we now take for granted (e.g. playing on multiple devices.) What the music industry is doing now is akin to the government trying to win the War on Drugs by dropping defoliant in Colombia while doing nothing to reduce local demand — if the demand for illegal material exists, an infrastructure will spring up to fill it.
With regard to the latter, the recording industry faces a backlash when they impose penalties that vastly outstrip the perceived seriousness of the crime. People have an idea of what fair use entails, and anything you could do with a tape recorder in the 1980’s pretty much fits in that category. Thus, multi-million-dollar prosecutions of parents and students seems grossly unfair. However, people also know that “everyone” shares files, yet we only occasionally hear about these huge lawsuits, and thus people assume it won’t happen to them. The only people who believe they’ll get caught for file-sharing are those that already have. However, if being caught file-sharing leads to financial ruin, this must of necessity be only a very small percentage. If university students got caught by the thousand file-sharing and got fined $100 for it, they might consider legal alternatives a better option after a fine or two.
All this said, I think the future will eventually be in DRM-free downloads, and that that future will result in less profit both for recording companies (which may die entirely) and for hit artists (though it will result in substantially more profit for well-known local and regional acts, or less-popular national acts, which currently get almost nothing from the “star” system of the recording industry.) It’s understandable that the recording industry and the most-successful recording artists want to fight this future, but I don’t see any way that continuously stepping up penalties for actions taken by half the American population is going to do it.
As for creating a new federal bureaucracy to fight copyright infringement, having law enforcement involved in what is essentially a civil matter (as copyright should be) is always dangerous, because it eliminates risk and return from the equation. When something is a civil matter, the injured party must decide that its worth its while to pursue a given enforcement action. Industrial-scale piracy would certainly be worth a lawsuit; a university student running Kazaa probably isn’t. However, when the injured party can simply ask the government to use taxpayer dollars to go after infringers, then why not go after everyone? it doesn’t cost them anything; instead we get to pay for it.
DRM is a dead end; as a trusted-client problem, it is unsolvable. I think this “get tough” legislative approach is a dead end as well.
Over at Schneier on Security, Bruce Schneier has a post today about securing data on disk. Encryption is often sold as a panacea for all security problems — which it’s not — but keeping people from reading your data if they steal your laptop is one thing encryption is really good at, and it’s an area where the real complexities of encryption (key management, key rotation, public key infrastructure) aren’t terribly important and can be safely neglected.
Schneier mentions Microsoft’s BitLocker in passing, and I wanted to add some detail. BitLocker is a whole-disk encryption system integrated into Windows Vista, and integrates with the Trusted Platform Module if available (the TPM is a smart chip on the mainboard that stores keys and performs secure cryptographic operations.) You tell BitLocker to encrypt your drive, and then choose one of several options for how to store the key. The simplest mode simply prevents someone from mounting the drive in another system or operating system, by storing the key in the TPM and retrieving it automatically on boot (this actually does make it significantly harder to get at the data on the disk without your password.) More complex modes store the key in the TPM and require either a PIN code from you or a certificate stored on a USB key to extract the key. Thus, on booting your PC you enter your PIN or insert the key, and the drive is unlocked.
The PGP product Schneier advocates encrypts the drive similarly to BitLocker, though rather than storing the key in the TPM it relies on a user-supplied passphrase to decrypt the key. While this is theoretically less secure (with the TPM, even the encrypted key is stored in tamper-resistant hardware and difficult to access), in practice it makes little difference — it’s still quite secure, and unlike BitLocker will let you encrypt other drives.
However, one feature BitLocker has and PGP lacks is key escrow. Now, this is normally thought of by privacy activists as an anti-feature, remembering the Clipper Chip fiasco of the late 90’s. However, the purpose of BitLocker’s key escrow is not to give a back-door key to the government, but rather to make the system palatable for enterprise deployment. Large corporations have traditionally been unwilling to embrace whole-disk encryption products like PGP even on laptops carrying sensitive data, for fear that the person with the key will forget the passphrase or simply leave the company and refuse to disclose it. By having the BitLocker keys escrowed with the domain controller such that appropriate corporate officers can retrieve it, it makes BitLocker “safe” for corporate use. If you’re not a domain member (i.e. it’s your home computer), then the keys aren’t escrowed with anyone else — there’s no government back-door.
Schneier rightly points out that an issue with any sort of whole-drive encryption is that they do not protect your data from government subpoena. If the government seizes your computer as evidence, they can (in the United States at least) subpoena the keys, and if you don’t turn them over you can be fined or jailed for contempt of court. This is not an issue for most (legal) data, but if you have something to hide from everyone, there are solutions other than the one Schneier posits (”just don’t keep data on your laptop that you don’t want subpoenaed.”) One option is the open-source disk encrypter TrueCrypt.
The problem with encrypted data on your disk is that it’s really obvious. It is not plausible to say “Oh, I don’t have any encrypted data” if served with a subpoena. For one, you probably have encryption software on your computer, and links to data that can’t be followed without decryption. But besides that, encrypted data is provably, mathematically distinguishable from almost everything else. Encrypted data consists of a binary blob with a uniform distribution across its entire data space — that is, any given byte is just as likely to be 00 as it is to be 01, 02, 03, or any other value. If you plotted it on a histogram, given enough data the graph would be approximately flat (subject to the variation and “clumpiness” always present in random data) and there would be no more repetition than expected by random chance. This is unlike every other type of data — executable programs, graphics, sound, word processor files, spreadsheets, etc. all have their own characteristic histograms and repeated patterns. Even compressed files have specific, recognizable headers and certain characteristic patterns (though they come closest to looking like encrypted data, since they have high entropy.) Thus, encrypted data stands out because it is “more random” than any other data on your hard drive. Since no one keeps large blobs of totally random noise on their hard drive, if one is found, it’s pretty certain to be encrypted data, and the courts know this (or at least can be convinced of it by expert witnesses.)
TrueCrypt has the feature of being able to place an encrypted volume inside an encrypted volume. Combined with the fact that it pads encrypted volumes with random noise, this leads to the ability to have plausible deniability of encrypted data. Essentially, it works as follows:
- You create a TrueCrypt volume on your hard drive with a specified size, say 10 GB. TrueCrypt reserves that much space, and fills it with random noise.
- You create a second TrueCrypt volume, with a different key, inside the first volume, with a smaller specified size, say 2 GB. TrueCrypt takes that space and fills it with different random noise.
- When you want to access encrypted data, you mount both volumes. You put really secret stuff on the inner volume, and moderately secret stuff (e.g. pirated MP3s) on the outer volume.
Now, if someone gets your laptop, they can see that you have TrueCrypt installed, and that there is a 10GB encrypted volume (as there’s a 10GB blob of random noise on your hard drive.) They force you to give them the key, and you do so. This unlocks the outer volume, revealing its encrypted files. However, there is no sign that the inner volume exists. Unless you know it’s there, and know the key, there is no way to distinguish the random noise of its encrypted files from the random noise TrueCrypt filled the outer volume with anyway. There could be a dozen encrypted volumes, or none — it’s impossible for anyone to know, and indeed, most people without a security mindset would never even think of such a thing.
Now, there are drawbacks to this technology. If you mount the outer volume but not the inner one, neither TrueCrypt nor your operating system knows about the inner volume, either! This means that writing files to the outer volume may overwrite and destroy the inner volume if you’ve not mounted it. This isn’t a major problem, but it is inconvenient, especially if you have many volumes (as you need to type in the different passphrases and addresses of all of them every time you want to write to any of them.) And no automation will help you, because having any would defeat the purpose — the existence of automation scripts would tip off a smart forensic investigator that your outer volume contains inner volumes.
It’s an interesting solution to the problem of plausible deniability — using steganography to hide encrypted data in encrypted data. Admittedly, Schneier’s solution (just don’t have the data at all) is even safer, but sometimes that’s not good enough.
Subscribe