Peter Scharr, Germany’s Commissioner of Data Protection and head of the European Union’s privacy working group, has stated that information identified only by IP address must be considered personally identifiable information. As the AP article points out, this could have rather serious implications for search engines and many other electronic businesses, and RSnake is concerned about it messing up the entire advertising business model of the Internet.
First, for those not working in the information security industry: something being classified as personally identifiable information (PII) is a big deal. If data is PII, you are liable for damages if the data is ever released, and you are required by statute to take significant and often expensive measures to protect it. If you’re a public corporation, Sarbanes-Oxley requires you to do all sorts of things to protect the data (e.g. encryption.) If your company takes credit card payments, the Payment Card Industry Data Security Standard requires you to do even more (e.g. physical protection of the hardware the data sits on, specific firewall/router configurations, etc.) Most large companies have their own standards for how PII must be protected that combine or even go beyond the regulatory and industry requirements. Overall, the required protections around PII are onerous enough that companies strive to minimize how much PII they have at all — it’s often cheaper and easier to just delete the data than to protect it the way you need to protect it. Companies must make the decision of “How much business value do we get out of storing, say, our customers’ addresses, and does it exceed the cost of protecting that data?” Often the answer is no.
On the surface, calling IP addresses PII is ridiculous. IP addresses are found on every packet anyone sends on the Internet; if IP addresses count as a personal identification, then logging basically anything about Internet traffic makes the logs PII. It takes a label currently applied only to a small amount of high-value data and applies it to something that everyone everywhere logs; it seems absurd. But as I think about it more, I’ve come to realize that Scharr has a point.
The EU is much more aggressive about privacy law than the United States. The United States Constitution guarantees privacy from the government through the Fourth and Fifth amendments; this sharply limits what the government can collect on you and what it can do with the data it does collect. However, there is no Constitutionally or legislatively defined general right to privacy — anyone can collect whatever data they want, so long as they’re not a branch of government. This is usually an adequate protection against government abuse, but it does mean the private sector can accumulate a frightening amount of data about you, and that could be prone to abuse as well. EU nations, on the other hand, often have a general right to privacy and various data collection expected in the United States is often illegal; in addition, where the data can be stored, sharing it with any third party without express user consent is almost always illegal.
If IP addresses are PII, what really happens? It requires changing a lot of current practices, but this is not the same as breaking scenarios. Remember, the privacy issue isn’t with transmitting or using IPs — it’s with storing them or sharing them with a third party.
- Currently search engines like Google use your IP to identify where you are geographically, so as to establish search profiles for regions and target ads. They store the first 24 bits of your IP (dropping the last octet) as a proxy for location. They would need to switch to storing a different proxy for location (e.g. latitude and longitude), though they could still base this proxy on your IP.
- Pay-per-click ad networks would still function. When they’re clicked, the ad network records the click (so as to be able to bill the advertiser), then issues a 301 redirect to the advertiser, who also records the click (to know it happened and the ad was effective.) These records would need to leave out IP, or be protected as PII. Lacking IP, however, would make detecting and preventing click fraud (spoofed clicks, or many clicks from the same person) much more difficult. Currently a skilled fraudster can evade IP-based click-fraud prevention, but losing even that would make click fraud easy. Also, without IP addresses, the ad networks would have a hard time proving to advertisers that clicks were real if an advertiser chose to sue them. Large ad networks would probably have to just eat the cost of protecting their logs as PII.
- Contrary to RSnake’s comment, I do not think this would affect embedded content. Embedded content comes in two forms — content linked to on a page, which your browser loads (objects), and content retrieved by the server and displayed on the page (mashups.)
- In the object case (e.g. viewing a YouTube video on someone’s web page), the web site owner is not leaking your IP to the third party — you are. The web site is not sending your IP to YouTube at all; your web browser is sending it in response to a link tag in the page.
- In the mashup case (e.g. web pages that get data from an API, like Facebook pages, pages embedding Google Maps, etc.), the web site owner is also not leaking your IP to the third party. You access the site, and then the site accesses the third party not as you, but as itself. The site leaks its own IP, not the customer’s. No PII is released.
- Sites that do user tracking (via logins simply recognizing users between sessions) would be unaffected; they use cookies, not IP, to track users. Most ad networks work this way, too.
- The biggest change, though, is to simple website logs. Currently every time you access any web page, it makes a note in a log of your IP and which site you accessed, which is used for statistical analysis, forensics, etc. Even this blog is doing it; with most web providers you can’t even turn this logging off if you want to. Sites will either have to stop doing this or take substantial steps to protect the logs (or else be subject to significant statutory liability if they don’t.) Not keeping logs is, from a security perspective, very dangerous — if something happens, you have no idea what happened and thus may not be able to fix it.
However, despite all that cost and difficulty, when you think about it… IP addresses really are personally identifying. If you have an always-on broadband ‘net connection, your IP address changes very rarely (maybe only once in several months), so all your web traffic everywhere, complete with your search queries, emails, etc., can be tied together with that number. Your ISP can connect that number to your name, address, etc. If you’re at a corporation, the IP is tied to a corporate gateway or proxy… which has logs tying each communication (based on date and time) to your desktop’s IP, which once again likely uniquely identifies you (unless you always compute from a shared machine.)
IP is a unique identifier for confirming identity, but not so much for initially finding it. In other words, if someone attacks my website, and I have only their IP address, it may not do me much good in finding out who they are unless I can get someone with subpoena powers to get it from the ISP. However, if I suspect a specific person of something, I can probably find out their IP and check it against my attacker’s IP, thus confirming their identity. Likewise, if I am an ad network or search engine with a lot of IP data, I don’t know who you are based on your IP, but the commonality in IPs between all the data I have may enable me to figure it out based on data aggregation.
I think this is a case where something is considered ridiculous merely because it changes things. Yes, a lot of business models and current practices would have to change if IP-as-PII became the default assumption. Yes, it would make some security people’s jobs harder, and cause web providers to incur a lot of costs. But does that mean it’s wrong? Perhaps what it means is that current businesses & web sites under-value their users’ privacy, and are freeloading while providing inadequate protections. It’s a different world if we have to discard IPs or protect them as PII, but I’m not convinced it’s a worse one.
Subscribe







January 24th, 2008 at 9:15 pm
I found your site on technorati and read a few of your other posts. Keep up the good work. I just added your RSS feed to my Google News Reader. Looking forward to reading more from you.
Aaron Wakling