IP Addresses: Personally Identifiable Information?

Peter Scharr, Germany’s Commissioner of Data Protection and head of the European Union’s privacy working group, has stated that information identified only by IP address must be considered personally identifiable information. As the AP article points out, this could have rather serious implications for search engines and many other electronic businesses, and RSnake is concerned about it messing up the entire advertising business model of the Internet.

First, for those not working in the information security industry: something being classified as personally identifiable information (PII) is a big deal. If data is PII, you are liable for damages if the data is ever released, and you are required by statute to take significant and often expensive measures to protect it. If you’re a public corporation, Sarbanes-Oxley requires you to do all sorts of things to protect the data (e.g. encryption.) If your company takes credit card payments, the Payment Card Industry Data Security Standard requires you to do even more (e.g. physical protection of the hardware the data sits on, specific firewall/router configurations, etc.) Most large companies have their own standards for how PII must be protected that combine or even go beyond the regulatory and industry requirements. Overall, the required protections around PII are onerous enough that companies strive to minimize how much PII they have at all — it’s often cheaper and easier to just delete the data than to protect it the way you need to protect it. Companies must make the decision of “How much business value do we get out of storing, say, our customers’ addresses, and does it exceed the cost of protecting that data?” Often the answer is no.

On the surface, calling IP addresses PII is ridiculous. IP addresses are found on every packet anyone sends on the Internet; if IP addresses count as a personal identification, then logging basically anything about Internet traffic makes the logs PII. It takes a label currently applied only to a small amount of high-value data and applies it to something that everyone everywhere logs; it seems absurd. But as I think about it more, I’ve come to realize that Scharr has a point.

The EU is much more aggressive about privacy law than the United States. The United States Constitution guarantees privacy from the government through the Fourth and Fifth amendments; this sharply limits what the government can collect on you and what it can do with the data it does collect. However, there is no Constitutionally or legislatively defined general right to privacy — anyone can collect whatever data they want, so long as they’re not a branch of government. This is usually an adequate protection against government abuse, but it does mean the private sector can accumulate a frightening amount of data about you, and that could be prone to abuse as well. EU nations, on the other hand, often have a general right to privacy and various data collection expected in the United States is often illegal; in addition, where the data can be stored, sharing it with any third party without express user consent is almost always illegal.

If IP addresses are PII, what really happens? It requires changing a lot of current practices, but this is not the same as breaking scenarios. Remember, the privacy issue isn’t with transmitting or using IPs — it’s with storing them or sharing them with a third party.

However, despite all that cost and difficulty, when you think about it… IP addresses really are personally identifying. If you have an always-on broadband ‘net connection, your IP address changes very rarely (maybe only once in several months), so all your web traffic everywhere, complete with your search queries, emails, etc., can be tied together with that number. Your ISP can connect that number to your name, address, etc. If you’re at a corporation, the IP is tied to a corporate gateway or proxy… which has logs tying each communication (based on date and time) to your desktop’s IP, which once again likely uniquely identifies you (unless you always compute from a shared machine.)

IP is a unique identifier for confirming identity, but not so much for initially finding it. In other words, if someone attacks my website, and I have only their IP address, it may not do me much good in finding out who they are unless I can get someone with subpoena powers to get it from the ISP. However, if I suspect a specific person of something, I can probably find out their IP and check it against my attacker’s IP, thus confirming their identity. Likewise, if I am an ad network or search engine with a lot of IP data, I don’t know who you are based on your IP, but the commonality in IPs between all the data I have may enable me to figure it out based on data aggregation.

I think this is a case where something is considered ridiculous merely because it changes things. Yes, a lot of business models and current practices would have to change if IP-as-PII became the default assumption. Yes, it would make some security people’s jobs harder, and cause web providers to incur a lot of costs. But does that mean it’s wrong? Perhaps what it means is that current businesses & web sites under-value their users’ privacy, and are freeloading while providing inadequate protections. It’s a different world if we have to discard IPs or protect them as PII, but I’m not convinced it’s a worse one.

anonymity, legal, networks, privacy

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.