Own Your Data(base)
It's been a terrible month for user data security. Epsilon and Sony, both high-profile and data-rich companies, have been breached and revealed sensitive personal data to hackers. In Sony's case, the 77 million users affected weren't even notified that their names, addresses and potentially credit card data were compromised until six days after the attack. Many speculate that in the rush to get out new product features, Sony neglected to carefully think through their security model for protecting the valuable user data they they stored. Clearly, the current system of data storage and retrieval is broken. As today's New York Times reports, there is currently no U.S. federal law regulating data theft, penalties, and notification requirements, so states are left to determine their own protocols. Companies have little downside to collecting troves of information, since the penalties for losing it are unclear while the benefits to having it are potentially great. Meanwhile, consumers have little or no control over what happens to the increasing amount of personal information that they give or leave as they interact online and in person with well connected businesses.
So what's the solution? I've been following the Diaspora project since it was first conceived about a year ago. The still-unreleased project is intended to be an open-source, peer-to-peer social network that allows users to own their own data. The gist of it is that instead of posting all of your pictures, comments and information to Facebook, where you lose control of it, you instead post it to your own Diaspora "seed" which then makes it available to users that are allowed to request it. When I create a network of friends with Diaspora, somewhere on my computer is a list of the computers that hold my friends' information, and my computer then connects and retrieves that info (all behind the scenes) when I want to interact with my friends. Facebook, on the other hand, has all the information on their own computers and simply allows me to see it when I'm connected to someone. The differences are slight, but very important. If I want to remove my information from Facebook, I can request that they delete it, but I don't specifically know exactly how they are doing it. Could it still be retrieved from offline backups if subpoenaed for a court case? Is it still being used to create advertising profiles for members of my family?
Storing information in your own Diaspora seed, on the other hand, ensures that you have complete control (yes, technically users could save the information as they browse) and can change access confidently and fully at any time. You know who is seeing your data, what they are requesting, and what is being provided. This exact distributed model would be ideal (from a consumer's point of view) for data of all types. Imagine if users could create their own "data silo" that could be plugged into by any company for purposes of interfacing and interacting with that user. When I went to the store, if I wanted to join the discount club, I could provide the store with access to my data silo, and they could store the information that they wanted to keep about me. When I went to the doctor, I could give them my data silo address and they could save my medical records to my system. Basically any company that needed to preserve information about me as a consumer/patient/customer could use my own silo to do so. The silo itself would take care of ensuring that companies could see only the information that they provided or the information that I chose to allow them to see. Later, if I decided I no longer wanted to participate in the shopper club or I wanted to switch doctors and no longer wished to allow my old doctor access to my history, I could revoke it or change permissions. It would be my data, my decisions, and my control.
Of course, there are security implications for this setup as well -- users tend to be worse than big companies about keeping their systems up to date and securing physical access. However, this is a solvable problem. Open source tools could be created to set up the data stores, and users that didn't want to host their own silos could choose to allow a professional service to do it for them. The data on the silo would remain encrypted, so the service would not have access to the actual information -- they would simply manage the logistics. For users that ran their own silos, updates would occur regularly and automatically by default, and systems that had security vulnerabilities could be temporarily taken offline until the patch was applied. In the end, perhaps the biggest benefit of a system like this would be decentralization of user information. Sony, with 77 million records and credit card numbers, is a pretty big target for a determined group of hackers. Data siloing would spread out the information, and lessen the occurrence of huge collections of information -- instead of storing credit card information and user addresses, Sony would just store pointers to the silos of their customers. When a breach was discovered, an automatic process could run to quickly and automatically notify every silo to revoke access. Carrying out a breach on one computer is much easier than on 77 million computers, so the data would be far safer.
Ultimately, companies need to be held more accountable for the information they collect and retain. Giving control back to users would put the burden on the users, who would understand the importance of securing their own information. Finally, there's something in it for the merchants too -- if consumers are confident that they have full control over what a company knows about them at any given time, they're much more likely to share info. If I know that my grocery store will give me better prices if it knows my Amazon history, I might just share it with them if I have the option to pull that access back at any time. That's good for everyone.

