Nice Ride and user privacy – crossing the line

I’m a really big fan of Nice Ride, the bike-sharing program we have here in the Twin Cities. It’s a great way to encourage cycling (especially for beginners) and exploration of the cities – there are so many little wonderful things you miss when you’re in a car or riding the bus. That’s why I was disappointed when Nice Ride disclosed rider data to the public without removing a field which can be used to individually identify riders.

Privacy has been in the Minnesota news recently, when it was discovered that the Minneapolis police department was scanning license plates and using that information to compile a database of driver activity (such as where and when a car was spotted). The mere existence of such a database is disturbing, but is unfortunately not news to those of us who follow the advancing deployment of technology. What was disturbing was that this data was semi-public – anyone could request the locations where a particular license plate was observed, and the police would provide that data. Since this story broke, efforts have been made to reduce the overall scale of the database, in addition to monitoring and/or restricting access to the public.

Nice Ride, on the other hand, apparently has no qualms about publishing their entire database, complete with a unique subscriber ID. This unique subscriber ID allows anyone with a copy of the database to track an individual user’s activity throughout the Nice Ride system. This is useful information for Nice Ride employees who are using this data to figure out how individual riders are using the bikes, allowing Nice Ride to better serve their customers. But releasing this data to the public means that a subscriber ID can be easily linked with an actual person, exposing an individual’s entire ride history. There are many conclusions one can draw about individual Nice Ride users by manipulating this data (and combining it with other data), so let’s take a look!

I’d like to start out by describing the easiest ways to correlate a subscriber ID and an actual user, but I don’t really have the heart to publish a thorough methodology – that’s one of the things I’m deeply opposed to, and is my main grievance with the irresponsible publication of this data. I did not personally use Nice Ride this year, so I don’t even have a subscriber ID in the system. But if you’re a user/consumer of social media, can you remember tweeting or updating your Facebook status when you rode on a Nice Ride? Remember someone else who did? Know of any ways that you can find this info again, as well as the date/time it was published? Well, that’s one way to start. (Again, I apologize for not writing more on this but I’m trying not to go too in-depth. Simple observation is the other obvious way – you saw that cute girl get on a Nice Ride at a certain date/place/time, and while you don’t have her name, now Nice Ride has told you everywhere she has ridden a shared bike)

Once you match a single person to a subscriber ID, the floodgates are open. You get every single individual ride’s start time/date, as well as location, and the same for the destination (time, date, location). It’s also trivial to glance at any person’s data and see if any other user has checked out a bike from the same location within the same timeframe, potentially gaining the subscriber ID of a known acquaintance, spouse, etc.

Or, to take an example from the Minneapolis Bike Love forum:

Let’s say I take a bike out every morning near my house and ride it to work. My ex-wife knows I do this. She uses this information to figure out my subscriber ID because I am the only one who daily takes that bike from there and rides to the location near my work. Using my ID she looks at my other activity. She sees that I am riding places in the middle of the day. She sees that I am riding places when I told her I was out of town. She sees that I am riding around when I told her I was too sick to take the kids. She sees that I am riding to a place where I spent Saturday night and ride away the next morning. I just do not want her knowing that shit and I did not pay NiceRide to tell her.

The bottom line is that publishing this data is irresponsible and potentially dangerous. Bike-share programs in other cities also publish the exact same data (in addition to cool charts), but without the subscriber ID. I support the great things that Nice Ride does in order to make biking more accessible to beginners and those who prefer to avoid the hassle of bike maintenance. But they seriously need to remove just one field before publishing their data.

Update as of 12/8/2012:

Of course there’s one more thing that I neglected to mention in the above post. If you go to Nice Ride’s sign-up page, you’re presented with the user agreement at the bottom. About 2/3 of the way through that document, the section on “Confidential Information” (which is the only aspect of the user agreement related to privacy, as far as I can tell) refers the user to the Privacy Policy on the website.

Now, most modern websites have some sort of Privacy Policy which governs data that is submitted or stored via the website, so that’s kind of sloppy – obviously subscriber ID, check-in times, station locations, etc. are not submitted via the website. And ignoring that oversight, most of the Privacy Policy is relatively standard boilerplate, even the section that reads:

We may share aggregated demographic information (data that cannot identify any individual person) with our partners and sponsors.

The data they have published is not aggregated data (and can potentially be used to identify individuals), and they are not providing it strictly to partners and sponsors, but to the public. There are good reasons for this (so other data nerds can make maps and track behavior). Even if Nice Ride removed the subscriber ID, they would still not be in technical compliance with their policy (because of the aggregation claim), but they would remove the possibility of identification of users, which is all I really care about.

And finally, Nice Ride published a similar dataset in 2011, but included Date of Birth, Gender, and ZIP Code – making it very easy to identify people. It doesn’t appear that they did much about this oversight (other than properly redacting this data in 2012), as Minneapolis Mayor RT Rybak’s subscriber ID appears to be in use in both the 2011 and 2012 data sets (though either he stopped using Nice Ride in May 2012, or was assigned a new subscriber ID – this doesn’t surprise me considering he’s an avid cyclist and probably prefers his own bike). It would have been a smart idea to re-assign subscriber IDs after that inadvertent disclosure.

And if you’re wondering, I did email the Director of IT for Nice Ride prior to publishing this, and he was unconcerned about the privacy implications of publishing the data. I didn’t tell him specifically about the privacy policy violations mentioned in this update, because I thought of that angle after he stopped replying to my email. The EFF sent me a form letter telling me to contact my local bar association, and a reporter from the Star Tribune couldn’t come up with an angle which was appealing enough to readers.

If anyone has any ideas on how to get this resolved (either updating their policy to state that they will share ride data about users, or to stop publishing the subscriber ID field), please let me know and share the link to this post. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *