This Startup Wants to Build a “GitHub for Data”

· · Link

A startup called Gretel wants to build a “GitHub for data” so developers can safely access sensitive data.

Often, developers don’t need full access to a bank of user data — they just need a portion or a sample to work with. In many cases, developers could suffice with data that looks like real user data.

This so-called “synthetic data” is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data — like names, addresses and other customer identifiers — and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy — a technique used to anonymize vast amounts of data — so that it’s no longer tied to customer information.

 

More Apps Should Use Differential Privacy

· · Link

News app Tonic is different than most news apps because it uses differential privacy. More apps should do the same.

Before your eyes cross, a real-life example Cyphers gave me is the census. The government has a lot of aggregate data about its citizens—and it probably wants to share demographic information from that set without revealing anything about any one particular individual. Let’s say you live in a small census block with only one or two people. It wouldn’t take a genius to figure out personal information about you, given the right parameters. Differential privacy would be a way to summarize that data without putting any one individual at risk.