Apple has shared a new white paper on its machine learning journal website that details how the company improves its products with iOS analytics despite using differential privacy. Critics of Apple maintain that Apple is behind competitors when it comes to machine learning because it goes to great lengths to keep user data private.
But Apple has consistently shown that it can balance machine learning with privacy, and the technology it uses is called differential privacy. We’ve talked a bit about how it works before, but this latest paper has additional information. A PDF version is also available with more detail than the journal.
Learning With Privacy at Scale
When you opt-in to iOS analytics, Apple creates a per-event privacy parameter, called epsilon. The value of epsilon is different depending on the type of information. Certain data, like the most-used emojis, don’t need to be kept as private as other data. After an event is created on a device, the data is immediately privatized and temporarily stored. It’s not immediately sent to Apple’s servers; it depends on device conditions.
The system then randomly samples from the private records and sends them to the server. The records don’t have device identifiers or timestamps of when the event was generated. The communication between device and server is encrypted using TLS.
Different algorithms are used to process the data: the Private Count Mean Sketch algorithm (CMS), the Private Hadamard Count Mean Sketch algorithm (HCMS), and the Sequence Fragment Puzzle (SFP) algorithm. Each algorithm does something different, and you can read more in the paper.
But the paper includes how the algorithms discover popular emojis, discover new words, and identify website Safari usage. For emojis, the CMS algorithm uses the parameters m=2014, k=65,536, and e=4 with a dictionary size of 2600 emojis.
To figure out which websites have high energy and memory usage, the HCMS algorithm parameters are m=32,768, k=1024, and e=4 with a dictionary size of 250,000 web domains. Finally, to improve autocorrection, the SFP algorithm is used.