We may also use your personal information for account and network security purposes, including in order to protect our services for the benefit of all our users, and pre-screening or scanning uploaded content for potentially illegal content, including child sexual exploitation material.
PhotoDNA is primarily used in the prevention of child pornography, and works by computing a unique hash that represents the image. This hash is computed such that it is resistant to alterations in the image, including resizing and minor color alterations. It works by converting the image to black and white, resizing it, breaking it into a grid, and looking at intensity gradients or edges.
Microsoft eventually donated PhotoDNA to the National Center for Missing & Exploited Children (NCMEC). It’s used by companies like Facebook, Twitter, Google, and others. Basically, it works by creating a hash of a photo or video, comparing it to known child pornography hashes in NCMEC’s Child Victim Identification Program database, and seeing if there’s a match.
Now, companies scanning user content is a bit concerning even if it’s used for good in cases like this, especially Apple given its privacy stance. According to Microsoft’s PhotoDNA FAQ, images are instantly converted into a secure hash and can’t be reverse engineered. PhotoDNA is specifically used for child pornography and can’t be used to scan for other content.
One possibility is that Apple’s scanning is done by the photoanalysis daemon. This is the algorithm used to detect people, objects, etc. in your photos, and it’s done locally. That could be implied by the phrase “pre-screening.” But the word “uploaded content” trips me up. Is Apple scanning stuff in iCloud Drive, or just iCloud Photos? “Scanning” is also an issue. I use it because that’s the word Apple uses, but I don’t think comparing hashes is the same as “scanning.”
On its iCloud security overview page, Apple says that customer data is encrypted in transit and on iCloud servers. This includes both the Photos app and content stored in iCloud Drive. So I’m wondering at which point Apple scans content. Clearly before it gets encrypted.
We know that Apple stores the encryption keys in the cloud as well, according to the company’s Legal Process Guidelines [PDF]. This is how Apple helps you reset your password, and provide law enforcement with data under a subpoena. Further, Apple’s legal terms for iCloud says the following (emphasis added):
C. Removal of Content
You acknowledge that Apple is not responsible or liable in any way for any Content provided by others and has no duty to pre-screen such Content. However, Apple reserves the right at all times to determine whether Content is appropriate and in compliance with this Agreement, and may pre-screen, move, refuse, modify and/or remove Content at any time, without prior notice and in its sole discretion, if such Content is found to be in violation of this Agreement or is otherwise objectionable.
How can Apple determine if content is appropriate or not if it isn’t scanning iCloud content despite it being encrypted? Or, as I mentioned, maybe it’s doing all of this scanning during upload/before encryption. Don’t get me wrong, I have no problem with Apple scanning for child abuse content, but I’d like to know the extent of Apple’s knowledge of general customer content.
The word “appropriate” needs to be defined, because as we saw in 2012, Apple once deleted a screenwriter’s script attached to an email because it had a character viewing a porn ad on his computer:
AND THEN I SAW IT — a line in the script, describing a character viewing an advertisement for a pornographic site on his computer screen. Upon modifying this line, the entire document was delivered with no problem.
I reached out to Apple with questions, and I’ll update this article if I get a response.
Note: Additionally, a crucial detail that a reader points out below was that end-to-end encryption for iCloud content is only turned on if you enable two-factor authentication. But that doesn’t include iCloud Drive or the Photos app. I could be wrong but I think end-to-end encryption is different than “encrypted in transit and at rest on the server.”
Henry Farid, one of the people who helped develop PhotoDNA, wrote an article for Wired saying:
Recent advances in encryption and hashing mean that technologies like PhotoDNA can operate within a service with end-to-end encryption. Certain types of encryption algorithms, known as partially or fully homomorphic, can perform image hashing on encrypted data. This means that images in encrypted messages can be checked against known harmful material without Facebook or anyone else being able to decrypt the image. This analysis provides no information about an image’s contents, preserving privacy, unless it is a known image of child sexual abuse.
I think homomorphic encryption is still relatively new and I don’t think Apple currently uses it. But I could see Apple using it because you can perform computations on data encrypted with this technique, similar to what Apple can do with differential privacy. I look forward to the iOS 13 security guide to see if I can glean some insight. Or like I mentioned earlier, maybe Apple just scans for this content before it gets uploaded and encrypted.