An Apple Developer Explains How iCloud is Broken

Frustrated with iCloud, Apple’s developer community speaks up en masseFrustrated with iCloud, Apple’s developer community speaks up en masse

A decade ago, Max Seelemann was a youngster in school when he launched the Ulysses writer’s tool and The Soulmen. In a previous WWDC interview, he told TMO’s Dave Hamilton how the OS X app has evolved. And how smart, technical changes led to the Daedalus editing app for the iPad. He also gave us a preview of the soon to be rewritten Ulysses III.

Later, during the development process of Ulysses III, severe syncing issues arose with the use of Apple's iCloud APIs for developers. After Ulysses III was released this spring, in the process of reviewing the app for TMO, these issues were discussed, and so I asked Mr. Seelemann if he'd share his story with us. Here's the interview.

________________________

John Martellaro: Would you give us some background on this iCloud API issue, to put it all into context?

Max Seelemann: Ulysses III is designed as a document-based shoebox app: a single library of documents which is stored transparently in iCloud. Saving and syncing are supposed to be completely invisible to the user. There are no filenames, no open window and no save dialogues. While iCloud’s document store is a natural fit, our design did not allow us to use common conveniences like NSDocument. The standard components are modeled too closely after classic document-based applications. In addition they were too inflexible and heavyweight for our needs.

Max Seelemann

JM: How did the first problems crop up?

MS: It was clear from early on we had to write our own iCloud layer using the low-level NSFileCoordinator API. But by doing so, we had to take care of everything the system would normally do for us: ensuring lossless saving, interacting with iCloud, adopting Versions, showing all available documents to the user and so on. Additionally, our design required that we stored metadata for manual ordering and had a cloud-safe way for moving, copying and deleting lots of files at once.

This is a very complex undertaking as there are a lot of situations to handle: Think of conflicting changes (edit a document on two Macs), incompatible operations (move the same sheet to two different folders) and stuff that must be coordinated (you can’t save a document while it’s being moved). If you want it to be fast and modern, you will want to use multicore processors and put it all on background threads.

To give an idea of how much work that was, here are some figures: The storage layer, as it ships in Ulysses, took two engineers half a year of work each. It consists of tens of thousands of lines of code and almost the same amount in tests. Clearly nothing to be done casually at home.

JM: Where do Apple's iCloud APIs start to break down?

The most problematic part for us was understanding Apple’s NSFileCoordinator APIs and the many issues we had with it. The thing is: it looks simple, the methods are certainly simple and the documentation is written in a simple way -- but it’s use is everything but simple. In retrospective, we get the impression that the whole system seems to have been tested for Apple’s standard use only. That is, a user-managed single layer of folders, with occasional opening, saving and renaming of a few monolithic files. Pages, Keynote, traditional document-based apps. However, while the underlying APIs are designed and documented for broader use and more advanced situations, they soon start to fail if being used like that.

To give an example: there are notifications for the addition and deletion of a file inside an observed folder (NSFilePresenter). They are there, they are documented, but they are actually never being called. It might be our fault, but we haven’t seen a single invocation of one of those notifications. We eventually built our own detection for added and deleted files. There is a secondary way to get updates on the contents of an iCloud directory (via Spotlight), most certainly the one Apple is using for the standard iCloud open window. And so the notification system just fell behind.

JM: What has been your biggest challenge so far?

MS: The biggest problems we have had to date, and the ones we doubt we'll be able to solve completely, are with the implementation outside of the developer framework. In addition to what developers are exposed to, iCloud’s document syncing consists of multiple system processes working in the background.

Since this is all undocumented, I can only give our current understanding: There’s ubd, the process that actually talks to Apple’s cloud servers, gets update notifications, splits and transfers files, etc. librariand is the process that maintains the list of currently available files, the sync states, ongoing transfers and exposes all of this to Spotlight. And, last but not least, filecoordinationd. This is the file locking mechanism that makes sure only one app is writing on a file at a time. It’s also the system part of the previously discussed file notification system.

Now, as soon as there are deep hierarchies, lots of files and lots of changes, things start to fail. We have seen scenarios with each of the three subsystems broken, sometimes even beyond system reboots and iCloud deactivation/reactivation. Add a few hundred packages, open some of them, run background processes and start moving stuff around -- you can be sure that file coordination will break. In our experience it will just randomly stop reporting file changes. I may also happen that file accesses are never granted. Folder deletions have to be tried, aborted and retried in some cases.

A good part of these issues might be our fault, some certainly are. During development, we found and fixed hundreds of latent false uses in our code. The problem is that there is no documentation on how these APIs work underneath. Everything is kept with simple scenarios and accesses to single files. This makes sense to some extent, because that’s what Apple’s default implementations are doing.

After a couple of months trying to get it right, we eventually decided to implement a fallback mechanism. In case our app stops receiving notifications from file coordination, we will manually parse the file changes from what we get from Spotlight. Those updates are not as exact and as instantaneous, but they're much better than just doing nothing. We figured that both mechanisms failing at the same time would be quite rare.

JM: How are things going now, after all the work you put in?

MS: With a few weeks into the release, our implementation does seem to work pretty well. We have heard only a few users report iCloud issues, with most of them being solvable by a system reboot or turning iCloud’s document syncing off and back on. Synchronization completely failing beyond system reboots happened to only a handful of users. We recently learned about some undocumented command line tools that seem to be usable to beat even the hardest hangs.

I want to make clear that iCloud, once it works, is absolutely stunning. It really is magical to see changes come across devices in a matter seconds. Getting there has been a tough job for us. All we wish is that the overall iCloud system will get more reliable and stable. For now, we don’t need new features. Getting the current ones working smoothly with intense usage like ours would be perfect.

JM: Thanks so much for helping us understand these developer issues. I'm betting that lots of other developers will be descending on Apple at WWDC in a few weeks to discuss the very same things.

___________________

Here's some additional, excellent background on this developer issue: "Frustrated with iCloud, Apple’s developer community speaks up en masse"