Apple researchers released Pico-Banana-400K, a curated set of 400,000 images for training AI to edit photos from text prompts. The goal is to improve real photo editing, not just synthetic demos.
Apple frames the dataset as a fix for weak training data. Models such as GPT-4o can make flashy edits, but accuracy drops when tasks need precise control. Apple says better real-world images and cleaner labels raise that standard.
Inside the dataset
Pico-Banana-400K spans 35 edit types across eight categories. It covers basics like color and exposure changes and harder jobs like turning a portrait into a Pixar-style or LEGO-style character. Every item passed through Apple’s own quality checks.
Apple split the release into three parts. There are 258,000 single-edit examples for baseline training. Another 56,000 preference pairs compare a good edit to a failed one. A final 72,000 multi-turn sequences show how an image changes across several edits.
How Apple built and scored it
Apple used Google’s Gemini-2.5-Flash-Image, known in the paper as Nano-Banana, to generate and apply edits at scale. The team then scored results with Google’s Gemini-2.5-Pro for instruction compliance and technical quality. This keeps the loop consistent from edit to evaluation.
The process exposes model strengths and weaknesses. Style and global adjustments worked well. Local, surgical edits exposed gaps that current systems still miss.
It still fails
Apple reports clear limits. Global style changes succeed about 93 percent of the time. Precise tasks such as moving objects or editing embedded text land below 60 percent. That gap shows why real, labeled examples matter when you need pixel-level control.
The data also highlights the friction in multi-step edits. Compounding small errors across turns degrades quality. The multi-turn split exists to help models learn stability from step to step.
Text-guided editing is moving from novelty to workflow. Newsrooms, studios, and app developers want edits they can trust. A large, well-labeled set like Pico-Banana-400K gives researchers a common yardstick and a stronger base model. It also invites fairer benchmarking across teams that use the same data.
Apple positions the release as a foundation to train and test the next wave of editing systems. The company ties its goals to practical outcomes, not just leaderboard scores.
Access and use
The full dataset is available for non-commercial research use on GitHub. Apple encourages developers to fine-tune existing models or build new ones on top of these splits. If you work on evaluation, the preference pairs and multi-turn sets give you ready metrics for realism, compliance, and stability.
Bottom line. Apple shipped a big, clean dataset that targets real editing problems. It sets a higher bar for training data and a clearer path to models that follow instructions precisely.