There’s a new image editor from Apple, if you know where to look. The kings of the iPhone teamed up with researchers at the University of California, Santa Barbara to create a tool that allows you to edit photos and images with text-based instructions. It doesn’t have an official release, but researchers are presenting a demo you can try yourself, first spotted by Extreme technology.
The project is called Multimodal Large Language Model Guided Image Editing (MGIE). There are many AI image editors on the market today. Photoshop now comes with built-in AI tools, and others, like OpenAI’s DALL-E, let you edit images in addition to generating them from scratch. However, if you’ve ever tried to use them, you’ll know that they can be a little frustrating. In many cases, it is difficult for AI to understand exactly what you are looking for.
The innovation with MGIE adds another layer of AI interpretation. When you tell the AI what you want to see, MGIE first uses a text-based AI to make your instructions more explicit and descriptive. “The experimental results demonstrate that expressive instructions are crucial for instruction-based image editing,” the researchers said in a paper published on arXiv. “Our MGIE can lead to notable improvement.”
Apple published an open source version of the software on GitHub. If you’re savvy, you can run a version of MGIE on your own, but the researchers set the tool up on hugging face. It runs a little slow when there are a lot of people using it, but it’s a fun experiment.
Giant technology companies like Apple spend billions of dollars on projects that no one gets to see, so it is very possible that this tool called MGIE will never have an official release. Apple did not immediately respond to a request for comment.
We tested it ourselves here at the Gizmodo office. I uploaded a photo of my colleague and closest advisor, Kyle Barr, wearing a strange pair of sunglasses he bought on Netflix in This year’s consumer electronics show. I told the AI that “the man is standing in the desert.” Before generating the image, the MGIE tool extrapolated:
“The man is wearing a metal helmet and is in a desert environment. The environment around him is barren and barren, with sand dunes stretching as far as the eye can see.”
After playing with the tool for much longer than we should, it’s clearly subject to many of the same limitations as any other AI image generator. Many times, the results are strange and nothing like what you asked for. But in some cases, it did an impressive job, and in the show’s defense, AI works best with familiar subjects. “Familiar” isn’t something you’d call Kyle’s sunglasses.