Better Siri is Coming: What Apple's Research Says About Its AI Plans

It would be easy to assume that Apple is late to the game when it comes to AI. Since ChatGPT took the world by storm in late 2022, most of Apple’s competitors have made great efforts to catch up. While Apple has certainly talked about AI and even launched some products with AI in mind, it seemed like it was leaning into it rather than diving in headfirst.

But in recent months, rumors and reports have suggested that Apple was actually just biding its time and waiting to make its move. In recent weeks, there have been reports that Apple is talking to both OpenAI and Google about deploying some of its AI capabilities, and the company has also been working on its own model called Ajax.

Looking at Apple’s published AI research, a picture begins to emerge of how Apple’s AI approach could be brought to life. Now, obviously making product assumptions based on research is a deeply inexact science – the road from research to store shelves is windy and full of potholes. But you can at least get a sense of what the company is Think about it – and how its AI capabilities might work when Apple starts talking about it at its annual developer conference WWDC in June.

Smaller, more efficient models

I suspect you and I are hoping for the same thing here: Better Siri. And it looks like “Better Siri” is coming! Much of Apple’s research (and much of the tech industry, the world, and everywhere) assumes that large language models will immediately make virtual assistants better and smarter. For Apple, “Better Siri” means making these models as fast as possible – and ensuring they are available everywhere.

In iOS 18, Apple plans to run all of its AI features on an on-device, completely offline model. Bloomberg recently reported. It’s difficult to build a good general-purpose model even if you have a network of data centers and thousands of cutting-edge GPUs – it’s significantly harder to do it with just the guts in your smartphone. So Apple has to get creative.

In a paper called “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” (all of these papers have really boring titles but are really interesting, I promise!), researchers have a system for storing a model’s data developed, which is usually stored in your device’s RAM, instead on the SSD. “We have demonstrated the ability to run LLMs up to twice the size of available DRAM [on the SSD]”,” the researchers wrote, “achieving a speedup in inference speed of 4x to 5x compared to traditional loading methods in the CPU and 20x to 25x in the GPU.” They found that the models Run faster and more efficiently by using the most cost-effective and available storage on your device.

Apple researchers have also developed a system called EELBERT that can essentially compress an LLM to a much smaller size without significantly degrading it. Their compressed version of Google’s Bert model was 15 times smaller – just 1.2 megabytes – and only experienced a 4 percent reduction in quality. However, there were some latency compromises.

In general, Apple is pushing to solve a core problem in the model world: the larger a model gets, the better and more useful it can be, but also the more unwieldy, power-hungry and slower it can become. Like so many others, the company is trying to find the right balance between all of these things while looking for a way to have it all.

Siri, but good

When we talk about AI products, we often talk about virtual assistants – assistants that know things, that can remind us of things, that can answer questions and do things on our behalf. So it’s not exactly shocking that much of Apple’s AI research boils down to a single question: What if Siri was really, really, really good?

A group of Apple researchers have been working on a way to use Siri without having to use a wake word at all; Instead of listening for “Hey Siri” or “Siri,” the device may just be able to intuitively detect whether you’re talking to it. “This problem is significantly more challenging than speech trigger detection,” the researchers admitted, “since there may not be a leading trigger phrase that marks the beginning of a speech command.” That could be why another group of researchers is developing a system for more accurate detection of wake-up words. In another work, a model was trained to better understand rare words that are often not well understood by assistants.

In both cases, the appeal of an LLM is that it can theoretically process much more information much more quickly. For example, in the Wake Word article, researchers found this to be the case not By trying to discard all unnecessary noise, but instead feeding everything to the model and letting it process what was important and what wasn’t, the wake word worked far more reliably.

Once Siri hears you, Apple does a lot of work to make sure Siri understands and communicates better with you. One article developed a system called STEER (which stands for Semantic Turn Extension-Expansion Recognition, so we use STEER) that aims to improve your back-and-forth communication with an assistant by trying to figure out when You when you ask a follow-up question and when you ask a new question. In another case, LLMs are used to better understand “ambiguous queries” and figure out what you mean no matter how you say it. “In uncertain circumstances,” they wrote, “intelligent conversational agents may need to take the initiative to reduce their uncertainty by proactively asking good questions and thus solving problems more effectively.” Here, too, another article aims to help: Researchers used LLMs, to make assistants less verbose and more understandable when generating answers.

Soon you may be able to edit your images simply by asking for the changes.

Image: Apple

AI in healthcare, image editing, in your Memojis

When Apple talks publicly about AI, it tends to focus less on the sheer technological power and more on the everyday things AI can actually do for you. So while the focus is heavily on Siri – especially as Apple looks to compete with devices like the Humane AI Pin, the Rabbit R1 and Google’s continued proliferation of Gemini across Android devices – Apple seems to see plenty of other possibilities AI is useful.

An obvious focus for Apple is health: LLMs could, in theory, help navigate the oceans of biometric data collected by your various devices and help you make sense of it all. That’s why Apple has researched how to collect and compile all your movement data, how you can use Gait Detection and your headphones to identify you, and how you can track and understand your heart rate data. Apple also created and published “the largest available multi-device, multi-location human activity dataset” after collecting data from 50 participants with multiple sensors on their bodies.

Apple also seems to imagine AI as a creative tool. For one article, researchers interviewed a number of animators, designers and engineers and developed a system called Keyframer that “enables[s] “Giving users the ability to iteratively construct and refine generated designs.” Instead of typing a prompt and getting an image and then typing another prompt to get another image, you start with a prompt but then get a toolkit with which allows you to optimize and refine parts of the image to your liking. You can imagine this artistic back-and-forth process occurring everywhere from the Memoji creator to some of Apple’s more professional artistic tools.

In another article, Apple describes a tool called MGIE that allows you to edit an image by simply describing the changes you want. (“Make the sky bluer,” “make my face less weird,” “add some rocks,” something like that.) “Instead of a brief but ambiguous guide, MGIE directs an explicit visual intent and leads to reasonable image processing,” the researchers wrote. The first experiments were not perfect, but impressive.

We may even get some AI in Apple Music: In a paper called “Resource-constrained Stereo Singing Voice Cancellation,” researchers examined ways to separate voices from instruments in songs — which could be useful if Apple gave people tools to do that, for example would like to remix songs the way you can on TikTok or Instagram.

In the future, Siri may be able to understand your phone and use it for you.

Image: Apple

I bet over time Apple will lean into things like this, especially iOS. Apple will incorporate some of this into its own apps; Some are offered to third-party developers as APIs. (The current Journaling Suggestions feature is probably a good guide to how this might work.) Apple has always emphasized its hardware capabilities, especially when compared to the average Android device; Combining all this power with privacy-focused AI on device could be a big differentiator.

But if you want to see the biggest, most ambitious AI effort at Apple, you need to know something about Ferret. Ferret is a multimodal large language model that can take instructions, focus on something specific you’ve circled or otherwise selected, and understand the world around it. It’s designed for the now-normal AI use case of asking a device about the world around you, but it can also potentially understand what’s on your screen. In the Ferret paper, researchers show that it can help you navigate apps, answer questions about App Store reviews, describe what you’re looking at, and more. This has really exciting implications for accessibility, but could also one day completely change the way you use your phone – and your Vision Pro and/or smart glasses.

We’re getting way ahead of ourselves here, but you can imagine how this would work with some of the other things Apple is working on. A Siri that can understand what you want, coupled with a device that can see and understand everything that’s happening on your display, is a phone that can literally use itself. Apple wouldn’t need deep integration with everything; It could just run the apps and tap the right buttons automatically.

Again, this is just research, and if it all worked well starting this spring, it would be a legitimately unprecedented technical achievement. (I mean, you’ve tried chatbots – you know they’re not great.) But I bet you we’ll get some big AI announcements at WWDC. Apple CEO Tim Cook even made this clear in February, essentially promising it on this week’s earnings call. And two things are very clear: Apple is strong in the AI race and it could result in a complete overhaul of the iPhone. Heck, you might even start using Siri voluntarily! And that would be a remarkable achievement.

Better Siri is Coming: What Apple’s Research Says About Its AI Plans

Smaller, more efficient models

Siri, but good

AI in healthcare, image editing, in your Memojis

Leave a Comment Cancel reply