Tech

Apple’s AI research suggests features are coming for Siri, Artists and more.

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


It would be easy to think that Apple is late to the AI ​​game. Since late 2021 when ChatGPT took the world by storm, most of Apple’s competitors have struggled to catch up. While Apple has certainly talked about AI and even released some products with AI in mind, it seemed to be diving in rather than diving in head first.

But in recent months, rumors and reports have suggested that Apple was actually just biding its time, waiting to act. There have been reports in recent weeks that Apple is talking to OpenAI and Google about powering some of its AI capabilities, and the company is also working on its own model, called Ajax.

If you look at Apple’s published research on AI, a picture begins to emerge of how Apple’s approach to AI could come to life. Now, obviously, making assumptions about products based on research articles is a deeply inexact science – the line from research to store shelves is windy and full of holes. But you can at least get a sense of what the company is all about thought about – and how its AI capabilities might work when Apple starts talking about them at its annual developer conference, WWDC, in June.

Smaller, more efficient models

I suspect you and I are hoping for the same thing here: better Siri. And it looks like Better Siri is coming! There’s an assumption in a lot of research at Apple (and much of the tech industry, the world and everywhere) that great language models will immediately make virtual assistants better and smarter. For Apple, getting to Better Siri means making these models as quickly as possible — and making sure they’re everywhere.

In iOS 18, Apple plans to have all of its AI features running in a completely offline model on the device, Bloomberg recently reported. It’s hard to build a good all-in-one model even when you have a network of data centers and thousands of high-end GPUs – it’s drastically harder to do it with just the guts inside your smartphone. So Apple is having to get creative.

In an article called “LLM in a blink of an eye: Efficient inference of large language models with limited memory” (all these articles have really boring titles, but they are really interesting, I promise!), Researchers have developed a system to store a model’s data, which is usually stored in your device’s RAM, on the SSD. “We demonstrated the ability to run LLMs with up to twice the size of available DRAM [on the SSD],” the researchers wrote, “achieving an inference speed acceleration of 4 to 5x compared to traditional loading methods on the CPU and 20 to 25x on the GPU.” By taking advantage of cheaper, available storage on your device, they found, models can run faster and more efficiently.

Apple researchers also created a system called EELBERT which can essentially compress an LLM into a much smaller size without making it significantly worse. Its compressed version of Google’s Bert model was 15 times smaller – just 1.2 megabytes – and only had a 4% reduction in quality. However, it came with some latency tradeoffs.

In general, Apple is striving to resolve a central tension in the model world: the bigger a model becomes, the better and more useful it can be, but also the heavier, more power-hungry, and slower it can become. Like so many others, the company is trying to find the right balance between all of these things while also looking for a way to have it all.

Siri, but good

A lot of what we talk about when we talk about AI products are virtual assistants – assistants that know things, that can remind us of things, that can answer questions and perform tasks on our behalf. So it’s not exactly shocking that much of Apple’s AI research boils down to a single question: What if Siri was really, really, really good?

A group of Apple researchers has been working on a way to use Siri without the need to use an activation word; Instead of hearing “Hey Siri” or “Siri,” the device can simply intuit whether you are talking to it. “This problem is significantly more challenging than voice trigger detection,” the researchers acknowledged, “since there may not be a primary trigger phrase that marks the start of a voice command.” Maybe that’s why another group of researchers developed a system to detect trigger words more accurately. Other role trained a model to better understand rare words, which are often not well understood by assistants.

In either case, the appeal of an LLM is that it can, in theory, process much more information much more quickly. In the wake-word paper, for example, researchers found that, no trying to discard all unnecessary sounds, but instead feeding the model everything and letting it process what matters and what doesn’t, the wake word worked much more reliably.

Once Siri hears you, Apple is working hard to make sure it understands and communicates better. In an article, he developed a system called STEER (which stands for Semantic Turn Extension-Expansion Recognition, so we’ll use STEER) which aims to improve your back-and-forth communication with an assistant, trying to figure out when you’re asking a follow-up question and when you’re asking a new one. In another, he uses LLMs to better understand “ambiguous queries” and figure out what you mean no matter how you say it. “In uncertain circumstances,” they wrote, “intelligent conversational agents may need to take the initiative to reduce their uncertainty by proactively asking good questions, thereby solving problems more effectively.” Other role aims to help with that too: researchers have used LLMs to make wizards less verbose and more understandable when generating responses.

Soon, you’ll be able to edit your photos just by requesting changes.
Image: Apple

AI in healthcare, image editors, in your Memojis

Whenever Apple talks publicly about AI, it tends to focus less on raw technological power and more on the day-to-day things that AI can actually do for you. So while there’s a lot of focus on Siri — especially as Apple looks to compete with devices like the Humane AI Pin, the Rabbit R1, and Google’s continued destruction of Gemini across Android — there are plenty of other ways Apple seems to view A.I. being useful.

One obvious place for Apple to focus is on healthcare: LLMs could, in theory, help wade through the oceans of biometric data collected by your various devices and help you make sense of it all. So Apple has been researching how to collect and group all of your movement data, how to use gait recognition and your headphones to identify you, and how to track and understand your heart rate data. Apple also created and released “the largest multi-device, multi-location sensor-based human activity dataset” available after collecting data from 50 participants with multiple body sensors.

Apple also seems to envision AI as a creative tool. For one paper, researchers interviewed several animators, designers, and engineers and built a system called Keyframer that allows[s] users iteratively build and refine the generated designs.” Instead of typing one prompt and getting an image and then typing another prompt to get another image, you start with one prompt but then get a toolkit to adjust and refine parts of the image to your liking. You could imagine this kind of back-and-forth artistic process appearing everywhere, from Memoji’s creator to some of Apple’s more professional art tools.

In other role, Apple describes a tool called MGIE that lets you edit an image just by describing the edits you want to make. (“Make the sky bluer,” “make my face less weird,” “add some rocks,” that sort of thing.) “Rather than brief but ambiguous guidance, MGIE derives an explicit intention from visual awareness and leads to reasonable image editing,” the researchers wrote. His initial experiments weren’t perfect, but they were impressive.

We could even get some AI in Apple Music: for an article called “Voice cancellation for stereo singing with limited features”, researchers explored ways to separate voices from instruments in songs – which could be useful if Apple wants to give people tools to, for example, remix songs in the same way you can on TikTok or Instagram.

In the future, Siri will be able to understand and use your phone for you.
Image: Apple

Over time, I bet this is the kind of thing you’ll see Apple lean into, especially on iOS. Some of these Apple will incorporate into its own applications; some will be offered to third-party developers as APIs. (The recent Journaling Suggestions feature is probably a good guide to how this might work.) Apple has always touted its hardware capabilities, especially compared to the average Android device; Combining all this power with on-device privacy-focused AI could be a huge differentiator.

But if you want to see the biggest, most ambitious AI project happening at Apple, you need to know about Ferret. Ferret is a large multimodal language model that can receive instructions, focus on something specific you have circled or selected, and understand the world around you. It’s designed for the now-normal AI use case of asking a device about the world around it, but it might also be able to understand what’s on your screen. In the Ferret paper, researchers show that it can help you navigate apps, answer questions about App Store ratings, describe what you’re seeing, and more. This has really interesting implications for accessibility, but it could also completely change the way you use your phone – and someday your Vision Pro and/or smart glasses.

We’re getting way ahead of ourselves here, but you can imagine how this would work with some of the other things Apple is working on. A Siri that can understand what you want, paired with a device that can see and understand everything that’s happening on its display, is a phone that can literally use itself. Apple wouldn’t need deep integrations with everything; it could simply run the apps and tap the right buttons automatically.

Again, this is all just research, and for all of this to work well as of this spring would be a legitimately unprecedented technical achievement. (I mean, you’ve tried chatbots – you know they’re not great.) But I’ll bet you anything we’ll get big AI announcements at WWDC. Apple CEO Tim Cook even teased this in February and basically promised it on this week’s earnings call. And two things are very clear: Apple is in the AI ​​race and that could mean a total redesign of the iPhone. Heck, you might even start using Siri willingly! And that would be a great achievement.



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 5,956

Don't Miss