Apple Develops Breakthrough Method for Running LLMs on iPhones

MacRumors · Dec 21, 2023

Apple GPT in your pocket? It could be a reality sooner than you think. Apple AI researchers say they have made a key breakthrough in deploying large language models (LLMs) on iPhones and other Apple devices with limited memory by inventing an innovative flash memory utilization technique.

LLMs and Memory Constraints

LLM-based chatbots like ChatGPT and Claude are incredibly data and memory-intensive, typically requiring vast amounts of memory to function, which is a challenge for devices like iPhones that have limited memory capacity. To tackle this issue, Apple researchers have developed a novel technique that uses flash memory – the same memory where your apps and photos live – to store the AI model's data.

Storing AI on Flash Memory

In a new research paper titled "LLM in a flash: Efficient Large Language Model Inference with Limited Memory," the authors note that flash storage is more abundant in mobile devices than the RAM traditionally used for running LLMs. Their method cleverly bypasses the limitation using two key techniques that minimize data transfer and maximize flash memory throughput:

Windowing: Think of this as a recycling method. Instead of loading new data every time, the AI model reuses some of the data it already processed. This reduces the need for constant memory fetching, making the process faster and smoother.
Row-Column Bundling: This technique is like reading a book in larger chunks instead of one word at a time. By grouping data more efficiently, it can be read faster from the flash memory, speeding up the AI's ability to understand and generate language.

The combination of these methods allows AI models to run up to twice the size of the iPhone's available memory, according to the paper. This translates to a 4-5 times increase in speed on standard processors (CPUs) and an impressive 20-25 times faster on graphics processors (GPUs). "This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility," write the authors.

Faster AI on iPhone

The breakthrough in AI efficiency opens new possibilities for future iPhones, such as more advanced Siri capabilities, real-time language translation, and sophisticated AI-driven features in photography and augmented reality. The technology also sets the stage for iPhones to run complex AI assistants and chatbots on-device, something Apple is already said to be working on.

Apple's work on generative AI could eventually be incorporated into its ‌Siri‌ voice assistant. Apple in February 2023 held an AI summit and briefed employees on its large language model work. According to Bloomberg, Apple is aiming for a smarter version of Siri that's deeply integrated with AI. Apple is planning to update the way that ‌Siri‌ interacts with the Messages app, allowing users to field complex questions and auto-complete sentences more effectively. Beyond that, Apple is rumored to be planning to add AI to as many Apple apps as possible.

Apple GPT

Apple is reportedly developing its own generative AI model called "Ajax". Designed to rival the likes of OpenAI's GPT-3 and GPT-4, Ajax operates on 200 billion parameters, suggesting a high level of complexity and capability in language understanding and generation. Internally known as "Apple GPT," Ajax aims to unify machine learning development across Apple, suggesting a broader strategy to integrate AI more deeply into Apple's ecosystem.

As of the latest reports, Ajax is considered more capable than the earlier generation ChatGPT 3.5. However, it's also suggested that OpenAI's newer models may have advanced beyond Ajax's capabilities as of September 2023.

Both The Information and analyst Jeff Pu claim that Apple will have some kind of generative AI feature available on the ‌iPhone‌ and iPad around late 2024, which is when iOS 18 will be coming out. Pu said in October that Apple is building a few hundred AI servers in 2023, with more to come in 2024. Apple will reportedly offer a combination of cloud-based AI and AI with on-device processing.

Article Link: Apple Develops Breakthrough Method for Running LLMs on iPhones

contacos · Dec 21, 2023

I'll be impressed when Siri is finally multi-lingual. I cannot believe Siri was released with iPhone 4S and it still cannot do it. It could be as simple as being "aware" what language you usually type in with contact A and automatically use Siri in language X to communicate with this person while using language Y with contact Y. All they did was "this text is in another language, do you still want me to read [the gibberish nonsense]" with iOS 16

Infodataset · Dec 21, 2023

Apple is so clever to avoid adding more ram 😉

Ralfi · Dec 21, 2023

Just leapfrog/improve Siri with LLM & make the iPhone voice assistant competitive....fcol

subjonas · Dec 21, 2023

Sounds like some significant headway. I’d definitely like as much AI done on device as possible.

Apple will probably always be behind with their LLM as long as they prioritize privacy, which I’m very ok with. But just like with big phones, they will bend if there is enough market pressure, which I suspect may eventually be the case.

grantishere · Dec 21, 2023

“Siri, when’s Apple GPT going to be released?”

“Here are the neatest pizza restaurants. Would you like directions?”

fizzyfizz · Dec 21, 2023

Sounds great! But please call it something other than Siri as that name fills me with dread every time I consider giving it another try.

dampfnudel · Dec 21, 2023

Based on the link, Apple may have to increase the RAM again, possibly from 8 to 10GB.

higgsboson · Dec 21, 2023

NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.

iBluetooth · Dec 21, 2023

Infodataset said:
Apple is so clever to avoid adding more ram 😉

The AI models (LLM) need upwards of 64 GB. That drains battery

and you need to have a strong reason for using that much memory.

orderoftheditch · Dec 21, 2023

I hope with this LLM they retire the Siri name and give us a new device assistant. Siri is garbage and the worst part of the iOS experience after being a previous WP and Android user. Cortana was a commercial failure but at least it worked.

iBluetooth · Dec 21, 2023

higgsboson said:
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.

They are competing with sending data, remotely to a server, process it and receiving the response back to the phone, which would take at least 100 ms and usually more time. But being able to process it on the phone, could compete with the servers connection time.

_apple_apple_ · Dec 21, 2023

I love hearing stories of developers squeezing extra performance out of computers with neat tricks, like the original Crash Bandicoot game, and others.

jake.au · Dec 21, 2023

$20 says lawsuits somehow related to this are already brewing 😅

wdfly · Dec 21, 2023

LOL innovative invention of swapping memory to storage…… maybe they can call it something cool like “cache”.

Razorpit · Dec 21, 2023

Does this mean Siri will finally be able to “call my wife on iPhone” again after losing that ability around 2-3 years ago?

Razorpit · Dec 21, 2023

orderoftheditch said:
I hope with this LLM they retire the Siri name and give us a new device assistant. Siri is garbage and the worst part of the iOS experience after being a previous WP and Android user. Cortana was a commercial failure but at least it worked.

Even worse, Siri regressed when it was purchased, and never recovered. Siri was once king/queen(?) of the assistants.

t0rqx · Dec 21, 2023

We heard that before...Siri. Another inventive to force an storage upgrade.

ThatsME.mr · Dec 21, 2023

wdfly said:
LOL innovative invention of swapping memory to storage…… maybe they can call it something cool like “cache”.

No, it's not cache. If it would be so simple, no development would be required and no studies would have to be done.

Caching uses generic mechanisms to decide wich information can be stored where: in cache or in RAM.

Here they analyzed how a very complex algorithm can be implemented on systems with very different storage properties: One small and fast and one very large but slow. And this on an algorithm wich is thought to be always running in fast RAM completely AND has access to some mighty servers. This is some clever software optimization.

Secondly they optimized the way they access the SSD-storage. This seems to be some hardware usage optimization very deep in the system.

Abazigal · Dec 21, 2023

Razorpit said:
Does this mean Siri will finally be able to “call my wife on iPhone” again after losing that ability around 2-3 years ago?

At least it didn't respond with "which one?" 😉

coffeemilktea · Dec 21, 2023

Doesn't flash storage eventually wear out from repeated use? Your iPhone's working life is going to get that much shorter if it's used this way.

T'hain Esh Kelch · Dec 21, 2023

coffeemilktea said:
Doesn't flash storage eventually wear out from repeated use? Your iPhone's working life is going to get that much shorter if it's used this way.

It will still outlive your phone.

gnomeisland · Dec 21, 2023

higgsboson said:
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.

If anyone can pull this off it will be Apple by again controlling the whole stack. This could be a real competitive edge—if the new Ai tools are themselves comparable.

klasma · Dec 21, 2023

The combination of these methods allows AI models to run up to twice the size of the iPhone's available memory, according to the paper.

Maybe they should just double the RAM? Adding another 8 GB should be like $20.

I know, why not both.

stefanovago · Dec 21, 2023

GPT-4 has 1.73 *trillion* parameters. That would be ~x10 times Apple’s LLM.

Apple Develops Breakthrough Method for Running LLMs on iPhones

macrumors bot

macrumors 601

macrumors member

macrumors 601

macrumors 603

macrumors newbie

macrumors member

macrumors 601

macrumors newbie

macrumors 6502a

macrumors member

macrumors 6502a

macrumors member

macrumors member

macrumors regular

macrumors 65816

macrumors 65816

macrumors 68000

macrumors newbie

Contributor

macrumors 6502a

macrumors 603

macrumors 65816

macrumors 603

macrumors member

Our Staff