Apple Develops Breakthrough Method for Running LLMs on iPhones

dabirdwell · Dec 21, 2023

Posted this on Ars earlier:

Further, this means it will run on Home hubs like iPads, Macs, and maybe AppleTVs. I have a dozen HomePods and many dozens of connected devices around my house that are running automations and responding to requests all day every day. A GPT-4 level LLM, or even near, running locally and able to access my local files, would be a ridiculous benefit for my daily life.

Populus · Dec 21, 2023

I wish this solution would work 100% local, on device, but reading that Apple is building data centers for this purpose probably means that it’s not going to be 100% a local LLM.

Macalicious2011 · Dec 21, 2023

dabirdwell said:
Posted this on Ars earlier:

Further, this means it will run on Home hubs like iPads, Macs, and maybe AppleTVs. I have a dozen HomePods and many dozens of connected devices around my house that are running automations and responding to requests all day every day. A GPT-4 level LLM, or even near, running locally and able to access my local files, would be a ridiculous benefit for my daily life.

In 12-24months the biggest selling point of buying a new device will be chips for running LLM and ai tasks on-device. This would ween us off our dependence of Nvidia but also compute costs.

It would be wonderful to run generative ai software locally without rate limits or worrying about exhausting credits.

ksj1 · Dec 21, 2023

You can already run LLMs on your iPhone using llama.cpp. There are models that will fit in memory and work pretty well. Microsoft Phi for example.

krakenrelease · Dec 21, 2023

higgsboson said:
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.

The storage size that inference is based off of is more important than the processing speed. You cant have a Large Language Model without a large dataset.

CausticSoda · Dec 21, 2023

contacos said:
I'll be impressed when Siri is finally multi-lingual. I cannot believe Siri was released with iPhone 4S and it still cannot do it. It could be as simple as being "aware" what language you usually type in with contact A and automatically use Siri in language X to communicate with this person while using language Y with contact Y. All they did was "this text is in another language, do you still want me to read [the gibberish nonsense]" with iOS 16

I'll be impressed when it finally has basic, reliable mono-lingual functionality. At the moment I keep it turned off on all of my devices. It is shockingly bad.

wilhoitm · Dec 21, 2023

So now my 256GB iPhone has 32 times more memory than my 8GB Macbook Pro!

Kingcoherent · Dec 21, 2023

Hugging face CTO posted yesterday that 2024 would be the year of local inference, driven in part by Apple silicon. It was my prediction too, but mine carry less weight.

name99 · Dec 21, 2023

Obviously fake news, because I have been reliably informed by "the Internet" that Apple is behind in AI and always will be...

It's interesting to compare this with earlier AI work. I'm thinking, for example, of text recognition. This was widely available, one way or another, for a few years but not present as part of Apple's OS. But when it WAS added to Apple OS's, so that you can just cut and paste text from images, it's so slick and works so well, it feels like it was always there, you don't even notice it's working until you sit back and think "WTF?"
Similarly for being able to cut and paste the "subject" of a photo stripped of the background.

I fully expect LLMs (and art generation) to play out in much the same way – two years or so of bemoaning that Apple is "behind" while other people wrestle with godawful UI, strange failure modes and limitations, disappointing performance. Then Apple ships something that just works, and the people who were complaining three months ago refuse to acknowledge it and move on to complaining about the next thing.

name99 · Dec 21, 2023

ThatsME.mr said:
No, it's not cache. If it would be so simple, no development would be required and no studies would have to be done.

Caching uses generic mechanisms to decide wich information can be stored where: in cache or in RAM.

Here they analyzed how a very complex algorithm can be implemented on systems with very different storage properties: One small and fast and one very large but slow. And this on an algorithm wich is thought to be always running in fast RAM completely AND has access to some mighty servers. This is some clever software optimization.

Secondly they optimized the way they access the SSD-storage. This seems to be some hardware usage optimization very deep in the system.

Forget it.
People who don't want to understand technology, when it was explained the first time in the article, won't try any harder the second time, when it's explained in a post. It always looks more cool to mock than to understand.

name99 · Dec 21, 2023

stefanovago said:
GPT-4 has 1.73 *trillion* parameters. That would be ~x10 times Apple’s LLM.

Maybe so. But
- we have no real idea how a rough number like "parameters" translates to performance. There are rough scaling "laws" but those all seem to be based on applying more parameters to a constant architecture. Once you change the architecture, or switch to something different like a mixture of models, the scaling changes.

-GPT-4 is very much a brute force design. Which is not to say that this was dumb, just that it's not inevitably the only solution. A brute force design lands up embedding a huge amount of "reference" knowledge in the parameters, which is highly sub-optimal. What you want is that the parameters encode only "base" language and world knowledge, and the ability to look up anything else. This line of research doesn't seem to be very publicly visible right now (perhaps because the results are less sexy) but seems to me an overall more useful direction. Think, for example, of the way ChatGPT can call out to Wolfram to perform mathematical manipulation, but generalized to many other domains.

- the competition isn't against GPT-4, it's against nothing. For now most of the value will be in simply having something that's always available and without all the hassles of using ChatGPT (webUI, having to pay and login, all that nonsense).

- much of what Apple will be doing is stuff that's of less interest to a company like OpenAI, like getting models to work with ever more languages, or convenient cross-modality (integrate me talking and what the camera is seeing to generate text, stuff like that). This sort of productization is not necessarily academically leading edge, but it's what converts "that's cute" into "that's useful".

name99 · Dec 21, 2023

klasma said:
Maybe they should just double the RAM? Adding another 8 GB should be like $20.

I know, why not both.

Uh, because users might be upset if the next generation of Siri only works on iPhone 2024 and later models?

Le0M · Dec 21, 2023

I use Chat GPT for any kind of questions, like cooking, gardening, math and geometry, you name it. I often use the speech-to-speech way of interacting with it.
If Apple can provide such AI - that can answer questions on any subject - I'm gonna be it's number 1 fan.
Any other AI integration with apps, honestly, I couldn't care less.

name99 · Dec 21, 2023

coffeemilktea said:
Doesn't flash storage eventually wear out from repeated use? Your iPhone's working life is going to get that much shorter if it's used this way.

To the extent that flash wears out from "repeated use" (a somewhat contentious point probably of dubious relevance to anything outside 24 hour a day data center operation) the wear-out is induced by WRITING not reading...

wigby · Dec 21, 2023

klasma said:
Maybe they should just double the RAM? Adding another 8 GB should be like $20.

I know, why not both.

If you double the RAM you are also filling it up more memory constantly which eats more battery life. It matters more how you are delegating which data is prioritized.

Le0M · Dec 21, 2023

MilaM said:
This is very neat research, but I think the suggestion from the headline, that this will bring LLMs to iPhones, is a stretch. The test platform was a M1 Max with an unspecified amount of DRAM. My educated guess is that it had at least 16GB of DRAM, likely even more. This is not iPhone territory.

I said it before and will repeat it now. Making RAM a luxury with extremely overpriced upgrades will bite Apple, or rather us customers, in the ass.

We'll probably see via-web applegpt on iPhones and iPads, and only on powerful Macs, the on-device one.

erthquake · Dec 21, 2023

iBluetooth said:
They are competing with sending data, remotely to a server, process it and receiving the response back to the phone, which would take at least 100 ms and usually more time. But being able to process it on the phone, could compete with the servers connection time.

Moreover, AppleGPT can have at least an order of magnitude fewer parameters than GPT-4 and other foundational LLMs, and still be lightyears ahead of where Siri is now. You don't need a compression of *all* of human knowledge to get Siri to understand requests for stores, websites, information, opening apps and other relatively simple shortcuts and task; a much smaller, but well-tuned model should be sufficient.

Macalicious2011 · Dec 21, 2023

ksj1 said:
You can already run LLMs on your iPhone using llama.cpp. There are models that will fit in memory and work pretty well. Microsoft Phi for example.

Is the experience stable with minimal crashing or hangups?

Macalicious2011 · Dec 21, 2023

name99 said:
- we have no real idea how a rough number like "parameters" translates to performance. There are rough scaling "laws" but those all seem to be based on applying more parameters to a constant architecture. Once you change the architecture, or switch to something different like a mixture of models, the scaling changes.

Well said. For some use-cases RAG + Prompt engineering will yield superior performance to multiplying the number of parameters. However, the latter has probably not reached diminishing returns yet as it's early days for LLMs.

I'm developing an LLM based product the improvement in output from gpt3.5 turbo to gpt4-turbo was profound and unimagined for my application.

Exciting times ahead!

leonremi · Dec 21, 2023

Ok. And what is the point of this ?
Beyond AI hype and FOMO of course?

HylianKnight · Dec 21, 2023

higgsboson said:
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.

The latest NVMes offer an order of magnitude increase in read speeds. Coupled with the methods described in the article, it is entirely possible their LLM would function well, and at least be quicker and more accurate than the current implementation of Siri.

Macalicious2011 · Dec 21, 2023

leonremi said:
Ok. And what is the point of this ?
Beyond AI hype and FOMO of course?

The goal is the ability to run LLM based apps on-device instead of:
-Paying a subscription fee for a cloud based LLM app like chatgpt that's powered by super expensive Nvidia chips.
-Accessing a free cloud based LLM app that either train their ai model using your data or compromises the security of your data.

Think of it like paying to edit photos/videos in the cloud versus using the power of the chips in your phone.

picpicmac · Dec 21, 2023

MacRumors said:
AI model called "Ajax"

And Colgate-Palmolive was not consulted...

picpicmac · Dec 21, 2023

iBluetooth said:
The AI models (LLM) need upwards of 64 GB.

Given current designs, LLMs will need to be much larger than that to interact with humans with the large set of human spoken and written languages in the manner that humans use language.

Current products like ChatGPT are overgrown sentence completion machines, and they've been trained on some subset of English (or Japanese or Arabic or ....) as the set of tokens.

It's not an issue of word-for-word types of translations - chatGPT can already do that. Google Translate can already do that.

The issue is whether a model can embed the thought-patterns, the culture, of a language, which keeps these LLMs from being true AI.

ChatGPT can complete a sentence for many (50?) languages. But that does not mean it thinks in those languages.

Any future application, from Apple or from whomever, to be truly interactive will likely depend upon a much larger RAM/resource capability than current mobile devices.

leonremi · Dec 21, 2023

Macalicious2011 said:
The goal is the ability to run LLM based apps on-device instead of:
-Paying a subscription fee for a cloud based LLM app like chatgpt that's powered by super expensive Nvidia chips.
-Accessing a free cloud based LLM app that either train their ai model using your data or compromises the security of your data.

Think of it like paying to edit photos/videos in the cloud versus using the power of the chips in your phone.

My point is not about local vs cloud based LLM.
It’s. About the uselessness of it all.
I can already assign ChatGPT audio to the action button on my 15 pro max…and it’s just a chatty mostly useless bot.
AI is so overhyped (at least the chat bots)

Apple Develops Breakthrough Method for Running LLMs on iPhones

macrumors 6502

macrumors 601

macrumors 68000

macrumors 6502

macrumors regular

macrumors 6502a

macrumors 6502a

macrumors member

macrumors 68020

macrumors 68020

macrumors 68020

macrumors 68020

macrumors 6502a

macrumors 68020

macrumors 68030

macrumors 6502a

macrumors regular

macrumors 68000

macrumors 68000

macrumors member

macrumors 6502

macrumors 68000

macrumors 65816

macrumors 65816

macrumors member

Our Staff