Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

dabirdwell

macrumors 6502
Sep 26, 2002
458
26
Oklahoma
Posted this on Ars earlier:

Further, this means it will run on Home hubs like iPads, Macs, and maybe AppleTVs. I have a dozen HomePods and many dozens of connected devices around my house that are running automations and responding to requests all day every day. A GPT-4 level LLM, or even near, running locally and able to access my local files, would be a ridiculous benefit for my daily life.
 
  • Like
Reactions: Populus

Populus

macrumors 601
Aug 24, 2012
4,973
7,246
Spain, Europe
I wish this solution would work 100% local, on device, but reading that Apple is building data centers for this purpose probably means that it’s not going to be 100% a local LLM.
 

Macalicious2011

macrumors 68000
May 15, 2011
1,759
1,789
London
Posted this on Ars earlier:

Further, this means it will run on Home hubs like iPads, Macs, and maybe AppleTVs. I have a dozen HomePods and many dozens of connected devices around my house that are running automations and responding to requests all day every day. A GPT-4 level LLM, or even near, running locally and able to access my local files, would be a ridiculous benefit for my daily life.
In 12-24months the biggest selling point of buying a new device will be chips for running LLM and ai tasks on-device. This would ween us off our dependence of Nvidia but also compute costs.

It would be wonderful to run generative ai software locally without rate limits or worrying about exhausting credits.
 

krakenrelease

macrumors regular
Dec 3, 2020
121
118
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.
The storage size that inference is based off of is more important than the processing speed. You cant have a Large Language Model without a large dataset.
 

CausticSoda

macrumors 6502a
Feb 14, 2014
678
1,797
Abu Dhabi
I'll be impressed when Siri is finally multi-lingual. I cannot believe Siri was released with iPhone 4S and it still cannot do it. It could be as simple as being "aware" what language you usually type in with contact A and automatically use Siri in language X to communicate with this person while using language Y with contact Y. All they did was "this text is in another language, do you still want me to read [the gibberish nonsense]" with iOS 16
I'll be impressed when it finally has basic, reliable mono-lingual functionality. At the moment I keep it turned off on all of my devices. It is shockingly bad.
 

Kingcoherent

macrumors member
Aug 30, 2022
75
70
Hugging face CTO posted yesterday that 2024 would be the year of local inference, driven in part by Apple silicon. It was my prediction too, but mine carry less weight.
 

name99

macrumors 68020
Jun 21, 2004
2,283
2,139
Obviously fake news, because I have been reliably informed by "the Internet" that Apple is behind in AI and always will be...

It's interesting to compare this with earlier AI work. I'm thinking, for example, of text recognition. This was widely available, one way or another, for a few years but not present as part of Apple's OS. But when it WAS added to Apple OS's, so that you can just cut and paste text from images, it's so slick and works so well, it feels like it was always there, you don't even notice it's working until you sit back and think "WTF?"
Similarly for being able to cut and paste the "subject" of a photo stripped of the background.

I fully expect LLMs (and art generation) to play out in much the same way – two years or so of bemoaning that Apple is "behind" while other people wrestle with godawful UI, strange failure modes and limitations, disappointing performance. Then Apple ships something that just works, and the people who were complaining three months ago refuse to acknowledge it and move on to complaining about the next thing.
 
  • Like
Reactions: jhfenton

name99

macrumors 68020
Jun 21, 2004
2,283
2,139
No, it's not cache. If it would be so simple, no development would be required and no studies would have to be done.

Caching uses generic mechanisms to decide wich information can be stored where: in cache or in RAM.

Here they analyzed how a very complex algorithm can be implemented on systems with very different storage properties: One small and fast and one very large but slow. And this on an algorithm wich is thought to be always running in fast RAM completely AND has access to some mighty servers. This is some clever software optimization.

Secondly they optimized the way they access the SSD-storage. This seems to be some hardware usage optimization very deep in the system.
Forget it.
People who don't want to understand technology, when it was explained the first time in the article, won't try any harder the second time, when it's explained in a post. It always looks more cool to mock than to understand.
 

name99

macrumors 68020
Jun 21, 2004
2,283
2,139
GPT-4 has 1.73 *trillion* parameters. That would be ~x10 times Apple’s LLM.
Maybe so. But
- we have no real idea how a rough number like "parameters" translates to performance. There are rough scaling "laws" but those all seem to be based on applying more parameters to a constant architecture. Once you change the architecture, or switch to something different like a mixture of models, the scaling changes.

-GPT-4 is very much a brute force design. Which is not to say that this was dumb, just that it's not inevitably the only solution. A brute force design lands up embedding a huge amount of "reference" knowledge in the parameters, which is highly sub-optimal. What you want is that the parameters encode only "base" language and world knowledge, and the ability to look up anything else. This line of research doesn't seem to be very publicly visible right now (perhaps because the results are less sexy) but seems to me an overall more useful direction. Think, for example, of the way ChatGPT can call out to Wolfram to perform mathematical manipulation, but generalized to many other domains.

- the competition isn't against GPT-4, it's against nothing. For now most of the value will be in simply having something that's always available and without all the hassles of using ChatGPT (webUI, having to pay and login, all that nonsense).

- much of what Apple will be doing is stuff that's of less interest to a company like OpenAI, like getting models to work with ever more languages, or convenient cross-modality (integrate me talking and what the camera is seeing to generate text, stuff like that). This sort of productization is not necessarily academically leading edge, but it's what converts "that's cute" into "that's useful".
 
  • Like
Reactions: System603

name99

macrumors 68020
Jun 21, 2004
2,283
2,139
Maybe they should just double the RAM? Adding another 8 GB should be like $20.

I know, why not both.
Uh, because users might be upset if the next generation of Siri only works on iPhone 2024 and later models?
 

Le0M

macrumors 6502a
Aug 13, 2020
870
1,211
I use Chat GPT for any kind of questions, like cooking, gardening, math and geometry, you name it. I often use the speech-to-speech way of interacting with it.
If Apple can provide such AI - that can answer questions on any subject - I'm gonna be it's number 1 fan.
Any other AI integration with apps, honestly, I couldn't care less.
 

name99

macrumors 68020
Jun 21, 2004
2,283
2,139
Doesn't flash storage eventually wear out from repeated use? Your iPhone's working life is going to get that much shorter if it's used this way. :eek:
To the extent that flash wears out from "repeated use" (a somewhat contentious point probably of dubious relevance to anything outside 24 hour a day data center operation) the wear-out is induced by WRITING not reading...
 

wigby

macrumors 68030
Jun 7, 2007
2,780
2,763
Maybe they should just double the RAM? Adding another 8 GB should be like $20.

I know, why not both.
If you double the RAM you are also filling it up more memory constantly which eats more battery life. It matters more how you are delegating which data is prioritized.
 

Le0M

macrumors 6502a
Aug 13, 2020
870
1,211
This is very neat research, but I think the suggestion from the headline, that this will bring LLMs to iPhones, is a stretch. The test platform was a M1 Max with an unspecified amount of DRAM. My educated guess is that it had at least 16GB of DRAM, likely even more. This is not iPhone territory.

I said it before and will repeat it now. Making RAM a luxury with extremely overpriced upgrades will bite Apple, or rather us customers, in the ass.
We'll probably see via-web applegpt on iPhones and iPads, and only on powerful Macs, the on-device one.
 

erthquake

macrumors regular
Oct 11, 2011
213
201
They are competing with sending data, remotely to a server, process it and receiving the response back to the phone, which would take at least 100 ms and usually more time. But being able to process it on the phone, could compete with the servers connection time.

Moreover, AppleGPT can have at least an order of magnitude fewer parameters than GPT-4 and other foundational LLMs, and still be lightyears ahead of where Siri is now. You don't need a compression of *all* of human knowledge to get Siri to understand requests for stores, websites, information, opening apps and other relatively simple shortcuts and task; a much smaller, but well-tuned model should be sufficient.
 
  • Like
Reactions: System603

Macalicious2011

macrumors 68000
May 15, 2011
1,759
1,789
London
- we have no real idea how a rough number like "parameters" translates to performance. There are rough scaling "laws" but those all seem to be based on applying more parameters to a constant architecture. Once you change the architecture, or switch to something different like a mixture of models, the scaling changes.
Well said. For some use-cases RAG + Prompt engineering will yield superior performance to multiplying the number of parameters. However, the latter has probably not reached diminishing returns yet as it's early days for LLMs.

I'm developing an LLM based product the improvement in output from gpt3.5 turbo to gpt4-turbo was profound and unimagined for my application.

Exciting times ahead!
 

HylianKnight

macrumors 6502
Jul 18, 2017
465
490
NVMe SSD read speed is about 3GB/s. It is at least an order of magnitude less than conventional RAM and 2-3 orders of magnitude less than embedded RAM for apple silicon or GPU RAM. Don't see how they can make it work at a level of modern LLM chats.
The latest NVMes offer an order of magnitude increase in read speeds. Coupled with the methods described in the article, it is entirely possible their LLM would function well, and at least be quicker and more accurate than the current implementation of Siri.
 

Macalicious2011

macrumors 68000
May 15, 2011
1,759
1,789
London
Ok. And what is the point of this ?
Beyond AI hype and FOMO of course?
The goal is the ability to run LLM based apps on-device instead of:
-Paying a subscription fee for a cloud based LLM app like chatgpt that's powered by super expensive Nvidia chips.
-Accessing a free cloud based LLM app that either train their ai model using your data or compromises the security of your data.

Think of it like paying to edit photos/videos in the cloud versus using the power of the chips in your phone.
 

picpicmac

macrumors 65816
Aug 10, 2023
1,092
1,540
The AI models (LLM) need upwards of 64 GB.
Given current designs, LLMs will need to be much larger than that to interact with humans with the large set of human spoken and written languages in the manner that humans use language.

Current products like ChatGPT are overgrown sentence completion machines, and they've been trained on some subset of English (or Japanese or Arabic or ....) as the set of tokens.

It's not an issue of word-for-word types of translations - chatGPT can already do that. Google Translate can already do that.

The issue is whether a model can embed the thought-patterns, the culture, of a language, which keeps these LLMs from being true AI.

ChatGPT can complete a sentence for many (50?) languages. But that does not mean it thinks in those languages.

Any future application, from Apple or from whomever, to be truly interactive will likely depend upon a much larger RAM/resource capability than current mobile devices.
 

leonremi

macrumors member
May 12, 2017
90
161
The goal is the ability to run LLM based apps on-device instead of:
-Paying a subscription fee for a cloud based LLM app like chatgpt that's powered by super expensive Nvidia chips.
-Accessing a free cloud based LLM app that either train their ai model using your data or compromises the security of your data.

Think of it like paying to edit photos/videos in the cloud versus using the power of the chips in your phone.
My point is not about local vs cloud based LLM.
It’s. About the uselessness of it all.
I can already assign ChatGPT audio to the action button on my 15 pro max…and it’s just a chatty mostly useless bot.
AI is so overhyped (at least the chat bots)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.