Apple Releases Open Source AI Models That Run On-Device

ChrisA · Apr 25, 2024

Populus said:
[Hey ChatGPT, please generate a comment in the style of a typical MacRumors average user, with a touch of acid humor, regarding this piece of news.]

"Oh great, Apple's finally joining the open-source party—just a decade late and probably still with strings attached somewhere in those 'open' terms. They're throwing us a bone with OpenELM, but let's be real, they’re probably just doing it to lure in some AI hotshots tired of their corporate overlords. Now we just have to sit back and wait for iOS 18, where they'll inevitably limit these models to the latest hardware, forcing us all to upgrade. Because, you know, my current iPhone can't possibly handle a couple more AI tricks without combusting."

This is 100% accurate. GPT has mimicked the typical MR comment where the commenter did not even bother to read the code that was posted on Github but still thinks he can comment on it. And in the process makes a total fool of himself.

It is like a radio talk show host who is interviewing a book author but did not bother to read the book, he looks like a fool as he is unable to say anything relevant.

What I see in the released product is some interesting things (1) They use Linux. (2) PyTorch seems to have good performance on Apple hardware and (3) obviously, this is not Siri or even a replacement for it. (4) it seems to be Apple's proof of concept project for doing billion-parameter on-device transformer models. (5) one purpose might be to show AI researchers that it is possible to do this kind of work using Apple hardware rather than Nvidia hardware. In fact it is convincing enough that I'm going to try using my M2-Pro and see how it compares to mid-range Nvidia GPU.

Apple's goal might be to get some complainers to stop winning about "Why doesn't Apple Silicon allow you to use Nvidia GPUs?" To do that Apple needs to publish software that runs on-device that outperforms Nvidia. I doubt this will do that in an absolute way. But it just might be that a $2,000 M3 will perform as well as a $2,000 Gammer PC with Nvidia GPU inside. I don't know yet. But this is the kind of thing that makes people ask the question.

vantelimus · Apr 25, 2024

bluecoast said:
I wonder if Macs with 8GB memory will be able to run whatever Apple bakes into macOS based on it?

I’d be surprised if even the 15 pro will be able to run anything based on this.

New iPhones for everyone. Hopes Tim Cook.

I’ve done experiments on an iPhone 15 pro max using an 8B llama. It is slow, but usable. A model optimized for Apple devices and the Apple Neural Engine should be capable of achieving usable (i.e. human-level) performance on current models. Recent research on 1-bit matrices suggests that models could get much faster sooner rather than later.

Populus · Apr 25, 2024

ChrisA said:
What I see in the released product is some interesting things (1) They use Linux. (2) PyTorch seems to have good performance on Apple hardware and (3) obviously, this is not Siri or even a replacement for it. (4) it seems to be Apple's proof of concept project for doing billion-parameter on-device transformer models. (5) one purpose might be to show AI researchers that it is possible to do this kind of work using Apple hardware rather than Nvidia hardware. In fact it is convincing enough that I'm going to try using my M2-Pro and see how it compares to mid-range Nvidia GPU.

Apple's goal might be to get some complainers to stop winning about "Why doesn't Apple Silicon allow you to use Nvidia GPUs?" To do that Apple needs to publish software that runs on-device that outperforms Nvidia. I doubt this will do that in an absolute way. But it just might be that a $2,000 M3 will perform as well as a $2,000 Gammer PC with Nvidia GPU inside. I don't know yet. But this is the kind of thing that makes people ask the question.

I agree, the key here is a on-device model adapted to Apple’s own hardware. If you finally try it on your M2 Pro, let us know how it runs!

MrTemple · Apr 25, 2024

The thing I'm most excited about WWDC in June is that Siri is almost certainly going to have LLM overhaul so that it will actually understand what you're asking it, even complex, chained requests.

That to me is one of the most impressive aspects of LLMs. That they can actually understand what we are saying almost however we talk to them.

Give a smartphone's LLM a list of commands it can run (shortcuts, settings, info retrieval, 3rd party app 'intents'/actions, etc), and the capability goes through the roof. It was such a convoluted mess of request parsing before, and that's all needless now.

We're like three years from this being just a normal way we interact with the apps and tech on our phones:

Siri, John and I are going for Thai again. Make a reservation for 8, and remind me to get dressed a half-hour before I need to leave. Oh and I have to stop at the post office on the way to picking him up, I'll only be a sec in there. Until then I'm going to have a nap, you know what to do, hold my calls, set the lights, etc. When you're done, text John when I'll be by.

The era of having to know how to interact with a device in order to perform its basic features is going the way of the dodo.

We're going to see the first step in that direction for iPhones in June, with release this Fall.

ChrisA · Apr 25, 2024

I'm reading through the big pile of code on Github and found a real gem...

Some Apple software engineer wrote that "Slurm does not yet work with CoreML". This was his explanation for using Linux.

The key word here is "yet". This might point at Apple's solution to the problem of Apple Silicon's limited performance. Slurm is used on most of the world's largest super computers to distribute a large problem over thousands of regular computers and in effect create one "supercomputer"

We complain that the Mac Studio is the top of the line and we don't see improvement happening very fast. But what if Apple had a system where you buy Mac Studios in bulk and connected dozens of them or even hundreds of them and then were able to distribute the work over all of those computers?

We can see (in the open source code) that Apple is using a small cluster of Linux machines managed with Slurm and I think we can infer that they plan on doing the same with a cluster of Apple computers because one developer commented on why this is not being done now.

Populus · Apr 25, 2024

MrTemple said:
The thing I'm most excited about WWDC in June is that Siri is almost certainly going to have LLM overhaul so that it will actually understand what you're asking it, even complex, chained requests.

That to me is one of the most impressive aspects of LLMs. That they can actually understand what we are saying almost however we talk to them.

Give a smartphone's LLM a list of commands it can run (shortcuts, settings, info retrieval, 3rd party app 'intents'/actions, etc), and the capability goes through the roof. It was such a convoluted mess of request parsing before, and that's all needless now.

We're like three years from this being just a normal way we interact with the apps and tech on our phones:

The era of having to know how to interact with a device in order to perform its basic features is going the way of the dodo.

We're going to see the first step in that direction for iPhones in June, with release this Fall.

Getting closer to Apple’s 1987 Knowledge Navigator

bobdobalina · Apr 25, 2024

Arislan said:
And does that 1.76 trillion parameter model run on a phone without cloud servers?

The 16 Pro Max Ultra upgrades max out at $65,000. We think you're going to love it.

Tubamajuba · Apr 25, 2024

d-klumpp said:
As an Apple enthusiast since the 70s but nonetheless not a coder or electrical engineer, I come here to hear the latest rumors on tech to which I might look forward to and to get informative reactions, maybe even from professionals within the fields of computing or consumer electronics. Unfortunately, too often most commenters offer only offer rants with little merit and no insight. Why are they here? Why do I waste my own time even scanning these comments?

I check back here every few months somehow hoping things have gotten better, but it's all the same "stick in the mud" pessimism and negativity every time. Like, an article talking about a new third party iPhone case inevitably will have several people making completely irrelevant complaints about Apple "losing their way" and Tim Cook being a terrible CEO.

cdsapplefan · Apr 25, 2024

Apple is putting all there eggs 🪺 into this AI basket, hopefully 🤞 it doesn’t bite them, but I trust Apple from there previous history but that was under the Great Pioneer Steve Jobs. This AI push will be on Cooks watch. Good luck

AMacHasNoName · Apr 26, 2024

grantishere said:
It's clear they are behind and need help. OpenAI's GPT-4 has 1.76 trillion parameters. Apple's OpemELM model has 3 billions parameters.

GPT-4 would never run on a device and even if you quantized the model down to make it more plausible, the amount of memory you would need would exceed that which the average phone (and computer in many cases) would have available. On top of that, models with large parameter counts have high latency in response - which is why you generally see different model families with 7B, 40B, 1T, etc. For many use cases, using the 7Billion parameter model is more than sufficient and with many of the different "expert" family systems like Mixtral - the whole idea is to have a set of small models with lower counts that can work together to solve a problem as opposed to one big massive model that requires hundreds of A100s to work properly.

name99 · Apr 26, 2024

AMacHasNoName said:
GPT-4 would never run on a device and even if you quantized the model down to make it more plausible, the amount of memory you would need would exceed that which the average phone (and computer in many cases) would have available. On top of that, models with large parameter counts have high latency in response - which is why you generally see different model families with 7B, 40B, 1T, etc. For many use cases, using the 7Billion parameter model is more than sufficient and with many of the different "expert" family systems like Mixtral - the whole idea is to have a set of small models with lower counts that can work together to solve a problem as opposed to one big massive model that requires hundreds of A100s to work properly.

How many of those 1.76T (or 7B) parameters are non-zero? How many NEED to be non-zero?

People are making extremely strong claims about what *must* or *must not* be possible without even knowing the answers to such basic questions. If these models can operate well at, say, 10% sparsity, that suggests very different encoding of the weights for each layer, along with a very different design of accelerators.
What I see is we're in the same sort of situation as we were with, say, QuickTime 1.0, where RPZA was considered state of the art in video compression. And the same sort of crowd are happy to speak up with VERY strong claims about the technology based on exactly fsckall knowledge of what's essential, what can be aggressively modified, what hasn't even been tried yet... A little learning is a dangerous thing.

Right now most of these models are structured in a way that is optimal for nVidia hardware (so "structured sparsity at about 50%"). Which is perfectly understandable, and it's perfectly understandable why people using nV hardware haven't yet had the time (or any good reason) to push for serious engineering-style optimizations of the models.
But that is VERY DIFFERENT from claiming that something put together as fast as possible to run on specific nV hardware is *intrinsically* of the form it currently takes...

purplerainpurplerain · Apr 26, 2024

You won’t need an LLM for 90% of what you do on a phone. Voice commands to change settings, create an alarm or appointment, or play a song doesn’t need an LLM. Neither does text to speech or a talking weather forecast.

I wouldn’t ever post with an LLM or send an email with one either. I’m going to use my own brain and talk with my heart to people come rain or come shine whether I’m saying something wise or wrong.

The best thing is when I’m 60-70 years old I won’t be a total vegetable because I carried on exercising my brain for life.

clg82 · Apr 29, 2024

NT1440 said:
Honestly, I think the entire world just spits out the term “innovate” that is meaningless at this point.

Nobody can define it, but managerial types sure like spitting it out as part of justification for just about anything, good or bad (“these layoffs will allow us to focus on innovation!”).

I would say advancements instead, I always thought innovation was doing something in a way not done before, but apparently not 🤷‍♂️

This^

VulchR · Apr 29, 2024

NT1440 said:
Honestly, I think the entire world just spits out the term “innovate” that is meaningless at this point.

Nobody can define it, but managerial types sure like spitting it out as part of justification for just about anything, good or bad (“these layoffs will allow us to focus on innovation!”).

I would say advancements instead, I always thought innovation was doing something in a way not done before, but apparently not 🤷‍♂️

The use of 'innovative' in engineering seems to be like the use of 'novel' in scientific article titles (see https://www.nature.com/articles/nature.2015.19024).

Search

Search

Apple Releases Open Source AI Models That Run On-Device

ChrisA

macrumors G5

vantelimus

macrumors regular

Populus

macrumors 601

MrTemple

macrumors 6502

ChrisA

macrumors G5

Populus

macrumors 601

bobdobalina

macrumors 6502a

Tubamajuba

macrumors 68020

cdsapplefan

macrumors regular

AMacHasNoName

macrumors newbie

name99

macrumors 68020

purplerainpurplerain

macrumors 6502a

clg82

macrumors 6502

VulchR

macrumors 68040

Our Staff