I've worked on iOS AI Apps for several years now and I've looked long and hard at "on device".
What stops anyone with a decent, unique, intelligence model from putting it on device is having it stolen, re-packaged and resold.
iOS is pretty hardened but not unbreakable.
I asked Apple peeps, on the dev forums, and the answer I got was "yeah, that's DRM, it's not perfect"
Apple's own ML API got an encryption feature to protect proprietary models a couple of years ago but it was limited to only what ML supported.
So I don't expect a mad rush to use any on device LLM's unless there's an impressively secure way of locking in and protecting what they contain.
If the new Apple silicon (M4) has an AI focus then it'll no doubt be encased within a pretty strong silicon encryption. Something that Apple has been doing for things like biometrics, banking and ML.
It's not just about how good is your AI silicon is, it's as much about how well secured your AI model is on silicon for on device to succeed.
Apple has a specific patent for DRM associated with neural nets.
They also have a "secure mode" associated with the ANE which limits the extent to which ANE code can spy on anything else (code or data) associated with the ANE. Clearly used by FaceID, but presumably available to other software.
Yes, sure, no-one is claiming perfection on the part of Apple hardware/software. But this is essentially the same tech that protects iPhone video content and FaceID, and no-one's cracked either of those.
My guess is the way this plays out is that many companies, out of paranoia, are going to think just like Seoras, and deliver a lousier experience (slower, less personalized, doesn't work when connectivity is not available) in a way that gives Apple an entry wedge.
Apple Maps was not the best when it shipped, nor was Apple Translate. But by being on every device, and by avoiding the [self-imposed!] limitations of competitors, they were good enough for most people to try and experiment with.
And by the time competitors realize what is happening, that their paranoia is not justified, and that they are paying a dollar in inference compute for every week of customer queries, two years will have passed, and most of the user base will be comfortable with Apple (which will be laughing at all that money saved from on-device rather than on-server inference).
Certainly right now I can put up with the hassle of an existing chatbot (figure out which one, pay a monthly fee, have to out up with constant login prompts); or I can wait for the Apple chatbot.
Frankly I just don't care enough to go beyond waiting for what Apple ships in AppleOS 2024.