Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MysticCow

macrumors 68000
May 27, 2013
1,561
1,740
Since Nvidia doesn't work on modern Mac hardware anymore, there's no need to run Metal on Nvidia GPUs. Nvidia isn't coming back and neither is AMD to stay, they're the past for Apple.

Well, you can only piss off the other children so many times before they realize that it isn’t worth it…and they stop playing with you.
 

Irishman

macrumors 68040
Nov 2, 2006
3,393
843
I just finished watching this video, so it seems like Lumen support for Mac is rolling out - at least in Lyra - in higher settings.


Now, do Nanite next!!
 

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
I just finished watching this video, so it seems like Lumen support for Mac is rolling out - at least in Lyra - in higher settings.


Now, do Nanite next!!
Someone said that Nanite requires 64bit atomics (which Metal doesn't support) so we will have to wait until WWDC to see if that get added by Apple.
 
  • Like
Reactions: Irishman

Irishman

macrumors 68040
Nov 2, 2006
3,393
843
Someone said that Nanite requires 64bit atomics (which Metal doesn't support) so we will have to wait until WWDC to see if that get added by Apple.

Do you recall where you learned that?

Something for Metal 3.0, I guess.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
Something for Metal 3.0, I guess.
Hm... not sure if it's actually the case (I'd have to dig into the Nanite docs or request the info from Epic. Anyone got a link to the docs saying this?), but... old OpenCL comes to mind which could not guarantee atomics when accessed via two devices. This makes me wonder how atomics are handled on the SoC which contains CPU and GPU, but is using the unified memory.

Also (thinking about past headaches), atomics can introduce other issues. I mean it's nice that stuff is atomic, but latency hiding comes into play with GPU context switching. It can be a bit of a pain. Atomic types and functions in Metal are a subset of C++14 so far. One can also use threadgroups and SIMD synchronization (somewhere in the Metal specs). In other words, it's a been a little cumbersome in the past. Time will tell where things go. I'll try to play around a little with UE5 next week or so.
 
  • Like
Reactions: Irishman

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
Hm... not sure if it's actually the case (I'd have to dig into the Nanite docs or request the info from Epic. Anyone got a link to the docs saying this?), but... old OpenCL comes to mind which could not guarantee atomics when accessed via two devices. This makes me wonder how atomics are handled on the SoC which contains CPU and GPU, but is using the unified memory.

Also (thinking about past headaches), atomics can introduce other issues. I mean it's nice that stuff is atomic, but latency hiding comes into play with GPU context switching. It can be a bit of a pain. Atomic types and functions in Metal are a subset of C++14 so far. One can also use threadgroups and SIMD synchronization (somewhere in the Metal specs). In other words, it's a been a little cumbersome in the past. Time will tell where things go. I'll try to play around a little with UE5 next week or so.
Here is the hardware reqs:

Nanite is calls (specifically) VK_KHR_shader_atomic_int64 or Shader Model 6.6 atomics

Now Epic doesn't say (at least publicly) what part of those features they need explicitly so YMMV.
 
  • Like
Reactions: GrumpyCoder

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
https://forum.beyond3d.com/posts/2248798/
Looks like Mac hardware (not just API wise) doesn’t support 64 bit atomics. Maybe M2 will? I wonder if that is why Lyra’s performance is meh (or is it purely Lumen GI) On macOS.
Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).

That being said, I don't see why Epic couldn't do this in software. And I don't see why Apple couldn't add a workaround in LLVM either. Sure, with some overhead but nothing that would be a deal killer (when thinking about this for a second). The question is the same old, why bother for a really small fraction of the market? Apple might not care and Epic can use it out of the box on the "relevant" hardware the target market is using.


Edit: Just stumbled across this, put it on my reading list for next week: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
 
Last edited:

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).

That being said, I don't see why Epic couldn't do this in software. And I don't see why Apple couldn't add a workaround in LLVM either. Sure, with some overhead but nothing that would be a deal killer (when thinking about this for a second). The question is the same old, why bother for a really small fraction of the market? Apple might not care and Epic can use it out of the box on the "relevant" hardware the target market is using.
Well it is weird because Lyra seems to work and that does use Nanite. So there must be some sort of fallback.
 
  • Like
Reactions: Irishman

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
So no one has been able to compile the demo on MacOS? These results above are kind of brutal for the hardware involved.
Small update. No the demo won't work on macOS as the engine version is reported to be not compatible.

That being said, I have not tried to run it under Windows, copy everything required over manually and fiddle around with the project files to make it think it's running on a proper Windows engine. Chances are usually slim this will work. I'm booking this one under "Windows only".
 
  • Sad
Reactions: diamond.g

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
Small update. No the demo won't work on macOS as the engine version is reported to be not compatible.

That being said, I have not tried to run it under Windows, copy everything required over manually and fiddle around with the project files to make it think it's running on a proper Windows engine. Chances are usually slim this will work. I'm booking this one under "Windows only".
I wonder what makes Lyra different, as it runs on macOS without Nanite (even though we know it uses Nanite per the videos Epic released talking about the demo).
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
I wonder what makes Lyra different, as it runs on macOS without Nanite (even though we know it uses Nanite per the videos Epic released talking about the demo).
No custom shaders probably. The Lyra shaders are building and preparing as I type this. I don't think Nanite is working in macOS, but I'll try later/tomorrow to confirm.
 

leman

macrumors Core
Oct 14, 2008
19,202
19,062
Since Nvidia doesn't work on modern Mac hardware anymore, there's no need to run Metal on Nvidia GPUs. Nvidia isn't coming back and neither is AMD to stay, they're the past for Apple.

The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.

Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).


GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.
 

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.




GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.
How would you change nanite to not rely on 64-bit atomics?
 
  • Like
Reactions: Irishman

leman

macrumors Core
Oct 14, 2008
19,202
19,062
Ah

So is there an alternate way to pack Z to high bits and payload to low bits?

Thanks! I just had a quick look at a Nanite talk (https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf) — they are using software rasterisation (on the GPU) to render a lot of small triangles quicker than the hardware can. They need atomics to correctly update the rasterised values, and they need more than 32bits to fit the stuff in.

No idea whether this can be circumvented with smaller atomic operations. Maybe one can use smaller granularity per-pixel locks (no idea how feasible it is, and it's likely to be slow anyway)? What is the performance when rendering pixel-sized triangles on Apple GPUs anyway? Maybe it's less of a performance hit than other GPUs. No clue.

P.S. It does seem like the algorithm has been designed to overcome the limitations of a typical forward rasteriser. Maybe things can be done differently in the TBDR model, I don't know. Their approach does sound very similar to Apple's tile shading to me. Maybe one can use lower-level GPU shared memory synchronisation primitives to do this style of micropoly rasterization on Apple architecture and just stream out the tile once they are done. But my knowledge in this are is rudimentary at best.
 
Last edited:

diamond.g

macrumors G4
Mar 20, 2007
11,115
2,445
OBX
Thanks! I just had a quick look at a Nanite talk (https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf) — they are using software rasterisation (on the GPU) to render a lot of small triangles quicker than the hardware can. They need atomics to correctly update the rasterised values, and they need more than 32bits to fit the stuff in.

No idea whether this can be circumvented with smaller atomic operations. Maybe one can use smaller granularity per-pixel locks (no idea how feasible it is, and it's likely to be slow anyway)? What is the performance when rendering pixel-sized triangles on Apple GPUs anyway? Maybe it's less of a performance hit than other GPUs. No clue.
Yeah, it is weird, there has to be a "fallback" otherwise how can Lyra work on MacOS?
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.
Hm, not sure. On the other hands, that's on Apple designing Metal. They made that choice and if they deemed Nvidia an acceptable loss, than that's the way it is. They've had a lot of heat behind the scenes though and given the extreme shortage of Nvidia GPUs in the past two years, they probably made the right choice.
GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.
Sure CPU and GPU are different things, but when the CPU as a base supports atomics and you "attach" the GPU to that, surely they must have thought about it. Others offer it as well, so for whatever reason they decided it's not necessary. Again, their choice, but when it's something developers requested and make good use of it, then it might not have been a smart decision building a open and widely adopted system. And so here we are with a "broken" UE5 and Epic doesn't really care about it.
Yeah, it is weird, there has to be a "fallback" otherwise how can Lyra work on MacOS?
I'll still didn't get around playing with it, but I'm also not really convinced Nanite is working as expected on macOS. Hopefully I can try this week.
 
  • Like
Reactions: Irishman

leman

macrumors Core
Oct 14, 2008
19,202
19,062
Sure CPU and GPU are different things, but when the CPU as a base supports atomics and you "attach" the GPU to that, surely they must have thought about it. Others offer it as well, so for whatever reason they decided it's not necessary. Again, their choice, but when it's something developers requested and make good use of it, then it might not have been a smart decision building a open and widely adopted system.

Atomic 64-bit operations are a fairly recent additions to the GPUs. I had a quick look and it only seems to be exposed on newer AMD and Nvidia devices. In hinsight, yes, you are right, it appears that lack of these operations was an oversight, but it wouldn’t have been easy to predict that this kind of fairly obscure compute shader feature would become the basis of 3D game rendering in the future :) Apple does support the common 32-bit atomics.

I still wonder whether this software rasterization technique can be implemented more efficiently leveraging Apple’s tile shaders.
 
  • Like
Reactions: Irishman

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
I still wonder whether this software rasterization technique can be implemented more efficiently leveraging Apple’s tile shaders.
Don't know. Even if it's not more efficient, maybe there's little overhead. And of course there are ways around it for CPUs/compilers, how easy such a thing could be done for Apple GPUs, I don't know. I guess no one outside of the Apple engineers knows and they are not sharing.

Time will tell what happens and in the meantime I'll just use Windows for 3D work/games.
 
  • Like
Reactions: Irishman
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.