OS X New witcher announced, migration to UE5, future of mac gaming

MysticCow · Apr 10, 2022

GrumpyCoder said:
Since Nvidia doesn't work on modern Mac hardware anymore, there's no need to run Metal on Nvidia GPUs. Nvidia isn't coming back and neither is AMD to stay, they're the past for Apple.

Well, you can only piss off the other children so many times before they realize that it isn’t worth it…and they stop playing with you.

Irishman · Apr 11, 2022

I just finished watching this video, so it seems like Lumen support for Mac is rolling out - at least in Lyra - in higher settings.

Now, do Nanite next!!

diamond.g · Apr 11, 2022

Irishman said:
I just finished watching this video, so it seems like Lumen support for Mac is rolling out - at least in Lyra - in higher settings.

Now, do Nanite next!!

Someone said that Nanite requires 64bit atomics (which Metal doesn't support) so we will have to wait until WWDC to see if that get added by Apple.

Irishman · Apr 11, 2022

diamond.g said:
Someone said that Nanite requires 64bit atomics (which Metal doesn't support) so we will have to wait until WWDC to see if that get added by Apple.

Do you recall where you learned that?

Something for Metal 3.0, I guess.

diamond.g · Apr 11, 2022

Irishman said:
Do you recall where you learned that?

Something for Metal 3.0, I guess.

From the 3D Rendering thread:
#675

GrumpyCoder · Apr 13, 2022

Irishman said:
Something for Metal 3.0, I guess.

Hm... not sure if it's actually the case (I'd have to dig into the Nanite docs or request the info from Epic. Anyone got a link to the docs saying this?), but... old OpenCL comes to mind which could not guarantee atomics when accessed via two devices. This makes me wonder how atomics are handled on the SoC which contains CPU and GPU, but is using the unified memory.

Also (thinking about past headaches), atomics can introduce other issues. I mean it's nice that stuff is atomic, but latency hiding comes into play with GPU context switching. It can be a bit of a pain. Atomic types and functions in Metal are a subset of C++14 so far. One can also use threadgroups and SIMD synchronization (somewhere in the Metal specs). In other words, it's a been a little cumbersome in the past. Time will tell where things go. I'll try to play around a little with UE5 next week or so.

diamond.g · Apr 13, 2022

GrumpyCoder said:
Hm... not sure if it's actually the case (I'd have to dig into the Nanite docs or request the info from Epic. Anyone got a link to the docs saying this?), but... old OpenCL comes to mind which could not guarantee atomics when accessed via two devices. This makes me wonder how atomics are handled on the SoC which contains CPU and GPU, but is using the unified memory.

Also (thinking about past headaches), atomics can introduce other issues. I mean it's nice that stuff is atomic, but latency hiding comes into play with GPU context switching. It can be a bit of a pain. Atomic types and functions in Metal are a subset of C++14 so far. One can also use threadgroups and SIMD synchronization (somewhere in the Metal specs). In other words, it's a been a little cumbersome in the past. Time will tell where things go. I'll try to play around a little with UE5 next week or so.

Here is the hardware reqs:

Hardware and Software Specifications for Unreal Engine | Unreal Engine 5.0 Documentation

Nanite is calls (specifically) VK_KHR_shader_atomic_int64 or Shader Model 6.6 atomics

HLSL Shader Model 6.6 Atomic Operations

Engineering specs for DirectX features.

microsoft.github.io

VK_KHR_shader_atomic_int64(3)

Now Epic doesn't say (at least publicly) what part of those features they need explicitly so YMMV.

GrumpyCoder · Apr 14, 2022

diamond.g said:
Now Epic doesn't say (at least publicly) what part of those features they need explicitly so YMMV.

Thank you. Probably hard to say what they do with it, but I'd bet on generating new mesh geometry and making sure it's always "valid".

diamond.g · Apr 16, 2022

GrumpyCoder said:
Thank you. Probably hard to say what they do with it, but I'd bet on generating new mesh geometry and making sure it's always "valid".

https://forum.beyond3d.com/posts/2248798/
Looks like Mac hardware (not just API wise) doesn’t support 64 bit atomics. Maybe M2 will? I wonder if that is why Lyra’s performance is meh (or is it purely Lumen GI) On macOS.

GrumpyCoder · Apr 16, 2022

diamond.g said:
https://forum.beyond3d.com/posts/2248798/
Looks like Mac hardware (not just API wise) doesn’t support 64 bit atomics. Maybe M2 will? I wonder if that is why Lyra’s performance is meh (or is it purely Lumen GI) On macOS.

Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).

That being said, I don't see why Epic couldn't do this in software. And I don't see why Apple couldn't add a workaround in LLVM either. Sure, with some overhead but nothing that would be a deal killer (when thinking about this for a second). The question is the same old, why bother for a really small fraction of the market? Apple might not care and Epic can use it out of the box on the "relevant" hardware the target market is using.

Edit: Just stumbled across this, put it on my reading list for next week: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf

diamond.g · Apr 16, 2022

GrumpyCoder said:
Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).

That being said, I don't see why Epic couldn't do this in software. And I don't see why Apple couldn't add a workaround in LLVM either. Sure, with some overhead but nothing that would be a deal killer (when thinking about this for a second). The question is the same old, why bother for a really small fraction of the market? Apple might not care and Epic can use it out of the box on the "relevant" hardware the target market is using.

Well it is weird because Lyra seems to work and that does use Nanite. So there must be some sort of fallback.

Irishman · Apr 16, 2022

diamond.g said:
Well it is weird because Lyra seems to work and that does use Nanite. So there must be some sort of fallback.

So, we have Lumen AND Nanite in Lyra. Has anybody checked if it’s still running under Rosetta 2?

GrumpyCoder · Apr 19, 2022

diamond.g said:
So no one has been able to compile the demo on MacOS? These results above are kind of brutal for the hardware involved.

Small update. No the demo won't work on macOS as the engine version is reported to be not compatible.

That being said, I have not tried to run it under Windows, copy everything required over manually and fiddle around with the project files to make it think it's running on a proper Windows engine. Chances are usually slim this will work. I'm booking this one under "Windows only".

diamond.g · Apr 19, 2022

GrumpyCoder said:
Small update. No the demo won't work on macOS as the engine version is reported to be not compatible.

That being said, I have not tried to run it under Windows, copy everything required over manually and fiddle around with the project files to make it think it's running on a proper Windows engine. Chances are usually slim this will work. I'm booking this one under "Windows only".

I wonder what makes Lyra different, as it runs on macOS without Nanite (even though we know it uses Nanite per the videos Epic released talking about the demo).

GrumpyCoder · Apr 19, 2022

diamond.g said:
I wonder what makes Lyra different, as it runs on macOS without Nanite (even though we know it uses Nanite per the videos Epic released talking about the demo).

No custom shaders probably. The Lyra shaders are building and preparing as I type this. I don't think Nanite is working in macOS, but I'll try later/tomorrow to confirm.

leman · Apr 26, 2022

GrumpyCoder said:
Since Nvidia doesn't work on modern Mac hardware anymore, there's no need to run Metal on Nvidia GPUs. Nvidia isn't coming back and neither is AMD to stay, they're the past for Apple.

The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.

GrumpyCoder said:
Hmm... hard to say. M1 already supports ARMv8.4, so it should be there. Could Apple be foolish enough to break this? The GPU instruction set is usually hand in hand with the micro architecture of the CPU and the driver + API provide the option to run code on the GPU.

I think (not 100% sure) this functionality came with ARMv8.1 back in 2014 (hence the Metal atomics are a subset of C++14), so if it's not there in GPU hardware on M1 today I'd bet either there is a technical limitation they don't offer it or it's the old "do it the Apple way" and they're not interested in it. I can't believe they didn't think about it when they started designing the SoCs. As for M2, I don't think that's going to happen, M3 seems more likely (if it happens at all).

GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.

diamond.g · Apr 26, 2022

leman said:
The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.

GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.

How would you change nanite to not rely on 64-bit atomics?

leman · Apr 26, 2022

diamond.g said:
How would you change nanite to not rely on 64-bit atomics?

How would I know? I have no idea how it works and why it needs 64-bit atomics in the first place.

diamond.g · Apr 26, 2022

leman said:
How would I know? I have no idea how it works and why it needs 64-bit atomics in the first place.

Ah

https://twitter.com/i/web/status/1425729862225047554

So is there an alternate way to pack Z to high bits and payload to low bits?

leman · Apr 26, 2022

diamond.g said:
Ah

https://twitter.com/i/web/status/1425729862225047554

So is there an alternate way to pack Z to high bits and payload to low bits?

Thanks! I just had a quick look at a Nanite talk (https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf) — they are using software rasterisation (on the GPU) to render a lot of small triangles quicker than the hardware can. They need atomics to correctly update the rasterised values, and they need more than 32bits to fit the stuff in.

No idea whether this can be circumvented with smaller atomic operations. Maybe one can use smaller granularity per-pixel locks (no idea how feasible it is, and it's likely to be slow anyway)? What is the performance when rendering pixel-sized triangles on Apple GPUs anyway? Maybe it's less of a performance hit than other GPUs. No clue.

P.S. It does seem like the algorithm has been designed to overcome the limitations of a typical forward rasteriser. Maybe things can be done differently in the TBDR model, I don't know. Their approach does sound very similar to Apple's tile shading to me. Maybe one can use lower-level GPU shared memory synchronisation primitives to do this style of micropoly rasterization on Apple architecture and just stream out the tile once they are done. But my knowledge in this are is rudimentary at best.

diamond.g · Apr 26, 2022

leman said:
Thanks! I just had a quick look at a Nanite talk (https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf) — they are using software rasterisation (on the GPU) to render a lot of small triangles quicker than the hardware can. They need atomics to correctly update the rasterised values, and they need more than 32bits to fit the stuff in.

No idea whether this can be circumvented with smaller atomic operations. Maybe one can use smaller granularity per-pixel locks (no idea how feasible it is, and it's likely to be slow anyway)? What is the performance when rendering pixel-sized triangles on Apple GPUs anyway? Maybe it's less of a performance hit than other GPUs. No clue.

Yeah, it is weird, there has to be a "fallback" otherwise how can Lyra work on MacOS?

GrumpyCoder · Apr 26, 2022

leman said:
The more I dig, the more I am getting convinced that Apple dropped Nvidia because of Nvidia’s hardware limitations. For example, Mantle resource binding model is very similar to Metal, but Vulkan and DX12 use a more primitive, less flexible model. Since Intel also supports Metal’s model, the only conclusion is that that Nvidia is the main reason.

Hm, not sure. On the other hands, that's on Apple designing Metal. They made that choice and if they deemed Nvidia an acceptable loss, than that's the way it is. They've had a lot of heat behind the scenes though and given the extreme shortage of Nvidia GPUs in the past two years, they probably made the right choice.

leman said:
GPU has nothing to do with the CPU, they run very different microarchitecture a abs a different ISA. And Apple GPUs are extremely streamlined and do not support many things that are a given in the desktop space. They have no 64Bit atomic on hardware level and no synchronization primitives between threadgroups. It does make some things awkward to implement.

Sure CPU and GPU are different things, but when the CPU as a base supports atomics and you "attach" the GPU to that, surely they must have thought about it. Others offer it as well, so for whatever reason they decided it's not necessary. Again, their choice, but when it's something developers requested and make good use of it, then it might not have been a smart decision building a open and widely adopted system. And so here we are with a "broken" UE5 and Epic doesn't really care about it.

diamond.g said:
Yeah, it is weird, there has to be a "fallback" otherwise how can Lyra work on MacOS?

I'll still didn't get around playing with it, but I'm also not really convinced Nanite is working as expected on macOS. Hopefully I can try this week.

leman · Apr 26, 2022

GrumpyCoder said:
Sure CPU and GPU are different things, but when the CPU as a base supports atomics and you "attach" the GPU to that, surely they must have thought about it. Others offer it as well, so for whatever reason they decided it's not necessary. Again, their choice, but when it's something developers requested and make good use of it, then it might not have been a smart decision building a open and widely adopted system.

Atomic 64-bit operations are a fairly recent additions to the GPUs. I had a quick look and it only seems to be exposed on newer AMD and Nvidia devices. In hinsight, yes, you are right, it appears that lack of these operations was an oversight, but it wouldn’t have been easy to predict that this kind of fairly obscure compute shader feature would become the basis of 3D game rendering in the future

Apple does support the common 32-bit atomics.

I still wonder whether this software rasterization technique can be implemented more efficiently leveraging Apple’s tile shaders.

GrumpyCoder · Apr 26, 2022

leman said:
I still wonder whether this software rasterization technique can be implemented more efficiently leveraging Apple’s tile shaders.

Don't know. Even if it's not more efficient, maybe there's little overhead. And of course there are ways around it for CPUs/compilers, how easy such a thing could be done for Apple GPUs, I don't know. I guess no one outside of the Apple engineers knows and they are not sharing.

Time will tell what happens and in the meantime I'll just use Windows for 3D work/games.

diamond.g · May 2, 2022

New Deus Ex using UE5....

OS X New witcher announced, migration to UE5, future of mac gaming

macrumors 68000

macrumors 68040

macrumors G4

macrumors 68040

macrumors G4

macrumors 68020

macrumors G4

macrumors 68020

macrumors G4

macrumors 68020

macrumors G4

macrumors 68040

macrumors 68020

macrumors G4

macrumors 68020

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors 68020

macrumors Core

macrumors 68020

macrumors G4

Our Staff