Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
Such surprising and welcome news at WWDC 2022! Apple really surprised me and others with their focus on Mac gaming. While time will tell how well the news will be received by the devs I am optimistic about the future of gaming on Apple Silicon despite all the usual doomsday verdicts by PC pessimists about Apple not caring about gaming or devs not caring about Mac. Mac users are all aware of the limitations and benefits of their platform of choice and neither delusional or fool enough to think that Steam’s Mac user base of 2.55% is going to increase dramatically in the future or even less overtake Windows 96,31%, but yesterday was a major step in the right direction.

Last year Apple showed how they had worked for a long time on helping Larian Studios and 4A to optimize Baldur’s Gate 3 and Metro Exodus for Mac. This year they went all in and brought Capcom and Resident Evil Village on stage while they also announced No Man’s Sky. Feral also announced Grid Legends. Although I have limited gaming time and a backlog of Mac games for a decade or two to come I still get very excited about such news, like seeing Capcom go up on stage and praising the hell out of the Apple Silicon:

”With its incredible performance now the Mac with Apple Silicon is a great platform for games. And with the support for new Metal 3, our game screams on Apple Silicon, from the MacBook Air to the blazing-fast Mac Studio. Look how fluid we’re able to move through these hauntingly beautiful scenes. We’re using high-quality textures, geometry, and complex shaders. And with MetalFX upscaling we’re able to render amazing high-resolution visuals across the entire line, with MB Air running effortlessly at 1080p and Mac Studio delivering a breathtaking 4K experience. Previously this was only possible with high-performance consoles and gaming PCs, but we’re now able to bring this to every Mac with Apple Silicon. We’re simply astound by the fidelity these new Macs enabled us to achieve. These new Macs handle whatever we throw at them effortlessly. That’s incredible.”

Sure, they may be paid to praise Apple but technically they were speaking the truth. Metal 3 has removed one of the biggest obstacles for Mac game development. Its functionality is now much closer to DX 12 with the introduction of Mesh shaders, Fast Resource Loading and MetalFX upscaling making it much easier to bring newer games to Mac. Notice that RE Village and Grid Legends are DX 12 only and that was a big problem for porting such games. Codeweavers wrote about this regarding Crossover:

”Generally, games need access to at least one million shader resource views (SRVs). Access to that many SRVs requires resource binding at the Tier 2 level. Metal only supports about 500,000 resources per argument buffer, so Tier 2 resource binding isn’t possible. Metal’s limit of half a million is sufficient for Vulkan descriptor indexing, but not for D3D12. This limitation means CrossOver Mac can't support Tier 2 binding and therefore a lot of DirectX 12 games will not run.”

It seems that Tier 2 resource binding now is possible. Another exciting opportunity is that with the latest RE engine now ported to Mac all the future and past Capcom titles could easily be ported to Mac as well, like Monster Hunter, other RE titles, Devil May Cry, Dead Rising and Street Fighter. RE Village will run much smoother than Metro Exodus for sure. That game used MoltenVK on top of Metal and it ran well even on M1 with 8 GPU cores on medium 1080p. Now RE will be much more optimized than Metro Exodus. I looked at the pc benchmarks and it seems to run well on even old low-end GPUs. As Capcom said it will fly on Apple Silicon.

I found an article by Windows Central saying this is a good opportunity for Apple to turn into an Alienware rival.

"If Apple plays its cards right, MetalFX could turn the Mac into an attractive PC gaming alternative. The attractiveness of Apple's hardware proposition is there, but Apple will need to prove to gamers that it will continue to drive innovation and improvement where it matters for its platform to truly be appealing in the long run. In the PC gaming space, Microsoft and its partners have already shown that they are committed to gaming."

People say Apple must have paid Capcom as if it would be a negative thing. At the same time there are lots of posts saying Apple should pay studios and devs to bring games to Mac. Well, in that case maybe Apple did just that as you wanted. I think a good way for Apple would be to pay for porting game engines to Mac. That would make it much easier to bring games to Mac. Because there are currently lots of game engines lacking Mac support, like Anvil Engine, Blizzard Engine, Creation Engine, CTG Engine, Essence Engine 5, Frostbite, Galactic Engine, Havok, Luminous Engine, Platinum Engine and Snowdrop Engine.

Others say Macs suck for gaming due to the lack of HW ray tracing. Well, RT is only a thing now on the most expensive GPUs and not for the majority even on the pc side, considering the most popular cards on Steam are GTX 1060 and 1650. It will take another 2-3 years before it will be common and lots can happen in just 2 years. Does Apple Silicon ring a bell? Feral couldn’t port Deus Ex MKD to Mac because of Metal 1 and the port was delayed. It became possible when Apple released Metal 2. Or now when Metal 3 was just announced along with new AAA games making it possible for DX 12 games to be ported to Mac. Metal does also already have SW ray tracing and Apple may very well release HW RT soon.

What do you Mac/PC devs think about the news at WWDC? I’m not talking about small one-man devs with no resources to support the Mac or the usual PC pessimists despising the Mac saying there is no money in it despite all the AAA games and franchises ported to Mac now and before. I’m looking for constructive and creative thoughts on the news.

Here are this year’s sessions about Metal: https://developer.apple.com/videos/graphics-games
Mesh shaders: https://developer.apple.com/videos/play/wwdc2022/10162/
Fast Resource Loading API: https://developer.apple.com/videos/play/wwdc2022/10104/
MetalFX upscaling: https://developer.apple.com/videos/play/wwdc2022/10103/

Even Ray tracing performance been updated: https://developer.apple.com/videos/play/wwdc2022/10105/

UPDATE!

It appears that the Tier 2 binding limit of 500,000 resources is a misconception and Metal never had such limitations according to Apple engineers:

"This limit refers to the number of separate buffers a shader or kernel can access in a single draw or dispatch. 500,000 is basically code for "more than you will ever need".

I expect that if your kernel/shader accesses more than 500,000 separate allocations in a single draw or dispatch, you will likely hit some pretty significant performance limitations. Having millions of buffers in a single argument buffer shouldn't have any real performance effects, accessing so many buffers would though."
 
Last edited:

Imhotep397

macrumors 6502
Jul 22, 2002
350
37
It sounds really good, but it also feels like another bait and switch from Apple. Apple always takes half steps with anything 3D to promote new hardware by throwing a little money around and later abandons it all (Infinity Blade, Mari (foundry), Fortnite, Elite Dangerous etc.).

Have to wait and see.
 
Last edited:
  • Like
Reactions: Flint Ironstag

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
It sounds really good, but it also feels like another bait and switch from Apple. Apple always takes half steps with anything 3D to promote new hardware by throwing a little money around and later abandons it all (Infinity Blade, Mari (foundry), Fortnite, Elite Dangerous etc.).

Have to wait and see.

Yes, we have to wait and see where this leads but it seems that now with total control over their own CPU and GPU they can do things differently. First they transitioned the HW and now after two years they're modernizing their graphics API to be on par with D3DX12 and Vulkan and bringing in game devs to showcase it.

Like Epic they're also making the games exclusive to MAS. Feral has confirmed that Grid Legends won't come to Steam. No Man's Sky will be on iPad/Mac so I think it will be MAS exclusive too. Won't be surprised if RE Village also will be exclusive. Maybe they're trying to promote Apple Arcade by bringing some big titles to it. We'll see.
 

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
Capcom mentioned Mac again in their showcase at E3. They said they're bringing the main story of Village to Mac but at the same time they will be releasing RE Village Gold Edition with a new DLC called "Shadows of Rose". I wouldn't be surprised if they released the DLC for Mac too when they're already porting the main game.

The release date is interesting too. It's Oct 28. macOS Monterey was released on Oct 25 2021 on a Monday. This year Monday is on Oct 24 and Friday on 28th. I guess Apple will release Ventura on Mon 24th and Capcom will release Village + "Shadows of Rose" for Mac on 28th.
 

diamond.g

macrumors G4
Mar 20, 2007
11,120
2,449
OBX
Yes, we have to wait and see where this leads but it seems that now with total control over their own CPU and GPU they can do things differently. First they transitioned the HW and now after two years they're modernizing their graphics API to be on par with D3DX12 and Vulkan and bringing in game devs to showcase it.

Like Epic they're also making the games exclusive to MAS. Feral has confirmed that Grid Legends won't come to Steam. No Man's Sky will be on iPad/Mac so I think it will be MAS exclusive too. Won't be surprised if RE Village also will be exclusive. Maybe they're trying to promote Apple Arcade by bringing some big titles to it. We'll see.
Does Apple arcade have any Mac only games on it (or for that matter iPad only games)?
 

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
Does Apple arcade have any Mac only games on it (or for that matter iPad only games)?
Don't know. I was about to say that Grid Legends being MAS exclusive is the first but then I remembered that Borderlands 1 by Feral was also MAS exclusive. The downside is that such titles may lack some features or have limitations regarding multiplayer. Shift codes didn't work in BL on MAS.
 
  • Like
Reactions: Irishman

diamond.g

macrumors G4
Mar 20, 2007
11,120
2,449
OBX
Don't know. I was about to say that Grid Legends being MAS exclusive is the first but then I remembered that Borderlands 1 by Feral was also MAS exclusive. The downside is that such titles may lack some features or have limitations regarding multiplayer. Shift codes didn't work in BL on MAS.
So if NMS is MAS only I wonder how barren the multiplayer aspect will be.
 

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
So if NMS is MAS only I wonder how barren the multiplayer aspect will be.
Well I'm talking about past experiences. Things can change so we have to see. I think the limitations before were about MAS players only being able to play with other MAS players.
 

Feyl

Cancelled
Aug 24, 2013
964
1,951
I'm not sure if I'm excited about Apple's efforts in gaming, but damn.. I for sure didn't like those frame rate drops during the presentation.
 

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
I'm not sure if I'm excited about Apple's efforts in gaming, but damn.. I for sure didn't like those frame rate drops during the presentation.
Yeah, it was like the demo of Tomb Raider at WWDC 2020 but then it turned out to have good performance. Agree though they should have a better demo at such a big event.
 

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
Here are some explanations of the new features from MoltenVK Github:

What's new in Metal 3


MetalFX Upscaling

Render complex scenes in less time per frame with high-performance upscaling and anti-aliasing. Choose a combination of temporal or spatial algorithms to help boost performance.


Fast resource loading
Optimally stream asset data to Metal textures and buffers directly from storage using asynchronous I/O.


Offline shader compilation
The compiler can generate GPU binaries at project build time to eliminate in-app shader compilation, helping games improve performance and reduce load times.


Mesh shaders
This new geometry pipeline replaces vertex shaders with two new shader stages — object and mesh — that enable more flexible culling and LOD selection, and more efficient geometry shading and generation.


Metal backend for PyTorch
The new Metal backend in PyTorch version 1.12 enables high-performance, GPU-accelerated training using MPS Graph and the Metal Performance Shaders primitives.


New Ray Tracing features
The latest advancements in Metal Ray Tracing mean less GPU time is spent building acceleration structures, work like culling can move to the GPU to reduce CPU overhead, and both intersection and shading can be optimized with direct access to primitive data.

It also appears that the Tier 2 binding limit of 500,000 resources is a misconception and Metal never had such limitations according to Apple engineers:

"This limit refers to the number of separate buffers a shader or kernel can access in a single draw or dispatch. 500,000 is basically code for "more than you will ever need".

I expect that if your kernel/shader accesses more than 500,000 separate allocations in a single draw or dispatch, you will likely hit some pretty significant performance limitations. Having millions of buffers in a single argument buffer shouldn't have any real performance effects, accessing so many buffers would though."


A user on Github writes:

"It has been pointed out that Metal Tier 2 limits of 500,000 buffers and textures is lower than some of the advanced versions of Vulkan and DirectX 12, at least on certain hardware. Upon clarification with Apple engineers, it seems that this is a poor wording in the docs. This limit is the number of resources you can realistically use in a single shader while still expecting reasonable performance. This is not the limit of binding points/descriptors you can have. Since Metal has no concept of special descriptor pools — resource descriptors reside in regular GPU buffer memory, it seems that the upper limit on the number of descriptors is simply the amount of available buffer memory. I had no issues creating a test application that declares, binds and actually uses up to five million resource descriptors."
 
Last edited:

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
Metal 3 now has almost all of Vulkan's features:

Vulkan Feature​
Via Metal 3 Capability​
VK_EXT_shader_atomic_float
VK_EXT_shader_atomic_float2​
MSL atomic<float> (but no support for half or double)
VK_KHR_buffer_device_address​
MTLBuffer.gpuAddress​
VK_NV_mesh_shader​
mesh shaders​
Geometry shaders
mesh shaders​
Potentially more Vulkan-efficient Tessellation shaders
mesh shaders​
VK_KHR_acceleration_structure​
ray tracing (previously available but enhanced now)​
VK_KHR_ray_tracing_maintenance1​
ray tracing (previously available but enhanced now)​
VK_KHR_ray_tracing_pipeline​
ray tracing (previously available but enhanced now)​
 

diamond.g

macrumors G4
Mar 20, 2007
11,120
2,449
OBX
Well I'm talking about past experiences. Things can change so we have to see. I think the limitations before were about MAS players only being able to play with other MAS players.
Is that not a thing anymore? Apple allows cross platform play on MAS games?
 

Irishman

macrumors 68040
Nov 2, 2006
3,396
844
So if NMS is MAS only I wonder how barren the multiplayer aspect will be.

It's so barren that I have decided long ago not to buy any game with a multiplayer component from the MAS, if it's on offer from any other store.
 
  • Like
Reactions: Homy

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
UPDATE!

It appears that the Tier 2 binding limit of 500,000 resources is a misconception and Metal never had such limitations according to Apple engineers:

"This limit refers to the number of separate buffers a shader or kernel can access in a single draw or dispatch. 500,000 is basically code for "more than you will ever need".

I expect that if your kernel/shader accesses more than 500,000 separate allocations in a single draw or dispatch, you will likely hit some pretty significant performance limitations. Having millions of buffers in a single argument buffer shouldn't have any real performance effects, accessing so many buffers would though."
It's not a misconception, it's their way of saying "we can't do it, so no one else must do it". Metal still has a completely different way of handling resources and limits are only relevant when running DX12 games via Crossover and the likes. Otherwise the whole thing must be rewritten anyway if games are optimized for DX12. Nothing has really changed in that way, it's always been possible to port games over to Metal if one is willing to invest time and resources (no pun intended).

Here's the DX12 resource flow control:
Focusing just on root signatures, root descriptors, root constants, descriptor tables, and descriptor heaps, the flow of rendering logic for an app should be similar to the following:

  • Create one or more root signature objects – one for every different binding configuration an application needs.
  • Create shaders and pipeline state with the root signature objects they will be used with.
  • Create one (or, if necessary, more) descriptor heaps that will contain all the SRV, UAV, and CBV descriptors for each frame of rendering.
  • Initialize the descriptor heap(s) with descriptors where possible for sets of descriptors that will be reused across many frames.
  • For each frame to be rendered:
    • For each command list:
      • Set the current root signature to use (and change if needed during rendering – which is rarely required).
      • Update some root signature’s constants and/or root signature descriptors for the new view (such as world/view projections).
      • For each item to draw:
        • Define any new descriptors in descriptor heaps as needed for per-object rendering. For shader-visible descriptor heaps, the app must make sure to use descriptor heap space that isn’t already being referenced by rendering that could be in flight – for example, linearly allocating space through the descriptor heap during rendering.
        • Update the root signature with pointers to the required regions of the descriptor heaps. For example, one descriptor table might point to some static (unchanging) descriptors initialized earlier, while another descriptor table might point to some dynamic descriptors configured for the current rendering.
        • Update some root signature’s constants and/or root signature descriptors for per-item rendering.
        • Set the pipeline state for the item to draw (only if change needed), compatible with the currently bound root signature.
        • Draw
      • Repeat (next item)
    • Repeat (next command list)
    • Strictly when the GPU has finished with any memory that will no longer be used, it can be released. Descriptors' references to it do not need to be deleted if additional rendering that uses those descriptors is not submitted. So, subsequent rendering can point to other areas in descriptor heaps, or stale descriptors can be overwritten with valid descriptors to reuse the descriptor heap space.
  • Repeat (next frame)
Note that other descriptor types, render target views (RTVs), depth stencil views (DSV), index buffer views (IBVs), vertex buffer views (VBVs), and shader object views (SOV), are managed differently. The driver handles the versioning of the set of descriptors bound for each draw during recording of the command list (similar to how the root signature bindings are versioned by the hardware/driver). This is different from the contents of shader-visible descriptor heaps, for which the application must manually allocate through the heap as it references different descriptors between draws. Versioning of heap content that is shader-visible is left to the application because it allows applications to do things like reuse descriptors that don’t change, or use large static sets of descriptors and use shader indexing (such as by material ID) to select descriptors to use from the descriptor heap, or use combinations of techniques for different sets of descriptors. The hardware isn’t equipped to handle this type of flexibility for the other descriptor types (RTV, DSV, IBV, VBV, SOV).
 

leman

macrumors Core
Oct 14, 2008
19,210
19,096
A user on Github writes:

"It has been pointed out that Metal Tier 2 limits of 500,000 buffers and textures is lower than some of the advanced versions of Vulkan and DirectX 12, at least on certain hardware. Upon clarification with Apple engineers, it seems that this is a poor wording in the docs. This limit is the number of resources you can realistically use in a single shader while still expecting reasonable performance. This is not the limit of binding points/descriptors you can have. Since Metal has no concept of special descriptor pools — resource descriptors reside in regular GPU buffer memory, it seems that the upper limit on the number of descriptors is simply the amount of available buffer memory. I had no issues creating a test application that declares, binds and actually uses up to five million resource descriptors."

If you have any questions about this I will be happy to try to answer
them (it’s my post you are quoting on GitHub)

It's not a misconception, it's their way of saying "we can't do it, so no one else must do it". Metal still has a completely different way of handling resources and limits are only relevant when running DX12 games via Crossover and the likes.

I do think it's a misconception because people have been applying concepts from DX12 to Metal where they don't exist. Metal has no descriptor heap limits because Metal has no descriptor heaps. Metal Argument Buffers Tier 2 target truly bindless hardware, where all bindings are just pointers to some data. So you scramble together some pointers and some other data, put them in GPU readable memory, and the GPU does whatever it needs to do. On the other hand, DX12 targets hardware where limitations can exist to what can reside where, hence the restriction that you can't mix sampler descriptors, constant data and other descriptors in the same descriptor heap (Metal knows no such limitations). DX12 also does not have indirections (descriptor heaps cannot contain pointers to other descriptor heaps) and does not allow descriptors to be created using GPU pipelines. Not to my knowledge at least, maybe I missed something in the docs.

Interestingly enough, I (entirely subjectively and maybe wrongfully) believe that many of these limitations are because of Nvidia. We know that Intel and AMD don't have them because, well, both of them support Apple's fully bindles model and then again, AMD had Mantle years ago who's binding model is very similar to Metal 3.

Otherwise the whole thing must be rewritten anyway if games are optimized for DX12. Nothing has really changed in that way, it's always been possible to port games over to Metal if one is willing to invest time and resources (no pun intended).

Here's the DX12 resource flow control:

As far as I understand it, Metal 3 resource binding is a strict superset of DX12 resource binding. There are fewer limitations in Metal and anything you can do with DX12 you can do with Metal - just with way less hassle. So no, you dont have to redesign your DX12 resource handling algorithms to port the game to Metal 3. Descriptor heaps just become data buffers, the root signature is too a data buffer, setting descriptors is writing a pointer to a buffer and copying descriptors is copying data between buffers. All the relevant DX API can be simply implemented as macros around pointer manipulation and memcpy.

This is a stark contrast from earlier Metal versions where you had to use a proxy API to set up bindings, which did not map well to DX12 model. To port a DX12 app to Metal 2 you would indeed need to rethink or at least heavily modify some of the code, as you'd need to construct and use the appropriate argument encoders.
 
Last edited:

Homy

macrumors 68020
Original poster
Jan 14, 2006
2,109
1,961
Sweden
If you have any questions about this I will be happy to try to answer
them (it’s my post you are quoting on GitHub)



I do think it's a misconception because people have been applying concepts from DX12 to Metal where they don't exist. Metal has no descriptor heap limits because Metal has no descriptor heaps. Metal Argument Buffers Tier 2 target truly bindless hardware, where all bindings are just pointers to some data. So you scramble together some pointers and some other data, put them in GPU readable memory, and the GPU does whatever it needs to do. On the other hand, DX12 targets hardware where limitations can exist to what can reside where, hence the restriction that you can't mix sampler descriptors, constant data and other descriptors in the same descriptor heap (Metal knows no such limitations). DX12 also does not have indirections (descriptor heaps cannot contain pointers to other descriptor heaps) and does not allow descriptors to be created using GPU pipelines. Not to my knowledge at least, maybe I missed something in the docs.

Interestingly enough, I (entirely subjectively and maybe wrongfully) believe that many of these limitations are because of Nvidia. We know that Intel and AMD don't have them because, well, both of them support Apple's fully bindles model and then again, AMD had Mantle years ago who's binding model is very similar to Metal 3.



As far as I understand it, Metal 3 resource binding is a strict superset of DX12 resource binding. There are fewer limitations in Metal and anything you can do with DX12 you can do with Metal - just with way less hassle. So no, you dont have to redesign your DX12 resource handling algorithms to port the game to Metal 3. Descriptor heaps just become data buffers, the root signature is too a data buffer, setting descriptors is writing a pointer to a buffer and copying descriptors is copying data between buffers. All the relevant DX API can be simply implemented as macros around pointer manipulation and memcpy.

This is a stark contrast from earlier Metal versions where you had to use a proxy API to set up bindings, which did not map well to DX12 model. To port a DX12 app to Metal 2 you would indeed need to rethink or at least heavily modify some of the code, as you'd need to construct and use the appropriate argument encoders.

Glad you joined in and to see a programmer who knows what he's talking about and is positive about the Metal 3 news and its possibilities. I was actually thinking of quoting you from other threads about the subject but now that you're here I see you've already added valuable information. Thanks! :)

One question that comes in mind is if Metal never had Tier 2 limits of 500,000 buffers and textures why Codeweavers said it had and was a problem for DX 12 support in Crossover? Nevermind, I think you answered it in your previous post: "To port a DX12 app to Metal 2 you would indeed need to rethink or at least heavily modify some of the code, as you'd need to construct and use the appropriate argument encoders."

It sounds like Codeweavers with Metal 3 don't have to do a lot of work anymore to implement DX 12 support in Crossover.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,210
19,096
One question that comes in mind is if Metal never had Tier 2 limits of 500,000 buffers and textures why Codeweavers said it had and was a problem for DX 12 support in Crossover?

I agree with the previous poster that whoever made the initial claim with 500,000 resources limit probably didn’t try using the API. Metal 3 does not change any limits, just how you encode data that you want the GPU to use. My experiments (binding and using five million resources in a single compute shader thread) were done on Monterey with Metal 2, and it worked just fine.

Although, to be fair, Apples poor documentation is as much to blame for this confusion as is anything else. They should have written that it’s the amount of useable resources and not just bindable resources. If you read that part of documentation it’s not clear at all, especially since it directly follows the limitations of Tier 1 Argument Buffers which do offer a finite number of binding slots (since Tier 1 hardware does not support bindless).
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
I do think it's a misconception because people have been applying concepts from DX12 to Metal where they don't exist.
Well, that's the point, isn't it?
Interestingly enough, I (entirely subjectively and maybe wrongfully) believe that many of these limitations are because of Nvidia. We know that Intel and AMD don't have them because, well, both of them support Apple's fully bindles model and then again, AMD had Mantle years ago who's binding model is very similar to Metal 3.
They all have their way of doing it. Nvidia, AMD, Microsoft, Apple, etc. Look at the reverse engineered graphics driver for M1 that Alyssa Rosenzweig did... do it the Apple way or it won't work. Look at what George Hotz did with the Neural Engine, same thing. I wouldn't blame anyone here, they all do what they think is right and Nvidia is by far the dominant power on the market when it comes to GPUs. I had a conference with some Nvidia guys last week, they don't care about anyone else and rightfully so. They're in a position where they can easily afford it.
As far as I understand it, Metal 3 resource binding is a strict superset of DX12 resource binding. There are fewer limitations in Metal and anything you can do with DX12 you can do with Metal - just with way less hassle.
Not sure about the "less hassle", but Metal has always been able to do anything when it comes to end results, just taking a different way.
So no, you dont have to redesign your DX12 resource handling algorithms to port the game to Metal 3.
Well, yes you do. See below.

I agree with the previous poster that whoever made the initial claim with 500,000 resources limit probably didn’t try using the API. Metal 3 does not change any limits, just how you encode data that you want the GPU to use. My experiments (binding and using five million resources in a single compute shader thread) were done on Monterey with Metal 2, and it worked just fine.
So Codeweavers and Parallels didn't try to use the API? All the developers who ported things didn't do either? I mean, it's about more than just 500k resource limit, but when you tried this have you tried it for more than a single compute shader and more than a single frame?
 
  • Like
Reactions: Irishman

leman

macrumors Core
Oct 14, 2008
19,210
19,096
Not sure about the "less hassle", but Metal has always been able to do anything when it comes to end results, just taking a different way.

What we are talking about here is not whether something can be done, but about whether abstractions can be carried over without much extra work. That's the entire point. You don't need to take a different way with Metal 3 as it can seamlessly support the same bindings mechanisms as DX12.

And sure, I think Metal is less hassle now. You just allocate a buffer and write some pointers to it. No need to mess with allocating and managing heaps and descriptors. Not to mention that Metal's API actually seems like lower overhead this time.

Well, yes you do. See below.

I think you meant to add something here?

So Codeweavers and Parallels didn't try to use the API? All the developers who ported things didn't do either?

I do not know what folks at these companies tried or not tried, and I have little doubt that they are much more competent than me. Let's not forget that the goal here is emulating DX12 on top of Metal, which is a non-trivial endeavour to say the least. There were probably too many practical hurdles before Metal 3 anyway, with no obvious way of mapping the DX12 descriptors to Metal's argument encoder mechanism, so the entire thing was probably not feasible to begin with. I have no idea who started this story with 500,000 resources being a limitation or what the background of that story is, but it's fairly safe to say that this limitation was not what that person thought it was. I mean, we have a semi-official statement by Apple on the matter.

I mean, it's about more than just 500k resource limit, but when you tried this have you tried it for more than a single compute shader and more than a single frame?

I tested the worst possible case that no application will actually use. Why, you have games accessing a million of textures per frame? How is that supposed to work? The point of contention is that games use DX12 descriptor heaps in a sparse manner (probably since the API is so god awful). Much easier to just allocate the biggest heap you can and then sparsely set some descriptor ranges as you need them + dynamically access those in the shader. This was never about using a million resources. This was about holding an array of a million pointers resources, most of which (pointers) are either NULL or invalid. Well, nobody prevents you from allocating a 8MB buffer in Metal 3 and writing whatever texture/buffer pointers to it you please.

Again, it is entirely possible that I misunderstand the scale of the problem. I am just a linguist who dabbles in GPU programming after all, not a professional game developer :) If there is an issue with my understanding of the things, I'd welcome your comments as a learning opportunity.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,072
2,650
Let's not forget that the goal here is emulating DX12 on top of Metal, which is a non-trivial endeavour to say the least. There were probably too many practical hurdles before Metal 3 anyway, with no obvious way of mapping the DX12 descriptors to Metal's argument encoder mechanism, so the entire thing was probably not feasible to begin with.
Sure, emulation is different, the problem is the same. I've talked to plenty of people involved in AAA game development all saying the same though. And my experience isn't really much different, I do focus on other things as well though. DX vs. OpenGL vs. Vulkan vs Metal isn't really something I have much interest in, I use whatever gets the job done.
I mean, we have a semi-official statement by Apple on the matter.
But we really don't.
Why, you have games accessing a million of textures per frame?
And therein lies the problem, look what the Apple guy said. He's speaking about per frame. Look what the DX12 documentation says, it isn't strictly per frame or per object.
 

leman

macrumors Core
Oct 14, 2008
19,210
19,096
Sure, emulation is different, the problem is the same. I've talked to plenty of people involved in AAA game development all saying the same though.

Saying the same about what?

And my experience isn't really much different, I do focus on other things as well though. DX vs. OpenGL vs. Vulkan vs Metal isn't really something I have much interest in, I use whatever gets the job done.

Sure. But understanding the principles as well as differences behind the APIs is crucial if you are designing cross-platform software. I would go about things very differently if I knew that I only need to target Metal, for example.

But we really don't.

We have an Apple engineer making a statement on an official channel that the "500,000 limit" does not refer to the "size of the descriptor heap" like the people commonly thought. Isn't that clear enough? What else is lacking in your opinion?

And therein lies the problem, look what the Apple guy said. He's speaking about per frame. Look what the DX12 documentation says, it isn't strictly per frame or per object.

I have to admit I am a bit confused about this. What's the problem? Let me repeat what I understand and then maybe we can see where the point of contention. So, DX12 documentation talks about the size of the descriptor heap, i.e. the number of the potential bindings that can exists in your shader pipeline. It gives you the maximal structural limits you can work with. Metal does not have limits of this kind because Metal does not have descriptor heaps — the amount of potential bindings are only limited by the size of the data buffer you allocate. The limits in the Metal documentation refer to the numbers of objects "in use", which represents a maximal amount of resources you can use and still expect reasonable performance.

Just because DX12 tells you that you can use 1 million descriptors doesn't mean that you can or should use 1 million resources. A compressed 256x256 texture without mipmaps takes 64KB, for half a million of such textures you are looking at 30GB of VRAM. You are not even going to use dozens of thousands textures per frame, or even have as many textures loaded, simply because a) you will run out of memory long before reaching any of these "limits" and b) your performance will be horrible because no GPU can use this amount of data in a shading pipeline and hope to stay within reasonable program execution times. So... what's the problem exactly? Again, no sane DX12 dev will actually use a million textures in their app. They might use one million of texture binding points implement a dynamic priority queue for their texture streaming logic, but that's an entirely different story and we have already established that this kind of usage is not a problem for Metal.
 
  • Like
Reactions: Irishman and Homy
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.