GravityMark GPU Benchmark

frustum · Jun 16, 2021

GravityMark is a free cross-platform and cross-API GPU benchmark with native support of Metal and Apple silicone.
Check the stability of your systems and compare the performance of different platforms by drawing an enormous amount of asteroids.

https://gravitymark.com/

frustum · Jul 16, 2021

We released a new version with leaderboard support.

GravityMark Leaderboard

GravityMark GPU Benchmark

gravitymark.tellusim.com

The performance of Metal API is more than twice lower than Direct3D12/Vulkan running exactly the same HW:

Metal 34 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

D3D12 80 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

Maybe the same issue makes Apple M1 performance is lower than AMD APU:

Apple M1 13 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

AMD APU 16 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

We are trying to resolve this situation with Apple. But we cannot do it alone.

diamond.g · Jul 23, 2021

Are you guys running the default 200000 asteroid test?

frustum · Jul 23, 2021

Yes, 200,000 asteroids count is a kind of balance that shows a decent result on integrated GPUs.

Nicole1980 · Jul 24, 2021

frustum said:
We released a new version with leaderboard support.

GravityMark Leaderboard

GravityMark GPU Benchmark

gravitymark.tellusim.com

The performance of Metal API is more than twice lower than Direct3D12/Vulkan running exactly the same HW:

Metal 34 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

D3D12 80 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

Maybe the same issue makes Apple M1 performance is lower than AMD APU:

Apple M1 13 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

AMD APU 16 FPS:

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

We are trying to resolve this situation with Apple. But we cannot do it alone.

So ... maybe I'm not fully understanding, but what you're suggesting is that there's a bug for MacOS that produces scores that are not truly representative of the real relative power of the gpu? If so, then the Mac version of your program is effectively worthless, correct?

frustum · Jul 24, 2021

Nicole1980 said:
So ... maybe I'm not fully understanding, but what you're suggesting is that there's a bug for MacOS that produces scores that are not truly representative of the real relative power of the gpu? If so, then the Mac version of your program is effectively worthless, correct?

The problem is that there is no effective way to draw thousands of objects by using Metal API.

AMD GPU can draw thousands of objects by a single command. This command is not available with Metal API. That is why the macOS version is 3 times slower than other API running the same HW.

We are emulating this single command is emulated as a giant loop with single draw commands. Direct3D11 (Nvidia and Intel) and OpenGLES APIs are not supporting that single command as well. But even that emulation is not slowing them down 3 times as Metal.

diamond.g · Jul 24, 2021

I presume you are not using Mesh Shaders or Primitives in D3D or Vulkan. @leman is there not a faster way in Metal 2 to draw thousands of the same object?

leman · Jul 24, 2021

frustum said:
The problem is that there is no effective way to draw thousands of objects by using Metal API.

AMD GPU can draw thousands of objects by a single command. This command is not available with Metal API. That is why the macOS version is 3 times slower than other API running the same HW.

We are emulating this single command is emulated as a giant loop with single draw commands. Direct3D11 (Nvidia and Intel) and OpenGLES APIs are not supporting that single command as well. But even that emulation is not slowing them down 3 times as Metal.

Can you elaborate more on this? Metal has full support for instanced rendering (on par with any other API) as far as I know as well as advanced indirect rendering.

frustum · Jul 24, 2021

diamond.g said:
I presume you are not using Mesh Shaders or Primitives in D3D or Vulkan. @leman is there not a faster way in Metal 2 to draw thousands of the same object?

Mesh shaders require the latest (RTX / RX 6XXX) GPU. Metal API and all mobile devices don't support Mesh shaders now.

frustum · Jul 24, 2021

leman said:
Can you elaborate more on this? Metal has full support for instanced rendering (on par with any other API) as far as I know as well as advanced indirect rendering.

Yes, instanced rendering can draw a huge number of objects. The problem is that the CPU cannot prepare data for GPU with a reasonable FPS, especially when the number of more than 100k and the scene is fully dynamic. GPU can be used for scene processing and draw commands generation with unbeatable performance.

Metal API allows using only single draw command for indirect rendering. It means that there is no way to draw different geometry in a single command. And each geometry variation should use separate draw command:

drawIndexedPrimitives:indexType:indexBuffer:indexBufferOffset:indirectBuffer:indirectBufferOffset: | Apple Developer Documentation

Encodes a draw command that renders multiple instances of a geometric primitive with indexed vertices and indirect arguments.

developer.apple.com

Other API except D3D11 and OpenGLES have multi-draw-indirect-count functionality where it's possible to combine multiple draw-indirect commands in a single API call. And no CPU-GPU synchronization is required for that:

vkCmdDrawIndexedIndirectCount(3)

We are emulating multi-draw-indirect-count functionality as a loop of draw indirect commands for D3D11, OpenGLES, and Metal. This emulation is working even faster for the previous Nvidia GPU generation because of the driver issue:

D3D11 71 FPS (GTX 1060):

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

Vulkan 58 FPS (GTX 1060):

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

But unfortunately, not in the case of Metal API, where the same HW is three times slower.

Metal API supports indirect command buffers. Our test demonstrates that they are slower than the loop of simple indirect draw commands. That makes them practically useless. Or maybe only the next Apple HW will have benefits from them.

leman · Jul 24, 2021

frustum said:
Yes, instanced rendering can draw a huge number of objects. The problem is that the CPU cannot prepare data for GPU with a reasonable FPS, especially when the number of more than 100k and the scene is fully dynamic. GPU can be used for scene processing and draw commands generation with unbeatable performance.

Metal API allows using only single draw command for indirect rendering. It means that there is no way to draw different geometry in a single command. And each geometry variation should use separate draw command:

drawIndexedPrimitives:indexType:indexBuffer:indexBufferOffset:indirectBuffer:indirectBufferOffset: | Apple Developer Documentation

Encodes a draw command that renders multiple instances of a geometric primitive with indexed vertices and indirect arguments.

developer.apple.com

Other API except D3D11 and OpenGLES have multi-draw-indirect-count functionality where it's possible to combine multiple draw-indirect commands in a single API call. And no CPU-GPU synchronization is required for that:

vkCmdDrawIndexedIndirectCount(3)

We are emulating multi-draw-indirect-count functionality as a loop of draw indirect commands for D3D11, OpenGLES, and Metal. This emulation is working even faster for the previous Nvidia GPU generation because of the driver issue:

D3D11 71 FPS (GTX 1060):

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

Vulkan 58 FPS (GTX 1060):

GravityMark Report

GravityMark GPU Benchmark

gravitymark.tellusim.com

But unfortunately, not in the case of Metal API, where the same HW is three times slower.

Metal API supports indirect command buffers. Our test demonstrates that they are slower than the loop of simple indirect draw commands. That makes them practically useless. Or maybe only the next Apple HW will have benefits from them.

View attachment 1810390

View attachment 1810391

Thanks for the elaboration! I think I’ve seen your posts in the Metal support forums, it indeed sounds like a complex issue. I have used indirect rendering pipelines with great success so far, so your experience leaves me a bit puzzled. There is no doubt that Apple can improve the APIs as well as their support procedures.

frustum · Jul 24, 2021

leman said:
Thanks for the elaboration! I think I’ve seen your posts in the Metal support forums, it indeed sounds like a complex issue. I have used indirect rendering pipelines with great success so far, so your experience leaves me a bit puzzled. There is no doubt that Apple can improve the APIs as well as their support procedures.

Yep, that issue is on the Metal support forum without any progress.

The single indirect calls are working great. The troubles are getting appear in the case of the whole scene indirect rendering.

We also hope that API progress will go to GPU-driven rendering and better API flexibility direction. Unfortunately, all modern graphics APIs were focused on parallel CPU job submission instead of GPU-driven architectures. And now nothing is interesting except ray-tracing

which is great, except for the totally unconfigurable API.

Nicole1980 · Jul 25, 2021

frustum said:
The problem is that there is no effective way to draw thousands of objects by using Metal API.

AMD GPU can draw thousands of objects by a single command. This command is not available with Metal API. That is why the macOS version is 3 times slower than other API running the same HW.

We are emulating this single command is emulated as a giant loop with single draw commands. Direct3D11 (Nvidia and Intel) and OpenGLES APIs are not supporting that single command as well. But even that emulation is not slowing them down 3 times as Metal.

Ok, the technical details are over my head, but what I am hearing is that your test is indeed worthless for MacOS.

There are many other graphics tests that don't expose the 'bug' (or whatever the problem is between your program and Metal) so the take-away here is that we might as well use any number of the graphics tests for Macs that actually do produce correct and comparable scores.

frustum · Jul 25, 2021

Nicole1980 said:
Ok, the technical details are over my head, but what I am hearing is that your test is indeed worthless for MacOS.

There are many other graphics tests that don't expose the 'bug' (or whatever the problem is between your program and Metal) so the take-away here is that we might as well use any number of the graphics tests for Macs that actually do produce correct and comparable scores.

The purpose of this benchmark is to showcase the problem of macOS with heavy applications. I don't think that it's normal to have access only to 30% of possible hardware capabilities. But it's definitely good to demonstrate M1 performance.

Having its own proprietary graphics API is great for Apple. But it should be at least not worse than analogs.

leman · Jul 25, 2021

frustum said:
The purpose of this benchmark is to showcase the problem of macOS with heavy applications. I don't think that it's normal to have access only to 30% of possible hardware capabilities. But it's definitely good to demonstrate M1 performance.

Having its own proprietary graphics API is great for Apple. But it should be at least not worse than analogs.

To be frank, I don’t think you have demonstrated that this is a problem with the API. All we can conclude that you were not able to get it working The fault could lie with Apple, but there also could be a problem with your implementation. If I recall correctly, the Apple engineer in charge of your ticket stated that you did not provide enough information for them to replicate and analyze the issue. I hope you can consider publishing the relevant portions of the code, so that others can have a look and try to replicate the problems you have encountered.

frustum · Jul 25, 2021

leman said:
To be frank, I don’t think you have demonstrated that this is a problem with the API. All we can conclude that you were not able to get it working The fault could lie with Apple, but there also could be a problem with your implementation. If I recall correctly, the Apple engineer in charge of your ticket stated that you did not provide enough information for them to replicate and analyze the issue. I hope you can consider publishing the relevant portions of the code, so that others can have a look and try to replicate the problems you have encountered.

We started to do it 7 months ago. Nothing is happening. There is no solution how to increase DIP performance on Metal right now.

The problem is obvious. The application is spending 99% of the time in a rendering loop:

C++:

// macOS Metal rendering loop code:
for(uint32_t i = 0; i < num_draws; i++) {
    [encoder drawIndexedPrimitives:primitive_type indexType:index_type indexBuffer:index_buffer indexBufferOffset:index_offset indirectBuffer:indirect_buffer indirectBufferOffset:offset];
    offset += stride;
}

// Vulkan rendering code:
vkCmdDrawIndexedIndirectCount(command, indirect_buffer, indirect_offset, buffer, offset, num_draws, stride);

// Direct3D11 AMD rendering code:
agsDriverExtensionsDX11_MultiDrawIndexedInstancedIndirect(context->content, command, num_draws, indirect_buffer, indirect_offset, stride);

// Direct3D11 Nvidia and Intel rendering loop code:
for(uint32_t i = 0; i < num_draws; i++) {
    command->DrawIndexedInstancedIndirect(indirect_buffer, offset);
    offset += stride;
}

// Direct3D12 rendering code:
command->ExecuteIndirect(signature, num_draws, indirect_buffer, indirect_offset, buffer, offset);

// OpenGL rendering code:
glMultiDrawElementsIndirectCount(draw_mode, index_type, indirect_offset, offset, num_draws, stride);

// OpenGLES rendering loop code:
for(uint32_t i = 0; i < num_draws; i++) {
    glDrawElementsIndirect(draw_mode, index_type, offset);
    offset += stride;
}

Metal ICB is not faster than the loop of draw indirect commands.
The Direct3D11 rendering loop on Nvidia is working faster than Vulkan and Direct3D12 solutions for non-RTX GPUs.
Nobody is losing 60% of performance here except Metal.

diamond.g · Oct 21, 2021

It looks like you have updated your site with some RT testing. Are there plans to incorporate this into GravityMark? It would make for the only cross-platform RT test available.

frustum · Oct 23, 2021

diamond.g said:
It looks like you have updated your site with some RT testing. Are there plans to incorporate this into GravityMark? It would make for the only cross-platform RT test available.

Yes, the RT version of GravityMark will be released this year.

frustum · Nov 4, 2021

Meanwhile, we have released a GravityMark v1.31 update for macOS Monterey.
There is a 1.5x performance boost on M1. And almost 2x better speed on AMD.
The benchmark can crash on the previous macOS versions.

GravityMark GPU Benchmark

gravitymark.tellusim.com

leman · Nov 4, 2021

Does it mean your instancing issues were fixed? Was that a bug in Metal drivers or did you change the code?

frustum · Nov 4, 2021

leman said:
Does it mean your instancing issues were fixed? Was that a bug in Metal drivers or did you change the code?

Yes, there is good progress with the driver. The same code is started to work with the new OS version.

frustum · Nov 14, 2021

iOS version is available in the App Store:

‎GravityMark GPU Benchmark

‎GravityMark GPU Benchmark demonstrates the capabilities of modern GPUs by rendering an enormous quantity of objects in real-time, utilizing GPU acceleration. We avoid typical CPU-based performance bottlenecks by delegating the entirety of scene management and rendering steps exclusively to the...

apps.apple.com

diamond.g · Nov 15, 2021

frustum said:
iOS version is available in the App Store:

‎GravityMark GPU Benchmark

‎GravityMark GPU Benchmark demonstrates the capabilities of modern GPUs by rendering an enormous quantity of objects in real-time, utilizing GPU acceleration. We avoid typical CPU-based performance bottlenecks by delegating the entirety of scene management and rendering steps exclusively to the...

apps.apple.com

Is there a way to re-run the benchmark without having to quit the app?

frustum · Nov 15, 2021

diamond.g said:
Is there a way to re-run the benchmark without having to quit the app?

Demo mode is infinity loop mode without calculating the score. Devices are getting hot after a single benchmark loop, and results are dropping dramatically due to thermal throttling. We can add multiple benchmark iterations, but it will have no sense in terms of benchmark results.

frustum · Jan 16, 2022

GravityMark 1.44 with macOS ray tracing support is ready.
AMD GPUs are not supported due to missing functionality.
macOS 12 is required.

GravityMark GPU Benchmark

gravitymark.tellusim.com

GravityMark GPU Benchmark

macrumors member

macrumors member

macrumors G4

macrumors member

macrumors 6502a

macrumors member

macrumors G4

macrumors Core

macrumors member

macrumors member

macrumors Core

macrumors member

macrumors 6502a

macrumors member

macrumors Core

macrumors member

macrumors G4

macrumors member

macrumors member

macrumors Core

macrumors member

macrumors member

macrumors G4

macrumors member

macrumors member

Our Staff