The reason it's so CPU dependent is because it has to switch shaders multiple times per frame (and that includes toggling it on and off!), which is really inefficient. It's a Direct3D bottleneck. Ideally what needs to happen is: the mesh to be rendered needs to be put into a queue, and then at the end of a frame the queue should be sorted by the required shader, and then finally rendered. That would reduce the shader switch count from potentially hundreds per frame to a maximum
For reference, here's my specs and performance:
CPU: AMD FX-6300 @ 4.1GHz (overclocked to its turbo speed)
RAM: 8GB 1600MHz DDR3
GPU: AMD R7 265 2GB (overclocked core at 1050MHz)
I'm playing at 1080p, with 16x anisotropic filtering and 8xEQ adaptive antialiasing (which applies the antialiasing to the alpha channel of transparent textures).
Emerald Coast: http://i.imgur.com/YyRx6bq.jpg
Red Mountain: http://i.imgur.com/GN7G4mS.jpg
Red Mountain, other direction: http://i.imgur.com/Er3SrEv.jpg
Red Mountain, no antialiasing: http://i.imgur.com/YFijQ3F.jpg
I tested with different antialiasing settings because I've run some profiling, and my GPU driver is getting CPU limited there. As you can see though, on <= modern Intel Core i3 tier performance CPUs (as far as single-threaded performance is concerned), we're getting nearly the same performance despite mine being more modern, and this is all due to that bottleneck I mentioned. This, by the way, is after
I optimized shader parameter committing to hell and back (at least, to the best of my ability, although it could still use some work). It used to be worse, which is scary. You'll also probably notice my CPU wasn't even hitting its maximum frequency, which it typically does in multithreaded applications. If I was able to multithread this thing in any conceivable way, I would, but there's just no way to squeeze that into this existing pipeline as far as I can see.
I really want to implement the draw queue, and in fact SADX has one built in for rudimentary alpha sorting (spoiler: they tried so hard to make that work and it still fails when it's needed). There are a lot of potential complications due to SADX's rendering pipeline though. I know from experience, having already attempted to implement my own alpha sort queue.
p.s thanks for the performance report; I'm still interested in performance on a wider variety of hardware if anyone is willing to post.
Irixion, on 17 January 2017 - 10:57 AM, said:
I run the game on my onboard Skylate 6700k and it's at a locked 60 100% of the time. Oh AMD.
mfw comparing modern cpu to really old cpu
This post has been edited by Morph: 18 January 2017 - 03:57 PM