Branchless maximum/principle axis

laser blaster

Iterator

Posts: 178

« Reply #45 on: June 04, 2015, 10:22:14 PM »

Quote from: Syntopia on May 30, 2015, 04:21:40 PM

I think my point was that Eiffies versions were actually slower than my naive version using three conditional branches and a normalize function. So don't be too afraid of branches.

And while abs is very likely to be a trivial function, I think it was unexpected that 'sign' turned out to be slower than 'normalize'. On my GPU 'sign' is compiled into the following intermediate code:

Code:

SLT.F R3.xyz, R1, {0, 0, 0, 0}.x;
SGT.F R1.xyz, R1, {0, 0, 0, 0}.x;
TRUNC.U R3.xyz, R3; // <- Truncate to unsigned integer
TRUNC.U R1.xyz, R1; 
I2F.U R3.xyz, R3; // <- Convert back to float. Really?
I2F.U R1.xyz, R1; 
ADD.F R1.xyz, R1, -R3;

while the 'normalize' is compiled into

Code:

MUL.F R1.xyz, R2, R1;
DP3.F R2.x, R1, R1; // <- a dot product
RSQ.F R2.x, R2.x;  // <- an inverse square root

which was quite unexpected for me.

So I'd advise to simply just measure the performance, instead of reasoning about it.

The normalize may be faster in practice, but the intermediate code isn't very good indicator of speed, because it doesn't doesn't map directly to modern GPU machine code. A 3-component dot product is implemented as 3 scalar multiply-adds on current GPU's- so that's 3 instructions. The 3-component multiply will also compile to 3 instructions. Only very old GPU's still use native vector instructions. There is no way that I know of to view the actual native assembly code for any modern GPU's.

And abs is definitely very fast- I've heard that on some GPU's it's effectively free, as it can be combined into other instructions. But sign() being so slow is quite a shocker. It should be implemented as (f>=0) ? 1 : -1, which shouldn't be more than 2 instructions: a compare, and a conditional move instruction. I don't know why they did it in such a complicated way.


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #46 on: June 05, 2015, 04:51:21 PM »

Quote from: laser blaster on June 04, 2015, 10:22:14 PM

Yes, it will only hint at what is happening. Notice, that even with machine codes you would still need to know how many cycles each instruction uses.

Quote

There is no way that I know of to view the actual native assembly code for any modern GPU's.
And abs is definitely very fast- I've heard that on some GPU's it's effectively free, as it can be combined into other instructions. But sign() being so slow is quite a shocker. It should be implemented as (f>=0) ? 1 : -1, which shouldn't be more than 2 instructions: a compare, and a conditional move instruction. I don't know why they did it in such a complicated way.

The native assembly for ATI cards can be viewed using their "GPU ShaderAnalyzer" (you can choose between all their architectures and see the machine code). For instance. sign(x) on ATI compiles to:

Code:

      0  y: SETGT       ____,  0.0f,  KC0[0].x      
         z: SETGT       ____,  KC0[0].x,  0.0f      
      1  x: ADD         R0.x,  PV0.z, -PV0.y

while (f>=0) ? 1 : -1 compiles into

Code:

      0  y: SETGT_DX10  ____,  KC0[0].x,  0.0f      
      1  x: CNDE_INT    R0.x,  PV0.y,  -1082130432,  1065353216

which is exactly as expected (the reason for difference is that sign(x) is required to have sign(0)=0).

I have also heard that abs (and saturate) should be free instructions, but I think that may depend on architecture. On ATI archs abs translated into a "MAX_DX10 ____, KC0[0].y, -KC0[0].y" instruction.


	Logged

eiffie

Guest

Re: Branchless maximum/principle axis

« Reply #47 on: June 05, 2015, 05:22:21 PM »

Thanks for the info Syntopia - very helpful as always.


	Logged

Pages: 1 2 3 [4] Go Down

« previous next »

	Author	Topic: Branchless maximum/principle axis (Read 14813 times)
		Description: Compute principle axis without branching
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	Maximum Zoom Factor Mandelbulb 3d	The Rev	2	3248	October 17, 2010, 08:36:39 PM by The Rev
	Maximum Security Prison Images Showcase (Rate My Fractal)	thom	0	1512	June 13, 2012, 04:20:27 AM by thom
	ability to set maximum render time per frame Feature Requests	erstwhile	4	4898	December 08, 2013, 10:30:52 PM by erstwhile
	principle of diminishing marginal productivity Images Showcase (Rate My Fractal)	thom	0	1576	April 28, 2015, 03:44:20 AM by thom
	Maximum render size Kalles Fraktaler	Dinkydau	4	2564	February 16, 2017, 11:32:12 PM by Dinkydau

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	January 10, 2026, 08:10:43 AM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again