Branchless maximum/principle axis

TruthSerum

Guest

« Reply #30 on: May 30, 2015, 01:21:11 AM »

Interesting results, I always knew there was something about branching on GPU. I think for texture fetches the adjacent pixel UV's must always be calculated, so it is able to select the mipmap LOD. Correct me if I am wrong about that.

I think the penalty on CPU is less, I'm not sure about a full pipeline flush, as dom767 said. I mean, consider how many background applications are doing branches even while your system is idle. CPU is designed to execute bad code very well

I like the shader dom767, I left a comment on ShaderToy for it


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #31 on: May 30, 2015, 01:48:53 PM »

Quote from: TruthSerum on May 30, 2015, 01:21:11 AM

I think the penalty on CPU is less, I'm not sure about a full pipeline flush, as dom767 said. I mean, consider how many background applications are doing branches even while your system is idle. CPU is designed to execute bad code very well

An idle background application process would be probably have its branches correctly predicted most of the time, so there would be no penalty. Besides, the pipeline on modern CPUs is not that long: 12-14 cycles for the Core 2 architecture. This is completely neglible compared to the cost of switching between the background processes: something which involved storing CPU state and switching virtual adress spaces. The context switch cost on a Core 2 is roughly 5-7 microseconds or some 10000 instructions.

For very simple conditional statements, the compiler may also opt for using 'predicated' instructions, like the CMOV x86 instruction, which is only executed if a flag is set. This ensures no pipeline flushing is needed. (This is widely used on the ARM architecture, where most instructions are predicated.)

Btw, this is a great resource for CPU optimizations: http://www.agner.org/optimize/


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Branchless maximum/principle axis

« Reply #32 on: May 30, 2015, 02:43:28 PM »

Amazing amount of discussion here for fabs()

I know computer architecture is a complex thing and one shouldn't oversimplify / make assumptions, but I'm going to do exactly that now cheesy

Straight up, I don't see how anything can be faster than *unconditionally * setting a single (sign) bit. If you think about the amount of logic that goes into branching on modern parallel and superscalar archs, there's just no way you want to do anything else. As mentioned it's even worse for GPUs due to serialisation of branch divergence.

So just use fabs, right? cheesy

Always avoid a branch if it's cheap to do so, in general - that's why eiffie's method works well: abs, getting sign bit and selecting between 2 constant literals is all sort of free on a latency hiding, branch fearing GPU.


« Last Edit: May 30, 2015, 02:49:32 PM by lycium »	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #33 on: May 30, 2015, 04:21:40 PM »

I think my point was that Eiffies versions were actually slower than my naive version using three conditional branches and a normalize function. So don't be too afraid of branches.

And while abs is very likely to be a trivial function, I think it was unexpected that 'sign' turned out to be slower than 'normalize'. On my GPU 'sign' is compiled into the following intermediate code:

Code:

SLT.F R3.xyz, R1, {0, 0, 0, 0}.x;
SGT.F R1.xyz, R1, {0, 0, 0, 0}.x;
TRUNC.U R3.xyz, R3; // <- Truncate to unsigned integer
TRUNC.U R1.xyz, R1; 
I2F.U R3.xyz, R3; // <- Convert back to float. Really?
I2F.U R1.xyz, R1; 
ADD.F R1.xyz, R1, -R3;

while the 'normalize' is compiled into

Code:

MUL.F R1.xyz, R2, R1;
DP3.F R2.x, R1, R1; // <- a dot product
RSQ.F R2.x, R2.x;  // <- an inverse square root

which was quite unexpected for me.

So I'd advise to simply just measure the performance, instead of reasoning about it.


	Logged

eiffie

Guest

Re: Branchless maximum/principle axis

« Reply #34 on: May 30, 2015, 05:25:41 PM »

Hey Syntopia did you by chance try Knighty's version where he converts float/bool I have always wondered what that compiles to. Does it leave them untouched?

...or sign(p)*max(vec3(greaterThan(a,max(a.yzx,a.zxy))),0.0);

@lycium I think bit checking is just too low an occupation for modern GPUs to be bothered with.


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #35 on: May 30, 2015, 06:57:21 PM »

Code:

vec3 v = abs(vec);
float m = max(v.x, max(v.y, v.z));
v = vec3(float(v.x == m), float(v.y == m && v.x !=m), float(v.z == m && v.x !=m && v.y !=m));

// restore original sign
return normalize(v*vec);

compiles to

Code:

MAX.F R2.x, |R1.y|, |R1.z|;
MAX.F R2.x, |R1|, R2;
SNE.F R2.z, |R1.x|, R2.x;
SEQ.F R2.y, |R1|, R2.x;
SEQ.F R3.x, |R1.z|, R2;
SNE.F R3.y, |R1|, R2.x;
SEQ.F R2.x, |R1|, R2;
TRUNC.U R2.x, R2;
TRUNC.U R2.z, R2;
TRUNC.U R2.y, R2;
AND.U R2.y, R2, R2.z;
TRUNC.U R3.x, R3;
AND.U R2.z, R2, R3.x;
TRUNC.U R3.x, R3.y;
AND.U R2.z, R2, R3.x;
I2F.U R2.x, R2;
I2F.U R2.y, R2;
I2F.U R2.z, R2;
MUL.F R1.xyz, R2, R1;
DP3.F R2.x, R1, R1;
RSQ.F R2.x, R2.x;
MUL.F R1.xyz, R2.x, R1;

so it seems booleans are represented as floats (SNE.F/SEQ.F), converted to ints (TRUNC.U), AND'ed and converted back to floats (I2F)


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Branchless maximum/principle axis

« Reply #36 on: May 30, 2015, 07:48:36 PM »

Quote from: Syntopia on May 30, 2015, 04:21:40 PM

So I'd advise to simply just measure the performance, instead of reasoning about it.

I agree in general, but I don't think we're referring to the same thing - I was specifically referring to fabs, which just cannot be slower than anything else because it's a single OR-operation with 0x800000... the top (sign) bit of an IEEE number.

I have to say I'm surprised that a normalize does better than that other code, too - but that again is more complex story of data dependencies limiting pipelining etc.

Quote from: eiffie on May 30, 2015, 05:25:41 PM

@lycium I think bit checking is just too low an occupation for modern GPUs to be bothered with.

Not at all... I also don't really know what you mean by this... so if I have "x = y ^ z" somewhere in my code, the GPU goes "how dare you molest me with such trivial tasks?!!?" cheesy

They are mostly like other CPUs, working on binary IEEE floating point data. In terms of raw throughput, you can't do better than an unconditional simple bit-operation.


« Last Edit: May 30, 2015, 07:59:55 PM by lycium »	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #37 on: May 30, 2015, 11:19:23 PM »

Quote from: lycium on May 30, 2015, 07:48:36 PM

Indeed - I was talking about branching, not abs.

Btw, I think your OR operation would make the number negative (IEEE754 numbers are multipled with (-1)^sign). You'd want to AND with 0x7fffffff instead. And AND'ing floats is not valid in neither C nor GLSL. So you have to do a reinterpretive cast. In GLSL this would be:

Code:

// Needs #version 330
float fabs(float f) {
return intBitsToFloat(floatBitsToInt(f) & 0x7fffffff);
}

Which *is* slower than using the builtin abs command (I tested)

The similar code in C would be (this snippet was found on the net):

Code:

float fastabs(float f) 
{int i=((*(int*)&f)&0x7fffffff);return (*(float*)&i);}


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Branchless maximum/principle axis

« Reply #38 on: May 30, 2015, 11:35:36 PM »

Ahhh yes, of course the sign bit should not be set for positive number


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #39 on: May 30, 2015, 11:55:48 PM »

Interestingly, this can be used to build a 'sign' function which is faster than the builtin function produces:

Code:

// Notice: returns +1 for 0 as input.
float fastSign(float f) {
   // Bitwise OR of the number '1' with the sign-bit of f
   return intBitsToFloat(0x3F800000 | (0x80000000 & floatBitsToInt(f) ));
}

vec3 fastSign3(vec3 f) {
   return vec3(fastSign(f.x),fastSign(f.y),fastSign(f.z));
}

Using this makes Eiffies function faster than the my branchy version.


« Last Edit: May 31, 2015, 12:32:28 AM by Syntopia »	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Branchless maximum/principle axis

« Reply #40 on: May 31, 2015, 12:00:19 AM »

maybe just floatBitsToInt(f) >> 31?


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

TruthSerum

Guest

Re: Branchless maximum/principle axis

« Reply #41 on: May 31, 2015, 12:02:18 AM »

Also a proper sign function should handle the edge case sign(0)==0, which I don't think can be achieved with a single bit shift.


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: Branchless maximum/principle axis

« Reply #42 on: May 31, 2015, 12:40:24 AM »

Quote from: lycium on May 31, 2015, 12:00:19 AM

maybe just floatBitsToInt(f) >> 31?

I don't think it will work: for a negative number, this will produce all binary 1's. (which is -1 as an integer). For a positive number, this will produce all binary 0's (which is 0 as an integer).

You could salvage it as:

Code:

float fastSign(float f) {
return float((floatBitsToInt(f) >> 31)<<1)+1; // "-1 << 1" is -2
}

But now you need a conversion from integer to float (instead of a reinterpretive cast which is free) and some additional operations. This makes this function somewhat slower than the one using AND and OR.


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Branchless maximum/principle axis

« Reply #43 on: May 31, 2015, 12:42:39 AM »

Sorry again, I really should have checked the rest of the code for context

Code:

float fastSign(float f) { return (floatBitsToInt(f) >> 31) ? -1.0f : 1.0f; }

float fastSignWithZero(float f) { return (f == 0) ? 0 : ((floatBitsToInt(f) >> 31) ? -1.0f : 1.0f); }

Edit: Actually, this seems silly, it should just be (f >= 0) ? 1 : -1.


« Last Edit: May 31, 2015, 12:53:38 AM by lycium »	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

TruthSerum

Guest

Re: Branchless maximum/principle axis

« Reply #44 on: June 04, 2015, 06:54:26 AM »

By the way, I noticed that 0.5+0.5*sign(x) (1) is slightly different from max(sign(x),0.0) (2)

With (1), if x=0.0, then it returns 0.5, which might not be desirable. This is not the case with (2), which returns 0.0 for x=0.0.

The point of this was to create a mask out of the sign of the value, presumably with the intention of using mix later, or similar, to select between two values without branching.


	Logged

Pages: 1 2 [3] 4 Go Down

« previous next »

	Author	Topic: Branchless maximum/principle axis (Read 14814 times)
		Description: Compute principle axis without branching
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	Maximum Zoom Factor Mandelbulb 3d	The Rev	2	3248	October 17, 2010, 08:36:39 PM by The Rev
	Maximum Security Prison Images Showcase (Rate My Fractal)	thom	0	1513	June 13, 2012, 04:20:27 AM by thom
	ability to set maximum render time per frame Feature Requests	erstwhile	4	4898	December 08, 2013, 10:30:52 PM by erstwhile
	principle of diminishing marginal productivity Images Showcase (Rate My Fractal)	thom	0	1576	April 28, 2015, 03:44:20 AM by thom
	Maximum render size Kalles Fraktaler	Dinkydau	4	2564	February 16, 2017, 11:32:12 PM by Dinkydau

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	January 10, 2026, 08:10:48 AM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again