Logo by LAR2 - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Did you know ? you can use LaTex inside Postings on fractalforums.com!
 
*
Welcome, Guest. Please login or register. March 28, 2024, 11:59:50 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1] 2 3 4   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Branchless maximum/principle axis  (Read 9511 times)
Description: Compute principle axis without branching
0 Members and 3 Guests are viewing this topic.
TruthSerum
Guest
« on: May 24, 2015, 06:52:40 PM »

How can I compute the principle axis of a vector without branching. By this I mean a function takes as input an arbitrary vector, and returns a unit vector that points along the main axis of the vector.

For example, if I have the vector (2.0, -4.0, 1.0), then the function I desire would return (0.0, -1.0, 0.0), since this input vector is "mostly" pointing in the negative Y direction.

Handling for corner cases has to be consistent, but never having more than a single element set to \pm1.0. So for example, (3.0, 3.0, 1.0) could be (1.0, 0.0, 0.0), or (0.0, 1.0, 0.0), but not (1.0, 1.0, 0.0).

I'm interested in the idea itself more than an implementation in any particular language. I'm sure this has been solved before.
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #1 on: May 24, 2015, 07:15:41 PM »

for this particular case i have no concrete solution, BUT any if query can be stated in terms of greater zero or smaller zero,

for example

if(a>1.0)then a=1.0
could be written (without branching)
a=min(a,1.0);

wink
Logged

---

divide and conquer - iterate and rule - chaos is No random!
TruthSerum
Guest
« Reply #2 on: May 24, 2015, 07:42:41 PM »

Thanks, I tried subtracting the smallest component, which should set it to zero.

Code:
av = abs(v);
v -= sign(v) * min(av.x,min(av.y,av.z));

This will not work for corner vectors though: (1,1,1) and others will all become (0,0,0).
Logged
hobold
Fractal Bachius
*
Posts: 573


« Reply #3 on: May 24, 2015, 10:37:40 PM »

Thanks, I tried subtracting the smallest component, which should set it to zero.

Code:
av = abs(v);
v -= sign(v) * min(av.x,min(av.y,av.z));

This will not work for corner vectors though: (1,1,1) and others will all become (0,0,0).
There is a standard trick to tweak algorithms which require distinct (non-equal) values. The basic idea is to append extra least significant digit(s) to the original input values.


A simplified illustrative example, with no regard to implementation:
Let's say the input vector's coordinates are stored with two fractional digits. So they can assume the values -1.00, -0.99. -0.98 ... and so on in increments of 0.01.

In order to make coordinate values unique, we append one extra digit, distinct and specific to each coordinate axis. Let's say "1" for the X axis, "2" for Y, and "3" for Z.

So, the problematic input vector (1, 1, 1) would turn into (1.001, 1.002, 1.003). All values are now unique.


In practice, with floating point values, it is probably easier to replace a few least significant bits uniquely instead of appending extra bits. So you would mask out the two least significant bits, and replace them uniquely for each coordinate with "00", "01", or "10".

This does make the algorithm imprecise. Depending on where the input data is coming from, the imprecision may be lost in the noise.
Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #4 on: May 24, 2015, 11:58:13 PM »

Love hobold's suggestion here. Very neat.

I got down to two compares robustly...

Code:
vec /= max(x,y,z)

if (abs(x)==1)
 return vec(x,0,0)
if (abs(y)==1)
 return vec(0,y,0)
return vec(0,0,z)

Losing a bit of precision is probably a good payoff for the conditionals!
Logged
TruthSerum
Guest
« Reply #5 on: May 25, 2015, 01:03:16 AM »

Thanks, this routine might do for now actually. I think a==b can be performed using a mask:

Code:
float m = abs(sign(a-b)); // m=0.0 if a==b, m=1.0 otherwise

But then you'd need to mask out the other values because you can't return, as in your routine. It probably just works out faster to branch anyway.

The offset (adding a small fraction) might not be necessary after all. I think that I tend to avoid epsilon/tolerance values if I can help it.
Logged
TruthSerum
Guest
« Reply #6 on: May 26, 2015, 06:22:44 PM »

Thanks to eiffie for his improved formula. Again, using two branches, but saves on the division for normalization.

Code:
vec3 paxis(vec3 p)
{
    vec3 a=abs(p),r = vec3(1.0,0.0,0.0);
    if(a.z>=max(a.x,a.y))r=r.yzx;
    else if(a.y>=a.x)r=r.zxy;
    return r*sign(p);
}
« Last Edit: May 27, 2015, 12:19:54 PM by TruthSerum » Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #7 on: May 26, 2015, 07:28:23 PM »

Eiffie's definitely going to win out on HLSL. Be careful if you're doing this on CPU using C++ because a lot of people do sign() using compares anyway to avoid having to understand their systems FP format, and vector swizzling isn't free on CPU...

The code I wrote is bugged btw, you'd need something like this...

Code:
vec /= max(abs(vec.x), abs(vec.y), abs(vec.z));

if (abs(vec.x)==1) vec = vector(vec.x,0,0);
else if (abs(vec.y==1) vec = vector(0,vec.y,0);
else vec = vector(0,0,vec.z);

return vec;

And anyway, the standard implementation of abs() in C++ uses a compare... (doh!)

Assuming HLSL has a floor function, how about this ugly little beast?

Code:
av = abs(vec)
vec /= max(av.x, max(av.y,av.z));
return floor(abs(vec))*vec;
Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #8 on: May 26, 2015, 07:35:06 PM »

Ahh ignore me. Same old problem with the vec(1,1,1) argument. sad So we'd need to use hobold's trick to solve that one...
Logged
TruthSerum
Guest
« Reply #9 on: May 26, 2015, 08:28:42 PM »

The (1,1,1) edge case is annoying indeed. I wonder about the floor of negative -1<x<0 values of x, because I think floor(-0.5) = -1, not 0.0 as you would need here. I don't think there is a function for that, trunc(), or similar, in GLSL.

Also, I had problems with x/x being exactly 1 (so the abs(x)==1.0 comparisons failed sometimes). This seems strange because I thought the FPU guaranteed this.

Thanks for the advice about CPU implementations. I don't think branching is as bad though on PC (x86/x64), or so I have heard.
Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #10 on: May 26, 2015, 09:59:24 PM »

Floor (abs (vec))×vec takes care of the rounding down issue with negative numbers.

Branch cost on CPU is a pipeline flush multiplied by the probability of misprediction. Probably 5-10 cycles average although i lose track of how deep the pipelines are these days...

On GPU the control logic is usually shared between multiple pipelines so 2 branches which split 50/50 reduces execution speed by a factor of 4.
Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #11 on: May 26, 2015, 11:49:35 PM »

And just in case anyone was after the branchless principle axis algorithm...

Code:
av = abs(vec)
vec /= max(av.x, max(av.y,av.z));
av = floor(abs(vec)) * vector(1,0.9,0.8)
av /= max(av.x, max(av.y, av.z));
vec = floor(av) * vec;

branch-free, untested.

The *(1, 0.9, 0.8 ) simulates the trick hobold mentioned.
If a single value is set, then the /= max() returns the value to 1.0.
If multiple values are set this prioritises x over y over z.

I doubt very much that this has any useful purpose other than making me feel satisfied. smiley
Logged
TruthSerum
Guest
« Reply #12 on: May 27, 2015, 12:40:35 AM »

I tested it and you will be glad to know that, with some small syntax fixes, it works perfectly. I will leave it here for completeness:

Code:
vec3 paxis(vec3 vec) {
    vec3 av = abs(vec);
    vec /= max(av.x, max(av.y,av.z));
    av = floor(abs(vec)) * vec3(1.0,0.9,0.8);
    av /= max(av.x, max(av.y, av.z));
    vec = floor(av) * vec;
    return vec;
}

I would say that the branching version that eiffie supplied seems better for practical purposes though, so I expect I will continue to use that.
Logged
dom767
Explorer
****
Posts: 40


dom767
WWW
« Reply #13 on: May 27, 2015, 12:57:01 AM »

Heh, thanks for testing it out. Using Eiffie's far simpler to comprehend (and probably most often faster) solution sounds reasonable to me.

Thanks for the challenge though, enjoyed that one. smiley
Logged
TruthSerum
Guest
« Reply #14 on: May 27, 2015, 12:35:36 PM »

I should mention that it seems to be slightly different to the other routine, maybe in the use of the small "epsilon" offsets.

(how it should look)

(artifacts visible)

If you revisit it you will have to double check those values. I didn't try playing with them to try and fix it, but I just thought I should mention that I am seeing artifacts under some circumstances.

Annoying because I thought it worked well too.
Logged
Pages: [1] 2 3 4   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Maximum Zoom Factor Mandelbulb 3d The Rev 2 2227 Last post October 17, 2010, 08:36:39 PM
by The Rev
Maximum Security Prison Images Showcase (Rate My Fractal) thom 0 859 Last post June 13, 2012, 04:20:27 AM
by thom
ability to set maximum render time per frame Feature Requests erstwhile 4 2661 Last post December 08, 2013, 10:30:52 PM
by erstwhile
principle of diminishing marginal productivity Images Showcase (Rate My Fractal) thom 0 997 Last post April 28, 2015, 03:44:20 AM
by thom
Maximum render size Kalles Fraktaler Dinkydau 4 1538 Last post February 16, 2017, 11:32:12 PM
by Dinkydau

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.14 seconds with 24 queries. (Pretty URLs adds 0.007s, 2q)