Title: Intel Sandy Bridge and AVX extension Post by: ker2x on April 14, 2010, 04:23:19 PM
To be released in Q1 2011, Lot of shiny things, including : The size of the SIMD vector registers is increased from 128-bits XMM registers to 256-bits registers called YMM0 - YMM15. Existing 128-bit instructions use the lower half of the YMM registers. Further extensions to 512 or 1024 bits are expected in the future. woooooooooooooooooooooohooooooooooooooooo \o/ Title: Re: Intel Sandy Bridge and AVX extension Post by: hobold on April 14, 2010, 04:35:43 PM Beware, the first hardware implementations are unlikely to have full width SIMD ALUs. The 256 bits wide vectors will probably be processed as two halves of 128 bit, either occupying two (simple/integer) vector ALUs simultaneously, or one (complex/floating point) ALU for consecutive clock cycles.
Raw throughput will initially not be doubled. But future chip versions might upgrade the hardware to full width. Intel's SIMD instruction sets have a few other conceptual limitations (lack of generic permutes and a few other processing primitives that require more operands), but fractals are usually embarrassingly parallel. So for our purposes here, AVX should pave the way for ever faster rendering. Title: Re: Intel Sandy Bridge and AVX extension Post by: ker2x on April 14, 2010, 05:28:20 PM Beware, the first hardware implementations are unlikely to have full width SIMD ALUs. The 256 bits wide vectors will probably be processed as two halves of 128 bit, either occupying two (simple/integer) vector ALUs simultaneously, or one (complex/floating point) ALU for consecutive clock cycles. Raw throughput will initially not be doubled. But future chip versions might upgrade the hardware to full width. Intel's SIMD instruction sets have a few other conceptual limitations (lack of generic permutes and a few other processing primitives that require more operands), but fractals are usually embarrassingly parallel. So for our purposes here, AVX should pave the way for ever faster rendering. Indeed. This is just a first step to a shiny future. As far as i understood, there is no way, yet, to do some math on the 256bits registers. eg : a single instruction to add 8x32bits from 2 256bits registers, and put the result in a 3rd register. Title: Re: Intel Sandy Bridge and AVX extension Post by: hobold on April 14, 2010, 07:59:01 PM Well, the processor of the Xbox360 has a few of these "horizontal" instructions, but they are more a convenience than a true addition to the SIMD paradigm. If an algorithm is massively data parallel, and has relatively weak data dependencies, then a resourceful programmer can usually find a data layout that fits the hardware. And when the data flow patterns are trickier, you typically need something more general, like a permute, to implement them. At the moment it seems more likely that the GPU vendors will implement permute (and perhaps conditional split, I heard Nvidia calls it "warp reforming"), because they have more of an incentive to push their hardware to general purpose. Intel already has THE general purpose processors and there is less pressure to make the SIMD extensions more general as well. |