On the GPU, lookup tables are a big no-no, since the actual cache is quite small (only a few KB). In addition, there is dedicated hardware for computing sin and cos which computes with 1/4th to 1/8th the throughput of basic arithmetic. In fact, sin and cos are faster than division on current hardware (specifically, division is implemented by a reciprocal, which is performed by the same transcendental unit as performs sin or cos, followed by a multiplication.)! Also, the transcendental units are fully parallel to the standard ALUs, which means that so long as you don't compute more than 1 sin or cos for every 4 or 8 instructions, you essentially get them for free.
Thanks for that, I'm surprised that handling large areas of data isn't more optimised on GPUs given how much texture handling they normally do