As graphics processor hardware now supports a number of different floating point data formats, it is important to understand how to select an appropriate format for your calculations, and why the choice is important. Even the simplest of operations benefit greatly from a little thought. Here is an instructive example that we recently came across.
If you are creating an animated shader in OpenGL ES, you will need to tell it the time. The obvious way to do this is to create a uniform variable called, let's say, animation_time, and use it to modify some aspect of the object we are drawing, such as its color, position, or texture. Let us look at some complications that can occur.
Here is a very simple animated pixel shader that uses the time to pulse a red light repeatedly:
uniform mediump float animation_time;gl_FragColor = vec3(fract(animation_time), 0.0, 0.0);
The fract function returns the fractional part of the animation_time value, which gives us a sawtooth-like ramp over time. This is placed in the red component of the fragment color, so that the object appears to be repeatedly pulsing red.
We pass the time to the shader using C code something like this. We will assume that we are running at a reasonable frame rate, say 30 frames per second (0.033 seconds per frame), and advance the time appropriately on each frame:
GLint location = glGetUniformLocation(myProgramObject, "animation_time");float animation_time = 0.0f;
glUniform1f(location, true, animation_time); /* ... do some rendering here ... */ /* Now advance the time (assume 30 frames/sec) */ animation_time += 0.033f; /* 0.033s = 33ms per frame */
However, there's a subtle problem here that may cause your animation to go off the rails pretty quickly on most (but annoyingly not all) implementations of OpenGL ES. On a typical implementation, the animation will get jerky, then lumpy, becoming worse and worse until it finally stops altogether after about a minute and a half.
The reason is pretty hard to spot. If we put a printf in the loop, we see nothing amiss. The time is smoothly increasing as we go along, and yet we still see the bad behaviour. Changing the C program to use a double instead of a float has no effect, so it's not a precision issue.
Or is it?
The key is the precision specifier on the uniform declaration in the first line of the shader program. The OpenGL ES shading language specifies that "mediump" precision variables need only have 10 bits of precision, which corresponds to representing numbers with about 3 significant decimal digits.
If your implementation uses this minimum (as they usually do - it allows the entire floating point value to fit neatly into 16 bits), the implicit conversion from C's float value to the internal value rounds off to that 3 decimal digits of precision. Since these are significant digits, the range of the value affects the absolute precision we have to work with. I will use decimal precision to illustrate what is happening, because it's easier than fiddling with binary, and it shows the problem just as well.
Initially, we see no problem. The values are small enough that 3 digits are enough to represent them exactly.
As we pass the 1-second mark, we can no longer represent the values in three significant digits and we start to lose precision. The progress of the animation is no longer as smooth. On some frames we progress by 0.03 units, on some by 0.04.
And as we reach 10 seconds, the problem gets a lot worse, as we see no change in color between multiple frames.
After 100 seconds, our 3 digits of precision give us no fractional part at all and the animation effectively halts forever.
Of course, in real life, the floating-point values are represented in binary, so the degradation happens in smaller steps at powers of two, but the principle is the same. With a typical 16-bit implementation of mediump, we start seeing degradation after a few seconds, and complete failure in less than two minutes.
Upping the precision of the uniform animation_time to highp may help in some systems, but by no means all. The OpenGL ES spec sets minimum limits on highp which allow hardware to implement it with the same precision as mediump in fragment shaders. There is an expectation that fragment shaders will be dealing primarily with colors, where a restricted range is not a big issue, so this is exactly what many implementations in fact do.
Luckily, there's a simple solution that works universally.
In many animations, we want to repeat our animation over a finite time period. In our example here, we just used fract so the period was one second. Since we are discarding the integral value of the time, we could not bother passing it in at all and it would have exactly the same effect.
So, we remove the call to fract in the shader, and instead put it in the C code. That means that we do not need to send a time in that increases without bounds, and we can guarantee that we stay within the range where we get good precision.
Different animations may use different periods. If you are using sin or cos, it makes sense to either clamp the input time into the range 0 to 2π, or keep it to 0 to 1 and scale by 2π inside the shader.
The mention of 2π brings up one last point. A symmetric range -n to +n is twice as precise as the equivalent range 0 to 2n since the sign bit is always present and does not count against our allocation of precision bits. Only the absolute magnitude of n counts. For sin and cos, using the range -π to π is therefore a better choice than 0 to 2π, and is often more convenient as well.
In general, please consider carefully both the precision and the range of the numbers you are passing into your shaders, especially fragment shaders. Using too big a range wastes precision, and when precision is limited, the effects aren't pretty.
Have you had interesting experiences with floating point precision? Why not share them in the comments?
Recently we have been receiving a lot of questions asking about which of our GPU's handle what levels of precision. Thanks to peterharris for the info.
ARM Mali Midgard range of GPU's:
ARM Mali Utgard range of GPU's:
*We have one special "fast path" for varyings used directly as texture coordinates which is actually fp24.