Entries for tag "math", ordered from most recent. Entry count: 67.
# How to Correctly Interpolate Vertex Attributes on a Parallelogram Using Modern GPUs?
Thu
13
Feb 2020
This is probably first guest post ever on my blog. It has been written by my friend Łukasz Izdebski Ph.D.
In a nutshell, today’s graphics cards render meshes using only triangles, as shown on the picture below.

I don’t want to describe how it is done (a lot of information on this topic can be easily found on the Internet) and describe the whole graphics pipeline, but for a short recap: In the first programmable part of the rendering pipeline, vertex shader receives a single vertex (with assigned data about it that I will later refer to) and outputs one vertex which is transformed by the shader. Usually it is a transformation from 3D coordinate system to Normalized Device Coordinates (NDC) (information about NDC can be found in Coordinate Systems). After primitive clipping, perspective divide, and viewport transform, vertices are projected on the 2D screen, which is drawn to the monitor output.
But this is only half the story. I was describing what is happening with vertices, but at the beginning I mentioned triangles, with three vertices forming one triangle. After vertex processing, Primitive Assembly follows, then the next stage is Rasterization. This is very important because in this stage the generation of fragments (pixels) happens, which are lying inside the triangle, as shown on the picture below.

How are colors of the pixels inside the triangle generated? Each vertex can contain not only one set of data - 3D coordinates in the virtual world, but also additional data called attributes. Those attributes can be color of the vertex, normal vector, texture coordinate etc.
How then those rainbow colors are generated as shown on the picture above, while as I said only one color can be set on three vertices of the triangle? The answer is interpolation. As we can read in Wikipedia, interpolation in mathematics is a type of estimation, a method that can be used to generate new data points between a discrete set of known data points. In the described problem it’s about generating interpolated colors inside the rendered triangle.
The way to achieve this is by using barycentric coordinates. (Those coordinates not only can be used for interpolation but additionally define if a fragment lies inside the triangle or not. More on this topic can be read in Rasterization: a Practical Implementation). In short, barycentric coordinate system is a set of three points λ1, λ2, λ3 ≥ 0 (for convex polygons) where λ1 + λ2 + λ3 = 1. When a triangle is rasterized, for every fragment of the triangle proper barycentric coordinates are calculated. Then the color of the fragment C can be calculated by weighted sum of colors at each vertex C1, C2, C3, and weights are of course barycentric coordinates: C = C1 * λ1 + C2 * λ2 + C3 * λ3
This way not only color can be interpolated, but any vertex attribute. (This functionality can be disabled by using proper Interpolation Qualifier on an attribute in vertex shader source code, like flat).
When dealing with triangle meshes, this way of attribute interpolation gives correct results, but when rendering 2D sprites or font glyphs, some disadvantages may occur under specific circumstances. When we want to render a gradient which starts in one of the corners of a sprite (see the picture below), we can see quite ugly results :(

This is happening because interpolation occurs on two triangles independent of each other. Graphics cards can work only on triangles, not quads. In this case we want the interpolation to occur on a quad not triangle, as pictured below:

How to trick the graphics card and force quadrilateral attributes interpolation? One way to render such type of gradients is by using tessellation – subdivide quad geometry. Tessellation shader is available starting from DirectX 11, OpenGL 4.0, and Vulkan 1.0. A simple example of how it will look like depending on different parameters of tessellation (more details about tessellation can be found in Tessellation Stages) can be seen in the animated picture below.

As we can see, when the quad is subdivided to more than 16 pieces, we are getting the desired visual result, but as my high school teacher used to say: “Don’t shoot a fly with a cannon”, and this solution for rendering such a simple thing by using tessellation is an overkill.
This way I developed a new technique that will be helpful to achieve this goal. First, we need to get access to the barycentric coordinates in the fragment shader. DirectX 12 HLSL gives us those coordinates when using SV_Barycentrics. In Vulkan those coordinates are available in AMD VK_AMD_shader_explicit_vertex_parameter extension and Nvidia VK_NV_fragment_shader_barycentric extension. Maybe in the near future it will be available in the core spec of Vulkan and more importantly from all hardware vendors.
If we are not fortunate to have those coordinates as built-in functions, we can generate them by adding some additional data: one new vertex attribute and one new uniform (constant) value. Here are the details of this solution. Consider a quadrilateral built from four vertices and two triangles as shown in the picture below.

Additional attribute Barycentric in vertices is a 2D vector and should contain following values:
A = (1,0) B = (0,0) C = (0,1) D = (0,0)
Next step is to calculate extra constant data for the parameter to be interpolated as shown in the picture above (in this case color attribute), using the equation:
ExtraColorData = - ColorAtVertexA + ColorAtVertexB - ColorAtVertexC + ColorAtVertexD
The fragment shader that renders the interpolation we are looking for looks like this:
/////////////////////////////////////////////////////////////////////
//GLSL Fragment Shader
/////////////////////////////////////////////////////////////////////
#version 450
#extension GL_ARB_separate_shader_objects : enable
layout(binding = 0) uniform CONSTANT_BUFFER
{
vec4 ExtraColorData;
} cbuffer;
in block
{
vec4 Color;
vec2 Barycentric;
} PSInput;
layout(location = 0) out vec4 SV_TARGET;
void main()
{
SV_TARGET = PSInput.Color + PSInput.Barycentric.x * PSInput.Barycentric.y * cbuffer.ExtraColorData;
}
/////////////////////////////////////////////////////////////////////
//HLSL Pixel Shader
/////////////////////////////////////////////////////////////////////
cbuffer CONSTANT_BUFFER : register(b0)
{
float4 ExtraColorData;
};
struct PSInput
{
float4 color : COLOR;
float2 barycentric : TEXCOORD0;
};
float4 PSMain(PSInput input) : SV_TARGET
{
return input.color + input.barycentric.x * input.barycentric.y * ExtraColorData;
}
That’s all. As we can see, it should not have a big performance overhead. When barycentric coordinates will be more available, then memory overhead will also be minimal.
Probably the reader will ask the question if this is looking good in the 3D perspective scenario, not only when a triangle is parallel to the screen?

As shown in the picture above, hardware properly interpolates data using additional computation as describe here (Rasterization: a Practical Implementation), so this new method works with perspective as it should.
Does this method give proper results only on squares?
This is a good question! The solution described above works on all parallelograms. I’m now working on a solution for all convex quadrilaterals.
What else can we use this method for?
One more usage comes to my mind: a post-process fullscreen quad. As I mentioned earlier, graphics cards do not render quads, but triangles. To simulate proper interpolation of attributes, 3D engines render one BIG triangle which covers the whole screen. With this new approach, rendering quad built from two triangles can be available and attributes which are needed to be quadrilateral interpolated can be calculated in the way shown above.
Comments | #math #rendering Share
# Operations on power of two numbers
Sun
09
Sep 2018
Numbers that are powers of two (i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and so on...) are especially important in programming, due to the way computers work - they operate on binary representation. Sometimes there is a need to ensure that certain number is power of two. For example, it might be important for size and alignment of some memory blocks. This property simplifies operations on such quantities - they can be manipulated using bitwise operations instead of arithmetic ones.
In this post I'd like to present efficient algorithms for 3 common operations on power-of-2 numbers, in C++. I do it just to gather them in one place, because they can be easily found in many other places all around the Internet. These operations can be implemented using other algorithms as well. Most obvious implementation would involve a loop over bits, but that would give O(n) time complexity relative to the number of bits in operand type. Following algorithms use clever bit tricks to be more efficient. They have constant or logarithmic time and they don't use any flow control.
1. Check if a number is a power of two. Examples:
IsPow2(0) == true (!!) IsPow2(1) == true IsPow2(2) == true IsPow2(3) == false IsPow2(4) == true IsPow2(123) == false IsPow2(128) == true IsPow2(129) == false
This one I know off the top of my head. The trick here is based on an observation that a number is power of two when its binary representation has exactly one bit set, e.g. 128 = 0b10000000. If you decrement it, all less significant bits become set: 127 = 0b1111111. Bitwise AND checks if these two numbers have no bits set in common.
template <typename T> bool IsPow2(T x)
{
return (x & (x-1)) == 0;
}
2. Find smallest power of two greater or equal to given number. Examples:
NextPow2(0) == 0 NextPow2(1) == 1 NextPow2(2) == 2 NextPow2(3) == 4 NextPow2(4) == 4 NextPow2(123) == 128 NextPow2(128) == 128 NextPow2(129) == 256
This one I had in my library for a long time.
uint32_t NextPow2(uint32_t v)
{
v--;
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16;
v++;
return v;
}
uint64_t NextPow2(uint64_t v)
{
v--;
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16; v |= v >> 32;
v++;
return v;
}
3. Find largest power of two less or equal to given number. Examples:
PrevPow2(0) == 0 PrevPow2(1) == 1 PrevPow2(2) == 2 PrevPow2(3) == 2 PrevPow2(4) == 4 PrevPow2(123) == 64 PrevPow2(128) == 128 PrevPow2(129) == 128
I needed this one just recently and it took me a while to find it on Google. Finally, I found it in this post on StackOveflow.
uint32_t PrevPow2(uint32_t v)
{
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16;
v = v ^ (v >> 1);
return v;
}
uint64_t PrevPow2(uint64_t v)
{
v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
v |= v >> 16; v |= v >> 32;
v = v ^ (v >> 1);
return v;
}
Update 2018-09-10: As I've been notified on Twitter, C++20 is also getting such functions as standard header <bit>.
Comments | #math #c++ #algorithms Share
# Pitfalls of Floating-Point Numbers - Slides
Sat
24
Sep 2016
Just as I announced in my previous blog post, today I gave a lecture on a "Kariera IT" event - organized by CareerCon, dedicated to IT jobs.
Here you can find slides from my presentation, in Polish. It's called "Pułapki liczb zmiennoprzecinkowych" ("Pitfalls of floating-point numbers").
Here are links to the Floating-Point Formats Cheatsheet (in English) that I mentioned in my presentation:
Comments | #events #teaching #math Share
# Watch out for reduced precision normalize/length in OpenGL ES
Mon
27
Apr 2015
GLSL language for OpenGL ES introduces concept of precision. You can annotate variable declaration (both float and int/uint) with a precision qualifier: highp, mediump or lowp, like:
mediump float a = 3.0;
You can also specify default precision qualifier by using precision statement. Language specification defines minimum required range and precision for each precision qualifier.
highp basically means normal, single-precision, 32-bit float (IEEE 754), as we know it from CPU programming.mediump is said to have have range of at least -2^14 ... 2^14 and relative precision 2^-10, so it can be, for example, implemented using a 16-bit, half-precision float.lowp is said to have range at least -2 ... 2 and absolute precision 2^-8, so basically it can be stored as a 10-bit, fixed-point number.GPU vendors are free to use more precise data types, or even full 32-bit float for all of them. What exact precision is used depends on specific GPU and maybe even operating system or graphics driver version. Using smaller data types can occupy less memory, calculate faster and consume less battery power. But it comes at the price of reduced precision and range of these numbers. Tom Olson wrote interesting articles about this: "Benchmarking floating-point precision in mobile GPUs": Part I, Part II, Part III.
In this post I'd like to warn you against a specific problem related to it - usage of length(), normalize() and distance() functions. Using smaller data types not only limits precision in terms of number of significant digits, but also available range (over which the value will saturate to -INF/+INF). For mediump, this range is defined as +/-2^14, which is only 16384.
This may still look like a lot, but let's remember that calculating vector length involves intermediate value that it sum of squares of this components. This can grow very big before a square root is applied. For example, for 3D vector:
length(a) = sqrt(a.x*a.x + a.y*a.y + a.z*a.z)
If you do this operation on a mediump vector, the term a.x*a.x + a.y*a.y + a.z*a.z can exceed maximum value for vector as small as (74.0, 74.0, 74.0). It can be very dangerous if you do something like this in your fragment shader:
precision mediump float;
uniform vec3 light_pos;
...
void main()
{
...
vec3 dir_to_light = normalize(pos - light_pos);
// Calculate your lighting and so on...
You might ask: Why isn't this intermediate value stored in high precision before taking its square root to avoid this overflow problem? Obviously it could be, as precision in any place of the shader is free to be higher than the minimum allowed in that place, so some GPU vendors can do it this way, but you shouldn't rely on this. GLSL specification clearly says that the shader is free to use same, reduced precision for intermediate values.
The precision used to internally evaluate an operation, and the precision qualification subsequently associated with any resulting intermediate values, must be at least as high as the highest precision qualification of the operands consumed by the operation.
Conclusion is: When you write shaders for OpenGL ES, watch out for operations that involve calculating vector length (or dot product) like length(), normalize(), distance(), use highp precision for vectors involved and remember that what works on one GPU due to using precision higher than minimum required, may not work on another GPU and it’s still an application issue.
Comments | #math #opengl Share
# Floating-Point Formats Cheatsheet
Thu
13
Jun 2013
Floating-point numbers (or floats in short) are not as simple as integer numbers. There is much to be understood when dealing with these numbers on low level - basic things like the sign + exponent + significand representation (and that exponent is biased, while significand has implicit leading 1), why you should never compare calculation results operator ==, that some fractions with finite decimal representation cannot be represented exactly in binary etc., as well as why there are two zeros -0 and +1, what are infinite, NaN (Not a Number) and denorm (denormal numbers) and how they behave. I won't describe it here. It's not an arcane knowledge - you can find many information about this on the Web, starting from Wikipedia article.
But after you understand these concepts, quantitative questions come to mind, like: how many significant decimal digits can we expect from precision of particular float representation (half, single, double)? What is the minimum non-zero value representable in that format? What range of integers can we represent exactly? What is the maximum value? And finally: if our game crashes with "Access violation, reading location 0x3f800000", what chances are that we mistaken pointer for a float number, as this is one of common values, meaning 1.0?
So to organize such knowledge, I created a "Floating-Point Formats" cheatsheet:
Floats_cheatsheet.pdf
Floats_cheatsheet.docx
Comments | #productions #math Share
# SBX - Scale-Bias Transform
Wed
23
Jan 2013
A class of 2D or 3D point and vector transformations called affine transformations (that is - linear transformation plus translation) can be conveniently represented by matrix. But sometimes we don't need possibility of input x to influence output y or vice versa. For example, that's the case when we want to convert coordinates of mouse cursor from pixels, like we take it from WinAPI (where X goes right from 0 to e.g. 1280 and Y goes down from 0 to e.g. 720) to post-projective 3D space (where X goes right from -1.0 to +1.0 and Y goes up from -1.0 on the bottom to +1.0 on the top). Then every component can be transformed independently using linear transform, like this:
Parameters are: scale.xy, bias.xy.
Given input point i.xy, output point o.xy is:
o.x = i.x * scale.x + bias.x
o.y = i.y * scale.y + bias.y
This seems trivial, but what if we wanted to encapsulate such transformation in a new data structure and define operations on it, like we do on matrices? Let's call it SBX - Scale-Bias Transform. I propose following structure:
struct ScaleBiasXform
{
static const ScaleBiasXform IDENTITY;
vec2 Scale;
vec2 Bias;
};
And following functions:
void Construct(ScaleBiasXform& out,
const vec2& input1, const vec2& output1,
const vec2& input2, const vec2& output2);
void Scaling(ScaleBiasXform& out, float scale);
void Scaling(ScaleBiasXform& out, const vec2& scale);
void Translation(ScaleBiasXform& out, const vec2& vec);
void Inverse(ScaleBiasXform& out, const ScaleBiasXform& sbx);
void Combine(ScaleBiasXform& out, const ScaleBiasXform& sbx1, const ScaleBiasXform& sbx2);
void TransformPoint(vec2& out, const vec2& point, const ScaleBiasXform& sbx);
void UntransformPoint(vec2& out, const vec2& point, const ScaleBiasXform& sbx);
void TransformVector(vec2& out, const vec2& vec, const ScaleBiasXform& sbx);
void UntransformVector(vec2& out, const vec2& vec, const ScaleBiasXform& sbx);
Full source code with additional comments and function bodies is here: ScaleBiasTransform.hpp, ScaleBiasTransform.cpp. Some additional notes:
# Particle System - How to Store Particle Age?
Wed
12
Dec 2012
Particle effects are nice because they are simple and look interesting. Besides, coding it is fun. So I code it again :) Particle systems can have state (when parameters of each particle are calculated based on previous values and time step) or stateless (when parameters are always recalculated from scratch using fixed function and current time). My current particle system has the state.
Today a question came to my mind about how to store age of a particle to delete it after some expiration time, determined by the emitter and also unique for each particle. First, let's think for a moment about the operations we need to perform on this data. We need to: 1. increment age by time step 2. check if particle expired and should be deleted.
If that was all, the solution would be simple. It would be enough to store just one number, let's call it TimeLeft. Assigned to the particle life duration at the beginning, it would be:
Step with dt: TimeLeft = TimeLeft - dt
Delete if: TimeLeft <= 0
But what if we additionally want to determine the progress of the particle lifetime, e.g. to interpolate its color of other parameters depending on it? The progress can be expressed in seconds (or whatever time unit we use) or in percent (0..1). My first idea was to simply store two numbers, expressed in seconds: Age and MaxAge. Age would be initialized to 0 and MaxAge to particle lifetime duration. Then:
Step with dt: Age = Age + dt
Delete if: Age > MaxAge
Progress: Age
Percent progress: Age / MaxAge
Looks OK, but it involves costly division. So I came up with an idea of pre-dividing everything here by MaxAge, thus defining new parameters: AgeNorm = Age / MaxAge (which goes from 0 to 1 during particle lifetime) and AgeIncrement = 1 / MaxAge. Then it gives:
Step with dt: AgeNorm = AgeNorm + dt * AgeIncrement
Delete if: AgeNorm > 1
Progress: Age / AgeIncrement
Percent progress: AgeNorm
This needs additional multiplication during time step and division when we want to determine progress in seconds. But as I consider progress in percent more useful than the absolute progress value in seconds, that's my solution of choice for now.
Comments | #math #rendering Share
# DirectXMath - A First Look
Sun
15
Apr 2012
Programmers using DirectX have access to the accompanying library full of routines that support various 3D math calculations. First it was D3DX - auxilliary library tightly coupled with DirectX itself. It's still there, but some time ago Microsoft released a new library called XNA Math. I've written about it here. This library also ships with latest versions of DirectX SDK (which was last updated in Jun 2010), but it is something completely new - a more independent, small, elegant and efficient library, portable between Windows (where it can use SSE2 or old FPU) and Xbox 360.
Now Microsoft comes up with another library called DirectXMath. See the official documentation and introductory blog post for details. As a successor of XNA Math, it looks very similar. Main differences are:
Sounds great? Well, I didn't tell yet how to get it. It's not shipped with DirectX SDK. You need the SDK for Windows 8. They say that:
The library is annotated using SAL2 (rather than the older VS-style or Windows-style annotations), so requires the standalone Windows SDK for Windows 8 Consumer Preview or some additional fixups to use with Visual C++ 2010.
So probably I'm not going to switch to this new library anytime soon :P