Blog

# Tips for Using Perforce

Sun
27
Jun 2021

Version Control Systems are tools that every programmer should use. Among them, Git is probably the most popular one. Some companies use Perforce instead. Whether it is better or worse is hard to tell, but it has its advantages that make it indispensable in some types of projects, like game development. Perforce handles large binary files very well. Even if the files have tens or a hundred of gigabytes, it still works fine. I talk about the size of one local copy here, not the entire repository on the server.

From user’s perspective, Perforce differs greatly from Git or SVN. Not only commands are named differently (e.g. there is “Submit” instead of “Commit”), but the whole concept of “changelists” is something that needs to be well understood to be used efficiently. While working with Perforce for many years in different companies and projects, I learned some good practices that I would like to share here. Writing them down was difficult as they seem obvious to me, but hopefully some of them are not obvious to you so you will learn something new.

1. Paste paths to address bar

Let’s start with a simple one. Perforce window has a text box on the top that resembles address bar in web browsers. It shows the path of the currently selected file or directory in Depot or Workspace tab. It can also accept input.

When you work on some file in another tool and you want to jump quickly to it in Perforce, e.g. to check it out, just copy the full path of the file to system clipboard and paste it in this “address bar”. Selection in Workspace tab will switch to it immediately.

Read full entry > | Comments | #tools Share

# Intrusive Linked List in C++

Tue
25
May 2021

A doubly linked list is one of the most fundamental data structures. Each element contains, besides the value we want to store in this container, also a pointer to the previous and next element. This may not be the best choice for indexing i-th element or even traversing all elements quickly (as they are scattered in memory, performance may suffer because of poor cache utilization), but inserting an removing an element from any place in the list is quick.


Source: Doubly linked list at Wikipedia.

Inserting and removing elements is quick, but not necessarily very simple in terms of code complexity. We have to change pointers in the current element, previous and next one, as well as handle special cases when the current element is the first one (head/front) or the last one (tail/back) – a lot of special cases, which may be error-prone.

Therefore it is worth to encapsulate the logic inside some generic, container class. This authors of STL library did by defining List class inside #include <list>. It is a template, where each item of the list will contain our type T plus additional data needed – most likely pointer to the next and previous item.

struct MyStructure {
int MyNumber;
};
std::list<MyStructure> list;
list.push_back(MyStructure{3});

In other words, our structure is contained inside one that is defined internally by STL. After resolving template, it may look somewhat like this:

struct STL_ListItem {
STL_ListItem *Prev, *Next;
MyStructure UserData;
};

What if we want to do the opposite – to contain “utility” pointers needed to implement the list inside our custom structure? Maybe we have a structure already defined and cannot change it or maybe we want each item to be a member of two different lists, e.g. sorted by different criteria, and so to contain two pairs of previous-next pointers? A definition of such structure is easy to imagine, but can we still implement some generic class of a list to hide all the complex logic of inserting and removing elements, which would work on our own structure?

struct MyStructure {
int MyNumber = 0;
MyStructure *Prev = nullptr, *Next = nullptr;
};

If we could do that, such data structure could be called an “intrusive linked list”, just like an “intrusive smart pointer” is a smart pointer which keeps reference counter inside the pointed object. Actually, all that our IntrusiveLinkedList class needs to work with our custom item structure, besides the type itself, is a way to access the pointer to the previous and next element. I came up with an idea to provide this access using a technique called “type traits” – a separate structure that exposes specific interface to deliver information on some other type. In our case, it is to read (for const pointer) or access by reference (for non-const pointer) the previous and next pointer.

The traits structure for MyStructure may look like this:

struct MyStructureTypeTraits {
typedef MyStructure ItemType;
static ItemType* GetPrev(const ItemType* item) { return item->Prev; }
static ItemType* GetNext(const ItemType* item) { return item->Next; }
static ItemType*& AccessPrev(ItemType* item) { return item->Prev; }
static ItemType*& AccessNext(ItemType* item) { return item->Next; }
};

By having this, we can implement a class IntrusiveLinkedList<ItemTypeTraits> that will hold a pointer to the first and last item on the list and be able to insert, remove, and do other operations on the list, using a custom structure of an item, with custom pointers to previous and next item inside.

IntrusiveLinkedList<MyStructureTypeTraits> list;

list.PushBack(new MyStructure{1});
list.PushBack(new MyStructure{2});

for(MyStructure* i = list.Front(); i; i = list.GetNext(i))
printf("%d\n", i->MyNumber); // prints 1, 2

while(!list.IsEmpty())
delete list.PopBack();

I know this is nothing special, there are probably many such implementations on the Internet already, but I am happy with the result as it fulfilled my specific need elegantly.

To see the full implementation of my IntrusiveLinkedList class, go to D3D12MemAlloc.cpp file in D3D12 Memory Allocator library. One caveat is that the class doesn't allocate or free memory for the list items – this must be done by the user.

Comments | #algorithms #c++ Share

# VkExtensionsFeaturesHelp - My New Library

Thu
01
Apr 2021

I had this idea for quite some time and finally I've spent last weekend coding it, so here it is: 611 lines of code (and many times more of documentation), shared for free on MIT license:

** VkExtensionsFeaturesHelp **

Vulkan Extensions & Features Help, or VkExtensionsFeaturesHelp, is a small, header-only, C++ library for developers who use Vulkan API. It helps to avoid boilerplate code while creating VkInstance and VkDevice object by providing a convenient way to query and then enable:

  • instance layers
  • instance extensions
  • instance feature structures
  • device features
  • device extensions
  • device feature structures

The library provides a domain-specific language to describe the list of required or supported extensions, features, and layers. The language is fully defined in terms of preprocessor macros, so no custom build step is needed.

Any feedback is welcome :)

Comments | #productions #vulkan #rendering Share

# Myths About Floating-Point Numbers

Wed
17
Mar 2021

Floating-point numbers are a great invention in computer science, but they can also be tricky and troublesome to use correctly. I’ve written about them already by publishing Floating-Point Formats Cheatsheet and presentation “Pitfalls of floating-point numbers” (“Pułapki liczb zmiennoprzecinkowych” – the slides are in Polish). Last year I was preparing for a more extensive talk about this topic, but it got cancelled, like pretty much everything in these hard times of the COVID-19 pandemic. So in this post, I would like to approach this topic from a different angle.

A programmer can use floating-point numbers on different levels of understanding. A beginner would use them, trusting they are infinitely capable and precise, which can lead to problems. An intermediate programmer knows that they have some limitations, and so by using some good practices the problems can be avoided. An advanced programmer understands what is really going on inside these numbers and can use them with a full awareness of what to expect from them. This post may help you jump from step 2 to step 3. Commonly adopted good practices are called “myths” here, but they are actually just generalizations and simplifications. They can be useful for avoiding errors, unless you understand what is true and what is false about them on a deeper level.

1. They are not exact

It is not true that 2.0 + 2.0 can give 3.99999. It will always be 4.0. They are exact to the extent of their limited range and precision. If you assign a floating-point number some constant value, you can safely compare it with the same value later, even using the discouraged operator ==, as long as it is not a result of some calculations. Imprecisions don't come out of nowhere.

Instead of using integer loop iterator and converting it to float every time:

for(size_t i = 0; i < count; ++i)
{
    float f = (float)i;
    // Use f
}

You can do this, which will result in a much more efficient code:

for(float f = 0.f; f < (float)count; f += 1.f)
{
    // Use f
}

It is true, however, that your numbers may not look exactly as expected because:

  • Some fractions cannot be represented exactly – even some simple ones like decimal 0.1, which is binary 0.0001101… This is because we humans normally use decimal system, while floating-point numbers, like other numbers inside computers, use binary system – a different base.
  • There is a limited range of integer numbers that can be represented exactly. For 32-bit floats it is only 16,777,216. Above that, numbers start “jumping” every 2, then every 4, etc. So it is not a good idea to use floating-point numbers to represent file sizes if your files are bigger than 16 MB. If count in the example above was >16M, it would cause an infinite loop.

64-bit “double”, however, represents integers exactly up to 9,007,199,254,740,992, so it should be enough for most applications. No wonder that some scripting languages do just fine while supporting only “double” floating-point numbers and no integers at all.

2. They are non-deterministic

It is not true that cosmic radiation will flip the least significant bit at random. Random number generators are also not involved. If you call the same function with your floating-point calculations with same input, you will get the same output. It is fully deterministic, like other computing. (Note: When old FPU instructions are generated rather than new SSE, this can be really non-deterministic and even a task switch may alter your numbers. See this tweet.)

It is true, however, that you may observe different results because:

  • Compiler optimizations can influence the result. If you implement two versions of your formula, similar but not exactly the same, the compiler may, for example, optimize (a * b + c) from doing MUL + ADD to FMA (fused multiply-add) instruction, which does the 3-argument operation in one step. FMA has higher precision, but can then give a different result than two separate instructions.
  • You may observe different results on different platforms – e.g. AMD vs Intel CPU or AMD vs NVIDIA GPU. This is because floating-point standard (IEEE 754) defines only required precision of operations like sin, cos, etc., so the exact result may vary on the least significant bit.

I heard a story of a developer who tried to calculate hashes from the results of his floating-point calculations in a distributed system and discovered that records with what was supposed to be same data had different hashes on different machines.

I once had to investigate a user complaint about a following piece of shader code (in GLSL language). The user said that on AMD graphics cards for uv.x higher than 306 it always returns black color (zero).

vec4 fragColor = vec4(vec3(fract(sin(uv.x * 2300.0 * 12000.0))), 1.0);

I noticed that the value passed to sine function is very high. For uv.x = 306 it is 27,600,000. If we recall from math classes that sine cycles between -1 and 1 every 2*PI ≈ 6.283185 and we take into consideration that above 16,777,216 a 32-bit float cannot represent all integer numbers exactly, but start jumping every 2, then every 4 etc., we can conclude that we have not enough precision to know whether our result should be -1, 1, or anything in between. It is just undefined.

I then asked the user what is he trying to achieve with this code, as the result is totally random. He said it is indeed suppposed to be... a random number generator. The problem is that the result being always 0 is as valid as any other. The reason random numbers are generated on NVIDIA cards and not on AMD is that sine instruction on AMD GPU architectures actually has period of 1, not 2*PI. But it is still fully deterministic in regards to input value. It just returns different results between different platforms.

3. NaN and INF are indication of an error

It is true that if you don’t expect them, their appearance may indicate an error, either in your formulas or in input data (e.g. numbers very large, very small and close to zero, or just garbage binary data). It is also true that they can cause trouble as they propagate through calculations, e.g. every operation with NaN returns NaN.

However, it is not true that these special values are just a means of returning error or that they are not useful. They are perfectly valid special cases of the floating-point representation and have clearly defined behavior. For example, -INF is smaller and +INF is larger than any finite number. You can use this property to implement following function with a clearly documented interface:

#include <limits>

// Finds and returns maximum number from given array.
// For empty array returns -INF.
float CalculateMax(const float* a, size_t count)
{
    float max = -std::numeric_limits<float>::infinity();
    for(size_t i = 0; i < count; ++i)
        if(a[i] > max)
            max = a[i];
    return max;
}

Summary

As you can see, common beliefs about floating-point numbers - that they are not exact, non-deterministic, or that NaN and INF are an indication of an error, are some generalizations and simplifications that can help to avoid errors, but they don’t tell the full story. To really understand what's going on on a deeper level:

  • Keep in mind which values in your program are just input data or constants and which are results of some calculations.
  • Know the capabilities and limitations of floating point types - their maximum range, minimum possible number, precision in terms of binary or decimal places, maximum interger represented exactly etc.
  • Learn about how floating point numbers are stored, bit by bit.
  • Learn about special values - INF, NaN, positive and negative zero, denormals. Understand how they behave in computations.
  • Take a look at assembly generated by the compiler to see how CPU or GPU really operates on your numbers.

Update 2021-06-09: This article has been published as a guest post on C++ Stories and spawned an interesting discussion on Reddit that is worth reading.

Comments | #math Share

# Vulkan Memory Types on PC and How to Use Them

Sun
21
Feb 2021

Allocation of memory for buffers and textures is one of the fundamental things we do when using graphics APIs, like DirectX or Vulkan. It is of my particular interest as I develop Vulkan Memory Allocator and D3D12 Memory Allocator libraries (as part of my job – these are not personal projects). Although underlying hardware (RAM dice and GPU) stay the same, different APIs expose them differently. I’ve described these differences in detail in my article “Differences in memory management between Direct3D 12 and Vulkan”. I also gave a talk “Memory management in Vulkan and DX12” at GDC 2018 and my colleague Ste Tovey presented much more details in his talk “Memory Management in Vulkan” at Vulkanised 2018.

In this article, I would like to present common patterns seen on the list of memory types available in Vulkan on Windows PCs. First, let me recap what the API offers: Unlike in DX12, where you have just 3 default “memory heap types” (D3D12_HEAP_TYPE_DEFAULT, D3D12_HEAP_TYPE_UPLOAD, D3D12_HEAP_TYPE_READBACK), in Vulkan there is a 2-level hierarchy, a list of “memory heaps” and “memory types” inside them you need to query and that can look completely different on various GPUs, operating systems, and driver versions. Some constraints and guarantees apply, as described in Vulkan specification, e.g. there is always some DEVICE_LOCAL and some HOST_VISIBLE memory type.

A memory heap, as queried from vkGetPhysicalDeviceMemoryProperties and returned in VkMemoryHeap, represents some (more or less) physical memory, e.g. video RAM on the graphics card or system RAM on the motherboard. It has some fixed size in bytes, and current available budget that can be queried using extension VK_EXT_memory_budget. A memory type, as returned in VkMemoryType, belongs to certain heap and offers a “view” to that heap with certain properties, represented by VkMemoryPropertyFlags. Most notable are:

  • VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, which always matches flag VK_MEMORY_HEAP_DEVICE_LOCAL_BIT in the heap it belongs to, informs that the memory is local to the “device” (the GPU in Vulkan terminology). It doesn’t change what you can or cannot do with this memory type. If creating certain buffers or textures was possible only in GPU and not CPU memory, it would be expressed by appropriate bits not set in VkMemoryRequirements::memoryTypeBits. DEVICE_LOCAL flag set in a memory type is just a hint for us that resources created in that memory will probably work faster when accessed on the GPU.
  • VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT. Unlike previous one, this flag changes a lot. It means that you can call vkMapMemory on VkDeviceMemory objects allocated from this type and get a raw, CPU-side pointer to its data. In short: you can access this memory directly from the CPU, without a need to launch a Vulkan command for explicit transfer, like vkCmdCopyBuffer.
  • VK_MEMORY_PROPERTY_HOST_CACHED_BIT, which can occur only on memory types that are also HOST_VISIBLE. This one again is just a hint for us. It changes nothing we can or cannot do with that memory. It just informs us that access to this memory will go through cache (from CPU perspective). As a result, a memory type with this flag should be fast to write, read, and access randomly via mapped pointer. What the lack of this flag means is not clearly defined. Such memory may represent system RAM or even video RAM, but a common meaning (at least on PC) is that accesses are then uncached but write-combined from CPU perspective, which means we should only write to it sequentially (best to do memcpy), never read from it or jump over random places, as it may be slow.

Theoretically, a good algorithm as recommended by the spec, to search for the first memory type meeting your requirements, should be robust enough to work on any GPU, but, if you want make sure your application works correctly and efficiently on a variety of graphics hardware available on the market today, you may need to adjust your resource management policy to a specific set of memory heaps/types found on a user’s machine. To simplify this task, below I present common patterns that can be observed on the list of Vulkan memory heaps and types on various GPUs, on Windows PCs. I also describe their meaning and consequences.

Before I start, I must show you website vulkan.gpuinfo.org, if you don’t already know it. It is a great database of all Vulkan capabilities, features, limits, and extensions, including memory heaps/types, cataloged from all kinds of GPUs and operating systems.

1. The Intel way

Intel manufactures integrated graphics (although they also released a discrete card recently). As GPU integrated into CPU, it shares the same memory. It then makes sense to expose following memory types in Vulkan (example: Intel(R) UHD Graphics 600):

Heap 0: DEVICE_LOCAL
Size = 1,849,059,532 B
Type 0: DEVICE_LOCAL, HOST_VISIBLE, HOST_COHERENT
Type 1: DEVICE_LOCAL, HOST_VISIBLE, HOST_COHERENT, HOST_CACHED

What it means: The simplest and the most intuitive set of memory types. There is just one memory that represents system RAM, or a part of it that can be used for graphics resources. All memory types are DEVICE_LOCAL, which means GPU has fast access to them. They are also all HOST_VISIBLE – accessible to the CPU. Type 0 without HOST_CACHED flag is good for writing through mapped pointer and reading by the GPU, while type 1 with HOST_CACHED flag is good for writing by the GPU commands and reading via mapped pointer.

How to use it: You can just load your resources directly from disk. There is no need to create a separate staging copy, separate GPU copy, and issue a transfer command, like we do with discrete graphics cards. With images you need to use VK_IMAGE_TILING_OPTIMAL for best performance and so you need to vkCmdCopyBufferToImage, but at least for buffers you can just map them, fill the content via CPU pointer and then tell GPU to use that memory – an approach which can save both time and precious bytes of memory.

Read full entry > | Comments | #rendering #vulkan Share

# Book Review: C++ Lambda Story

Wed
13
Jan 2021

C++ Lambda Story book

Courtesy its author Bartłomiej Filipek, I was given an opportunity to read a new book “C++ Lambda Story”. Here is my review.

A book with 149 pages would be too short to teach entire C++, even in its basics, but this one is about a specific topic, just one feature of the language – lambda expressions. For this, it may seem like even too many, depending on how deeply the author goes into the details. To find out, I read the book over 3 evenings.

The book starts from the very beginning – the description of the problem in C++98/03, where lambdas were not available in the language, so we had to write functors - structs or classes with overloaded operator(). Then he moves on to describing lambdas as introduced in C++11, their syntax and all the features. Every feature described is accompanied by a short and clear example. These examples also have links to the same code available in online compilers like Wandbox, Compiler Explorer, or Coliru.

In the next chapters, the author describes what has been added to lambdas in new language revisions – C++14, C++17, C++20, and how other new features introduced to the language interact with lambdas – e.g. consteval and concepts from C++20.

Not only features of lambda expressions are described but also some quirks that every programmer should know. What if a lambda outlives the scope where it was created? What may happen when it is called on multiple threads in parallel? The book answers all these questions, illustrating each with a short, yet complete example.

Sometimes the author describes tricks that may seem too sophisticated. It turns out you can make a recursive call of your lambda, despite not directly supported, by defining a helper generic lambda inside your lambda. You can also derive your class from the implicit class defined by a lambda, or many of them, to have your operator() overloaded for different parameter types.

Your tolerance to such tricks depends on whether you are a proponent of “modern C++” and using all its features, or you prefer simple code, more like “C with classes”. Nonetheless, lambda expressions by themselves are a great language feature, useful in many cases. The book mentions some of these cases, as it ends with a chapter “Top Five Advantages of C++ Lambda Expressions”.

Overall, I like the book a lot. It describes this specific topic of lambda expressions in C++ comprehensively, but still in a concise and clear way. I recommend it to every C++ programmer. Because it is not very long, you shouldn’t hesitate with reading it like it was a new project you need to find time for. You should rather treat it like an additional, valuable learning resource, as if you read several blog articles or watched some YouTube videos about a topic of your interest.

You can buy the book on Leanpub. I also recommend visiting authors blog: C++ Stories (new blog converted from bfilipek.com). See also my review of his previous book: “C++17 in Detail”. There is a discount for “C++ Lambda Story” ($5.99 instead of $8.99) here, as well as for both books ($19.99 instead of $23.99) here - valid until the end of February 2021.

Comments | #books #c++ Share

# States and Barriers of Aliasing Render Targets

Tue
22
Dec 2020

In my recent post “Initializing DX12 Textures After Allocation and Aliasing” I said that when you use Direct3D 12, you have a render-target or depth-stencil texture “B” aliasing with some other resources “A” in memory and want to use it, you need to initialize it properly every time: 1. issue a barrier of type D3D12_RESOURCE_BARRIER_TYPE_ALIASING, 2. fully initialize its content using Clear, Discard, or Copy, so that garbage data left from previous usage of the memory as overwritten.

That is actually not the full story. There is a third thing that you need to take care of: a transition barrier for this resource. This is because functions: DiscardResource, ClearRenderTargetView, ClearDepthStencilView require a texture to be in a correct state – D3D12_RESOURCE_STATE_RENDER_TARGET or D3D12_RESOURCE_STATE_DEPTH_WRITE. A question arises here: How does resource state tracking interact with memory aliasing?

If you look at error and warning messages from D3D Debug Layer or PIX, you notice that states are tracked per resource, no matter if they alias in memory. This gives us some clue of how we should handle it correctly to tame the Debug Layer and make things formally correct, but it is still not that obvious where to put the barrier. Let's assume the last usage of our texture B is D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE – the texture is used for sampling. Where should we put the barrier to transition it to D3D12_RESOURCE_STATE_RENDER_TARGET, required to do the initializing Discard or Clear after aliasing? I can see 3 solutions here:

1. None at all – skip this barrier, just do the Discard/Clear as if the texture was in the correct RT/DS state. It obviously generates Debug Layer error, but it seems to work fine. My thinking about why this may be fine is: If the Discard/Clear requires the texture to be in the correct RT/DS state and it completely overwrites its content, then it shouldn’t care what initial state the texture is in. Its data are garbage anyway. It will most probably leave the texture initialized properly as for RT/DS state, no matter what.

2. Between the aliasing barrier and Discard/Clear. This would please the Debug Layer, but I have some doubts whether it is correct or necessary. It may not be correct because if the barrier does some transformations of the data, it would transform garbage data left after aliasing, which could be an undefined behavior leading to some persisting corruptions or even to GPU crash. It may not be necessary because we are going to re-initialize the whole content of the texture in a moment anyway with Discard/Clear.

3. After finished working with the texture in the previous frame. The idea is to leave each RT/DS texture that can alias with some other resource in a state that makes it ready for the initializing Discard/Clear next time we want to grab it from the common memory. This will work and will be formally correct, but it also sounds like a well-known and sub-optimal idea of reverting textures to some “base state” after each use. Additional problem appears when your last usage of the texture occurs on the compute queue, because there you cannot transition it to RENDER_TARGET or DEPTH_WRITE state.

Conclusion: While this is a tricky question that has no one simple answer, I would recommend to use [3] if you want to be 100% formally correct or [1] if you want maximum efficiency.

Comments | #directx #rendering Share

# CasCmdLine - Few Technical Details

Sun
15
Nov 2020

As part of my job, I've written a small console program CasCmdLine with a purpose of testing AMD's FidelityFX Contrast Adaptive Sharpening (CAS) shader on an image from disk, e.g. a screenshot from your game. You can find binary and source code on github.com/GPUOpen-Effects/FidelityFX-CAS/, CasCmdLine subdirectory. See also the blog post and tutorial about it to learn about its features and the syntax of supported command line parameters.

Update 2021-07-25: Links above are broken, as the tool became a standalone repository, which now supports CAS as well as the new FidelityFX Super Resolution (FSR) shader: FidelityFX-CLI.

Here I would like to point to three aspects of its implementation that allowed me to make it small and simple. They might interest you if you are a C++/Windows/graphics programmer.

1. To execute a compute shader like CAS, I needed to use a graphics API - Direct3D 11, 12, or Vulkan, as all of them are supported by the effect. I chose D3D11 as the easiest one. What’s interesting is that the API is used without creating a window or swap chain. There are no render frames, no calls to Present, no depth-stencil texture, no message loop. D3D11CreateDevice is used to initialize DirectX rather than D3D11CreateDeviceAndSwapChain. The program just initializes all necessary machinery, does its job, and exits. It is perfectly possible to write a program this way, which may be a good idea for any application that needs to do some GPU-accelerated computations rather than interactive graphics like games do. I suspect this mode of operation would work even on server systems that have no monitor attached, as long as there is a GPU and graphics driver installed. See file “CasCmdLine.cpp” to find out how this is implemented.

2. There is always a question in every graphics app about how to load shaders. Surely, compiling them from HLSL/GLSL source code is the worst option, as it requires the user to have shader compiler installed or attach the compiler to your program. It also takes more time than loading shaders precompiled to the intermediate binary format. But even in this format they need to be loaded from somewhere, whether individual files or some custom compressed archive, like games tend to do. In CasCmdLine I did it differently. I attached precompiled shaders directly to the program binary. To do that, I used command line parameter /Fh of the "fxc.exe" shader compiler, like this:

fxc.exe /T cs_5_0 /E mainCS /O3 /Fh CompiledShader.h ShaderSource.hlsl

Instead of a binary file, the compiler called with this parameter generates a text file in a format compatible with C/C++ that contains the data of the compiled shader in form of an array, like this:

#if 0
Shader metadata and assembly is put here, as commented out code...
#endif

const BYTE g_mainCS[] =
{
     68,  88,  66,  67,   8, 233, 
     11,  94, 141, 165,  83, 251, 
     50, 166, 219, 219,  84, 109, 
    128,  23,   1,   0,   0,   0, 
    (...)
};

Such file can be #include-d in a C++ code and used to create a D3D shader directly from this data. See files "Shaders/CompiledShader_*.h" to find out how they really look like.

3. The program needs to load and save image files in JPEG, PNG, and preferably other formats. Of course, these formats are very complex, support various pixel formats, involve some compression algorithms etc., so handling them manually would require an enormous amount of work. There are libraries for this, like the official libpng and libjpeg for handling PNG/JPEG formats, respectively, or a multi-format, multi-platform library DevIL.

If the developed program is intended only for Windows, it turns out that no third-party libraries are needed. Native Windows API contains a part called Windows Imaging Component (WIC) that can load and save image files in many formats, including BMP, PNG, JPEG, TIFF, GIF, ICO, WMP, DDS. It can also do some image operations, like rescaling. It is a COM API that involves interfaces like IWICImagingFactory, IWICBitmapDecoder, IWICBitmapFrameDecode, and many more. This is what I used in the program described here. I might write a tutorial about WIC someday... For now, I would just say if you figure out its API, it looks quite powerful. It might be useful for any graphics Windows app that needs to load textures. It is also what Microsoft's DirectXTex library uses under the hood.

Comments | #rendering #directx Share

Older entries >

Twitter

Pinboard Bookmarks

LinkedIn

LinkedIn

Blog Tags

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2021