Entries for tag "windows", ordered from most recent. Entry count: 52.
# There is a way to query GPU memory usage in Vulkan - use DXGI
Thu
15
Nov 2018
In my GDC 2018 talk “Memory management in Vulkan and DX12” (slides freely available, video behind GDC Vault paywall) I said that in Direct3D 12 you can query for the exact amount of GPU memory used and available, while in Vulkan there is no way to do that, so I recommend to just query for memory capacity (VkMemoryHeap::size
) and limit your usage to around 80% of it. It turns out that I wasn’t quite right. If you code for Windows, there is a way to do this. I assumed that the mentioned function IDXGIAdapter3::QueryVideoMemoryInfo
is part of Direct3D 12 interface, while it is actually part of DirectX Graphics Infrastructure (DXGI). This is a more generic, higher level Windows API that allows you to enumerate adapters (graphics cards) installed in the system, query for their parameters and outputs (monitors) connected to them. Direct3D refers to this API, but it’s not the same.
Key question is: Can you use DXGI to query for GPU memory usage while doing graphics using Vulkan, not D3D11 or D3D12? Would it return some reasonable data and not all zeros? Short answer is: YES! I’ve made an experiment - I wrote a simple app that creates various Vulkan objects and queries DXGI for memory usage. Results look very promising. But before I move on to the details, here is a short primer of how to use this DXGI interface, for all non-DirectX developers:
1. Use C++ in Visual Studio. You may also use some other compiler for Windows or other programming language, but it will be probably harder to setup.
2. Install relatively new Windows SDK.
3. #include <dxgi1_4.h>
and <atlbase.h>
4. Link with “dxgi.lib”.
5. Create Factory object:
IDXGIFactory4* dxgiFactory = nullptr; CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory));
Don’t forget to release it at the end:
dxgiFactory->Release();
6. Write a loop to enumerate available adapters. Choose and remember suitable one.
IDXGIAdapter3* dxgiAdapter = nullptr; IDXGIAdapter1* tmpDxgiAdapter = nullptr; UINT adapterIndex = 0; while(m_DxgiFactory->EnumAdapters1(adapterIndex, &tmpDxgiAdapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; tmpDxgiAdapter>GetDesc1(&desc); if(!dxgiAdapter && desc.Flags == 0) { tmpDxgiAdapter->QueryInterface(IID_PPV_ARGS(&dxgiAdapter)); } tmpDxgiAdapter->Release(); ++adapterIndex; }
At the end, don’t forget to release it:
dxgiAdapter->Release();
Please note that using new version of DXGI interfaces like DXGIFactory4
and DXGIAdapter3
requires some relatively new version (I’m not sure which one) of both Windows SDK on developer’s side (otherwise it won’t compile) and updated Windows system on user’s side (otherwise function calls with fail with appropriate returned HRESULT
).
7. To query for GPU memory usage at the moment, use this code:
DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; dxgiAdapter->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info);
There are two possible options:
DXGI_MEMORY_SEGMENT_GROUP_LOCAL
is the memory local to the GPU, so basically video RAM.DXGI_MEMORY_SEGMENT_GROUP_NON_LOCAL
is the system RAM.Among members of the returned structure, the most interesting is CurrentUsage
. It seems to precisely reflect the use of GPU memory - it increases when I allocate a new VkDeviceMemory
object, as well as when I use some implicit memory by creating other Vulkan resources, like a swap chain, descriptor pools and descriptor sets, command pools and command buffers, query pools etc.
Other DXGI features for video memory - callback for budget change notification (IDXGIAdapter3::RegisterVideoMemoryBudgetChangeNotificationEvent
) and reservation (IDXGIAdapter3::SetVideoMemoryReservation
) may also work with Vulkan, but I didn’t check them.
As an example, on my system with GPU = AMD Radeon RX 580 8 GB and 16 GB of system RAM, on program startup and before any Vulkan or D3D initialization, DXGI reports following data:
Local:
Budget=7252479180 CurrentUsage=0
AvailableForReservation=3839547801 CurrentReservation=0
Nonlocal:
Budget=7699177267 CurrentUsage=0
AvailableForReservation=4063454668 CurrentReservation=0
8. You may want to choose correct DXGI adapter to match the physical device used in Vulkan. Even on the system with just one discrete GPU there are two adapters reported, one of them being software renderer. I exclude it by comparing desc.Flags == 0
, which means this is a real, hardware-accelerated GPU, not DXGI_ADAPTER_FLAG_REMOTE
or DXGI_ADAPTER_FLAG_SOFTWARE
.
Good news is that even when there are more such adapters in the system, there is a way to match them between DXGI and Vulkan. Both APIs return something called Locally Unique Identifier (LUID). In DXGI it’s in DXGI_ADAPTER_DESC1::AdapterLuid
. In Vulkan it’s in VkPhysicalDeviceIDProperties::deviceLUID
. They are of different types - two 32-bit numbers versus array of bytes, but it seems that simple, raw memory compare works correctly. So the way to find DXGI adapter matching Vulkan physical device is:
// After obtaining VkPhysicalDevice of your choice: VkPhysicalDeviceIDProperties physDeviceIDProps = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES }; VkPhysicalDeviceProperties2 physDeviceProps = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2 }; physDeviceProps.pNext = &physDeviceIDProps; vkGetPhysicalDeviceProperties2(physicalDevice, &physDeviceProps); // While enumerating DXGI adapters, replace condition: // if(!dxgiAdapter && desc.Flags == 0) // With this: if(memcmp(&desc.AdapterLuid, physDeviceIDProps.deviceLUID, VK_LUID_SIZE) == 0)
Please note that function vkGetPhysicalDeviceProperties2
requires Vulkan 1.1, so set VkApplicationInfo::apiVersion = VK_API_VERSION_1_1
. Otherwise the call results in “Access Violation” error.
In my next blog post, I present detailed results of my experiment with DXGI used with Vulkan application, tested on 2 different GPUs.
Update 2019-03-15: Khronos released Vulkan extension equivalent to this DXGI functionality: VK_EXT_memory_budget.
Comments | #directx #vulkan #graphics #windows Share
# When integrated graphics works better
Sat
17
Feb 2018
In RPG games the more powerful your character is, the more tough and scary are the monsters you have to fight. I sometimes get a feeling that the same applies to real life - bugs you meet when you are a programmer. I recently blogged about the issue when QueryPerformanceCounter call takes long time. I've just met another weird problem. Here is my story:
I have Lenovo IdeaPad G50-80 (80E502ENPB) laptop. It has switchable graphics: integrated Intel i7-5500U and dedicated AMD Radeon R5 M330. Of course I used to choose AMD dedicated graphics, because it's more powerful. My application is a music visualization program. It renders graphics using Direct3D 11. It uses one ID3D11Device
object and one thread for rendering, but two windows displayed on two outputs: output 1 (laptop screen) contains window with GUI and preview, while output 2 (projector connected via VGA or HDMI) shows main view using borderless, topmost window covering whole screen (but not real fullscreen as in IDXGISwapChain::SetFullscreenState
). I tend to enable V-sync on output 1 (IDXGISwapChain::Present SyncInterval = 1
) and disable it on output 0 (SyncInterval = 0
). My rendering algorithm looks like this:
Loop over frames: Render scene to MainRenderTarget Render MainRenderTarget to OutputBackBuffer, covering whole screen Render MainRenderTarget to PreviewBackBuffer, on a quad Render ImGui to PreviewBackBuffer OutputSwapChain->Present() PreviewSwapChain->Present()
So far I had just one problem with it: my framerate decreased over time. It used to drop very quickly after launching the app from 60 to 30 FPS and stabilize there, but after few hours it was steadily decreasing to 20 FPS or even less. I couldn't identify the reason for it in my code, like a memory leak. It seemed to be related to rendering. I could somehow live with this issue - low framerate was not that noticable.
Suddenly this Thursday, when I wanted to test new version of the program, I realized it hangs after around a minute from launching. It was a strange situation in which the app seemed to be running normally, but it was just not rendering any new frames. I could see it still works by inspecting CPU usage and thread list with Process Hacker. I could minimize its windows or cover them by other windows and they preserved their content after restoring. I even captured trace in GPUView, only to notice that the app is filling DirectX command queue and AMD GPU is working. Still, nothing was rendered.
That was a frightening situation for me, because I need to have it working for this weekend. After I checked that restarting app or the whole system doesn't help, I tried to identify the cause and fix it in various ways:
1. I thought that maybe there is just some bug in the new version of my program, so I launched the previous version - one that successfully worked before, reaching more than 10 hours of uptime. Unfortunately, the problem still occured.
2. I thought that maybe it's a bug in the new AMD graphics driver, so I downloaded and installed previous version, performing "Clean install". It didn't help either.
3. In desperation, I formatted whole hard drive and reinstalled operating system. I planned to to it anyway, because it was a 3-year-old system, upgraded from Windows 8 and I had some other problems with it (that I don't describe here because they were unrelated to graphics). I installed the latest, clean Windows 10 with latest updates and all the drivers. Even that didn't solve my problem. The program still hung soon after every launch.
I finally came up with an idea to switch my app to using Intel integrated graphics. It can be done in Radeon Settings > "Switchable Graphics" tab. In a popup menu for a specific executable, "High Performance" means choosing dedicated AMD GPU and "Power Saving" means choosing integrated Intel GPU. See article Configuring Laptop Switchable Graphics... for details.
It solved my problem! The program not only doesn't hang any longer, but it also maintains stable 60 FPS now (at least it did during my 2h test). Framerate drops only when there is a scene that blends many layers together on a FullHD output - apparently this GPU cannot keep up with drawing so many pixels per second. Anyway, this is the situation where using integrated Intel graphics turns out work better than a faster, dedicated GPU.
I still don't know what is the cause of this strange bug. Is it something in the way my app uses D3D11? Or is it a bug in graphics driver (one of the two I need to have installed)? I'd like to investigate it further when I find some time. For now, I tend to believe that:
- The only thing that might have changed recently and break my app was some Windows updated pushed by Microsoft.
- The two issues: the one that I had before with framerate decreasing over time and the new one with total image freeze are related. They may have something to do with switchable graphics - having two different GPUs in the system, both enabled at the same time. I suspect that maybe when I want to use Radeon, the outputs (or one of them) are connected to Intel anyway, so the image needs to be copied and synchronized with Intel driver.
Update 2018-02-21: Later after I published this post, I tried few other things to fix the problem. For example, I updated AMD graphics driver to latest version 18.2.2. It didn't help. Suddently, the problem disappeared as mysteriously as it appeared. It happened during a single system launch, without a restart. My application was hunging, and later it started working properly. The only thing that I can remember doing in between was downloading and launching UIforETW - a GUI tool for capturing Event Tracing for Windows (ETW) traces, like the ones for GPUView. I know that it automatically installs GPUView and other necessary tools on first launch, so that may have changed something in my system. Either way, now my program works on AMD graphics without a hang, reaching few hours of uptime and maintaining 60 FPS, which only sometimes drops to 30 FPS, but it also go back up.
Comments | #directx #gpu #windows Share
# How to view CHM files on high DPI monitor?
Tue
16
Jan 2018
Using monitors with high resolution like 4K, where you need to set DPI scaling other than 100%, is pain in the *** in Windows - it causes trouble with many applications. That’s why I want to stick with FullHD monitors as long as possible. One of the apps that doesn’t scale with DPI is Microsoft’s own viewer for CHM files (Microsoft Compiled HTML Help). CHM is a file format commonly used for software help/documentation. It has been introduced with Windows 98 as a replacement for old HLP (WinHelp). Although we read almost everything online these days, some programs and libraries still use it.
A CHM document is completely unreadable on 4K monitor with 200% DPI scaling:
I searched Google for solution. Some sources say there is Font button in the app’s toolbar that lets increase font size, but it doesn’t work in my case. This page says that availability of this button can be configured when creating CHM file. This page mentions some alternative readers for CHM format (Firefox plugin, as well as standalone app).
I know that apps which misbehave under high DPI can be configured to work in “compatibility mode”, when Windows just rescales their window. I found out that executable for this default CHM reader is c:\Windows\hh.exe, but I couldn’t find this setting in Properties of this file. I thought that maybe it’s because the file is located in system directory and owned by the system with insufficient privileges for administrators and normal users, so I came up with following solution that actually works:
D:\Soft\hh.exe "d:\AGS_SDK-5.1.1\ags_lib\doc\amd_ags.chm"
-> document should open in the browser with font size good for reading. Every pixel is just scaled to 2x2 pixels.# When QueryPerformanceCounter call takes long time
Sun
03
Dec 2017
QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to QueryPerformanceCounter
function itself taking too much time. Below I describe what I eventually found out.
It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.
I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function QueryPerformanceCounter
.
This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).
I wrote a simple C++ program that tests how long it takes to execute QueryPerformanceCounter
function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: QueryPerformanceCounterTest.zip. Running this test on two different machines gave following results:
CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:
> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)
CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):
> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)
As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.
I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.
My conclusions from this "adventure" are twofold:
QueryPerformanceCounter
to execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.
Update 2018-12-17: Blog post "Ryzen Threadripper for Game Development – optimising UE4 build times" on GPUOpen.com, section "HPET timer woes", seems to be related to this topic.
Comments | #windows #optimization #hardware #winapi Share
# Lost clicks and key presses on low FPS
Sun
22
Oct 2017
There is a problem with handling input from mouse and keyboard in games and other interactive applications that I just solved. I would like to share my code for the solution. When your app uses a loop that constantly calculates and renders frames, like games usually do, it may seem natural to just read current state of every mouse and keyboard key (whether it's down or up) on each frame. You may then caculate derived information, like whether a button has just been pressed on released, by comparing new state to the state from previous frame. This is how Dear ImGui library works. So first solution could look like this:
void UpdateFrame() { // Fill ImGui::GetIO().DeltaTime, KeyCtrl, KeyShift, KeyAlt etc. ImGui::GetIO().MouseDown[0] = (GetKeyState(VK_LBUTTON) & 0x8000) != 0; ImGui::GetIO().MouseDown[1] = (GetKeyState(VK_RBUTTON) & 0x8000) != 0; ImGui::GetIO().MouseDown[2] = (GetKeyState(VK_MBUTTON) & 0x8000) != 0; for(uint32_t i = 0; i < 512; ++i) ImGui::GetIO().KeysDown[i] = (GetKeyState(i) & 0x8000) != 0; ImGui::NewFrame(); if(ImGui::IsKeyPressed('A')) // Do something... }
There is one problem with this approach. If user presses and releases a key for a very short time, so that both press and release happens between two frame, it will go unnoticed. This is very annoying. It happens especially when:
First step towards solving this is to react to "real" events that are sent by the operating system:
Comments | #gui #winapi #windows Share
# How to change display mode using WinAPI?
Sat
11
Mar 2017
If you write a graphics application or a game, you may want to make it fullscreen and set specific screen resolution. In DirectX there are functions for that, but if you use OpenGL or Vulkan, you need another way to accomplish that. I've researched the topic recently and I've found that Windows API supports enumerating display devices and modes with functions: EnumDisplayDevices
, EnumDisplaySettings
, as well as changing mode with function ChangeDisplaySettingsEx
. It's a programatic access to more or less the same set of features that you can access manually by going to "Display settings" system window.
I've prepared an example C program demonstrating how to use these functions:
DisplaySettingsTest - github.com/sawickiap
First you may want to enumerate available Adapters. To do this, call function EnumDisplayDevices
multiple times. Pass NULL
as first parameter (LPCWSTR lpDevice
). As the second parameter pass subsequent DWORD
Adapter index, starting from 0. Enumeration should continue as long as the function returns BOOL
nonzero. When it returns zero, it means there are no more Adapters and that Adapter with given index and any higher index could not be retrieved.
For each successfully retrieved Adapter, DISPLAY_DEVICE
structure is filled by the function. It contains following members:
WCHAR DeviceName[32]
- string with name of the Adapter, like "\\.\DISPLAY1".WCHAR DeviceString[128]
- string with more user-friendly name of the Adapter, like "AMD Radeon (TM) RX 480".DWORD StateFlags
- various flags, like DISPLAY_DEVICE_ACTIVE
if the device is on, or DISPLAY_DEVICE_PRIMARY_DEVICE
if this is the primary device.There is a second level: Adapters contain Display Devices. To enumerate them, use the same function EnumDisplayDevices
, but this time pass Adapter DeviceName
as first parameter. This way you will enumerate Display Devices inside that Adapter, described by the same structure DISPLAY_DEVICE
. For example, my system returns DeviceName
= "\\.\DISPLAY1\Monitor0", DeviceString
= "Generic PnP Monitor".
The meaning and the difference between "Adapter" and "Display Device" is not fully clear to me. You may think that Adapter is a single GPU (graphics card), but it turns out not to be the case. I have a single graphics card and yet my system reports 6 Adapters, each having 0 or 1 Display Device. That can mean Adapter is more like a single monitor output (e.g. HDMI, DisplayPort, VGA) on the graphics card. This seems true unless you have two monitors running in "Duplicate" mode - then two Display Devices are reported inside one Adapter.
Then there is a list of supported Display Settings (or Modes). You can enumerate them in similar fashion using EnumDisplaySettings
function, which fills DEVMODE
structure. It seems that Modes belong to an Adapter, not a Display Device, so as first parameter to this function you must to pass DISPLAY_DEVICE::DeviceName
returned by EnumDisplayDevices(NULL, ...)
, not EnumDisplaySettings(adapter.DeviceName, ...)
. The structure is quite complex, but the function fills only following members:
DWORD dmPelsWidth, dmPelsHeight
- resolution, in pixels.DWORD dmBitsPerPel
- bits per pixel (all Modes have 32 in my case).DWORD dmDisplayFrequency
- refresh rate, in Hz.DWORD dmDisplayFlags
- additional flags, like DM_INTERLACED
for interlaced mode.I have a single graphics card (AMD Radeon RX 480) with two Full HD (1920 x 1080) monitors connected. You can see example output of the program from my system here: ExampleOutput.txt.
To change display mode, use function ChangeDisplaySettingsEx
.
LPCTSTR lpszDeviceName
), pass DeviceName
of the chosen Adapter (again, not Display Device!).DEVMODE *lpDevMode
), pass structure filled with desired Display Settings. You can fill it by yourself, but Microsoft recommends to pass the copy of the structure as it was retrieved from function EnumDisplaySettings
.DWORD dwFlags
), you can pass various flags, e.g. whether new settings should be saved in the registry.The function returns DISP_CHANGE_SUCCESSFUL
if display mode was successfully changed and one of other DISP_CHANGE_*
constants if it failed.
To restore original display mode, call the function like this:
ChangeDisplaySettingsEx(targetDeviceName, NULL, NULL, 0, NULL);
Unfortunately, display mode changed in the way described here is not automatically restored after user switches to some other application (e.g. using Alt+Tab), like in DirectX fullscreen mode, but you can handle it yourself. Good news is that if you pass CDS_FULLSCREEN
flag to ChangeDisplaySettingsEx
, the previous mode is automatically restored by the system when your application exits or crashes.
Comments | #windows #graphics Share
# Handy Global Hotkeys for Music Control
Sat
28
Jan 2017
I now have a keyboard without "media" keys, so I came up with a set of global hotkeys that I've set up in my music player and consider quite handy. ("Global" means they work in the entire system, also when player application is not in focus.) I can't remember where do they come from, but I think it's possible that I've seen them somewhere. These are:
My favorite music player is foobar2000. To setup new global hotkeys there:
I'm sure you can do this in other music players as well, like AIMP.
Comments | #music #windows Share
# 32-bit Applications on 64-bit Windows
Wed
30
Nov 2016
As you probably know, the processor, operating system and applications on a PC may be 32-bit or 64-bit. CPU-s we have in our computers are 64-bit for a long time already. Windows XP tended to be used in 32-bit version, but now I can see most people use Windows 7/8/8.1/10 in 64-bit version as well. Only apps still exist in various forms. Shell extensions and drivers must match the version of the operating system, but other programs can be used in 32-bit version even on 64-bit system. Different combinations are possible:
We may ask a question about where does Windows store files and settings of such apps. It is especially interesting as the answer is very counter-intuitive. Location for (2) – 64-bit apps on 64-bit Windows – may contain “32” in its name (because of backward compatibility), while location for (3) – 32-bit apps on 64-bit Windows – may contain “64” (because of the name WoW64). Here is the list of such locations:
Program Files folder:
System folder:
Registry key:
(Same applies to HKEY_CURRENT_USER.)
See also: Windows 64-bit: The 'Program Files (x86)' and 'SysWOW64' folders explained