Entries for tag "hardware", ordered from most recent. Entry count: 21.
# When QueryPerformanceCounter call takes long time
QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to
QueryPerformanceCounter function itself taking too much time. Below I describe what I eventually found out.
It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.
I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function
This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).
I wrote a simple C++ program that tests how long it takes to execute
QueryPerformanceCounter function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: QueryPerformanceCounterTest.zip. Running this test on two different machines gave following results:
CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:
> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)
CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):
> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)
As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.
I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.
My conclusions from this "adventure" are twofold:
QueryPerformanceCounterto execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.
Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.
Update 2018-12-17: Blog post "Ryzen Threadripper for Game Development – optimising UE4 build times" on GPUOpen.com, section "HPET timer woes", seems to be related to this topic.
# What is Samsung phone doing to photos?! (sharpening a lot)
I now use Samsung Galaxy S7 smartphone and I'm quite happy with it, except for the camera. I noticed that all the photos taken with it look bad. There is clearly something wrong with them. When I zoomed in, I noticed that the device applies insane amount of sharpening. Every photo looks like it was first filtered by bilateral filter (a kind of edge-preserving blur that is used for noise reduction) and then sharpening with intensity set to maximum, which causes annoying ringing artifacts around the edges.
I decided to make an experiment. I gathered all the devices I had access to that can take photos and I brought them to a place where I could photograph a building that has many sharp edges, plus some tram cables. It was the middle of a sunny day, so lighting brightness and contrast was high and devices didn't have a reason to apply too much processing to the photos taken. I configured all of them to fully automatic mode, maximum resolution and JPEG as output format (except Canon camera, where I forgot about it, so I actually made CR2 RAW that I later converted to JPEG). Devices I used for comparison were (click on each link to access original photo file):
When you zoom in to the building, you can clearly see that both Samsung phones applied very strong sharpening. It turns out this is a known problem. There is a discussion on Reddit, as well as YouTube video about it. Sony phone and DSLR don't have this effect.
Samsung Galaxy S6:
Samsung Galaxy S7:
Sony Xperia Z2:
Canon PowerShot G7X Mark II:
What's interesting is that the Canon camera also applied some sharpening, and did it even in RAW! (How can they call it RAW then?!) Fortunately in this camera it can be disabled: While in photo shooting mode, press MENU button, go to tab 6, select "Picture Style" and set it to "Neutral", so that the first parameter in the sequence of numbers (meaning "Sharpness Strength") is 0.
In Samsung phones this filter cannot be disabled :( The only way to take pictures without it is to use RAW, where it's not applied. To do it, while in photo shooting mode: swipe left, choose "Professional", enter configuration, select "Image size" and there enable "Save RAW and JPEG files". You need to enter "Professional" mode every time you want to take a photo. Then of course you need to process the image on your PC and convert it to JPEG, e.g. in Adobe Lightroom or other similar program, but there you can decide how much sharpening do you need (or none).
# How to Boost Your RAM to Declared 3000 MHz?
I recently upgraded some components of my desktop PC. I was suprised to discover that RAM doesn't work with declared speed of 3000 MHz. Here is the solution I've found to this problem.
Back in the days of DOS I can remember having to set up everything manually, like selecting IRQ number and DMA channel to make sound working in games. But today, in the era of Plug&Play, assembling a computer is easy and everything works automatically. Almost everything...
Although I found that both my new motherboard (Gigabyte GA-Z170-HD3P) and RAM modules (Corsair Vengeance LPX DDR4, 32GB(2x16GB), 3000MHz, CL15 (CMK32GX4M2B3000C15)) support 3000 MHz frequency, it worked on 2133 MHz. Motherboard specification says: "Support for DDR4 3466(O.C.) /3400(O.C.) /3333(O.C.) /3300(O.C.) /3200(O.C.) /3000(O.C.) /2800(O.C.) /2666(O.C.) /2400(O.C.) /2133 MHz memory modules", while specification of the memory has "3000MHz" even in its title. What happened? Motherboard spec calling all the frequencies higher than 2133 "OC" (like in "overclocking") gave me some clue that it is not standard.
After few minutes of searching on Google, I've learn about a thing called XMP (Extreme Memory Profile). It's an extension to SPD (Serial Presence Detect) - a protocol used by RAM modules to report to the motherboard what parameters do they support. I then checked in the specs that my motherboard, as well as my memory support XMP 2.0.
So what I finally did was:
That's all! Fortunately I didn't need to manually set any frequency, timings or voltage of my Skylake processor, memory or any other components, like overclockers do. With all the other settings left to default "Auto", the computer still works stable and RAM now runs with 3000 MHz frequency.
By the way: Please don't be worried when you see only half of this frequency in HWiNFO64 tool as "Memory - Current Memory Clock". All in all we are talking about DDR here, which means "Double Data Rate", so the real frequency is just that, but data is transferred on both rising and falling edge of the clock signal.
Warning! It turned out that enabling XMP on my machine makes it working very unstable. Firefox, The Witcher 3 and basically all memory-intensive applications crashed randomly. So if you experience similar issues, you better disable XMP or, if you know any better solution, please post a comment about it.
# Good Buy: ADATA DashDrive Elite UE700 128GB USB3.0
When I was browsing online shop, I was shocked to see that the market of USB flash memory sticks ("pendrives") changed so much recently. I have many pendrives that I was given or won as a prize somewhere, mostly 2-8 GB. My biggest pendrive was 32 GB that I bought several years ago for a very occasional price, as for that time period. Now I can see that the most reasonable choice (for money that I want to spend on a pendrive) is 128 GB!
So I started searching for a model to buy. Sure pendrive is not so complex as a laptop or a car - it's just a small accessory, but anyway I wanted to make a good choice, so I decided to look for following criteria:
Finally I found this one and I bought it for myself, as well as for my family as Christmas present: ADATA DashDrive Elite UE700 128GB USB3.0.
I'm quite happy with it. Transfers that I actually measured by writing and then reading one big file from/to SSD disk are: 110 MB/s write, 181 MB/s read, which is enough to write a 2 GB file in just 18 seconds and read it in 11 seconds.
(This article is not sponsored. It's just my personal recommendation.)
Important Update 2015-06-04: I have two of these pendrives and after half year of using them (not very much - mostly for backup and moving files between computers, once every few days) they both started showing errors and losing files! So eventually I do not recomment this model!!!
# What do we have from benchmarks?
There was this case some time ago about some graphics vendors cheating in Futuremark benchmark (see this). They basically detected this particular application and raised frequency to increase performance and gain higher score. So some devices have been delisted from the Best Mobile Devices list for cheating and they published this document: Benchmark Rules and Guidelines.
My first thought was: Good, they just want everyone to play fair. But then I read the rules again, especially this one: "The platform may not replace or remove any portion of the requested work even if the change would result in the same output." and I said: Wait, what? Isn't it a generic definition of every optimization? If a developer writes 2+2 in GLSL and the platform just uses 4, is it cheating because it removed requested work (addition in this case) even if result is the same?
And then I started thinking: What do we have from benchmarks after all? Is their importance a good thing for gamers and other customers of graphics technology? In theory, benchmarks should mimick some aspect of real applications to measure and compare how different hardware performs in this type of applications (e.g. games). But it may be that decision makers want to just see good scores in benchmarks (bosses generally like numbers and bars and graphs :) so engineers implement optimizations or even some cheats just for these benchmarks. And then media notice that, devices get delisted, benchmark creators write such rules... and gamers just want to play games.
If performance was measured just in real games, and platform vendors optimized or even cheated for a particular title, then at least we would have a better performing game. Just my personal opinion :)
# Aero2 - Free Internet in Poland
Did you know that in Poland there is free access to the Internet available for everyone? It's called Aero2 and it works through 3G. To use it, all you need to do is:
2. You need to pay 20 PLN. It's a deposit and it will be returned if you return your SIM card.
3. You need to fill in the order form and send it, along with printed confirmation of transfer of your deposit and copy of your identity card, to the address shown on their website.
4. After about one month they will send you SIM card. Of course you cannot make phone calls with this card - it is only for data transfer.
5. Enter connection parameters:
Username and password: empty
IP and DNS addresses: automatic
6. You have free Internet :) It's not very fast and will disconnect you every hour, but still usable for browsing websites. I think it may be useful when you travel e.g. in train or after you move to new house and don't have new Internet connection yet. So if it's so cheap to get it, why not give it a try?
# Developing Graphics Driver
Want to know what do I do at Intel? Obviously all details are secret, but generally, as a Graphics Software Engineer, I code graphics driver for our GPU. What does this software do? When you write a game these days, you usually use some game engine, but I'm sure you know that on a lower level, everything ends up as a bunch of textured 3D triangles rendered with hardware acceleration by the GPU. To render them, the engine uses one of standard graphics APIs. On Windows it can be DirectX or OpenGL, on Linux and Mac it is OpenGL, on mobile platforms it is OpenGL ES. On the other side, there are many hardware manufacturers - like NVIDIA, AMD, Intel or Imagination Technologies - that make discrete or embedded GPUs. These chips have different capabilities and instruction sets. So graphics driver is needed to translate calls to API (like IDirect3DDevice9::DrawIndexedPrimitive) and shader code to form specific to the hardware.
Want to know more? Intel recently published documentation of the GPU from the new Ivy Bridge processor - see this news. You can find this documentation on intellinuxgraphics.org website. It consists of more than 2000 pages in 17 PDF files. For example, in the last volume (Volume 4 Part 3) you can see how instructions of our programmable execution units look like. They are quite powerful :)
# Vector Register Size - Diagram
It may be hard to imagine and remember what is the exact number of bits, bytes, words or floats in some piece of data, like a SIMD register. So today I've made following diagram/cheatsheet:
Here you can find its "source" in OpenOffice Draw format: Vector_register_size.odg.