Wed
04
Jan 2012
In March 2007 I've written an algorithm which I called "Fast, heuristic disk search" and published it in my article Szybkie, heurystyczne przeszukiwanie dysku (in Polish) on codeguru.pl. Today I went back to it, improved it a bit and I think it is a good opportunity to present it once again, this time in English :)
The problem is about searching hard disk, looking for a file or directory with particular name. Naive approach, which is used by file searching functions in programs like Total Commander, uses depth-first search, so it enters C:\, then C:\Program Files, then C:\Program Files\Adobe etc., spending lots of time in places where it probably won't find the file we are looking for.
Game developers know that searching algorithm can be significantly improved by using some heuristics that will drive algorithm towards (probably) right direction. We do it in A* pathfinding. Does a similar principle could be used also when searching for a file?
My particular problem was configuring a program with path to a console tool, let's say fxc.exe shader compiler from DirectX SDK. We could present a textbox where user has to enter path to this tool. We could also make "Browse" button with an open file dialog to simplify locating it. But wouldn't be great if program tried to locate it automatically on first run? Some installed programs store path in a registry key, but if it doesn't, we have no idea where could it be if we don't look for it among files and directories on hard disk.
My algorithm tries to locate given file or directory in given time limit by searching all given directories and their subdirectories. The improvement I propose here is inteligent management of the order in which we process these directories.
The algorithm uses two collections: a queue of directories to process and a set of already visisted directories. Here is the code in C#:
/* Intelligent search for a file or directory by name, on the hard disk. * Can throw exceptions on some errors, while silently ignore other. * * targetName - name of the target like "fxc.exe" or path suffix like @"bin\x86\fxc.exe". * interestingDirNames - case-insensitive strings with parts of * directory names that make these directories particularly * interesting, while not very common. * Example: { "microsoft", "dx", "directx", "sdk" } * startDirs - paths to directories to start search from. For example, * path to Program Files or hard disk root directories. * * Returns path to found target or null if not found in given time. */ public static string IntelligentSearch( string targetName, bool targetIsDirectory, string[] interestingDirNames, string[] startDirs, TimeSpan timeout) { DateTime endTime = DateTime.Now + timeout; // Paths to process LinkedList<string> PathQueue = new LinkedList<string>(); // Path already processed in form Key=Path, Value="1" System.Collections.Specialized.StringDictionary Processed = new System.Collections.Specialized.StringDictionary(); foreach (string startDir in startDirs) PathQueue.AddLast(startDir); // Processing loop - berform breadth first search (BFS) algorithm while (PathQueue.Count > 0) { // Get directory to process string Dir = PathQueue.First.Value; PathQueue.RemoveFirst(); // Already processed if (Processed.ContainsKey(Dir)) continue; // Add to processed Processed.Add(Dir, "1"); try { string targetPath = Path.Combine(Dir, targetName); if (targetIsDirectory) { if (Directory.Exists(targetPath)) return targetPath; } else { if (File.Exists(targetPath)) return targetPath; } foreach (string SubDir in Directory.GetDirectories(Dir)) { bool dirIsInteresting = false; if (interestingDirNames != null && interestingDirNames.Length > 0) { string subDirName = Path.GetFileName(SubDir); foreach (string interestingName in interestingDirNames) { if (subDirName.IndexOf(interestingName, StringComparison.InvariantCultureIgnoreCase) != -1) { dirIsInteresting = true; break; } } } if (dirIsInteresting) PathQueue.AddFirst(SubDir); else PathQueue.AddLast(SubDir); } } catch (Exception) { } if (DateTime.Now > endTime) return null; } return null; }
Example usage, which finds my fxc.exe in a fraction of a second:
string[] interestingDirNames = { "microsoft", "dx", "directx", "sdk", "utilities", "bin", "x86" }; List<string> startDirs = new List<string>(); startDirs.Add(Environment.GetFolderPath(Environment.SpecialFolder.ProgramFiles)); foreach (System.IO.DriveInfo Drive in System.IO.DriveInfo.GetDrives()) if (Drive.DriveType == System.IO.DriveType.Fixed) startDirs.Add(Drive.RootDirectory.Name); string result = Globals.IntelligentSearch( "fxc.exe", false, interestingDirNames, startDirs.ToArray(), TimeSpan.FromMilliseconds(5000.0));
Comments | #algorithms #.net Share