Fast, Heuristic Disk Search

Warning! Some information on this page is older than 6 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Wed
04
Jan 2012

In March 2007 I've written an algorithm which I called "Fast, heuristic disk search" and published it in my article Szybkie, heurystyczne przeszukiwanie dysku (in Polish) on codeguru.pl. Today I went back to it, improved it a bit and I think it is a good opportunity to present it once again, this time in English :)

The problem is about searching hard disk, looking for a file or directory with particular name. Naive approach, which is used by file searching functions in programs like Total Commander, uses depth-first search, so it enters C:\, then C:\Program Files, then C:\Program Files\Adobe etc., spending lots of time in places where it probably won't find the file we are looking for.

Game developers know that searching algorithm can be significantly improved by using some heuristics that will drive algorithm towards (probably) right direction. We do it in A* pathfinding. Does a similar principle could be used also when searching for a file?

My particular problem was configuring a program with path to a console tool, let's say fxc.exe shader compiler from DirectX SDK. We could present a textbox where user has to enter path to this tool. We could also make "Browse" button with an open file dialog to simplify locating it. But wouldn't be great if program tried to locate it automatically on first run? Some installed programs store path in a registry key, but if it doesn't, we have no idea where could it be if we don't look for it among files and directories on hard disk.

My algorithm tries to locate given file or directory in given time limit by searching all given directories and their subdirectories. The improvement I propose here is inteligent management of the order in which we process these directories.

The algorithm uses two collections: a queue of directories to process and a set of already visisted directories. Here is the code in C#:

/* Intelligent search for a file or directory by name, on the hard disk.
 * Can throw exceptions on some errors, while silently ignore other.
 * 
 * targetName - name of the target like "fxc.exe" or path suffix like @"bin\x86\fxc.exe".
 * interestingDirNames - case-insensitive strings with parts of
 *     directory names that make these directories particularly
 *     interesting, while not very common.
 *     Example: { "microsoft", "dx", "directx", "sdk" }
 * startDirs - paths to directories to start search from. For example,
 *     path to Program Files or hard disk root directories.
 *     
 * Returns path to found target or null if not found in given time.
 */
public static string IntelligentSearch(
  string targetName, bool targetIsDirectory,
  string[] interestingDirNames, string[] startDirs, TimeSpan timeout)
{
  DateTime endTime = DateTime.Now + timeout;

  // Paths to process
  LinkedList<string> PathQueue = new LinkedList<string>();
  // Path already processed in form Key=Path, Value="1"
  System.Collections.Specialized.StringDictionary Processed =
    new System.Collections.Specialized.StringDictionary();

  foreach (string startDir in startDirs)
    PathQueue.AddLast(startDir);

  // Processing loop - berform breadth first search (BFS) algorithm
  while (PathQueue.Count > 0)
  {
    // Get directory to process
    string Dir = PathQueue.First.Value;
    PathQueue.RemoveFirst();
    // Already processed
    if (Processed.ContainsKey(Dir))
      continue;
    // Add to processed
    Processed.Add(Dir, "1");

    try
    {
      string targetPath = Path.Combine(Dir, targetName);
      if (targetIsDirectory)
      {
        if (Directory.Exists(targetPath))
          return targetPath;
      }
      else
      {
        if (File.Exists(targetPath))
          return targetPath;
      }

      foreach (string SubDir in Directory.GetDirectories(Dir))
      {
        bool dirIsInteresting = false;
        if (interestingDirNames != null && interestingDirNames.Length > 0)
        {
          string subDirName = Path.GetFileName(SubDir);
          foreach (string interestingName in interestingDirNames)
          {
            if (subDirName.IndexOf(interestingName, StringComparison.InvariantCultureIgnoreCase) != -1)
            {
              dirIsInteresting = true;
              break;
            }
          }
        }

        if (dirIsInteresting)
          PathQueue.AddFirst(SubDir);
        else
          PathQueue.AddLast(SubDir);
      }
    }
    catch (Exception) { }

    if (DateTime.Now > endTime)
      return null;
  }

  return null;
}

Example usage, which finds my fxc.exe in a fraction of a second:

string[] interestingDirNames = {
  "microsoft",
  "dx",
  "directx",
  "sdk",
  "utilities",
  "bin",
  "x86" };

List<string> startDirs = new List<string>();

startDirs.Add(Environment.GetFolderPath(Environment.SpecialFolder.ProgramFiles));

foreach (System.IO.DriveInfo Drive in System.IO.DriveInfo.GetDrives())
  if (Drive.DriveType == System.IO.DriveType.Fixed)
    startDirs.Add(Drive.RootDirectory.Name);

string result = Globals.IntelligentSearch(
  "fxc.exe",
  false,
  interestingDirNames,
  startDirs.ToArray(),
  TimeSpan.FromMilliseconds(5000.0));

Comments | #algorithms #.net Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2025