Indexing and Searching Overview

dtSearch products instant searching across terabytes of text in a wide range of online and offline data types. Search time (including concurrent search time) is typically less then a second.

  • dtSearch Desktop and dtSearch Network run in a classic Windows environment for individual or shared network-based searching.
  • dtSearch Web runs in an Internet or Intranet environment, with no limit on the number of concurrent searches.
  • The dtSearch Engine developer SDK comes in multiple different versions for different platforms. Running in an Internet or Intranet server-based environment, the dtSearch Engine supports efficient multithreaded searching, with no limit on the number of concurrent search threads.

Privacy

dtSearch products will not send dtSearch Corp. any information about how you use dtSearch, your documents, your searches, etc. Privacy Policy

Building an Index

dtSearch products can instantly search terabytes of text because dtSearch builds a search index that stores each unique word and its location in the data.

  • A single index can hold up to a terabyte of data, spanning multiple directories, emails and attachments, online data and other databases. (See supported data types.)
  • dtSearch can build and simultaneously search any number of terabyte indexes.
  • Indexing is easy: just point to the folders or online data you want to index.
  • No need to tell dtSearch what files, emails or other content you have; dtSearch will figure that out for itself.
  • Indexing, searching and display of documents does not alter original files or other data, including Hash values.
  • dtSearch also offers automated indexing via the Windows Task Scheduler.
  • See optimizing indexing of large collections of data for important tips on indexing building.
  • dtSearch products also include a caching option for use with web-based or other remote data.
  • See indexing tips for information on unindexed searching, forensics tips, etc.

Updating an Index

dtSearch can update your indexes by adding only new or updated items, removing deleted items, and compressing the index, without affecting searching.

  • Like initial index building, index updates can also be automated via the Windows Task Scheduler.
  • Updating an index, including through the Windows Task Scheduler, does not lock out individual or concurrent searching.
  • See also indexing tips.

Indexing Tips

Indexing Tip #1

Build an index -- unindexed searching is almost never more efficient. While indexing is much slower than searching, the time it takes to build an index and then search for multiple search terms (as is typical in forensics and e-discovery) is significantly less than the time it takes to run multiple unindexed search terms. And once the index is in place, if you think of more search terms, additional search time is pretty much instantaneous.

Indexing Tip #2:

Watch for encrypted files and "image only" PDFs. After building an index, dtSearch creates a log that includes a list of all encrypted files and "image only" PDFs. Take a look at this log so you know what you need to separately decrypt and/or OCR and run again through dtSearch. Flagging encrypted files and flagging "image only" PDFs

Indexing Tip #3

Access emails directly as PSTs, OSTs, MSGs, etc, instead of going through Outlook/MAPI. If you are not searching your own personal email collection (and sometimes even if you are searching your own emails and have a large collection), it is much more efficient to bypass the Outlook/MAPI “middleman,” and directly access the data. Outlook and Exchange indexing. And don’t forget fuzzy searching to sift through potential typographical errors in emails and attachments!

Indexing Tip #4

Consider adding caching. "Cache documents in the index" stores a complete copy of documents in the index. This allows for the immediate display of full documents with highlighted hits, even when the documents are not accessible or where access is slow or unreliable. "Cache document text in the index" makes both generation of search reports and generation of the hits-in-context synopsis in search results much faster. Adding caching

Indexing Tip #5

Check out optimizing indexing of large document collections before you start a large index job. One example: while search options like fuzzy searching are adjustable at search time, if you build a case and accent-sensitive index, the only way to change that setting is to rebuild the entire index again. With case and accent sensitive indexing on, your index size will be much larger, as your index will store Frank, frank and FRANK as separate words, instead of the same word. Worse, with case and accent-sensitive indexing on, a search for Frank Harvey would miss both frank harvey and FRANK HARVEY. Full FAQ

Indexing Tip #6

Update your indexes by telling dtSearch to add any new or changed documents, remove deleted documents and compress the updated index. This type of update tends to be much less time consuming than completely re-indexing. Even better, dtSearch can update indexes automatically with no effect on ongoing concurrent searching.

Search Tips