About dtSearch


dtSearch Corp. has over two decades of experience in enterprise and developer text retrieval and document filters. The Smart Choice for Text Retrieval® since 1991, the dtSearch software product line offers “industrial-strength” (PC Magazine) performance in searching, as well as the ability to parse a wide variety of data formats.

Then and Now


The company started research and development in text retrieval in 1988. Incorporated in Virginia in 1991, dtSearch Corp. began marketing the first dtSearch product in the first quarter of that year. The original dtSearch release ran as a desktop-“only” application.

Today, the dtSearch product line can instantly search terabytes of text across a desktop, network, or Internet or Intranet site. dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or portable media.

The first few releases of the dtSearch product line did not include OEM or developer API access. In 1995, dtSearch Corp. made available its initial developer version for OEM integration. Launched in 1995, the very first Windows product that embedded the dtSearch Engine had an installed base of over one million users.

The dtSearch Engine now comes in multiple platforms and makes available dtSearch document filters and instant searching for a wide range of Internet, Intranet and other commercial applications. SDKs include native 64-bit. Cross-platform APIs cover C++, Java and current .NET. The dtSearch Engine also works on cloud platforms like Azure and AWS.

The two functional components at the core of dtSearch products are:  dtSearch’s proprietary document filters and general data support; and dtSearch’s full-text and metadata searching. These two functional components work together for integrated searching and data display with highlighted hits. Or they can work separately. For example, some developers require the dtSearch document filters “only,” without the need for search functionality.

In addition to the dtSearch Engine, other dtSearch products include: dtSearch Web for quickly publishing instantly searchable data to an Internet or Intranet site; dtSearch Network for instantly searching across a network; dtSearch Desktop for desktop search; and dtSearch Publish for publishing searchable data to portable media.

Who Uses dtSearch


Fortune 100 companies and others with some of the most demanding document search needs in the world rely on dtSearch products. Typical enterprise use of the dtSearch product line includes general “office” document retrieval, searching through email repositories plus attachments, and database search.

dtSearch products can also search web data. High-traffic, content-rich public Internet sites deploy dtSearch products to search online technical documentation. dtSearch products can also run on internal access servers, providing secure Intranet searching.

In the legal and investigative areas, a large number of e-discovery providers and forensic investigators rely on dtSearch. (For these users, the dtSearch site includes a summary FAQ on indexing and searching features of common interest to forensics users.)  Additionally, dtSearch products assist with legal research across statutes, regulations, and case law.

The financial and accounting industries similarly use dtSearch products. In fact, 3 out of 4 of the “Big 4” accounting firms are dtSearch customers. The dtSearch product line operates in the recruiting space for resume or CV searching. Increasingly, dtSearch products search medical records too.

US Government customers include defense, space and law enforcement agencies. 4 out of 5 of the Fortune 500 largest Aerospace and Defense industry companies use dtSearch. Other US Government agencies (from tax agencies to court systems), as well as state and local government agencies, are also dtSearch customers.

International governmental organizations likewise employ dtSearch products. (Through its Unicode support, the dtSearch product line supports hundreds of international languages.)  The product line has a strong international presence in the private sector too. dtSearch has distributors worldwide, including coverage on six continents.

As for OEM integration, some of the largest IT companies have embedded the dtSearch Engine in commercial applications. The dtSearch site includes hundreds of publicly-available developer case studies covering a wide variety of market segments. The dtSearch site also has over a hundred press reviews  from general information management publications like Computerworld, Network World, eWeek, KMWorld as well as reviews from more specialized programmer and other vertical market publications.

Search Features


Terabyte Indexer. dtSearch enterprise and developer products can index over a terabyte of text in a single index, spanning multiple folders, emails and attachments, online data and other databases. The products can create and search any number of indexes, and can search indexes during updates. Indexed search time is typically less than a second, even across terabytes of data.

Concurrent, Multithreaded Searching. dtSearch developer products provide efficient multithreaded searching, with no limit on the number of concurrent search threads.  For online search, the products can run in a completely stateless manner, making it very easy to scale.  A “success story” from Intel® describes dtSearch’s “perfect score” for high-volume web-based concurrent searching:

The dtSearch Engine multi-threaded indexed search demo achieved 100 percent parallel time in the Intel Concurrency Checker test, indicating full optimization for multi-core hardware ... The relationship between Intel and dtSearch stretches back a number of years ... [and] generates synergies that deliver excellent performance and other benefits to end-customers, including internal customers at Intel. Full write-up (PDF)

Federated Searching and the dtSearch Spider. dtSearch products provide federated search across any number of folders, emails (with nested attachments), and databases. The dtSearch Spider adds local and remote online content to a search. The Spider can index sites to any level of depth, with support for public and secure online content. dtSearch products provide integrated relevancy ranking with multicolor highlighted hits across both online and offline data.

25+ Search Options and International Language Support. The dtSearch product line offers over 25 search types, including special forensics search options. For international language coverage, dtSearch products support Unicode, including support for right-to-left languages, and special Chinese/Japanese/Korean character options.

Faceted Search and Other Data Classification Options. The dtSearch Engine developer APIs support categorization based on document full-text contents, internal document metadata, database content, or data attributes associated with documents during document indexing.  The dtSearch Engine has APIs for other advanced data classification options as well, such as faceted search and full-text and/or fielded data positive and negative variable term weighting. See "Databases" section of selected articles by subject for details on database indexing, faceted searching, document classification and other topics.

Document Filters and Supported Data Types. dtSearch’s proprietary document filters support a broad range of document types, emails and attachments, databases and other online data.  The document filters further support browser display with multicolor highlighted hits for text converted to HTML for display. Document filters are also available for separate licensing across all platforms (see below).

SDKs and Platforms. The dtSearch Engine supports multiple platforms.  Cross-platform APIs cover C++, Java and current .NET.  (See also ASP.NET Core live demo showing faceted search and multicolor hit-highlighting.  Please select SEC Filings to see the faceted search portion of the demo.Developer tutorials cover a wide range of topics, including an ASP.NET Core demo sample code walk-through.  The dtSearch Engine also works on cloud platforms like Azure and AWS.

Document Filters and Supported Data Types


dtSearch’s own document filters support parsing, indexing, searching and display with highlighted hits of text and metadata across a broad range of online and offline data types.

  • Web-ready content: supports integrated images and text in HTML, XML/XSL, PDF, ASP.NET, CMS, PHP, WordPress, SharePoint, etc.
  • Other databases and data sources: supports XML, Access, XBASE, CSV, etc.; dtSearch Engine APIs support NoSQL and SQL-type databases, along with the full-text of BLOB data; dtSearch Engine APIs also support disk images, network data streams and other non-file data.
  • MS Office formats: supports integrated browser-ready image and text in Word (RTF/DOC/DOCX), PowerPoint (PPT/PPTX), Excel (XLS/XLSX), Access (MDB/ACCDB) and OneNote (ONE); support includes documents saved from Office 365.
  • Other “Office” formats, PDF and other printer formats, compression formats: supports other “Office” suite formats; EMF Spool (SPL) files; compression formats like RAR, ZIP, GZIP and TAR; PDF, PDF Portfolio, and many encrypted PDFs.
  • Emails and attachments:  supports integrated browser-ready images, text and attachments (including recursively embedded objects) in Outlook/Exchange (PST/OST/MSG) and Thunderbird (MBOX/EML); support includes emails saved from Office 365.
  • Using dtSearch with cloud storage (OneDrive, DropBox, Amazon S3, SharePoint-synced files, etc.)
  • Full list of supported document types
Document Filter APIs. All developer APIs make available to developers dtSearch’s text parsing, extraction, conversion and hit-highlighting capabilities.

  • An “object extraction” API lets developers navigate through the structure of each embedded object as a hierarchy, and optionally extract each object, such as an image in an MS Word file embedded in an MS Access database, compressed and attached to an email.
  • General dtSearch Engine licenses include the document filters along with dtSearch indexing and searching functionality.
  • The document filters are also available for separate license for developers requiring text parsing, extraction and conversion “only,” without search.

New Versions


dtSearch Corp. typically releases a new version of the dtSearch product line every 3 - 4 months. Because of the very large developer installed base currently using the dtSearch Engine, the company strives wherever possible to maintain backwards compatibility in the developer API.
 
Typically, a new release will provide support for new formats and operating systems, and of course add new product line features. The dtSearch website includes detailed release notes spanning many years.

Because of the many file format changes that the dtSearch product line must keep up with, as well as coverage for new operating systems and the like, dtSearch encourages customers to stay current on dtSearch releases. Please sign up for automatic notifications of new version releases. And see upgrades for information on current downloads.

With each new release of the dtSearch product line, the company generally makes available a new beta version for download. The posted release notes contain information on the new beta features as well.