dtSearch Release Notes
Disclaimer: This information is provided subject to the license agreement accompanying the products
described and to the terms of use of this web site,
and does not constitute an additional warranty. dtSearch makes no warranty of any kind with regard to this
information.
Critical RAR Security Vulnerability affects older dtSearch for Windows versions
CVE-2025-8088, a security vulnerability affecting the Rarlab library that dtSearch uses to process RAR files, affects dtSearch
versions 2022.02 and earlier. Users with affected versions should update immediately. To update, run dtSearch
Desktop and click Help > Check for Updates > Check Now.
Only Windows versions are affected. Another way to eliminate the vulnerability is to delete the files dtv_rar.dll and
dtv_rar64.dll from your dtSearch installation.
For more information on this vulnerability, see: https://www.cve.org/CVERecord?id=CVE-2025-8088
The vulnerability only affects older dtSearch versions because we removed the related code from dtSearch products
before the release of dtSearch 2023.01.
Another recent RAR vulnerability, CVE-2025-6218, does not affect dtSearch because dtSearch ignores the
archive-supplied extraction path.
dtSearch 2025.02 (Build
8842 )
Wed 09/10/2025
All Products
- JSON file format support
Added support for indexing and searching JSON files (type id
it_JsonRecord=371). As with XML, field searching, including nested field searching, is supported. Because JSON
files cannot be automatically detected, the file type table must contain a rule identifying JSON files using a
filename pattern (i.e., *.json).
- JSON tables support
If the file type table specifies the file type "JSON Table" (it_JsonTable=372),
then the JSON file must consist of an array of objects, each of which will be indexed as a row of the file.
Information outside the array object will be ignored.
- CSV and JSON table configuration files
Table configuration files provide a way to make indexed table data more useful by providing a way to specify
the name and date for each row using data in the table. To enable table configuration files, set the flag
dtsFlagUseJsonConfigFiles in Options.OtherFlags. When this flag is enabled and a CSV or JSON table is being
indexed, dtSearch will look for a file named {filename}.config.json (where {filename} is the name of the CSV
or JSON table) containing the rules to apply to that file. The .config.json file can contain these settings:
- "RowName": A template using %% around field names specifying how to build an informative filename
for each row. Example: "Row for %%lastname%% %%firstname%%". The row name will always include the
ordinal and offset for the row data so the generated filename does not have to be unique.
- "RowDateField": The name of the field to use as the date for the row, which will appear and be
sortable in search results like file modification dates. If no RowDateField is specified, the
modification date of the containing file will be used.
- "DateFormat": The layout of the RowDateField (example: "YYYY-MM-DD").
- "SkipBlankFields": If true, fields with blank values will be omitted rather than being included
with an empty value. (This can improve the readability of table data with many optional fields.)
- "SkipLines": For CSV data only, the number of lines of data to skip before the line containing
the field names. This can be useful for handling exported CSV reports that start with one or more lines
of titles or other extra information.
Example:
{
"RowName" : "ROW %%lastname%% %%firstname%%",
"RowDateField": "start_date",
"DateFormat": "YYYY-MM-DD",
"SkipBlankFields" : "True",
"SkipLines": 2
}
dtSearch Engine for Linux
- The dtSearch Engine for Linux includes new support for Linux on ARM64. The libraries are installed in the dtsearch/arm64 folder.
The minimum supported Linux version for the ARM64 version is Red Hat Enterprise Linux 9 with GCC 12.2
- The minimum supported Linux version for the x86 and x64 versions is now Red Hat Enterprise Linux 7 (updated from Red Hat Enterprise Linux 6) with GCC 4.9.
Fixes/Minor Enhancements
- Updated included Unrar version to 7.13
- Fixed: <--BeginNoIndex--> tags worked in HTML files but not XML
- Added flag dtsoConvertHtmlAllowCustomTags in ConvertFlags to tell FileConverter to allow unrecognized HTML tags
that conform to the specification for custom tags (start with a letter, contain at least one hyphen -- see the
HTML specification for
custom tags for details).
- Added new option in dtSearch Desktop, in Options > Preferences > PDF view options to control the location
of temporary copies of PDF files for use with Adobe Acrobat
dtSearch 2025.01 (Build 8836) Released March 10, 2025
Fixes/Minor Enhancements
- dtSearch Web: Added support for running dtSearch Web Setup under Windows Server 2025.
- Fixed: Some users unable to use Edit > Copy in dtSearch Desktop to copy text from the preview window.
- Added search flag dtsSearchWantHitDetailsSynonymOf (In the hits by word report, include SynonymOf= to indicate the
source term for synonym expansions.)
- Fixed: If MD5Hash or SHA256Hash is designed as a stored field, and if documents are index with the flag
dtsoFfGenerateMd5Hash or dtsoFfGenerateSHA256Hash, these field values were not treated as stored fields during
indexing for container document types such as ZIP (so the hash values would not appear as document properties in
search results).
- Fixed: Bug in regular expression matching misses "abc" in a search for "##z*abc". The bug only applies to regular
expression searches and affects single character expressions followed by * in cases where the single character
expression should match zero instances of the character. In the above example, zabc, zzabc, etc. are found, but abc
is not. Simple wildcard searches like z*abc are not affected.
- dtSearch Publish: Fixed "missing files" error generating new CD
- File parser bug fixes affecting: TAR, HWP, HWPX, PDF, EML
- Other bug fixes
dtSearch 2024.02 (Build 8821) Released September 30, 2024
All Products
- Multithreaded Indexing
Added support for building indexes using multiple threads, greatly improving
indexing speed on 64-bit Windows and Linux systems with multiple cores. At least 16 Gb of memory is
required.
In dtSearch Desktop/Network, click Options > Preferences > Indexing Resources to enable this
option.
For information on using multithreaded indexing in the API, please see Multithreaded Operations .
Multithreaded indexing had
previously been available as a "preview" feature.
Fixes/Minor Enhancements
- Updated included Zlib version to 1.3.1, Unrar to 7.1, and ICU to 74.2 (Windows and Linux).
- Fixed dtSearch Desktop Options > Preferences > Search Results bug causing the Column Sizes setting not to
be remembered if the compact resizable version of the Preferences dialog box is enabled
- In documents with footnotes, added a link back to the footnote reference to the footnote text.
- Fixed .NET Core API bug causing the IndexInfo.Flags property not to be populated.
- File parser bug fixes affecting: PDF, WordPerfect
- Other bug fixes
dtSearch 2024.01 (Build 8815) Released March 30, 2024
dtSearch Desktop/Network
- Accessibility improvements:
- New "Accessibility Options" page in Options > Preferences with options to improve operation of
dtSearch with screen readers.
- New optional compact resizable dialog boxes, which you can enable or disable in the Accessibility
Options page. At high text magnification levels, the standard dtSearch dialog boxes can be too large
to work with. By default, dtSearch will use compact resizable dialog boxes in this situation, which
can be resized and are organized to take up less screen space.
- "Make text larger" setting in Windows is now recognized and applied to menus, search results, dialog
boxes, and default document window zoom. The application must be closed and restarted for the change
to take effect. This is an alternative to the existing option to directly control the dialog box
font size in Options > Dialog Box Font Size.
- To provide better keyboard navigation among the dtSearch Search components (the menu, the Search
dialog box, the list of search results, the document and the Document Information sidebar window),
new keyboard shortcuts and menu options have been added to provide direct navigation using the
keyboard. Menu options corresponding to these shortcuts are in View > Go to window. The keyboard
shortcuts are: Ctrl+Alt+M - Main menu. Ctrl+Alt+D - Document, Ctrl+Alt+R - Search results, Ctrl+Y -
Document info, Ctrl+S - Search dialog box).
- Easier tab navigation of the Options > Preferences dialog box
- New minimum system requirements: Windows versions 8 through 11 and Windows Server versions 2012 and later.
dtSearch Engine
- Added arm64 and arm64ec versions of the dtSearch Engine DLL for Windows. Unlike the x64 version, the arm64
and arm64ec versions include the crypto-enabled
PDF file parser. The arm64 and arm64ec versions use the same C++ API as the x64 version.
Fixes/Minor Enhancements
- Added support for GPS metadata (latitude, longitude, altitude) in JPG files
- File parser bug fixes affecting PPTX ("modern
comments" support added), PDF
- Other bug fixes
dtSearch 2023.02 (Build 8806) Released October 24, 2023
All Products
- Image/Sound/Video metadata improvements: Added new file format support for OGG (Vorbis metadata), OPUS
(Vorbis metadata), FLAC (Vorbis metadata), WAV (RIFF and XMP metadata), AIFF (RIFF and XMP metadata), WEBP
(RIFF and XMP metadata), AVI (RIFF and XMP metadata), HEIF (EXIF and XMP metadata), APE (APEv2 and XMP
metadata), PNG (XMP, IPTC, and EXIF metadata), GIF (XMP metadata). Added XMP to existing metadata support
for MP4, TIF, MP3, MOV, WMA/WMV. Image files continue to have an "Image Format" field that specifies the
file format and categories of metadata found (EXIF, IPTC, XMP, etc.). Sound and video files have a new
"Media Format" field with this information.
dtSearch Engine
- Added sample code demonstrating use of the dtSearch Engine in an ASP.NET Core application running in a
Docker container under Windows (NanoServer) and Linux. See the [dtsearch]\examples\Docker folder and the
readme-dtSearch-docker.md file in the application folders.
- Added sample code demonstrating how to build NuGet packages to deploy the dtSearch Engine with associated
dependencies including ICU, CMAP files, stemming rules, and the external file parsers. See the
[dtsearch]\examples\NuGet folder.
- Added to C++ API: dtssGetIndexInfo2, which is the same as dtssGetIndexInfo but also provides an error code
and diagnostic message when an index cannot be accessed.
Fixes/Minor Enhancements
- Updated RAR file parser to the current version of the Rarlab source (6.2.10 released August 1, 2023).
dtSearch uses source code from Rarlab to implement content extraction from RAR archives.
Note: Rarlab reports that the updated source fixes two security
vulnerabilities. Based on information available about vulnerabilities, we do not believe they affect any
dtSearch product.
(1) CVE-2023-40477 (out of bounds write) affects RAR recovery volumes. RAR recovery
volumes have a .rev extension and a different binary header from RAR archives, so dtSearch will not invoke
the RAR file processing code if it encounters a RAR4 recovery volume. Additionally, the code to process
recovery volumes is disabled in the RAR extraction code that dtSearch uses. This Rarlab article states that unrar.dll is not
affected by the vulnerability, and the unrar.dll source code is what dtSearch uses in its RAR file
parser.
(2) CVE-2023-38831 (launching of an incorrect file) is associated with the WinRAR user interface
and does not affect dtSearch.
- dtSearch Desktop Index Manager index properties window displays open error message instead of blank index
properties when an index cannot be opened
- Fixed search performance bug that could cause reduced search speed during phrase searches involving
extremely large documents
- Fixed incorrect handling of the Unicode soft hyphen character (U+00AD), which should be ignored during
searches because it is not indexed.
- Fixed bug that could cause duplicate field name errors when documents include field names with certain CJK
diacritical characters. Verifying an index will indicate if an index is affected. Affected indexes should be
rebuilt.
- Text log files history.ix, dtSearchIndexingHistory.log, and indexlog.dat now format dates YYYY/MM/DD instead
of MM/DD/YYYY
- File parser bug fixes affecting: DOCX, PDF
- In dtSearch Desktop the Options > Create group policy dialog creates an MSI file that places the registry
keys under HKMU instead of HKCU or HKLM, so you can control at installation time whether the installation is
per-user or per-machine. To install per-machine, specify ALLUSERS=2 and run the .msi with administrator
permissions.
- Fixed time zone bug in the Linux indexer causing documents to be reindexed unnecessarily during an
incremental index update.
- Added option in dtSearch Desktop in Options > Preferences > Indexing Resources to "Ask Windows to keep
computer awake during indexing" (to prevent automatic sleep from blocking scheduled updates)./li>
- Other bug fixes
dtSearch 2023.01 (Build 8790) Released June 5, 2023
dtSearch Desktop/Network
- New search results control. Improvements include:
- Long text can wrap within columns
- "First hits in context" display is faster and more efficient, and now appears in a wrapped column
instead of a separate row.
- Improved sorting performance
- Updated appearance
- Right-click any column header to quickly access search results format functions
Click a toolbar button or press Ctrl+Num-/ to toggle between "Fit-Content" and "Fit-Window" mode.
- The old control is still available -- check the box to "Use older dtSearch 7.x search results
control" in Options > Preferences > Search Results.
- Added an option in Options > Preferences > Document Display to control whether HTML files are displayed with
a <base> tag referencing the original document location. Enable this option to make relative links to
other files or images stored with the original document work in search results. However, this will also
cause internal bookmark links in the document to also reference the original file rather than the
hit-highlighted version.
dtSearch Engine
- WebDemo sample ASP.NET Core application updated to support configuring multiple search forms using the
appSettings.json file
Fixes/Minor Enhancements
- Fixed bug in beta/preview multithreaded indexer that could cause deleted documents not to be removed from an
index during an update.
- Fixed bug in dtSearch Desktop in the old 7.x search results control causing misalignment of data in search
results columns after repositioning a column.
- Fixed bug causing out-of-memory error verifying an index
- File parser bug fixes affecting: PDF, MSG
- Other bug fixes
dtSearch 2022.02 (Build 8775) Released December 1, 2022
dtSearch Desktop/Network
- Added built-in stemming rules and noise word lists for over 25 European languages.
- Added option in Search dialog box to select language to use for stemming.
- Added option in Options > Preferences > Letters and Words to select language to use for noise words.
- Added option in Options > Preferences > Indexing Resources to control number of threads used for
multi-threaded indexing
dtSearch Engine
- WebDemo sample ASP.NET Core application updated to require .NET Core 6 and current versions of Bootstrap and
JQuery
- Stemming rules and noise word lists for over 25 European languages are included in the dtSearch Data folder.
Fixes/Minor Enhancements
- Updated zlib source code to 1.12.13. Note: Zlib reports that this version fixed zlib CVE-2022-37434 ("zlib
through 1.2.12 has a heap-based buffer over-read or buffer overflow in inflate in inflate.c via a large gzip
header extra field. NOTE: only applications that call inflateGetHeader are affected. Some common
applications bundle the affected zlib source code but may be unable to call inflateGetHeader"). dtSearch
does not use the inflateGetHeader function so based on this information we do not believe this vulnerability
affected any dtSearch products.
- Fixed bug in beta/preview multithreaded indexer that could cause a corrupt index to be generated if the
underscore character is changed to a word break character in the alphabet settings.
- Fixed dtSearch Desktop search dialog box bug causing search dialog box to close when a search finds no
documents.
- Fixed dtSearch Desktop search dialog box bug causing "More search options" dialog box to clear all criteria
from previous search between searches.
- Fixed dtSearch Desktop search results display causing modification times to incorrectly reflect daylight
savings times (earlier or later by one hour)
- Improved View > Document Information display, shown as a dockable sidebar instead of a pop-up and with
better formatting of information.
- File parser bug fixes affecting: DOCX, WordStar, PDF
- Exceptions thrown from .NET 4.x DataSource API DocStream objects previously would fail the associated I/O
operation after a retry; now any detected exceptions will immediately fail the document and report it
through IIndexStatusHandler and WasDocError.
- Other bug fixes
dtSearch 2022.01 (Build 8749) Released May 17, 2022
- This version updates the zlib compression/decompression library code in dtSearch to the new zlib version
1.2.12 to remedy a reported memory corruption vulnerability in the zlib library. For more
information see nvd.nist.gov.
- The zlib update affects the Windows, Linux and macOS versions of dtSearch products.
- Zlib is a library of code that is used very widely in software products to implement ZIP-compatible
compression and decompression. For more information about zlib, please see zlib.net.
- dtSearch uses zlib for decompression of zip-compressed data in ZIP files and other formats such as PDF and
Office that use zlib-compatible compression. dtSearch also uses zlib to compress documents and
internally-generated data when caching text and documents in an index and when indexing PDF files.
- Because the zlib vulnerability could affect dtSearch products if used to index maliciously-created content,
this update is recommended for all users.
dtSearch 2022.01 (Build 8748) Released May 2, 2022
Fixes and minor enhancements
- Added new command-line options for dtSearch Desktop to simplify network deployment and suppress startup
prompts: /agreetolicense (agree to license agreement), /dirstd (use standard data folder), /sn (specify
serial number), and /noupd (suppress update checking).
- Fixed index creation failure when targeting a Samba network share due to delayed lock release.
- Reduced memory use when processing very large Excel pivot tables.
- File parser bug fixes affecting: PST, MSG, PDF, MP4
- Other bug fixes
dtSearch 2021.02 (Build 8733) Released December 31, 2021
- Fixed: Clicking "Add Folder" in the 64-bit dtSearch Indexer closed the indexer on some machines. Based on
analysis in the linked articles below, this appears to have been due to an interaction between a Windows 11
bug in shell32.dll and the Intel TBB memory manager, which the 64-bit dtSearch Indexer used. The error
occurred consistently on Windows 11 machines with OneDrive configured to back up the Desktop folder,
although other cases may exist as well. This issue may affect any Windows application that uses the Intel
TBB memory manager (tbbmalloc.dll) or similar replacement allocator such as Microsoft's mimalloc.dll.
Technical details on the issue are here
(github.com - Intel TBB project) and here
(github.com - Microsoft mimalloc project)
- Fixed: Indexing error affecting the multithreaded indexing preview feature causing index updates to fail
with a "203" error code.
dtSearch 2021.02 (Build 8730) Released December 6, 2021
All Products
- Multithreaded Indexing Preview
Added support for building indexes using multiple threads, greatly
improving indexing speed on 64-bit Windows systems with multiple cores. At least 16 Gb of memory is
required.
In dtSearch Desktop/Network, click Options > Preferences > Indexing Resources to enable
this option.
For information on using multithreaded indexing in the API, please see Multithreaded Operations .
This is a
preview feature and should not be used for production work until released.
- Added support for the Hancom Office HWPX file format
- Windows 11, Windows Server 2022, and .NET 6 added as supported platform/environments.
dtSearch Desktop/Network
- Index Groups
Index groups provide a way to organize your indexes in the Search dialog box to make
very large numbers of indexes easier to manage. To enable this option, click Options > Preferences >
Search Options, and check the box to "Show indexes by group". When groups are enabled, if an index name
contains a colon, then the part before the colon is considered to be the group. For example, if an index is
named "Business: Records", then the group is "Business". In the Search dialog box, "Records" would appear
under a collapsible "Business" group heading.
- "Find indexes"
The new "find indexes" control in the Search dialog box lets you filter the list
of indexes by name and quickly select indexes to search using the keyboard.
To enable it, click Options
> Preferences > Search Options, and check the box to "Show index finder".
To use "find indexes",
in the Search dialog box press Ctrl+F and type any part of an index name. As you type the index list will
update to only show matching index names. You can also use the * and ? wildcard characters to find indexes
using wildcard matches.
Press Ctrl+Q to quickly check only the first listed index and return to the
search request box, or press ENTER to just return to the search request box.
- Better handling of font scaling on systems with multiple monitors.
- Added sort direction arrows to the search results column headers.
- Added Ctrl+Q hotkey for checking/unchecking checkboxes to select items in search results.
- Added support for using 64-bit Adobe Acrobat/Adobe Reader to display PDF files. The 64-bit dtSearch PDF Search Highlighter plug-in is also needed
for hit-highlighting to work.
- The dtSearch Desktop/Network search shortcut now launches the dtSearch Desktop/Network version that
corresponds to the Adobe Reader/Acrobat version installed, so if you have the 64-bit version of Adobe
Acrobat or Adobe Reader, the shortcut will automatically launch the 64-bit version of dtSearch
Desktop/Network.
- Added option in Options > Preferences > Indexing resources to "Store a copy of indexed Outlook items
in the index". Checking this option eliminates the need to run the 32-bit version of dtSearch
Desktop/Network if you are using the 32-bit version of Outlook and the 64-bit version of dtSearch
Desktop/Network if you are using the 64-bit version of Outlook, because dtSearch can display retrieved
Outlook items by extracting them from the index.
dtSearch Engine
- Added M1 support to the macOS version of the dtSearch Engine.
- Added streaming interface to Java DataSource API to allow data source documents to be returned as objects.
Fixes and minor enhancements
- Fixed bug causing "Search within these results" in dtSearch Desktop/Network to return an incorrect error
message when searching just-created indexes.
- Fixed bug causing indexing crash due to low-level I/O error.
- File parser bug fixes affecting: XLSX, DOCX, PDF
- Other bug fixes
dtSearch 2021.01 (Build 8712) Released June 23, 2021
All Products
- Beginning with this version, the minimum supported Windows version for dtSearch Desktop is Windows 7, and
the minimum supported Windows version for the dtSearch Engine is Windows Vista.
- dtSearch version numbers will now be based on the year released and an ordinal, so this version is 2021.01.
This numbering change does not reflect any change in compatibility or index format.
For developers
using the API, the internal major version number returned by the API will remain "7". The minor version
number will use two digits for the year and two digits for the ordinal, so the third version released in
2021 would have minor version number 2103.
Fixes and minor enhancements
- Added support for extracting macro text from vbaProject.bin files
- File parser bug fixes affecting: PDF, DOCX
- Updated WebDemo sample .NET Core application for .NET Core 3.1/5.0
- Fixed bug affecting mailto: links extracted using API with the dtsoFfHtmlShowLinks flag
- Other bug fixes
dtSearch 7.97 (Build 8684) October 14, 2020
dtSearch Desktop
- Added option in dtSearch Desktop to automatically use the 64-bit indexer if possible for indexing.
This
option is in Options > Preferences > Indexing Resources and is enabled by default.
When this box
is checked, the 64-bit indexer will be used for any eligible index update, regardless of how it is started
(scheduled, invoked using "Update Index", or invoked using "Update Multiple Indexes"). Under 64-bit versions
of Windows, all indexes can be built with the 64-bit indexer except indexes of Outlook messages when the
Microsoft Office version installed is 32-bit. When multiple indexes are being updated in a scheduled task
that updates several indexes, the 64-bit indexer will be used for all indexes that are 64-bit eligible, and
the 32-bit indexer will be launched separately to update any indexes that must be run in 32-bit mode.
- Added an option in dtSearch Desktop to control whether dtSearch should check for sufficient disk space
before starting an index update. If this box is checked, dtSearch will estimate the amount of disk space
needed to build an index and will halt the update with an error message if insufficient space is
available.
This option is in Options > Preferences > Indexing Resources.
dtSearch Engine
- The 64-bit Windows version of the engine supports options for using a replacement memory manager such as
Intel's TBB library, which can improve performance in heavily-multithreaded applications. For more
information on, please see this article: "External
allocator integration"
Fixes and minor enhancements
- Fixed bug affecting term weighting applied by field name
- File parser bug fixes affecting: XLS, XLSX, ZIP (malicious compressed content detection improved), DOCX
- Fixed date parsing bug affecting search requests containing two or more date ranges spanning different years
expressed as text
- Fixed HTML hit highlighting error affecting unrecognized tag with no spaces around it
- Fixed memory leak in .NET Core DataSource indexing API
- Other bug fixes
dtSearch 7.96 (Build 8668) March 30, 2020
- The 64-bit version of the dtSearch Engine now uses the Microsoft Universal
CRT. The CRT is included with Windows 10 and is automatically installed by Windows Update on most
other versions of Windows. It can be installed as described this Microsoft article: The
latest supported Visual C++ downloads.
- The 64-bit version of the dtSearch Engine supports Windows version Vista and later, so Windows XP is no
longer supported.
- The dtSearch PDF Search Highlighter plug-in has been updated for compatibility with the 2020 versions of
Adobe Reader and Adobe Acrobat. The new highlighter can be
downloaded here.
- Fixed bug in .NET Standard DataSource API affecting modification and creation dates of documents and the
DocDisplayName property.
- Fixed hit highlighting error affecting EML with embedded IMG tag.
- Fixed bug affecting the "not w/n" operator, causing it to retrieve a document that should not have been
retrieved.
- File parser bug fixes affecting: XLS, DOC, DOCX, PDF
- Other bug fixes
dtSearch 7.95 (Build 8633) October 24, 2019
- Fixed: in the .NET and .NET Core API, IndexInfo.UserFields did not delimit fields and values clearly. A
quote-delimited string is used now, with double quote characters in field values converted to single quote
characters to prevent ambiguity.
- Fixed: Crash when the dtsConvertAutoUpdateSearch flag was used with dtsSearchWantHitsByWord in a field
search.
- Fixed: bug affecting fuzzy search in combination with wildcard near front of word.
- Fixed: multicolor hit highlighting did not work with a search request that included a regular expression
with the ( character as part of the regular expression pattern.
- Fixed: Encoding error processing long UTF-8 text file (in version 7.94 only)
- File parser bug fixes affecting: PDF, XLS, Unicode Filtering, MSG, CSV
- Other bug fixes
dtSearch 7.94 (Build 8620) July 19, 2019
- Developer note: dtv_pdfcrypto.dll (Windows) and dtv_pdfcrypto.so (Linux) are now separate components again,
as they were in versions 7.92 and earlier. (In version 7.93, the cryptographic capabilities of
dtv_pdfcrypto.dll and dtv_pdfcrypto.so were merged into the dtSearch Engine, to provide a simplified, flat
deployment option for developers. We are reversing this decision in version 7.94 because the new
ViewerLibraries element in the homedir.xml file provides a more flexible way to get the same result while
covering other external file parsers and also preserving a separate engine version limited to 40-bit RC4
encryption for developers.)
- Fixed in 7.94.8620: A bug in versions 7.93 and 7.94.8619 caused errors in the target index for the merge
when a newly-created index was merged with an index created by an earlier version. To determine if an index
is affected by the bug, run Verify Index using version 7.94. Affected indexes will be reported as having a
merge consistency error. Compressing the index with version 7.94 will repair the errors, and version 7.94
will also correct for the index errors when searching the index.
- In indexes created with the "Hyphens all three" option (dtsoHyphenAll), hyphens in search requests are now
treated as spaces instead of significant by default, so a search for "one-two" will find "one two". To
restore the previous treatment, there is a new option in dtSearch Desktop in Options > Preferences >
Letters and Words. In the API, use the flag dtsSearchHyphenSignificant in SearchJob. This option does not
affect indexing.
- Added new Options.CheckConfiguration API to validate that a dtSearch Engine application has been correctly
deployed. CheckConfiguration checks for missing components such as the ICU library, CMAP files, alphabet
file, stemming rules, etc. Use CheckConfigurationFlags to specify the items to check. In the C++ API, use
dtssCheckConfiguration.
- Added an option to skip symbolic links during indexing. In dtSearch Desktop, this is in Options >
Preferences > Filtering Options, and in the API use the flag dtsIndexSkipSymbolicLinks.
- Added flag dtsoTfMimeFollowContentDispositionForText in Options.TextFlags to tell dtSearch to treat plain
text or HTML in MIME messages as attachments rather than part of the message body if the message part has a
Content-Disposition header specifying that the part is an attachment.
- Fixed: dtSearch Desktop bug causing the list of indexes to be empty after an out-of-memory error
- Fixed: SearchResults.SerializeAsXml did not preserve 32-bit Unicode characters.
- Fixed: Hit highlighting error that resulted in an extra </div> tag to be generated in conversion
output from some Office documents, resulting in formatting errors.
- Fixed: Hit highlighting bugs affecting PDF, conversion to HTML, and SearchReportJob when using cached text
- Fixed: Bug in encoding detection could cause a crash indexing or converting files with unknown encoding
(plain text or HTML)
- File parser bug fixes affecting .html, WMF and EMF images, Excel 2003 XML, XBase, Excel 95, PPT.
- Other bug fixes
dtSearch 7.93 (Build 8596) March 19, 2019
- Fixed: Intermittent crash when generating search reports (build 8596)
- Fixed bugs in the new ICU-based Unicode processing including word breaking error affecting Katakana
Half-width characters U+FF60-FF9F, U+2103 degrees Celsius and U+2109 degrees Fahrenheit.
- Fixed: Crash indexing corrupt PowerPoint file due to read past end of buffer
- Fixed: Indexing bug affecting the new accent-optional index type with the "hyphen all three" option set.
- Fixed: 16-byte memory leak each time ICU-based automatic encoding detection is done for a file with
ambiguous encoding information during indexing or file conversion, potentially causing out-of-memory errors
indexing very large document collections with the 32-bit indexer
- Fixed: Error compiling dtSearchNetApi4 sample code
- Other bug fixes
dtSearch 7.93 (Build 8592) February 25, 2019
All Products
-
This version adds preliminary support for the new PDF 2.0 file format. PDF 2.0 is the first major
change in the PDF file format since PDF 1.0 in 1993. Support is "preliminary" because while there are
some tools that can open PDF 2.0 files now, end-user commercial software products have not yet started
to support generation of new PDF 2.0 output so there is almost no data available for testing.
Because this new PDF version changes the header information, dtSearch versions before 7.93 will not
recognize the PDF 2.0 file format and will miss all content in these files. Therefore, it is
essential to use dtSearch 7.93 or later before attempting to index and search PDF 2.0 files.
-
Added new "Accent-optional" index type.
To create an accent-optional index in dtSearch Desktop/Network, click Index > Create Index (Advanced)
and check the box to "Support optional accent sensitivity".
When you select an accent-optional index for searching, the Search dialog box will show a new option,
"Accents in search terms are significant".
In an accent-optional index, accented letters can be made significant for matching purposes, but
unaccented letters will still always match both accented and unaccented forms.
For example, a search for "abc" in an accent-optional index find both "abc" and "äbc". The results will
be the same whether or not you check the "Accents in search terms are significant" box.
A search for "äbc" in an accent-optional index will find different results depending on whether the
"Accents in search terms are significant" box is checked. If the box is checked, then "äbc" will match
"äbc" and not "abc". If the box is not checked, then "äbc" will match both "äbc" and "abc".
- Added support for searching emoji characters in indexes created with version 7.93 and later.
- Microsoft has deprecated support for Windows XP compatiblity in Visual Studio. Because the Windows version
of dtSearch is built with Visual Studio, this means that dtSearch support for Windows XP is now deprecated
as well. (dtSearch is currently built using Visual Studio 2017 which still supports Windows XP.)
Index Compatibility
- To enable new features in dtSearch 7.93, this version makes some changes in the index format. Older dtSearch
versions will be able to search indexes created by dtSearch 7.93, but this is not recommended due to lack of
support in older dtSearch versions for the new features added in version 7.93 such as Accent-optional
indexes.
-
If you use dtSearch to merge new indexes into older existing indexes: indexes created with dtSearch 7.93
will not be compatible with older indexes for purposes of merging indexes, unless you specify when you
create the newer index that it must be compatible with the older index that will be included in or the
target of the merge. To create an index that is compatible with an existing older index, use Create
Index (Advanced), check the box to make the index compatible with an existing index, and select the
index to use for compatibility.
Using the API, set IndexJob.CreateCompatibleIndexPath to specify the index to use as the source for
compatibility information. To preserve the compatibility of an existing index that is being updated with
ActionCreate=true, set the flag dtsIndexCreatePreserveExistingSettings in IndexJob. (This is equivalent
to setting IndexJob.CreateCompatibleIndexPath to IndexJob.IndexPath.)
dtSearch Engine
-
Added integration with the International Components for Unicode (ICU) providing improvements in text
processing, such as the new accent-optional index type and support for emoji searching, for indexes
created with version 7.93. Please see "ICU Integration" in the dtSearch Engine API Reference for more
information. ICU integration is optional for applications using the developer API, and enabled by
default in other products.
Optional integration with ICU can be enabled using a new icuconfig.xml configuration file.
- To enable the new accent-optional index type (see above) set the flag dtsIndexCreateOptionalAccentSensitive
in IndexingFlags. To require accents to match at search time, set the flag dtsSearchRequireAccents in
SearchFlags. Note: ICU integration must be enabled to use the new accent-optional index type.
- A new IndexJob.CreateCompatibleIndexPath option lets you specify an existing index to use as a template when
creating an index. The noise word list, alphabet, ICU character tables and other text processing rules such
as dtsoTfAutoBreakCJK will be copied from the old index into the new index.
- The cryptographic capabilities of dtv_pdfcrypto.dll (Windows) and dtv_pdfcrypto.so (Linux) are now built
into the dtSearch Engine itself. Previously the dtSearch Engine .dlls and .so files for Windows and Linux
only supported decryption up to 40-bit RC4, and the dtv_pdfcrypto.* files were needed to enable decryption
up to 256-bit AES. Please see "Export Notice" section of the "Installing
the dtSearch Engine" article in the dtSearch Engine API Reference for information on exporting the
crypto-enabled dtSearch components.
-
dtSearch Engine supports a new homedir.xml file that can be used to specify the location of data files
and the viewers folder and to consolidate data files separately from executables. Previously the CMAP
files were installed under the bin and bin64 folder and the WordNet files were installed in a top-level
folder under the dtSearch Developer folder. Now these files can be installed along with other data files
(alphabet, noise word lists, stemming rules) in a "data" folder parallel to the bin and bin64 folder.
Use of homedir.xml is optional. Developers can still use the old deployment locations for data files.
Fixes and other enhancements
- Fixed: dtSearch Desktop/Network, when saving search results as CSV, incorrectly quoted user-defined (stored
field) field values containing quotation marks.
- Fixed: Extra whitespace appears in front of tables when viewing some retrieved HTML files in Internet
Explorer and dtSearch Desktop.
- Fixed: In the .NET Core API for Linux, the array SearchResultsItem.Hits reported incorrect word offsets of
hits. This did not affect hit highlighting if SearchResults.SetInputItem() was used because the correct hit
offsets stored internally in SearchResults were used.
- Fixed: Intermittent crash indexing PDF file due to read past end of buffer
- Fixed: Incorrect indexing of hyphens in indexes created with dtsoHyphenAllThree ("all three ways") option
and then updated with a different hyphenation option.
- Fixed: Sorting search results by title resulting in incorrect ordering for PDF files.
- Fixed: Indexing error affecting Outlook and Spider indexes in dtSearch Desktop (7.93 Build 8591 only)
- File parser bug fixes affecting *.rtf, *.msg, *.pdf
- Other bug fixes
dtSearch 7.92 (Build 8572) November 15, 2018
All Products
- Added support for Hancom *.hwp files. Support covers both the current *.hwp format and the Hanword 97
format.
Fixes and other enhancements
- Updated RAR file parser to the current version of the Rarlab source. Rarlab reports that the updated source fixes bugs that "may be
associated with security vulnerabilities."
- The dtSearch PDF Search Highlighter plug-in has been updated for compatibility with the 2019 versions of
Adobe Reader and Adobe Acrobat. The new highlighter can be
downloaded here.
- dtSearch Desktop/Network updated for compatibility with highlighting hits using and the new version of the
PDF highlighter (see above).
- Fixed: In dtSearch Desktop, search for date after a specified date in More Search Options results in
"Invalid date M99/D99/Y9999" error.
- Fixed: Malformed HTML generated when highlighting hits in HTML files with dtsoFfHtmlShowLinks flag set.
- Fixed: Build configuration error that could cause a crash on startup in dtv_pdfcrypto.dll under Windows XP
when calling application directly loaded dtv_pdfcrypto.dll with LoadLibrary.
- Fixed: File parser bug in Linux version 7.92 build 8571 affecting Outlook MSG files.
- File parser bug fixes affecting: *.html, *.pptx (hit highlighting error).
- Other bug fixes.
dtSearch 7.91 (Build 8553) June 17, 2018
dtSearch Engine
- Added faceted search and multicolor hit highlighting to the ASP.NET Core sample application.
- Added flag dtsoTfUseEmailDateAsFileDate in Options.TextFlags to use the internal message date (sent/received
date) as the file date for standalone .msg and .eml files, for purposes of file date filtering and the date
that appears in search results.
- .NET Standard API: fields in Options and SearchResultsItem changed to properties; additional optional
parameters in SearchResults.UrlEncodeItemWithIndexId (applications will need to be recompiled but no source
code changes needed).
Fixes and other enhancements
- This version fixes a denial of service vulnerability affecting searches in dtSearch Web, and potentially
other web-facing applications using the dtSearch Engine that accept very long search requests.
- Fixed: In dtSearch Desktop, Outlook error "8004010f = MAPI_E_NOT_FOUND "The information store could not be
opened." trying to view Outlook messages after a search."
- File parser bug fixes affecting: .html
- Other bug fixes.
dtSearch 7.90 (Build 8538) April 12, 2018
dtSearch Engine
- The dtSearch Engine has a new .NET
Standard API similar to the current .NET API in dtSearchNetApi4.dll, but compatible with .NET
Standard 2.0, .NET Core, and Xamarin. Supported platforms: Windows, UWP, Linux, macOS, Android (beta), and
iOS (beta).
As with the current .NET API, the .NET Standard API relies on a dtSearch Engine native
library (dtSearchEngine.dll under Windows, libdtSearchEngine.so under Linux, etc.). Therefore, applications
will need both the .NET Standard API wrapper, dtSearchNetStdApi.dll, and the native library for each
platform.
For more information on this new API, please see: .NET Standard API Information and the API documentation.
The current
dtSearch Engine for Windows version includes the Windows and UWP versions of the .NET Standard API. The
current dtSearch Engine for Linux version includes the Linux version of the NET Standard API. For other
platforms, please contact dtSearch
- Added new ASP.NET Core web search sample application, installed in the examples\NetStd\WebDemo folder.
- Added new flag in Options.FieldFlags, dtsoFfNormalizeEmailAddresses, to remove extra space and quotation
marks from email address fields and move embedded comments to the end of the name or email address.
- Added new flag in Options.TextFlags, dtsoTfHtmlSkipNav, to skip indexing content inside
<nav>...</nav> tags.
- Added new flag in Options.TextFlags, dtsoTfIgnoreNoIndex, to ignore <!--BeginNoIndex--> and
<!--EndNoIndex--> tags in HTML.
dtSearch Desktop
- To make it easier to review the results of large numbers of scheduled index updates, scheduled tasks now
generate a summary report listing all of the indexes updated and the results for each. After a scheduled
update, click the "Last Update" button in the "Schedule Updates" dialog box to see the summary report. The
report will also contain links to the detailed index update reports for each index.
These reports will
be stored in the LOGS subfolder of your dtSearch data folder so you can also access them directly in this
folder.
Fixes and other enhancements
- Fixed: dtSearch Desktop Indexer reports index is is in use when started from dtSearch Desktop Search
- Fixed: C++ API crash in executing dtsSearchReportJob if pReportCallBack is zero
- File parser bug fixes affecting: .pdf, .wpd
- Fixed dialog box error in "Schedule Updates" in dtSearch Desktop indexer (build 8538)
- Other bug fixes.
dtSearch 7.89 (Build 8517) November 20, 2017
dtSearch Engine
- New flag dtsIndexTokenizeEnumerableFields in IndexJob.IndexingFlags to automatically tokenize multi-value
enumerable fields using Options.StoredFieldDelimiterChar. For example, if a field value is
"First!Second!Third" and StoredFieldDelimiterChar is !, then instead of one enumerable field containing
"First!Second!Third", the document will be indexed with three separate enumerable field values, "First",
"Second", and "Third".
Fixes and other enhancements
- Fixed: dtSearch Desktop opens very slowly after the Windows 10 "Fall Creators Update" is installed,
particularly after Windows has been running for a while without rebooting. The source of the problem appears
to be how the Windows 10 "Fall Creators Update" implements the Win32 GetPixel API. The new version bypasses
GetPixel to the extent possible, relying on other methods of obtaining pixel data.
- Fixed XML syntax errors converting database files (*.dbf, *.csv) to it_ContentAsXml
- Fixed: Intermittent crash when closing dtSearch after doing a search that retrieves Outlook messages.
- File parser bug fixes affecting: .pdf, .msg
- Other bug fixes.
dtSearch 7.88 (Build 8499) September 25, 2017
dtSearch Engine
- New flag dtsoTfHideRevisions in Options.TextFlags to remove strikeout text and redlining from Microsoft Word
documents edited using "Track Changes".
- it_ContentAsXml output format now marks headers and footers in Office documents
Fixes and other enhancements
- .NET API: Fixed bug in Server.SetEnginePath that could cause an ApiInitializer exception if the path passed
to SetEnginePath contained Unicode characters.
- The rarely-used HTML header fields (HtmlTitle, HtmlH1, etc.) are now disabled by default. Set the flag
dtsoFfHtmlIndexHeadersAsFields in Options.FieldFlags to enable them.
- File parser bug fixes affecting: .xlsb, .pdf
- Fixed spider bug due to invalid temp filename generated from page URL
- Other bug fixes.
dtSearch 7.87 (Build 8481) July 1, 2017
dtSearch Desktop/Network
- New indexing option: "Generate and index SHA-256 hashes for documents" in Options > Preferences >
Indexing Options. Hashes are unique numerical codes that are sometimes used in forensics to identify files.
dtSearch now has an option to generate an SHA-256 hash for each document as it is indexed and to append the
hash to the indexed document text as a field named "Sha256Hash". The original document is not changed in any
way; this just affects the index and provides another way to search. Generating hashes will make indexing
slower.
dtSearch Engine
- dtSearch Engine for macOS released, with APIs for C++ and Java. The API and indexes are compatible with the
Windows, UWP, and Linux versions of the dtSearch Engine.
- New search option to use File Conditions to search by detected type id. Example: xfilter(type "excel") will
limit a search to files created by Excel, regardless of filename or extension.
Currently, developers
have the option to add the optional File Type and File Type Id fields to documents during indexing by
setting the dtsoFfIncludeFileTypeField and dtsoFfIncludeFileTypeIdField flags in Options.FieldFlags. When
documents are indexed with these flags, field searches can be done on the file type like this: "File Type
contains Excel", and after a search retrieved documents will appear with a "File Type" field at the end.
The new "type" filter search works with any index created by dtSearch 7.87 or later and does not depend on
any option settings. It does add any visible text to documents after a search, and can be efficiently
combined with other file selection criteria (size, date, name, extension) using the File Conditions syntax.
- New flag in Options.FieldFlags: dtsoFfGenerateSha256Hash (see explanation under dtSearch Desktop above).
- Added C# sample code demonstrating how to index SharePoint data using the DataSource API.
Fixes and other enhancements
- dtSearch Desktop: the option to display long files in report view will not apply to PDF files.
- dtSearch Desktop: after an index update, in addition to the HTML file listing documents that could not be
indexed (Index_LastUpdateErrors.htm), dtSearch Desktop will now also create plain-text files with simple
lists of filenames to faciliate automated processing. The filename lists will be: IndexLog_ImageOnlyPdf.txt,
IndexLog_Encrypted.txt, and IndexLog_OtherErrors.txt.
- Added button in dtSearch Diagnostic Tools to delete all dtSearch-related temporary files.
- Mapitool name templates can use # instead of % to identify symbols (example: "##Subject## [##Ordinal##]")
for easier use in DOS batch scripts. The old % syntax still works.
- Mapitool defaults to using a simple message ordinal (1, 2, 3, ...) instead of the message record key to
generate a unique filename for each message, because record keys can be very long in newer versions of
Outlook. The -t command-line switch can be used to override the default.
- File parser bug fixes affecting: .xls, .pdf, one, mp4, .pptx, .mdb
- Other bug fixes.
dtSearch 7.86 (Build 8458) February 28, 2017
dtSearch Desktop/Network
- New indexing option: "Index links in PDF files" in Options > Preferences > Indexing Options. If
checked, dtSearch will index any links embedded in PDF files.
- New indexing option: "Generate and index MD5 hashes for documents" in Options > Preferences > Indexing
Options. MD5 hashes are unique numerical codes that are sometimes used in forensics to identify files.
dtSearch now has an option to generate an MD5 hash for each document as it is indexed and to append the hash
to the indexed document text as a field named "MD5Hash". The original document is not changed in any way;
this just affects the index and provides another way to search. Generating MD5 hashes will make indexing
slower.
- Added option in Options > Preferences > Fonts and Colors to suppress use of color when highlighting
hits.
dtSearch Engine
- New flags in Options.FieldFlags: dtsoFfPdfShowLinks and dtsoFfGenerateMd5Hash (see explanation under
dtSearch Desktop above).
Fixes and other enhancements
- Fixed incorrect detection of associated PDF application in Windows running inside Parallels Desktop for Mac.
- File parser bug fixes affecting: .xlsx, .pdf, *.wpd, *.one, *.pptx, *.xlsx
- Other bug fixes.
dtSearch 7.85 (Build 8438) December 7, 2016
dtSearch Desktop/Network
- dtSearch Desktop/Network now offers the option to highlight each search term or phrase in a search request
in a different color. To enable this option, and to change the colors used for highlighting, click Options
> Preferences > Fonts and Color.
- Added option in Options > Preferences > Indexing Options to index databases as text. With this option
selected, database files such as Microsoft Access (*.mdb, *.accdb) and CSV files are indexed without
treating each row as a separate document, and without including field attributes. All of the text, including
field names, remains searchable, but database content is combined into a single plain text document, which
makes indexing and searching faster.
- Added option in Options > Preferences > Indexing Options to index XML files as text. With this option
selected, XML files are indexed without including field attributes. All of the text, including field names,
remains searchable, but XML content treated as plain text, which makes indexing and searching faster.
- Added checkbox in dtSearch Desktop to enable compatibility with Parallels Desktop for Mac. Parallels Desktop
is a program that enables Windows to run on Mac computers.
The option enables dtSearch to work around
an issue in Parallels Desktop that causes Parallels to report the modification date of files located in Mac
folders as January 1, 1601. When this box is checked, dtSearch alters the indexing process to ensure that
the correct dates are detected.
The option is in Options > Preferences > Indexing Options.
dtSearch Web/Publish
- dtSearch Web and dtSearch Publish now offer the option to highlight each search term or phrase in a search
request in a different color. The colors used for highlighting are stored in the dtsearch_options.html file
in the HighlightColors section. To enable this option when generating a search form, click the Document
Display Options tab in Form Builder and check the box to "Highlight hits using multiple colors."
To
modify an existing search form to use multiple colors, add these lines to the end of your
dtsearch_options.html file:
<BR><HR><I>HighlightColors: </I>
<!-- $Begin HighlightColors -->
ffff00,a6f500,00ffe3,cbe3ff,c5c3fa,ffbcff,ff9999,fcbf29,d7d7d7,eaddc2
<!-- $End -->
<BR><HR><I>Highlight hits using multiple colors: </I>
<!-- $Begin MultiColorHighlighting -->
1
<!-- $End -->
dtSearch Engine
- New UWP versions of the dtSearch Engine are included with this version. The UWP version of the C# API is
implemented in dtSearchUwpApi.dll, and the C++ API is implemented in dtSearchEngine_uwp_Win32.dll and
dtSearchEngine_uwp_x64.dll.
For sample code, please the sample application in C:\Program Files\dtSearch
Developer\examples\cs4\UwpDemo.
The UWP API is identical to the .NET API with a few exceptions, which
are described in a the file "dtSearch Engine for UWP.docx" in the UwpDemo folder.
- Added flags dtsoTfXmlAsText and dtsoTfDatabasesAsText to Options.TextFlags, corresponding to the two new
options in dtSearch Desktop described above to index XML files as plain text and to index databases as plain
text.
- Two enhancements to the implementation of multicolor hit highlighting in the developer API:
(1) First, a
new flag dtsSearchWantHitsByWordOrdinals has been added to the SearchFlags enumeration, and this flag should
be used along with dtsSearchWantHitsByWord and dtsSearchWantHitsArray. With this flag set, each word or
phrase in the HitsByWord report for a document will be preceded by an ordinal that constently identifies
that word or phrase from among the words or phrases in the search request. This makes it possible for
dtSearch to ensure that the mapping between search terms and highlighting attributes is consistent for all
documents retrieved in a search, regardless of which terms were retrieved in each individual document.
(2) Second, use of multiple highlighting attributes, enabled by the flag dtsConvertMultiHighlight, can now
be combined with the dtsConvertUpdateSearch flag, which updates hit offsets for a retrieved document to
reflect any changes either in the dtSearch version or the document since the document was indexed.
Fixes and other enhancements
- Added dtsExoUseSimpleFilenames flag to ExtractionOptions to limit extracted attachment filenames to 64
characters, Ascii-only.
- Added support for EMF Spool (*.SPL) files
- Added support for *.PPT and *.XLS files encrypted with 40-bit RC4 encryption to prevent modification, as
long as a password is not needed to open the file.
- Improved handling of malformed dates in EML and MBOX messages.
- File parser bug fixes affecting: .docx, .pdf, .xls, *.ppt, *.EMF, OpenOffice
- Fixed automatic date recognition bug affecting date ranges.
- Other bug fixes.
dtSearch 7.84 (Build 8405) August 31, 2016
dtSearch Engine
- Compatibility note: dtSearchNetApi4.dll is now built with Visual Studio 2015 Update 3, so the CRT and MFC
redistributables installers (VC_redist*.exe) for Visual Studio 2015 Update 3 should be used when deploying
dtSearchNetApi4.dll. These installers are available here:
https://www.microsoft.com/en-us/download/details.aspx?id=53840.
Fixes and minor enhancements
- dtSearch Desktop indexer now reports image-only PDF files in the index update log. (Metadata in these files
is still indexed. This change only affects reporting in the index update log.)
- Improved 32-bit indexing performance in low-memory conditions.
- Fixed indexing crash in mso20win32client.dll in the latest update to Office 2016 when indexing Outlook
messages.
- Added API to change the score of a document in the dtsnSearchFound notification (C++) or
ISearchStatusHandler.OnFound callback (.NET). In the C++ API, return dtsChangedItemScore from the
dtsnSearchFound notification and set dtsSearchResultsItem.score to the new value. In the .NET 4.0 API, in
ISearchStatusHandler.OnFound, set SearchResultsItem.Score to the new value and set
SearchResultsItem.ChangedItemScore to true.
- File parser bug fixes affecting: .pdf, .rtf, .emf, .doc, .xls, .ppt, .pst
- Fixed bug in build 7.84.8404 causing the ' (apostrophe) character to appear incorrectly in conversion output
in Internet Explorer 8.
- Other bug fixes.
dtSearch 7.83 (Build 8353) April 14, 2016
dtSearch Engine
- Compatibility note: dtSearchNetApi4.dll is now built with Visual Studio 2015 Update 1, so the CRT and MFC
redistributables installers (VC_redist*.exe) for Visual Studio 2015 should be used when deploying
dtSearchNetApi4.dll. These installers are available here:
https://www.microsoft.com/en-us/download/details.aspx?id=48145.
Fixes and minor enhancements
- Improved formatting of WordPerfect documents.
- Fixed file parsing bug causing extra line breaks to appear between paragraphs in .msg files.
- File parser bug fixes affecting: .pdf, .msg, .xlsx, .jtd, .one
- Other bug fixes.
dtSearch 7.82 (Build 8339) Released January 25, 2016
Fixes and minor enhancements
- All executables are code signed using SHA-2 in addition to SHA-1 (dual signed). All MSI files are signed
using SHA-2 only, because MSI files do not support dual signing. Please see this Microsoft article
(http://social.technet.microsoft.com/wiki/contents/articles/32288.windows-enforcement-of-authenticode-code-signing-and-timestamping.aspx)
Microsoft article for more information on SHA-1 deprecation.
- Added option in dtSearch Desktop's Edit > Copy File function to automatically shorten excessively long
filenames.
- Added new C# sample, AjaxWordListBuilder, demonstrating how to use the dtSearch Engine's WordListBuilder
object on a web page.
- Added ixStepCommittingUpdate and ixStepRemovingDeletedFiles to the IndexingStep enumeration to separately
identify these steps during an index update.
- Added file parser support for a OneNote file format variant created by certain Microsoft online services.
- Added experimental option in dtSearch Desktop to use the standard dtSearch Desktop "Next Hit" toolbar button
to navigate hits in PDF files displayed in Adobe Reader (otherwise only the Ctrl+Shift+Space hotkey can be
used). This option is in Options > Preferences > PDF View Options.
- File parser bug fixes affecting: .pdf, .xlsx, .xlsb
- Other bug fixes.
dtSearch 7.81 (Build 8281) October 23, 2015
Fixes and minor enhancements
- File parser bug fixes affecting: .doc, .pdf, .rar, .docx, .msg, .one, .pages, .qpw, .ppt
- Fixed bug preventing "view as report" in dtSearch Desktop from working with PDF files opened in Adobe Reader
- Fixed error reporting bug causing "Unable to access index %2" error message (without the index path) in
dtSearch Desktop when an index could not be accessed to search.
- Tested and compatible with Windows 10.
- Added support for highlighting hits in Adobe Reader DC. An updated version of the dtSearch PDF Search
Highlighter plug-in is also needed for Adobe Reader DC.
- Other bug fixes.
dtSearch 7.80 (Build 8253) May 27, 2015
All products
- Added support for indexing PDF files with 128-bit RC4, 128-bit AES, and 256-bit AES encryption, as long as
the file does not require a password to open and does not have the "copy text" permission disabled.
Developer note: This is implemented in a new component, dtv_pdfCrypto.dll, that is subject to export
restrictions. Please see the license.htm file accompanying this version for more information. For details on
the effect of encryption and permission settings on indexing, searching, and text extraction, please see http://support.dtsearch.com/faq/dts0161.htm.
Fixes and minor enhancements
- Fixed dtSearch Desktop bug causing some PDF files to be opened in a separate Adobe Reader window when file
is located on a network share.
- Fixed extra "PBrush" and "Adobe Photoshop Image" captions in some Word documents with embedded images.
- In the Linux version, the dtSearch Engine library (.so) files are installed in the dtsearch/bin and
dtsearch/bin64 folders instead of the lib and lib64 folders.
- Added RAR file parser (dtv_rar.so) to the Linux version of the dtSearch Engine
- Fixed incorrect parsing of some .docx, .xlsx, and .pptx documents when document has missing or incorrect
filename extension.
- Other file parser bug fixes affecting: .mdb, .pdf, WordPerfect 4.2, WordStar, KeyNote, .tar
- Other bug fixes.
dtSearch 7.79 (Build 8233) March 9, 2015
All products
- Added support for indexing Apple iWork 2009 Pages, Numbers, and Keynote files
Fixes and minor enhancements
- Fixed bug affecting cancellation of file conversion after either expiration of FileConverter.TimeoutSeconds
or when OutputStringMaxSize exceeded when processing large binary input files with the
dtsConvertInlineContainer flag.
- File parser bug fixes affecting: *.xlsx, .pdf, .doc, .msg, .rtf, .wps
- Added FileConverter.SetIndexCache() API to an IndexCache to be used with file conversion.
- Other bug fixes.
dtSearch 7.78 (Build 8215) Released October 29, 2014
Fixes and minor enhancements
- Fixed incorrectly rounded display of numeric value in Excel with conditional formatting
- Fixed bug affecting cancellation of file conversion after either expiration of FileConverter.TimeoutSeconds
or when OutputStringMaxSize exceeded when processing large binary input files with the
dtsConvertInlineContainer flag.
- Fixed dtSearch Web search form bug causing "undefined" to appear in Filename field in Internet Explorer 8
- Added IndexInfo.TotalDataSize to COM and .NET APIs
- File parser bug fixes affecting: .docx (incorrect display of paragraph style; error handling non-breaking
hyphens), .zip (hang indexing file deleted by antivirus software during indexing), .html
- Other bug fixes.
dtSearch 7.77 (Build 8205) August 23, 2014
Fixes and minor enhancements
- Fixed security issue reported in a third-party component, imgman32.dll, used in the dtimage.exe utility. See
http://support.dtsearch.com/faq/dts0235.htm for
more information.
- Added support for indexing Outlook 2013 and 2010 OST files. Note: Microsoft has not officially documented
the OST file format specification, so this support is based on unofficial non-Microsoft information about
the OST file format.
- Added support for indexing metadata in Adobe Photoshop images
- Fixed "~dtpdf.tmp" filenames appeared in tabs in dtSearch Publish
- Fixed PDF hit highlighting error in dtSearch Publish on systems with Adobe Reader versions 7-9 and Internet
Explorer 10
- Added "Images" field at the end of MIME messages listing names of inline image files.
- Fixed incorrect handling of filename-only indexing option causing "Unsupported file format" errors
- In the API, the flag dtsIndexCreateVersion6 is now ignored, so indexes will always be created in the current
index format.
- Fixed high-DPI scaling error in dtSearch Desktop causing checkbox lists to be drawn incorrectly
- Fix bug causing filename-only indexing option to instead report all files as inaccessible.
- File parser bug fixes affecting: .xls, .doc, .msg
- Other bug fixes.
dtSearch 7.76 (Build 8193) May 13, 2014
dtSearch Engine
- Added Options.TempFileDir to control location of temporary files created during file parsing.
- Added dtsOptions2, dtssSetOptions2, and dtssGetOptions2 in the C++ API to replace dtsOptions,
dtssSetOptions, and dtssGetOptions. dtsOptions remains supported for backward compatiblity. dtsOptions2
replaces the fixed-length buffers used in dtsOptions with string pointers to eliminate length restrictions
on string values in option settings.
- Other bug fixes.
Fixes and minor enhancements
- Fixed bug causing incorrect XML conversion output from conversion of Word document with Ole10Native stream
to it_ContentAsXml format
- File parser bug fixes affecting: .xls, .doc, .rar, .pst
- Fixed dtSearch Desktop indexer bug in "Update Multiple Indexes" dialog box causing case/accent sensitivity
to be transferred between indexes when the "Clear index before adding documents" box was checked.
- Other bug fixes.
dtSearch 7.75 (Build 8178) March 12, 2014
Fixes and minor enhancements
- Updated RAR file parser to support RAR 5
- Reduced stack use when indexing very deeply-nested containers
- dtSearch Web: Fixed bug in Build Search Form when generating a search form containing a custom field name
- dtSearch Web: Fixed bug affecting highlighting of the selected hit when clicking Next Hit in Internet
Explorer 8
- Some Microsoft Photo Editor 3.0 objects embedded in Office documents were not recognized as image data
- File parser bug fixes affecting: .mdb (Access 2003), .msg embedded in .rtf, .xls, .xlsx, .xlsb, .pdf
- Other bug fixes.
dtSearch 7.74 Build 8166 Released December 30, 2013
All products
- Added support for indexing iCalendar (*.ics) files
dtSearch Desktop/Network
dtSearch Web/Publish
- Updated search form templates for dtSearch Web and dtSearch Publish. A new drop-down list in the "Build
Search Form" dialog box lets you pick the template to use. The updated templates include frameless options
and new HTML5 elements such as the calendar control for date searching. Some examples of the new search forms are posted
here.
Fixes and minor enhancements
- dtSearch Desktop: fixed keyboard navigation problem affecting PgUp, PgDn, and the cursor keys.
- dtSearch Web: fixed bug causing "Document information could not be retrieved from the index" error message
when trying to open some documents after a search.
- API enhancement: Options.UserThesaurusFile can be set to an XML string containing the user thesaurus,
instead of the name of a file containing the thesaurus data.
- dtSearch Web/Publish: Fixed bug affecting multiple selection (using Ctrl+Click) of indexes to search on
search forms
- dtSearch Web Setup 64-bit: Fixed error launching help
- File parser bug fixes affecting .msg, .doc, .docx, .xlsx, .emf, .pdf
- Other bug fixes.
dtSearch 7.73 Build 8140 Released November 3, 2013
- Fixed dtSearch Desktop bug affecting keyboard navigation in retrieved documents.
- Fixed dtSearch Web/Publish bugs affecting hit navigation in some browser versions.
- Fixed dtSearch Web/Publish PDF highlighting browser compatibility issue affecting Internet Explorer 11.
dtSearch 7.73 Build 8139 Released October 7, 2013
- Fixed browser compatibility bug in dtSearch Desktop 7.73.8138 only causing problems with the "Next Hit"
button on some systems.
dtSearch 7.73 Build 8138 Released September 30, 2013
All products
- Support for some older Windows versions is discontinued in dtSearch 7.73. Supported: Windows Server 2012,
Windows Server 2008, Windows Server 2003 SP 2, Windows 8, Windows 7, Windows Vista, Windows XP SP 3. Not
supported: Windows 2000, Windows ME, Windows 98, Windows 95, and Windows XP versions without SP 3.
This
change is a result of our transition to Visual Studio 2012, using the v110_xp platform toolset, to build all
Windows products. The older .NET API wrappers (dtSearchNetApi2.dll and dtSearchNetApi3.dll) are still built
using older compilers for compatibility.
dtSearch Web/Publish
- Note: Updated search form templates for dtSearch Web and dtSearch Publish, previously included in dtSearch
7.73 beta builds, have been deferred to version 7.74 to allow time for additional browser compatibility
testing.
dtSearch Engine
- Added dtsConvertUseStyles
flag in the ConvertFlags enumeration, to provide a way to use CSS styles to format content.
- Added FileConverter.DocTypeTag, to provide a way to specify a DocType in HTML output.
- dtSearchNetApi4.dll is now built using Visual Studio 2012 Update 3.
dtSearch Desktop
- Added DocStyles.css in the dtSearch templates folder to control the formatting of property tables and
headings in retrieved files.
Fixes and minor enhancements
- FileConverter API - fixed missing line breaks when converting from HTML to .txt
- FileConverter API - fixed missing <Fields>...</Fields> tags around HTML metadata when converting
from HTML to it_ContentAsXml
- Java API - faster garbage collection of strings passed through IIndexStatusHandler API reduces memory use
during indexing
- Java API - added Options.storedFieldDelimiterChar
- Reduced memory use when indexing .msg files with very large numbers of recipients or attachments
- SearchReportJob API - fixed slow detection of timeout
- dtSearch Desktop Indexer - fixed "not responding" message when indexing some very large documents
- File parser bug fixes affecting .ppt, .pdf, .xls, .xlsx, .one, .xps, .msg, DocFile property sets
- Other bug fixes.
dtSearch 7.72 Build 8106 Released May 5, 2013
All products
- Support for indexing OneNote 2010 and OneNote 2007 *.one files. Display and extraction of images and
embedded documents are supported for both formats. Ink is displayed as OCR output (text) only.
dtSearch Engine
- Added FileConverter.InputStream in the .NET API to allow the input document to be passed as a .NET Stream
object
Fixes and minor enhancements
- Added dtsoFfSkipEmailProperties flag in Options.FieldFlags to suppress display of email properties such as
sender, subject, etc.
- Fixed XML structure errors in XML generated for it_ContentAsXml output by FileConverter
- File parser bug fixes affecting .msg, .docx, .xfa, .pdf, .rtf, .xlsb, .xlsx, .ppt
- Added detection of Windows PE and NE executables and Linux ELF executables (these formats are still indexed
according to the binary files setting, with content either filtered or skipped)
- Fixed bug causing use of dtsExoDoNotConvertAttachments in FileConverter.ExtractionOptions to generate an
incorrect "File Encrypted" error for some documents during file conversion (not indexing or searching).
- Fixed bug causing email headers to be indexed even if dtsoSkipEmailHeaders flag is set if filetype.xml set
up to index message bodies separately from attachments.
- Added support for metadata extraction from HDPhoto images.
- Fixed bug causing extra whitespace to be generated in conversion to plain text (UTF-8) format.
- Mapitool.exe utility uses message delivered date property instead of modified date property to set the file
date for extracted message items.
- dtSearch Developer installer updated May 18, 2013 to add missing MemAllocator.h header file.
- Fixed bug affecting nested proximity searches where a matching set of the terms exists exactly once and are
preceeded (by the exact proximity range) by two or more instances of the proximity terms that do not satisfy
the proximity criteria.
- Other bug fixes.
dtSearch 7.71 Build 8080 Released November 30, 2012
dtSearch Engine
- New support for highlighting hits using different colors for each search term when highlighting hits using
FileConverter. For API documentation on this feature, please see the article "Highlighting each term using
different attributes" in dtSearchApiRef.chm. Multicolor hit highlighting is not supported for PDF files
displayed in Adobe Reader.
Fixes and minor enhancements
- Added "Find Indexes" button in dtSearch Desktop's Index Library Manager to locate all indexes in a folder
tree.
- Fixed CSV file parser bug that caused "duplicate field id" error during index merge.
- In dtSearch Web Setup, improved detection and reporting of IIS configuration problems such as missing IIS
components
- File parser bug fixes affecting MSG, WordPerfect 4.2, PPTX, XBase, PPT, XLSB
- In Language Analyzer API, dtsLaJob.indexRetrievedFrom and dtsLaJob.alphabetLocation were not set during
searches
- PDF hit highlighting support added for Adobe Reader XI
- Other bug fixes.
dtSearch 7.70 Build 8063 Released September 28, 2012
dtSearch Desktop
- New support for displaying images embedded in Office documents (DOC, DOCX, PPT, PPTX, XLS, XLSX, RTF, EML).
To enable display of images in dtSearch Desktop, click Options > Preferences > Document display, and
check the box to "Display images in documents".
- Added new options in dtSearch Desktop to (1) hide MIME headers in emails, (2) show properties of images
embedded in documents, and (3) control whether paths are indexed along with filenames when the "Index
filenames as text" options is enabled. These options are in the Options > Preferences > Indexing
Options dialog box.
dtSearch Engine
- Embedded attachments, objects and images in documents can be extracted using dtsExtractionOptions (C++) or
ExtractionOptions (Java and .NET), which specify output locations and rules for filename generation.
Currently the following are supported:
Attachments in EML, MSG, DBX, TNEF (winmail.dat), PDF, MDB and
ACCDB (Access);
objects in DOC, DOCX, XLS, XLSX, PPT, PPTX, RTF;
images in DOC, DOCX, PPT, PPTX,
XLS, XLSX, RTF, EML, MDB and ACCDB (Access).
- New single-document option for indexing Access (*.mdb, *.accdb), XBase (*.dbf), and Comma-separated values
(*.csv) files.
By default, dtSearch indexes each record of database files (*.mdb, *.accdb, *.csv, *.dbf)
as a separate document. This new option provides a way to index all records in a database file as a single
document. For more information, see dtSearchApiRef.chm (Overviews > Databases and Fields > Database
files (*.mdb, *.dbf, *.csv))
- Added dtsoFfShowImageProperties flag in Options.FieldFlags to display image properties (such as EXIF data)
for images embedded in documents. Image properties are always indexed for images in seperate files. This
flag only affects images embedded in documents, such as a .jpg embedded in a Word file. A related change,
made for consistency, affects the hanlding of image files embedded in .eml email files. Previously these
properties were always extracted. Now they will only be extracted with the dtsoFfShowImageProperties flag is
set, so .eml files will be handled consistently with other file formats.
Fixes and minor enhancements
- Eliminated use of FILE_FLAG_RANDOM_ACCESS, which could cause excessive memory consumption under Windows
Server 2008 because of what appears to be a bug in Windows caching behavior
http://support.microsoft.com/kb/2549369 for more information).
- Zlib version updated to 1.2.7
- dtSearch.Spider2.dll and dtSearch.Spider4.dll have new dependencies on zlib DLLs
zlib_wapidll_{VC8/VC10}_{32/64}.dll to handle gzipped sitemap.xml files.
- Added file parsers for Ichitaro word processor versions 5 and later.
- File parser bug fixes affecting MSG, PDF, DOCX, PPTX, Excel 2, RTF
- Message attachments to MIME emails are now indexed as attachments (so they can be handled consistently with
other attachments in the new attachment-related features described above) rather than being merged with the
text of the message.
- Added reporting of PDF files that do not contain any page text. In dtSearch Desktop, these will appear in
the index log with "Image Only" after the type name (click View Log in the Update Index dialog box to
see the log of indexed files). In the API, the flag fiImageOnly will be set in IndexFileInfo (.NET, Java) or
dtsIndexProgressInfo.fileInfoFlags (C++) during indexing.
- Removed extra path information from headers in containers converted to text using FileConverter.exe or
FileConvertJob with the dtsConvertInlineContainer flag
- Removed "Document Properties" caption from Word, PowerPoint, and Excel 2003 file properties. For
applications that require this flag for backward compatibility, use the new flag
dtsoFfIncludeDocumentPropertiesCaption in Options.FieldFlags
- Added new values to SearchReportJob.Header and SearchReportTemplate.rtf: %%Ordinal%%, %%DocId%%, %%Type%%
- Added new dtsConvertIncludeBOM flag to FileConverter.Flags to add UTF-8 BOM to UTF-8 output
- FileConverter with dtsConvertJustDetectType produces more specific type ids for image, music, and video
files instead of it_Media
- Fixed search/highlighting error affecting the pre/N and w/N operators
- Added new dtsnIndexFolderInaccessible callback notification in IndexJob, logging in indexlog.dat, and
logging in HTML index log of inaccessible folders during indexing
- Fixed incorrect time zone adjustment of PDF built-in creation and modification date fields
- Fixed too-long filenames generated for items extracted from PST files (names could be too long for some file
systems when copied using Edit > Copy File in dtSearch Desktop)
- Other bug fixes.
dtSearch 7.68 (Build 8025) December 5, 2011
All products
- Outlook and MIME email files have a new, simplified header that will be consistent between the two formats
with these fields: From, SentVia (for emails sent using a mailbox other than the "From" address), To, CC,
BCC, Subject, Date.
The footer for these formats will include these additional fields for
backward-compatibility: Sender (combines From and SentVia); Recipient (combines To, CC, and BCC); SentDate
(msg files only); DeliveredDate (msg files only).
The Sender and Recipients fields provide a way to
search across all senders or all recipients of a message. These fields will also allow searches using the
field names used in older versions, such as "Sender contains example", to continue to work.
Documents
indexed with dtSearch versions 7.67 and earlier will be displayed with the old header so the change will not
cause hit highlighting problems with previously-indexed emails.
dtSearch Engine
- Includes new (August 9, 2011) security updates from Microsoft Bulletin MS11-025. For more information on
this Microsoft security update, including updated dependencies for dtSearch Engine developer components, please see this article.
- Added dtsoFfIndexArchiveFileLists flag. This option adds a searchable file named ArchiveFileList.html to ZIP
and RAR archives during indexing. The original file is not modified but the ArchiveFileList.html file is
searchable as if it were part of the ZIP or RAR file. The file consists of a list of the names of the files
inside the archive.
Fixes and minor enhancements
- File parser bug fixes affecting XLSX (string values computed by formulas), PST (crash indexing damaged PST
archive), MSG, DOC (list style formatting, table of contents formatting), XyWrite
- Updated Options > Preferences dialog box in dtSearch Desktop and fixed accessibility issues affecting
dtSearch Desktop Search and Indexer.
- Other bug fixes.
dtSearch 7.67 (Build 7973) June 1, 2011
All products
- Includes security updates from Microsoft Bulletin MS11-025. For more information on this Microsoft security
update, including updated dependencies for dtSearch Engine developer components, please see this article.
- Added support for indexing PST files directly. Because Outlook locks the PST file that is currently in use,
this will not work with the PST file that you are actively using in Outlook, and is primarily for use in
situations where archived or forensically-obtained PST files are being searched.
- Added plug-in for Adobe Reader X to enable hit highlighting in PDF files retrieved after a search. For more
information please see:
dtSearch PDF Highlighter Plug-in
Documentation (PDF)
dtSearch PDF Highlighter Plug-in (installer for the
plug-in)
Highlighting
hits in PDF files (API documentation)
dtSearch Desktop
- Added support for highlighting hits in PDF files in Adobe Reader X and Adobe Acrobat X.
- As a result of the Microsoft MFC
MS11-025 update, dtSearch Desktop 7.67 requires Windows XP or or newer and will not run under
Windows 2000.
dtSearch Engine
- .NET API: Added DataSource.DocStream to allow a document to be passed through the DataSource API as a stream
- .NET API: Added sample code demonstrating indexing of Azure blob data (examples\cs4\AzureBlobDemo)
Fixes and minor enhancements
- Fixed hit navigation error in the 64-bit version of dtSearch Desktop
- Fixed file parser bug causing extra spaces to appear in XPS documents
- Fixed bug causing incorrect hit highlighting in documents indexed using the binary filtering algorithm
- Added detection of MIDI files
- File parser bug fixes affecting PPTX, XyWrite, tag font size in XML output, AVI file detection, WordPerfect
(text using WP character set 12), WordPerfect 5 (soft hyphens), WordPerfect 4.2 (error in automatic file
type detection), ZIP (filename filters were not applied to exclude ZIP contents in an unindexed search),
Word for DOS (incorrect parsing of a character code)
- Other bug fixes.
dtSearch 7.66 (Build 7936) January 25, 2011
dtSearch Engine
- Added .NET 4.0 versions of the .NET API (dtSearchNetApi4.dll, dtSearch.Spider4.dll) and sample code for C#
.NET 4.0 and VB.NET 4.0
- Added dtsSearchFastSearchFilterOnly search flag to enable much faster, optimized generation of a
SearchFilter from a search when no other output is required from the search.
- Added WordListBuilder.GetLastError to the C++, Java, and .NET APIs to provide better reporting of errors
resulting from WordListBuilder calls.
- Added new flag to enable caching of field values in WordListBuilder to make ListFieldValues calls faster.
The flag is dtsWordListEnableFieldValuesCache (in the WordListBuilderFlags enumeration) and is passed to
WordListBuilder using the new SetFlags method.
- Added new .NET method Server.SetEnginePath to allow ASP.NET application deployment without administrative
access
- Added new .NET sample application, AzureDemo, demonstrating use of the dtSearch Engine in an Azure instance.
For documentation explaining how to deploy in Azure, see:
Overviews > Installing the dtSearch Engine
> Deployment steps: Azure 64-bit (in dtSearchApiRef.chm).
- Added a way to disable file parsers using the file type table (filetype.xml) by setting the TypeId to the id
of the parser to disable and the Flags value to 2.
- Added a mechanism for a dtsInputStream to simulate an I/O error by returning a negative value from read() of
less than 10,000. When this occurs, dtSearch will interpret it as an I/O error and halt processing of the
current input file immediately, reporting an I/O error through the API.
All Products
Fixes and minor enhancements
- In dtSearch Desktop, added SizeK, IndexRetrievedFrom, SearchDate, ReportDate variables to
SearchReportTemplate.rtf and SearchListTemplate.rtf
- Java and .NET API: Fixed IIndexStatusHandler bug causing PercentDone to remain zero during compression of an
index
- Added docId of document being removed from an index to IndexFileInfo reporting through IIndexStatusHandler
- Fixed FileConverter bug that caused invalid XML to be generated from some conversions due to output of
character code 128.
- Added SearchJob.UnindexedSearchFlags in the .NET API and SearchJob.setUnindexedSearchFlags in the Java API
to enable case and accent-sensitive unindexed searches in these APIs
- Added .NET SearchFilter.GetItems() to provide access to an array of the doc ids selected in a SearchFilter
- File parser bug fixes affecting Office XML drawings embedded in Word, PowerPoint, and Excel files;
interpretation of OEM character codes (_x00NN_) in Excel 2007 files; dates prior to 1970 in MDB files;
performance and memory use parsing MIME files; Word auto-numbering; PDF
- Other bug fixes.
dtSearch 7.65 (Build 7907) September 14, 2010
Fixes and minor enhancements
- Fixed bug in dtSearch Desktop indexer causing it to forget index caching setting if the "Clear index before
adding documents" box is checked in the Update Index dialog box
- Fixed file parser bug affecting indexing of QuickBooks backup (*.qbb) files
dtSearch 7.65 (Build 7906) August 20, 2010
dtSearch Engine
- Added dtsoFfSkipEmailHeaders flag for Options.FieldFlags to suppress searching and display of headers in
MIME and Outlook messages
Fixes and minor enhancements
- Reduced memory requirements for parsing very large XLS files
- Fixed bug that allowed XML output from saved search results and XML generated by conversion to
it_ContentAsXml to contain the colon (":") character in tag names, which caused the generated XML to fail
validation.
- Fixed PDF hit highlighting error affecting documents using ActualText parameter.
- Automatic date recognition has been changed to limit the scope of automatically recognized entities so they
will not cross a field boundary.
- Fixed error in HTML conversion causing some output to fail to word-wrap when displayed in a browser.
- Fixed memory leak in searches that use the dtsSearchLanguageAnalyzerSynonyms flag.
- ZIP file parser applies default encoding in filetype.xml when interpreting ambiguous ZIP filenames, and
applies automatic encoding detection if no default encoding is specified.
- File parser bug fixes affecting HTML, XLS, DOC (paragraph numbering error), DOCX, PPT.
- Reduced memory use when merging very large indexes
- Fixed PDF hit highlighting errors in certain types of corrupt PDF files
- Other bug fixes.
dtSearch 7.64 (Build 7876) March 15, 2010
dtSearch Engine
- Added dtsSearchLanguageAnalyzerSynonyms flag to enable using a language analyzer to generate morphological
variations on a search term at search time. When this flag is set, the language analyzer is called for each
word or phrase in the search request. The flag dtsLaInputIsSearchTerm is passed to the language analyzer in
dtsLaJob.flags, so the language analyzer knows why it is being called.
- Added dtssGetWordBreaker API function to provide direct access to the dtSearch Engine's internal word
breaker using the language analyzer API. For sample code demonstrating how to use this API, see the
WordBreak example in examples\vc8\WordBreak.
- Added more structural information to the output generated by conversion to the it_ContentAsXml file format.
- Added to COM interface: WordListBuilder.ListFieldValues, WordListBuilder.SetFilter, and
IndexJob.EnumerableFields.
- Added dtsListIndexSkipNoiseWords flag for ListIndexJob to list words in an index without including any noise
words.
- Added dtsoFfSkipDataSourceFields flag for Options.FieldFlags to prevent DocFields values from appearing in
FileConverter output
Fixes and minor enhancements
- Fixed incorrect display of CreationDate and ModDate properties in PDF files
- Fixed incorrect hit highlighting when Unicode Filtering options at search time different from options used
to index a file. To ensure consistent options, Unicode Filtering options are stored in the index when the
index is created, in the index_a.ix file.
- Fixed error updating index when directory specified for temporary files is inaccessible.
- Fixed index merge bug causing "Inconsistent doc ids from target index" error during merge.
- Fixed two search report bugs causing incorrect hit highlighting.
- Improved formatting of documents converted from Ami Pro and Quattro Pro to HTML
- Added automatic detection of gb2312 and JIS encoding.
- Added automatic detection of XyWrite, XBase, WordStar 3.x, and WordPerfect 4.2 and TAR files.
- Improved reporting of file types by FileConverter.DetectedTypeId, providing much more specific information
about Microsoft Word versions and adding type detection for additional file formats
- Added support for text extraction from Adobe Framemaker MIF, XFA form templates in PDF files, and Visio XML
files
- Fixed "Excessive nesting" error indexing OpenOffice document due to bug parsing table structure
- Fixed RTF file parser bug affecting handling of the \upr tag
- Other file parser bug fixes affecting Multimate, Lotus 1-2-3, PDF, Word, PowerPoint
- Other bug fixes.
dtSearch 7.63 (Build 7836) October 29, 2009
Fixes
- Fixed problem running dtSearch.exe on some systems after installing the kb973923 patch from Windows Update.
- Fixed missing checkboxes in dtWebSetup64.exe under Windows Server 2008
Compatibility notes for developers working with the .NET 2.0 API only
- In dtSearch 7.63, the DLL dependencies for dtSearchNetApi2.dll have changed due to the release of the Visual
Studio .NET 2005 Service Pack 1 Security Update for ATL. Because dtSearchNetApi2.dll is built
with the updated version of Visual Studio .NET 2005, it requires the updated MFC and CRT DLLs that are
included with that version.
This issue does not affect any other dtSearch Engine API.
- This Microsoft redistributable program will install the required components:
Microsoft Visual C++ 2005
Service Pack 1 Redistributable Package ATL Security Update (July 28, 2009)
http://www.microsoft.com/downloads/details.aspx?familyid=766A6AF7-EC73-40FF-B072-9112BAB119C2&displaylang=en
dtSearch 7.63 (Build 7835) October 12, 2009
dtSearch Engine
- Added IndexFileInfo.UserFields in .NET API to provide access to stored fields through the
IIndexStatusHandler callback interface during indexing.
- Added dtsnIndexDeletedFileRemoved, dtsnIndexListedFileRemoved, and dtsnIndexListedFileNotRemoved
notifications to the indexing status callbacks to notify the calling application when files are removed from
the index during indexing or when an attempt to remove a listed file fails.
- Compatibility note for developers working with the .NET 2.0 API only: The DLL dependencies for
dtSearchNetApi2.dll have changed due to the release of the Visual Studio .NET 2005 Service Pack 1 Security
Update for ATL. Because dtSearchNetApi2.dll is built with the updated version of Visual Studio
.NET 2005, it requires the updated MFC and CRT DLLs that are included with that version. This
Microsoft redistributable program will install the required components:
Microsoft Visual C++ 2005
Service Pack 1 Redistributable Package ATL Security Update (July 28, 2009)
http://www.microsoft.com/downloads/details.aspx?familyid=766A6AF7-EC73-40FF-B072-9112BAB119C2&displaylang=en
This issue does not affect any other dtSearch Engine API.
Fixes and minor enhancements
- Fixed bug in search report generation causing text in adjacent table cells to be run together in output
- dtSearchw.exe: Fixed "Invalid character" error in dtSearch Desktop opening document
- dtSearchw.exe: Fixed bug affecting use of drag-and-drop to re-order columns in search results, causing the
wrong column order to result
- Fixed memory leak (in version 7.62 only) when using regular expressions in File Segmentation Rules to split
documents or in Text Fields definitions.
- File parser bug fixes affecting: MS Word 2007, PDF, ZIP, OpenOffice.
- dtSearchw.exe: Added option to suppress automatic correction of hit highlighting when a document was indexed
with a different version or the document was modified since it was last indexed (in Options > Preferences
> Document Display).
- Fixed missing filename associations (*.ilb, *.dtSearch) in 7.62 setup program.
- Fixed truncation of very long search report when generated using dtsReportWholeFile from cached text.
- Other bug fixes.
dtSearch 7.62 (Build 7804) July 20, 2009
All products
- Regular expression searching extended to support TR1 regular expressions
dtSearch Engine
- Java API: Added IIndexStatusHandler to Java API for monitoring of IndexJobs
- Java API: Added IndexInfo object for more efficient retrieval of index properties from an index
- Java API: Added SearchFilter.SelectItems() with array of doc ids
- .NET API: Added SearchFilter.SelectItems() with array of doc ids
- Java API: Added SearchJob.WantResultsAsFilter
- FieldFlags: Added dtsoFfHtmlSkipImgAlt and dtsoFfHtmlSkipInputValues
- Language Analyzer API: Added dtsLaBlockWasSkipped to LanguageAnalyzerWordFlags, providing a way for a
language analyzer to request that the internal dtSearch word breaker handle a block of text from the input.
- C++ API: Added userFields to dtsIndexProgressInfo, providing a way to access stored fields from a document
as it is indexed
- Added dtsConvertAutoUpdateSearch flag to ensure consistent hit highlighting when a document was modified
since it was indexed or was indexed by an older version of dtSearch than is used to search it.
Fixes and minor enhancements
- dten600.exe: Faster generation of search results with dtsSearchWantHitDetails enabled
- dtSearchw.exe: Fixed error causing "Enter Serial Number" dialog box to appear under Vista due to UAC problem
- dten600.dll: Fixed error causing corrupt index with message referencing !zd and int vector error when
automatic recognition of dates is enabled, hyphen processing is set to dtsoHyphenAll, and the last word of a
document ends with a hyphen.
- Added new cmap files for PDF text extraction.
- Improved speed for "not (something)" and pure xfilter searches.
- Reduced memory use for searches that retrieve large numbers of documents with a relatively small
MaxFilesToRetrieve value.
- dtSearchw.exe: Fixed RTF output generated from Search Report that would not open in WordPad
- dtSearchw.exe: Fixed hit navigation error caused when text with highlight markings was pasted into a Word
2007 document from dtSearch search results, and then the resulting document indexed and searched again.
- dten600.exe: Several improvements to the automatic detection of MIME-encoded files
- Other file parser bug fixes affecting: MS Word, PDF, MSG, ZIP, SWF, RTF, WordPerfect
- Other bug fixes.
dtSearch 7.61 Build 7769 Released April 2, 2009
All products
- New file parser added for RAR (*.rar) archives.
dtSearch Desktop/Network
- Added "Search within these results"
- Added improved zoom-in/zoom-out for document windows. To use, hold down the Ctrl key and roll the mouse
wheel forward or backward, or press Ctrl+PLUS to zoom in and Ctrl+MINUS to zoom out.
dtSearch Engine
- Added it_ContentAsXml output format for FileConverter. This format organizes document content, metadata, and
attachments into a standard XML format for easier automated processing. It does not currently support hit
highlighting and is designed for automated content extraction only.
Fixes and minor enhancements
- dten600.dll: Added workaround for invalid records created in PowerPoint files when the same file is edited
by PowerPoint 2003 and PowerPoint 2007
- dten600.dll: Fixed missed hard page breaks in Word 2007 files
- dten600.dll: Other file parser bug fixes: PDF, MIME, QPW, SWF
- dtSearch.exe: Fixed error restoring window position on multi-monitor systems
- dten600.dll: Added diagnostic information to history.ix (records error messages generated during index
updates, and logs index directory contents)
- dten600.dll: Fixed error handling encoding of file when filetype.xml specifies a format inconsistent with
automatically-detected UTF-8 encoding
- dten600.dll: Fixed bug causing "Inconsistent doc ids from target index" error merging indexes with the
dtsIndexKeepExistingDocIds flag set.
- dten600.dll: Fixed bug affecting text fields extraction in very long text files with no line breaks and text
fields defined to end at end-of-line using the $$$ mark
- dtsearchw.exe: Fixed bug in Edit > Copy File causing last access time not to be transferred when source
file was read-only
- dtSearchNetApi2.dll: Fixed bug affecting processing of the " character in FileConverter.InputFields when
highlighting hits
- Other bug fixes.
dtSearch 7.60 (Build 7739) February 1, 2009
dtSearch Engine
- Added Visual C++ 2008 sample applications
- Added IndexJob.EnumerableFields and WordListBuilder.ListFieldValues to provide a quick way to list all
values of a field
- Added WordListBuilder.SetSearchFilter to limit output to documents specified by a SearchFilter
dtSearch Publish
- Added CopyFileExtensions option providing a way to designate filename extensions for files to automatically
copy from the CD documents folder when clicked
dtSearch Desktop/Network
- Added new user interface appearance options and updated toolbar icons
Fixes and minor enhancements
- lbvprot.dll: Fixed slow PDF file opening with Adobe Reader 9
- dten600.dll: Added metadata extraction for M4A files
- dten600.dll: Fixed FileConverter bug causing extra copies of comments to be generated in HTML files
containing an <?xml tag
- dten600.dll: Fixed extra </p> tags in table cells when .DOCX files converted to HTML
- dten600.dll: Fixed incorrect in XML FileConverter output when input contains tabs.
- dten600.dll: Fixed error causing FileConverter.HtmlHead tags to be omitted when hit-highlighting text files
indexed using the Unicode Filtering algorithm
- dten600.dll: Fixed FileConverter bug causing DetectedTypeId to always return it_DocFile for Microsoft Office
files, instead of the correct type of Office file (it_MicrosoftWord, it_PowerPoint, etc.)
- dten600.dll: Other file parser bug fixes (PDF, EML, DOC, DOCX, QPW)
- dtv_ifilter.dll: Added workarounds to handle incorrect return values from some IFilters
dtSearch 7.55 (Build 7700) October 11, 2008
Fixes and minor enhancements
- dten600.dll: Fixed Microsoft Access parser bug causing some MDB files to be indexed very slowly
- dten600.dll: Fixed PDF file parser word breaking error affecting PDF files created using MacroMedia
FlashPaper
- dten600.dll: Fixed error in MS Word 6.0 file parser affecting footnote extraction
- dten600.dll: Other file parser bug fixes (XML, MIME, Outlook MSG, Access)
- dtSearch.exe: Fixed formatting error and added diagnostic information to error message when file could not
be opened
- dtSearch.exe: Added support in Edit > Copy File for copying container files other than archives
dtSearch 7.54 (Build 7680) September 5, 2008
All products
- New Microsoft Access file parser added with no dependencies on the Microsoft JET Engine or ODBC drivers, so
it works identically under Windows and Linux. This file parser supports Access databases created by
Microsoft Access 95 through Access 2007.
- New file parser added for Flash (*.swf) files.
dtSearch Engine
dtSearch Publish
- Added option to enable links to Office documents (doc, xls, etc.) to launch in Office, outside of the
lbview.exe program. The setting is "ExternalLaunchExtensions" in the lbview.ini file.
- See below for important information about Adobe Reader and Adobe Acrobat 9 compatibility
Adobe Reader and Adobe Acrobat 9 Compatibility
- Adobe Reader and Adobe Acrobat 9 have a new setting that disables hit highlighting in PDF files by default.
To change this setting in Adobe Reader or Adobe Acrobat 9, click Edit > Preferences > Search, and
check the box to "Enable search highlights from external highlight server".
- dtSearch Desktop 7.54 will automatically make this change when it detects that hit highlighting is disabled
(after asking permission). dtSearch Desktop 7.54 also fixes a problem that causes PDF files to open very
slowly in dtSearch with Adobe Reader or Adobe Acrobat 9 installed.
- dtSearch Publish 7.54 has new settings in the lbview.ini file to automatically enable PDF hit highlighting
when it is disabled (after asking permission).
- dtSearch Web cannot change this setting in client applications, so web sites that rely on Adobe Reader to
highlight hits in PDF files should notify users of the need to change this setting to preserve hit
highlighting.
Fixes and minor enhancements
- dten600.dll: Outlook .msg file parser added separate table of bcc and cc recipients at end of message
- dten600.dll: Fixed error generating search reports with the "include whole file option" selected
- dtupdate.exe: Fixed error in dtupdate.exe that caused the option to install automatically after the download
is complete to fail (clicking the Install button after restarting dtupdate.exe will work).
- dten600.dll: Reduced memory use indexing PDF files with very large numbers of XObject references
- dten600.dll: Fixed formatting error causing tables to appear very narrow in RTF files when converted to HTML
- dten600.dll: Reduced memory use indexing very long text files
- dten600.dll: Reduced memory use indexing very long HTML and XML files
- dten600.dll: Fixed slow PDF opening in dtSearch Desktop when Adobe Acrobat or Adobe Reader 9 is installed.
- Java API: Added SearchResults.addDoc() method to add an item to search results
- dten600.dll: Fixed incorrect formatting of some custom percent and date cell number formats in Excel files
- dten600.dll: In search reports, %%Filename%% macro in header was incorrectly replaced with PDF or HTML Title
instead of the filename.
- dten600.dll: Fixed bug causing incorrect hit highlighting in search report generated from PDF file after
unindexed search
- dtupdate.exe: Fixed unnecessary escalation prompt under Vista when checking for new version -- it should
only prompt if there is a new version to install
- dtsfc.cpp: Fixed thread handle leak in DJobBase::startCB
- dtsearchw.exe: Fixed error in serial number validation
- dten600.dll: Fixed error processing of hyphenated words with . character defined as hyphen and "all three"
hyphenation option selected
dtSearch 7.53 (Build 7629) May 19, 2008
Fixes and minor enhancements
- dten600.dll: Fixed error formatting combined date and time cells in Lotus 1-2-3 spreadsheet
- dten600.dll: Reduced memory use indexing XML file with very large CDATA field
- dten600.dll: Reduced memory use indexing very large HTML file
- Added IndexCache object to the Java API
- dten600.dll: Fixed incorrect page number shown in search report generated from PDF file
- Spider: Fixed timeout error crawling very slow web site
- dtWebSetup: Fixed error causing blank list under "Select search form type" in Search Controls tab of Build
Search Form dialog box
- dten600.dll: Fixed error indexing PDF file with more than 32,000 pages
- dten600.dll: Fixed out of memory error caching text from very large text file
- dten600.dll: Added support for indexing attachments in PDF files
- dten600.dll: Fixed incorrect hit highlighting on PDF page containing Unicode hyphen characters when hyphens
indexed as letters
dtSearch 7.52 (Build 7600) April 8, 2008
dtSearch Engine
- 64-bit Java API
- 64-bit version of the dtSearch Engine for Linux
- Option to index .eml files as containers (so the
message body and attachments are each indexed as a separate file).
- Sample ASP.NET application implementing an OpenSearch interface to the dtSearch Engine
Fixes and minor enhancements
- dten600.dll: To improve consistency in the handling of punctuation in field names, unsearchable characters
are now removed from field names in input data, with a few exceptions (:&_+=.) to minimize the effect on
backward compatibility. In previous versions this was done in indivdual file parsers so the effect of
punctuation in field names depended on the format of the input data.
This change will not generally
affect searching because only searchable letters are used when matching field names.
This change may
affect the field names associated with stored
fields, in cases where the field name contains punctuation characters.
- dten600.dll: Fixed two bugs affecting field parsing in Office 2007 documents.
- dten600.dll: Fixed incorrect document count displayed in history.ix when startingDocId set to value other
than 1
- dten600.dll: Fixed XML hit highlighting bug resulting in illegal entity appearing in XML output
- dten600.dll: Fixed bug causing extra "F" character to appear in stored fields
- dten600.dll: Improved handling of low-memory conditions for searches that generate very large search results
sets
- dten600.dll: Fixed error caching text in index when an external language analyzer returns overlapping blocks
of text
- Added: FileConvert.exe and ListIndex.exe command-line utilities utilities
- dtSearch.Spider2.dll: Added LinkTraceFilename option to create a log of the links followed and not followed
during a crawl
- Added dtsLaJob.searchRequestPunct string advising external language analyzers of the search request
punctuation characters to preserve when analyzing a search request
- dtSearchNetApi2.dll: If data source throws an exception, IndexJob will catch it and report the exception
through the Errors object
- dtSearchNetApi2.dll: Added FileInfoFlags.fiOpenFailed to indicate when a document returned from a DataSource
with the DocIsFile flag set to true cannot be opened because it is either not present or locked
- dten600.dll: Fixed merge bug affecting merges of indexes containing the same container file with the newer
version of the container file in the target index
dtSearch 7.51 (Build 7556) January 18, 2008
dtSearch Desktop
- Search results saved as XML include the selection state of the items in the search results list (i.e.,
checked or unchecked). Search results saved in other formats such as CSV can either include all items or
just selected items.
dtSearch Web
- Added 64-bit version of dtSearch Web and dtSearch Web Setup
Fixes and minor enhancements
- dtSearch.Spider2.dll: Added 64-bit version of the .NET Spider API
- dtSearchw.exe: Added status bar indicator for total hits and total number of files retrieved
- lbvprot.dll: Fixed error in standard CD type causing it to get stuck at an "Opening page..." message
- dten600.dll: Fixed file parser error causing formatting errors and extraneous text in MS Works documents
- dten600.dll: File parsers added for Lotus 123 and Quattro Pro
- dten600.dll: Fixed incorrect highlighting in search report generated from cached text when automatic CJK
word breaking enabled
- dten600.dll: Fixed bug in HTML file parser that could cause duplication of field values in stored fields
extracted from HTML meta tags
- dten600.dll: Fixed HTML hit highlighting bug that caused comment tags to appear inside the <TITLE>
dtSearch 7.50b (Build 7518) November 25, 2007
Fixes and minor enhancements
- dten600.dll: Fixed SearchReportJob error causing more blocks of context than specified by MaxContextBlocks
to be included in the generated report.
- lbvprot.dll: Fixed error starting CGI applications
dtSearch 7.50 (Build 7517) November 9, 2007
Enhancements (All products)
- Improved integration with external language analyzers: (1) Language analyzers will be given much larger
chunks of text to analyze, which enables some language analyzers to operate more effectively. (2) Language
analyzers will be given consistently-sized chunks of text whether indexing or highlighting hits, which
ensures that hit highlighting will not be affected by changes in the behavior of a language analyzer
depending on the size of the data it receives.
dtSearch Desktop
- In the Edit > Copy File dialog box, added option to preserve original modification, creation, and
last access times of the original files
- In the Edit > Copy File dialog box, added option to copy the entire container file when a matching
document is inside a container (such as a ZIP file or email archive)
- Option in Options > Preferences > Indexing Options > Letters and Words to automatically
insert a word break around Chinese, Japanese, and Korean characters in text. This makes it possible for
documents that do not contain word breaks to be searched.
dtSearch Engine
- 64-bit version of the dtSearch Engine with C++ and .NET APIs.
- New dtsoTfAutoBreakCJK flag in Options.TextFlags to automatically insert a word break around Chinese,
Japanese, and Korean characters in text. This makes it possible for documents that do not contain word
breaks to be searched.
- ListIndexJob added to the Java API
- New dtsListIndexIncludeDocCount flag added to ListIndexFlags, to provide a the document count for each word
listed
Fixes and minor enhancements
- dten600.dll: Added dtsLaJob.pFileInfo to provide language analyzer with a dtsFileInfo describing the
document being processed
- dten600.dll: Added dtsLaJobInputIsFirstBlockInDocument value for dtsLaJob.flags to tell language analyzer
when a new document is starting
- dten600.dll: Fixed error formatting generated table of contents in .docx file
- dtIndexerw.exe: MAPI_E_CALL_FAILED error indexing Outlook data with Outlook 2007
- dten600.dll: Fixed crash indexing Word document with corrupt styles
- dten600.dll: Fixed PDF parsing error causing incorrect word break
- dten600.dll: Added support for non-Microsoft variant of Microsoft Searchable TIFF format
- dten600.dll: Fixed error generating search report when SearchReportJob specified neither an OutputFile nor
OutputToString (fixed in build 7517)
dtSearch 7.43 (Build 7476) September 16, 2007
Fixes and minor enhancements
- dten600.dll: Fixed error processing list formatting in some MS Word files caused bullets to be rendered as
numbers.
- dten600.dll: Fixed errors formatting text in .docx files.
- dten600.dll: Fixed bug in PDF file parser affecting decoding of CID fonts in PDF files
- dten600.dll: Fixed error extracting item from TAR file to hit-highlight after search
- dten600.dll: Added detection of the following file types with missing or incorrect filename extensions:
Microsoft Word 2003 XML files, Microsoft Excel 2003 XML files.
- dtsjava.dll: Fixed error indexing using data source API under WebSphere
- dten600.dll: Fixed extra spacing in output when HTML converted to UTF-8 text
dtSearch 7.42 (Build 7467) July 31, 2007
Enhancements (All products)
- Added support for Microsoft Searchable TIFF (created by Microsoft Office Imaging), Microsoft Document
Imaging (*.mdi), Windows Metafile *.wmf) and Enhanced Metafile (*.emf) formats
Enhancements (dtSearch Engine)
- Added flags to control recognition of ambiguous dates (new TextFlags values dtsoTfRecognizeDatesPresumeDMY,
dtsoTfRecognizeDatesPresumeYMD)
Fixes and minor enhancements
- dtSearchNetApi2.dll: JobErrorInfo no longer requires or supports the IDisposable interface.
- dten600.dll: Filenames longer than 1024 characters could cause "duplicate filename" errors when verifying an
index.
- dtSearchw.exe: Fixed problem with mouse wheel scrolling in search results window with high-resolution mouse
wheels
- dten600.dll: Improved recognition of CJK encodings in HTML and PDF
- dten600.dll: Fixed error in title attribute for documents indexed using the COM implementation of the data
source API
- dten600.dll: Changed the behavior when a container document such as a ZIP file is removed from an index
using its doc id (using IndexJob.ActionRemoveListed). Instead of just removing the container, the container
and all contained items will be removed.
- dten600.dll: MS Word file parser displayed some internal field data (TC, TA, SEQ)
- dten600.dll: Added detection of .docx, .xlsx, .xps, and OpenOffice documents with missing or incorrect
filename extensions
dtSearch 7.41 (Build 7420) April 21, 2007
Enhancements (dtSearch Engine)
- Added support for automatically varying hit weights according to the field they occur in, through the new
SearchJob.FieldWeights setting. For more information, see the "Relevance" topic in the dtSearch Engine API
Reference.
- Added improved progress reporting during unindexed searches to the C++ API (see dtsSearchProgressInfo in the
dtSearch Engine API Reference) and the .NET 2.0 API (see ISearchStatusHandler2 in the .NET 2.0 API
Reference).
- Compatibility note for developers working with the .NET 2.0 API only: The DLL dependencies for
dtSearchNetApi2.dll have changed due to Visual Studio .NET 2005 Service Pack 1. Because
dtSearchNetApi2.dll is built with Service Pack 1, it requires the updated MFC and CRT DLLs that are included
with that version. Executing the vcredist_x86.exe included with Visual Studio .NET 2005 Service Pack 1
(dated December 2, 2006 or later) will install these components.
This issue does not affect any
other dtSearch Engine API.
Fixes and minor enhancements
- dten600.dll: New fields added as properties of .eml files - CC, BCC, and Attachments (a list of the
filenames of all attachments).
- dtSearch .exe and .msi files digitally signed for better operation in Windows Vista
- dtSearchNetApi2.dll: Error in ConvertPath caused unnecessary refresh of virtual path mappings from the
metabase.
- dten600.dll: Minor improvements in the binary file detection and Unicode filtering algorithm for binary
files
- dten600.dll: Fixed bug in MHT file parser that caused hit highlighter to generate blank HTML page for some
MHT files
- dten600.dll: IndexCache object added to the COM interface.
- dtSearchw.exe: Fixed error starting indexer and dtSearch Web Setup under Windows Vista
dtSearch 7.40 (Build 7360) February 22, 2007
Enhancements (All products)
- Added automatic recognition of dates, email addresses, and credit card numbers in text. For more
information, see http://www.dtsearch.com/dateRecog.html>
- Added support for Vista (XMP) metadata in .jpg and .tif images.
- Added support for PowerPoint 2007 (*.pptx).
- Added support for Vista XML Paper Specification (*.xps) documents.
Enhancements (dtSearch Engine)
- Added IndexCache object in the .NET 2.0 API, and dtsIndexCache object in the C++ API, to enable much faster
searching when a series of searches must be done against a small number of indexes. The IndexCache maintains
a thread-safe pool of open indexes that are available for searching during the lifetime of the cache. Using
the cache eliminates the need to open and close the index for each search
Enhancements (dtSearch Desktop)
- Added option in Options > Preferences > Spider Options to log the links found in each page the
Spider follows.
- Added option in Options > Preferences > Search Options to change the maximum number of words a
search request can match.
Fixes and minor enhancements
- dtSearch.exe: Fixed "Invalid Character" error displaying documents in report view after installing Internet
Explorer 7.
- dten600.dll: When serializing stored fields to XML, add a _ in front of any stored field names that begin
with a digit so the resulting XML remains syntactically correct.
- dten600.dll: In the C++ API, the pOnIndexWordFn callback was called with encoded field information in
addition to the text of the word, and if the called function did not preserve this field information intact,
field attributes could become invalid. To prevent this, in version 7.40 the field information is removed
before the callback so pOnIndexWordFn will not see or be able to affect field attributes.
- dten600.dll: Increased the maximum value for MaxWordsToRetrieve to 512k (from 256k)
- dten600.dll: When checking for available disk space in a folder, indexer did not check whether the folder
was mounted from a different physical drive.
- dtSearch Web: Added option to log BooleanConditions and FileConditions to search log.
- dtindexerw.exe: Fixed MAPI_E_UNKNOWN_FLAGS error indexing Outlook messages in Outlook 2002 (fixed in build
7360)
dtSearch 7.30 (Build 7320) September 30, 2006
Enhancements (All products)
- Added preliminary support for Word 2007 (*.docx) and Excel 2007 (*.xlsx) based on the current Office 2007
beta and available documentation.
- Added support for JPG and TIFF metadata, including EXIF and IPTC fields.
- Unicode filtering file parser can handle individual documents larger than 2 Gb, and support for files larger
than 2 Gb added to the extext.exe utility
- Improved handling of partially inaccessible email files. In previous versions, if an email had encrypted or
corrupt data (for example, an encrypted attachment), the whole email was reported as encrypted or corrupt.
In this version, the readable portion of the message is indexed and the unreadable portion is separately
reported as a partially encrypted or partially unreadable file. This change applies to Outlook messages,
TNEF files, .eml files, MBOX archives, and .msg files.
Enhancements (dtSearch Engine)
- Beta x64 (64-bit) versions of the dtSearch Indexer and dtSearch Engine (dtIndexer64.exe, dtengine64.dll, and
dtSearchNetApi2.dll. The index format and APIs (C++, COM, and .NET) are identical to the 32-bit version. The
64-bit components are in a separate download file (dtSearch64_730.exe) with the same installation password
as the dtSearch Engine SDK.
- Added alternative PDF highlighting mechanism for client-based applications (see "Highlighting Hits in PDF
files" in the API Overviews section for details)
- Added ListIndexJob object to the .NET 2.0 API to list files, words, or fields in an index (see
dtSearchNetApi2.chm for API reference)
- Added dtsListIndexIncludeDocId flag for dtsListIndexJob and ListIndexJob to provide a quick way to list all
documents in an index and the doc id for each document
- C++ API Changes to support 64-bit file sizes in dtsInputStream (added size64 and seek64),
dtsInputStreamReader, dtsFileInfo (added size64), dtsSearchResultsItem (added size64). These changes
preserve binary compatibility for the dtSearch Engine DLL, but some C++ code may trigger new warnings when
compiled because of 64-bit values returned.
- Added dtsIndexKeepExistingDocIds flag to specify that, when compressing an index, the indexer should not
remap document ids, so document ids will be unmodified in the index once compression is done.
Fixes and minor enhancements
- dtWebSetup.exe: Fixed bug causing "Build Search Form" tool to create extra button bar when overwriting an
existing form
- dten600.dll: Fixed 'out of memory' error verifying very large index
- lbviewer.exe: Fixed bug causing XML to appear incorrectly when displayed using a stylesheet
- dten600.dll: PowerPoint file parser - added support for embedded OLE objects
- dten600.dll: PDF file parser detects and handles case where text in right-to-left languages (Hebrew or
Arabic) is stored backwards (left-to-right) in a PDF file, and automatically inverts the characters in the
word so it will be correctly searchable
- dten600.dll: PDF file parser handles invalid PDF files created by OCR product that leaves out required
/Pages and /Page tags in PDF structure
- dtSearchNetApi2.dll: JobErrorInfo object did not implement IDisposable interface, preventing deterministic
release of allocated resources.
dtSearch 7.25 (Build 7285) June 25, 2006
Fixes and minor enhancements
- dtWebSetup.exe: Fixed bug causing dtSearch Web Setup to fail to run on some Windows 2003 Server systems
- dten600.dll: Added Fragmentation, ObsoleteCount, and IndexFlags to the COM IndexJob.GetIndexInfo() and Java
IndexJob.getIndexInfo() methods. Also, indexing dates are now reported as a date and time.
dtSearch 7.24 (Build 7245) June 7, 2006
Enhancements (dtSearch Engine)
- Added support for indexing and searching TNEF (Transport Neutral Encapsulation Format) files
- dtSearch.Spider .NET API has new Authentication, FormAuthentication, and ProxyInfo properties
Fixes and minor enhancements
- dten600.dll: EML parser - fixed bug indexing messages with no message body
- dten600.dll: Excel file parsing bug caused indexer to hang on corrupt Excel file
- mapitool.exe: Added workaround for MAPI error MAPI_E_NOT_ENOUGH_MEMORY when saving a message that has a very
large number of recipients (see http://support.microsoft.com/kb/171907).
- dten600.dll: RTF parser - fixed bug parsing headers in RTF files
- dten600.dll: Improved parsing of Ole10Native streams in OLE Storage files
- C++ Source code samples reorganized into vc6, vc7, and vc8 folders
- dten600.dll: Fixed PDF hit-highlighting bug that caused highlighting to fail to appear in some documents
dtSearch 7.23 (Build 7241) May 8, 2006
Enhancements (dtSearch Desktop)
- Added new Indexing Resources preferences page
- Added pause button to Update Index dialog box
Enhancements (dtSearch Engine)
- Added ASP.NET 2.0 sample applications in VB.NET and C# in C:\Program Files\dtSearch
Developer\examples\asp.net2
- Java API: empty() method added to SearchResults, and clear() method added to SearchReportJob and SearchJob
to force deterministic release of memory allocated for SearchResults
Fixes
- dten600.dll: PowerPoint file parsing bug caused incorrect character formatting
- dten600.dll: Word file parsing bug caused right-to-left text to be left-aligned instead of right-aligned by
default.
- dten600.dll: EML parser - improved detection and parsing of malformed .eml files.
- mapitool.exe: Fixed error converting UTC date when saving .msg files to disk
- dten600.dll: Fixed crash indexing corrupt .mp3 file
- libdten600.so (Linux): Fixed error parsing zipped files with accented characters in the filename
- dtsearch.exe: Fixed error detecting Ctrl+C keyboard shortcut
- dten600.dll: Excel file parser bug caused some valid XLS files to be incorrectly reported as corrupt
dtSearch 7.22 (Build 7217) March 14, 2006
Enhancements (dtSearch Engine)
- .NET 2.0 API for Visual Studio .NET 2005. The .NET 2.0 API wrapper is dtSearchNetApi2.dll, and the .NET 2.0
version of the Spider API is dtSearch.Spider2.dll. The API is identical to the .NET 1.1 API. For sample
code, see the examples\cs2 and examples\vb.net2 folders.
- Added dtsSortByFullName sort flag
- Added docByteRead64, bytesRead64, bytesToIndex64 to dtsIndexProgressInfo
- Added Options.StoredFieldDelimiterChar, which provides a way to specify a delimiter between multiple
instances of a stored field in a single document
Enhancements (dtSearch Desktop)
- Added option to dtSearch to view PDF files as plain text instead of using Adobe Reader (Options >
Preferences > External Viewers).
Fixes
- dten600.dll: Fixed indexer bug that caused index to be reported as having "Illegal ref ptr" error when
verified
- dten600.dll: Fixed Excel file parser bugs affecting formatting of date and time values
- dten600.dll: Fixed indexing crash when indexing very large .gz files
- dten600.dll: Minor improvements to word break detection in PDF files
- dten600.dll: Fixed RTF file parser bug that could cause indexing crash on corrupt RTF file
- dten600.dll: Excel file parser defaults to 10 digits of precision for numbers without a specified format
(consistent with Excel).
- dten600.dll: Minor improvements to Unicode filtering algorithm.
dtSearch 7.21 (Build 7164) January 23, 2006
Enhancements (All products)
- IFilter support to enable dtSearch to parse document types such as Microsoft OneNote and AutoCAD that
include IFilters.
IFilters are components that enable various Microsoft search
products, such as Microsoft Index Server, to extract text from documents. For example, when you install
Microsoft OneNote, an IFilter is installed to enable searching of *.one files. To tell dtSearch to use
installed IFilters to process some of your files, set up a rule in Options > Preferences > File Types
and under File type, select "IFilter". In dtSearch Engine applications, use the FileTypeTableFile to specify
the filename patterns to use with IFilters. The IFilter adapter only works on systems with the Microsoft
component query.dll installed.
For information on products that include query.dll, see
http://support.microsoft.com/dllhelp
Fixes
- dten600.dll: Fixed bug that prevented some items in ZIP files from being displayed after a search (an
"unable to access input file" message would appear instead).
- dten600.dll: Fixed bug in file parsers for Microsoft Office documents (PowerPoint, Excel, and Word) that
could cause dtSearch to crash attempting to index corrupt documents
- dten600.dll: Fixed bug in PDF file parser that caused "Bad xref" error on some PDF files created with PDF
1.5-only compatibility
- dtsearch.h: unnamed unions removed from the dtsMessage structure. This will not affect binary compatibility
but may require source code changes in C++ code that accessed undocumented union members. Because the
removed union members were undocumented, this change should affect very few programs.
dtSearch 7.20 (Build 7136) December 6, 2005
Enhancements (All products)
- New file parsers for OpenOffice documents, spreadsheets, and presentations (*.sxw, *.sxc, *.odt, *.ods,
etc.), covering OpenOffice version 1 and OpenOffice version 2 (the "Open Document Format for Office
Applications")
- New file parsers for the Microsoft Office XML formats (Microsoft Word 2003 XML and Microsoft Excel 2003 XML)
Enhancements (dtSearch Desktop)
- Added "Opening containing folder" in right-click menu for retrieved items
- Improved reporting of errors that occur when copying files in Edit > Copy File(s)
- dtindexer.exe: added /caf and /cat command-line option to cache text (/cat) or cache original files (/cad),
when creating indexes using the command line, and /recog to recognize an index.
- Added Help > Check For Updates feature to automatically download new versions
Enhancements (dtSearch Engine)
- dtSearch.Spider.dll component provides a .NET API for the dtSearch Spider. For API documentation, see
dtSearchNetApi.chm. For sample code, see
C:\Program Files\dtSearch Developer\examples\cs\SpiderDemo.
- New xfilter search type, "ext", to search only on the filename extension (dot required). Examples:
xfilter(ext ".doc")
matches file with a .doc extension; xfilter(ext "~.doc")
matches file without a .doc extension; xfilter(ext ".")
matches file with no extension. This
search feature will only work with documents that were indexed with dtSearch 7.2 or later.
- SearchReport supports %%FirstHit%% macro in ContextHeader to indicate the word offset of the first hit in
the context block
- dtsIndexCacheTextWithoutFields flag added to IndexingFlags. This flag makes it possible to cache text (for
generation of a synopsis to include in search results) without including any of the fields added using the
data source API.
- dtsErAccCachedDoc flag added to ErrorCodes. This error code indicates that a document could not be extracted
from the document cache in an index (this usually means that the index was created without caching enabled)
- dtsConvertJustDetectType flag added to ConvertFlags, to have FileConverter or DFileConvertJob just detect
the file format of a document. The format is returned in FileConverter.DetectedTypeId.
- dtSearch Engine for Linux updated to the dtSearch 7.2 code base; multithreading support added
- dtsReportIncludeFileStart flag added to ReportFlags. This flag causes a block of text from the beginning of
the document to be included in the generated search report.
- A new search feature makes it possible to restrict a search to the text of documents (excluding any
metadata). To search for text that is not in any field, search for //text contains (search request).
Example:
(//text contains apple) and (author contains smith)
Fixes
- DynaZip unzip component (dunzip32.dll) updated to new version that eliminates buffer overrun vulnerability
in earlier versions.
- dtSearch.exe: Installing Adobe Acrobat 7.05 update caused hit highlighting to stop working in PDF files
- dten600.dll: Reduced amount of memory needed to parse very large Word, Excel, and Outlook items
- dten600.dll: Fixed file parsing error in Word documents that caused bullets to be rendered as auto-numbered
lists
- dten600.dll: Changed handling of CSV files that do not have a header that lists field names; these files are
now handled as plain text, since no field information is available for them.
- Spider: Fixed: Links with // in the name (http://www.example.com//default.html) caused index to be reported
as corrupt by Verify Index
- dtSearch.exe: MAPI profile id, entry id, and store id were displayed in search results list
- dtSearch Web: WebSearchForm.js incorrectly handled blank filename filter
- Spider: Did not match port number against filename filters for port numbers other than 80
- dtSearch.exe: Displayed MAPI entry id and store id of Outlook messages in search results.
dtSearch 7.10 (Build 7045) August 8, 2005
Enhancements (dtSearch Engine)
- Added two new ASP.NET samples, one in VB.NET and one in C#, that demonstrate a search interface using a grid
control for search results. The new samples are installed to C:\Program Files\dtSearch
Developer\examples\asp.net. Please see the readme file in the project folders before trying to open them in
Visual Studio -- a virtual directory mapping for C:\Program Files\dtSearch Developer\examples\asp.net has to
be created first or Visual Studio will not be able to open the project.
- GetNthWordDocCount added to WordListBuilder to get the number of documents a word occurs in
- SearchReportJob enhancements: Added ContextSeparator; itUnformattedHTML output format, for easier generation
of a synopsis; faster generation of search report when search results cover multiple indexes;
dtsReportLimitContiguousContext flag to prevent very large synopsis when there are many hits close together.
- In the OnFound callback notification in the C++ and .NET interfaces, an application can veto individual
items to prevent them from being included in search results. See SearchResultsItem.VetoThisItem (.NET) and
DSearchJob::VetoThisItem (C++ Support Classes).
- dtSearchNetApi.dll uses registry type library information and delay loading to eliminate the need for
dten600.dll to reside on the system PATH in ASP.NET applications.
- New TextFlags option to suppress automatic generation of xfirstword and xlastword (dtsoTfSkipXFirstAndLast)
- Options.MaxFieldNesting setting to limit the permissible depth of field nesting
- .NET API objects implement Dispose() for more deterministic release of allocated resources.
- .NET IndexJob.ExecuteInThread and SearchJob.ExecuteInThread use .NET thread pool instead of creating a
thread.
- dtSearch Engine for Linux updated to the dtSearch 7.1 code base
Enhancements (dtSearch Publish)
- "Standard" CDs (which use lbview.exe) can launch non-CGI programs from the CGI-BIN folder using a new URL
syntax. See the "Standard CDs" help topic in dtSearch_Web.chm for details.
- "Standard" CDs can highlight hits in PDF files with Adobe Reader 6 or later (formerly Adobe Reader 7 was
required).
Fixes
- dten600.dll: Reduced amount of stack required to process very long xfilter expressions
- dten600.dll: Fixed index merge bug that could cause a corrupt index merging into a large index without the
"clear target" flag set
- dten600.dll: Bug caused MakePdfWebHighlightFile to return a blank string after unindexed search
- dten600.dll: The default value of Options.MatchDigitChar has changed from blank (disabled) to '=', to be
consistent with the behavior of dtSearch Desktop.
- dten600.dll: Bug caused unindexed search of HTML field defined using comment tags to find the same search
term outside of the field
dtSearch 7.01 (Build 7025) June 14, 2005
Enhancements (dtSearch Web)
- Generated search form has more flexible stylesheet references
- File parsers generate HTML output using "em" units for font sizes instead of points, which allows font sizes
to scale up or down in Internet Explorer
Enhancements (dtSearch Publish)
- Added "Recognize CD" function to use the CD Wizard to modify a CD that was created on a different computer
Fixes
- dten600.dll: MS Word file parser caused a word break when MS Word inserted redundant font changes within a
word
- lbview.exe: Error opening PDF file with URL-encoded apostrophe in filename or path
- dten600.dll: PowerPoint file parser error parsing slide without outline entry text
- dtSearchNetApi.dll: SearchResultsItem did not include modified date or type id
- dten600.dll: SearchResults did not read HitsByWord when serializing from XML
- dtSearch Publish: PDF files did not highlight hits in Adobe Reader 7 in some systems with unpatched versions
of IE components.
- dtIndexer.exe: Default setting for IndexAutoCommitIntervalMB forced large index updates to commit too
frequently, making indexing slower.
dtSearch 7.00 (Build 7008) May 18, 2005
Enhancements (All products)
- High-capacity index format released, with support for
over 1 terabyte of data per index.
dtSearch 7 can update and search indexes created with dtSearch 6.
To upgrade an index to the version 7 format using dtSearch Desktop, (1) click Index > Update
Index..., (2) Check the box to "Upgrade index to version 7 format". (3) Click "Start Indexing"
- New variable field weighting search option. Example: "(Description:5 contains (apple and pear)) or (author:2
contains smith)"
Enhancements (dtSearch Desktop)
- Added Spider option to pause between page downloads, to reduce them impact on the server of a crawl
- PDF files open faster in Adobe Reader if Adobe Reader 7 is installed
Enhancements (dtSearch Engine)
- New API documentation. See dtSearchApiRef.chm (overviews, C++, and COM interface), dtSearchJavaApi.chm (Java
interface), and dtSearchNetApi.chm (.NET interface)
Enhancements (dtSearch Publish)
- A new file-based CD interface has been added that does not rely on HTTP.
- The CD Wizard has been simplified
Fixes
- dtSearchNetApi.dll: repeated instances of first element returned in HitsByWord array
- dten600.dll: Hidden stored fields (fields with names prefixed by **) were stored with the ** mark in front
of the field name, causing serialized XML search results to have incorrect XML syntax
- dten600.dll: Check for available disk space did not handle volume sizes larger than 2 Terabytes
- mapitool.exe: several minor bug fixes
- dten600.dll: MS Word file parser did not handle FORMTEXT fields
- dtSearch.exe: fixed 800a025e error from Internet Explorer when dtSearch tries to select hidden text to
highlight from a Word document
- dten600.dll: Fixed PDF file parser error counting words in pages with annotations, causing incorrect
highlighting
- dten600.dll: Fixed hit highlighting error in MIME files
- dtv_odbc.dll: Fixed bug handling Unicode data in Access fields
- dten600.dll: Added current option settings in effect to history.ix entry for each update
- dtisapi6.dll: Urls not truncated using MaxUrlSize