Text Analytics

Introduction

Despite the billions being invested in information technology, many organizations are still failing to fully realize the latent potential of their information.

Given that some 80% of all data generated inside enterprises today is unstructured (e.g. documents, emails, instant messages social media, SharePoint), effective decision-making means that organizations need to be able to collect, manage, extract and analyze large volumes or streams of data from multiple sources - quickly, accurately and efficiently.

Business leaders themselves have highlighted the value of doing this well: improved business effectiveness, reduced costs and a better and faster understanding of customers, products, risks and marketplace trends. They also recognize the risks of doing it badly: higher operational costs, reduced business confidence, customer leakage and also financial losses.

The issue with many existing (relational) database management systems is that they are unable to fully process information that is not already in a text-based format (e.g. PDFs).

Customer Requirements

  • Need to have an integrated and accessible view of all enterprise content - across multiple applications, repositories, interfaces, protocols and terminologies.
  • Ability to extract all of the relevant content from any kind of document or file.
  • Ability to rapidly and efficiently sort, index, cull, analyze and make use of the intelligence generated.
  • Identify relevant trends and patterns of information; as well as any gaps.
  • Be totally confident in the relevancy and accuracy of the results generated.

The Solution

ISYS Document Filters enables database systems to operate as a fully-fledged enterprise search and extraction tool, and to be able to handle any kind of unstructured content.

  • Supports 400+ file, email, archive and container formats (e.g. Word, Excel, PowerPoint, PDF, WordPerfect, ZIPs, MSGs).
  • Unique ‘Deep Inspection’ technology reveals and extracts even previously hidden content, including Tracked Changes, Comments, Notes, Annotations and Embedded Web Links.
  • Comprehensive platform support - Windows, Linux, Solaris, Mac OS, HP-UX and AIX.
  • Includes C style and object-oriented interfaces, with examples in C++, Java, C# and vb.net.
  • Identifies and extracts built-in and custom metadata, along with extraction of body text and sub-documents.
  • Users can view the content in its original form with high-definition rendering and printing.
  • Handles all character sets and encodings, including Unicode.

 

Partner Example

Sybase IQ is a relational database software system that is used for business intelligence and data warehousing across standard hardware and operating systems.

Through the integration of ISYS Document Filters, Sybase IQ UDA (Unstructured Data Analytics) provides the means to store and retrieve unstructured data objects (e.g. maps, images, documents, XML files) as part of the same repository as transactional or analytical data. Sybase IQ UDA also enables organizations to tap into unstructured data in new ways through full text search and analysis.

The result is a solution that can now bring the full power of an enterprise class database management system to the management and analysis of unstructured data; with users able to access both types of data using the same application and one interface.

The diagram below gives an overview of the Sybase IQ UDA product, incorporating ISYS Document Filters.

ISYS Text Analytics

 

For more information on how ISYS can drive Text Analytics solutions, download the SAP/Sybase IQ case example.