Description
The Tika Input Transformer is the default input transformer responsible for translating Microsoft Word, Microsoft Excel, Microsoft PowerPoint, OpenOffice Writer, and PDF documents into a Catalog Metacard. This Input Transformer utilizes Apache Tika to provide basic support for these mime types. As such, the metadata extracted from these types of documents is the metadata that is common across all of these document types, e.g., creation date, author, last modified date, etc. The Tika Input Transformer's main purpose is to ingest these types of content into the DDF Content Repository and the Metadata Catalog despite no DDMS metadata being generated.
The Tika input transformer is given a service ranking (prioity) of -1 so that it is guaranteed to be the last Input Transformer that is invoked. This allows any registered Input Transformers that are more specific for any of these document types to be invoked instead of this rudimentary default input transformer.
Usage
Use the Tika Input Transformer if ingesting Microsoft documents, OpenOffice documents, or PDF documents into the DDF Content Repository and/or the Metadata Catalog is important, even if the metacard generated contains no DDMS metadata in it.
Installation and Uninstallation
Install the catalog-transformer-tika
feature using the Web Console or System Console. This feature is uninstalled by default.
Configuration
None
Service Properties
Key | Value |
---|---|
mime-type | application/pdf application/vnd.openxmlformats-officedocument.presentationml.presentation |
shortname | |
id | |
title | Tika Input Transformer |
description | Default Input Transformer for all mime types. |
service.ranking | -1 |
Implementation Details
This Input Transformer maps the metadata common across all mime types to applicable Metacard Attributes in the default MetacardType.
DDMS to Metacard Mapping
N/A
Known Issues
- The Tika Input Transformer does not create DDMS metadata, it just populates the metadata about metadata fields such as Creation Date, Modified Date, etc.