/
Tika Input Transformer

Tika Input Transformer

Description

The Tika Input Transformer is the default input transformer responsible for translating Microsoft Word, Microsoft Excel, Microsoft PowerPoint, OpenOffice Writer, and PDF documents into a Catalog Metacard. This Input Transformer utilizes Apache Tika to provide basic support for these mime types. As such, the metadata extracted from these types of documents is the metadata that is common across all of these document types, e.g., creation date, author, last modified date, etc. The Tika Input Transformer's main purpose is to ingest these types of content into the DDF Content Repository and the Metadata Catalog despite no DDMS metadata being generated.

The Tika input transformer is given a service ranking (prioity) of -1 so that it is guaranteed to be the last Input Transformer that is invoked. This allows any registered Input Transformers that are more specific for any of these document types to be invoked instead of this rudimentary default input transformer.

Usage

Use the Tika Input Transformer if ingesting Microsoft documents, OpenOffice documents, or PDF documents into the DDF Content Repository and/or the Metadata Catalog is important, even if the metacard generated contains no DDMS metadata in it.

Installation and Uninstallation

Install the catalog-transformer-tika feature using the Web Console or System Console. This feature is uninstalled by default.

Configuration

None

Service Properties

KeyValue
mime-type

application/pdf
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.openxmlformats-officedocument.presentationml.presentation

application/vnd.openxmlformats-officedocument.presentationml.presentation
application/vnd.ms-powerpoint.presentation.macroenabled.12
application/vnd.ms-powerpoint.slideshow.macroenabled.12
application/vnd.openxmlformats-officedocument.presentationml.slideshow
application/vnd.ms-powerpoint.template.macroenabled.12
application/vnd.oasis.opendocument.text

shortname 
id 
titleTika Input Transformer
descriptionDefault Input Transformer for all mime types.
service.ranking-1

 

Implementation Details

This Input Transformer maps the metadata common across all mime types to applicable Metacard Attributes in the default MetacardType.

DDMS to Metacard Mapping

N/A

Known Issues

  • The Tika Input Transformer does not create DDMS metadata, it just populates the metadata about metadata fields such as Creation Date, Modified Date, etc.