Configuration based Metacard Attributes, Types, and Validation

Update 11 Feb 2019 - this was a proposed design whose implementation ended up being different than documented here.  Please reference the DDF docs for details on configurations for Metacard Types, Attributes and Validation Rulesets.


DDF provides the ability to use multiple different types of metacards depending on the data types being ingested into the catalog. Out of the box, DDF provides basic metacards that can handle some common data types. However, as new data types are ingested into the catalog, it makes sense to define new metacard types that reflect the attributes specific to the new data type.

In order to provide the most flexibility, these new metacard types should be allowed to be added to the system dynamically, i.e. does not require a recompilation of the code in order to generate new class files corresponding to the new metacard types.

This document describes at a high level the overall use of metacards, metacard types, metacard attributes, and their interaction with the rest of the system.

Metacard Usages

The primary use of metacards is to store metadata that describes an item or product ingested into the catalog. When processing and storing metacards in the catalog, the catalog framework processes metacards in multiple ways, including:

  • Validating that metacard data is complete and compliant on ingest
  • Indexing metacard attributes for optimized searching
  • Validating results from other systems before caching/aggregating
  • Displaying metacards for editing
    • Discovering which fields of the metacard should be displayed
    • Validating individual attributes of a metacard upon user entry
  • Transforming metacards into requested response formats


Metacard Validation

DDF validates metacards at various points throughout the system. Metacards can be validated as a whole – i.e. every attribute in the metacard is validated successfully, or individual attributes of the metacard may be validated – i.e. from the UI as the user is entering data into a metacard editor.

Based on the architecture of DDF, there are natural locations where this validation occurs. On ingest, a pre-ingest plugin can validate a metacard and mark a card as invalid or reject it before it is ingested into the catalog. When querying other systems, the corresponding results are converted into the appropriate metacards before being correlated with results from other systems. The newly-converted results can be validated before including them into the aggregated search results.

Metacard validation involves validating each attribute of the metacard. In order to perform that validation, the validator must know how to perform that validation. It must know the type of data being validated and what constraints that data must meet in order to be considered valid. Some examples of various types of data and potential constraints that could be applied include:


Data Type

Potential constraints

Integer or Long

Allowable range (0..100, -180..180)

Float

Allowable range (0.0-1.0)

XML

Schema validation (xml schema definition)

Schematron ruleset validation (ruleset file)

Enumerations

List of allowable values (“Red,Green,Blue”, “SiteA,SiteB,SiteC”)

String

Length ( <= 80 characters)

Pattern matching (email address, telephone numbers, IDs)


Normalized Metacards

In order to be truly useful in the enterprise, where users search for information across multiple systems, metacards from each system need to represent common information in a common manner. Data representing a common data type, e.g. file size, should all be represented in the same manner (bytes vs. kilobytes vs. megabytes, etc.). In other words, the data in each metacard should be represented in a normalized fashion.

This has implications for both input transformers, where metacard information is being extracted from the product that it represents, as well as for the processing of metacards (search results) from other systems, where all that is returned is metacard information.

Developers writing input transformers need to understand the common representation identified in the metacard definition and potentially convert units, scale the information extracted from the item or product in order to store information in the expected format.

When receiving results from other systems, the search aggregator must understand the both the format of the received results and the expected format of the applicable metacard in order to provide a mapping between the two formats. This mapping will be handled by appropriate plugins or source providers and may be driven by mapping tables, XSLT files, or whatever appropriate mechanism is required.

Dynamic Metacard Types

The nature of DDF and the systems that it interacts with necessitates the dynamic creation of new (not seen before) metacard types. For example, when interacting with a CSW source, the first step is to query the source and request a listing of its capabilities, including the data types that it will provide. The CSW response may include metadata information that we haven’t seen before and therefore, a new metacard type needs to be created on the fly to handle responses from that source.

Rather than hard-coding metacard types in code, allowing them to be created from an XML descriptor, whether from a file on disc or a returned description from a CSW source, provides the flexibility needed to handle dynamic data catalogs.

One key enabling flexibility at the metacard level is allowing metacard types to be assigned to an existing Java object. The metacard type in Java includes a MetacardType attribute. That attribute defines the type of metacard represented by this Java object. The definition of that metacard type includes a list of attributes that are unique to that type. Since the metacard type can be created dynamically by passing in a string defining the type name and a set of attributes that describe that type, metacard types can be created either through reading XML descriptions of those metacard types (as happens on system startup with the core metacard types), or dynamically using a description returned by a remote source.

Metacard Types and Attributes Overview

The following diagram provides an overview of how metacard types and attributes are used in the system:

The system reads xml descriptor files at startup providing both the definition of registered metacards and the corresponding validation rules for attributes of each metacard type. These are added to the registry and are available for use by other definitions.

The Validation Framework follows the standard Java Validation specifications for validation rules (JSR-303 and JSR-349). These specifications provide a standard framework for validating Java Beans. The goal is to provide the same validation capabilities for metacards that exist for Java Beans. DDF components that need to validate metacards or individual attributes (metacard validators, metadata editor) can invoke validators against a metacard as a whole, or against specific attributes of a given metacard type.

The MetacardType Registry holds the definitions of each metacard type that has been created. The definitions for each metacard type are created by reading XML descriptors or by dynamic API invocations. Input transformers can ask for a metacard of a given type and it will be generated with all of the attributes populated (no values assigned yet) based on that type’s definition.

Defining New Metacard Types

New metacard types can be defined either programmatically or by declaratively using an XML metacard description file. Metacard types are defined by providing the name of the metacard type, referencing any existing metacard type(s) that are to be included, and a set of attribute declarations identifying unique attributes that make up this new metacard type. The constraints for the provided attributes are handled separately from the metacard type definition.

Out of the box, DDF provides a common set of metacard types and their constraints. This set of common metacards should be consulted first before defining new metacard types with potentially overlapping attributes. Where possible, reuse existing metacard types and definitions before defining custom attributes - that provides for the greatest inter-operability when searching on specific fields. Metacard type definitions can include references to other metacard types in order to build on or extend existing metacard definitions. For example, a default "ddf" metacard type defines the attributes that all metacard types should include no matter what the actual data product they describe happens to be. These common attributes include things such as created and modified dates, title, resource URI, etc.

At their core, each attribute defined for a metacard type must be defined according to one of the following base types:

  • String (names, IDs, patterns, enumerations, XML, json, etc.)
  • Date
  • Short
  • Integer
  • Long
  • Float
  • Double
  • Boolean
  • Binary (byte arrays, thumbnails, etc.)
  • Object

From that base type, constraints can be added to create more specialized types - enumerations, thumbnails, WKT, etc.

In addition to the name of an attribute and its base type there are several characteristics of an attribute that the catalog used for processing those attributes. These additional characteristics are specified as part of the attribute definition and are described below:

NameDescription
multiValuedThe attribute can contain more than one value
indexedIndicates whether or not this attribute should be indexed by the catalog and participate in query evaluations. Some attributes may only want to be stored and not indexed.
storedIndicates whether or not the catalog must store the value of this attribute. Some attributes may only need to be indexed and not stored.
tokenizedIndicates whether or not this attribute should be tokenized, i.e. remove stopwords, before storing or indexing the resulting values.

Example Metacard Definition

The following xml snippet shows what the format of a metacard definition looks like:

Metacard Definition Format
<metacard>
	<!-- Simple metacard type name, e.g. nitf -->
    <name></name>

    <!-- included metacard types that this definition builds upon, can have multiple defined, e.g. ddf, image -->
    <include></include>
    <include></include>

    <!-- collection of attributes included above and beyond those in the types listed above, can have multiple defined -->
    <attributes>
        <!-- attribute definition -->
        <attribute>
            <!-- name of this attribute - must be distinct across all types in the system, e.g. nitf.version -->
            <name></name>
            <!-- base type for this attribute - e.g. String, integer, binary -->
            <type>String|Date|Short|Integer|Long|Float|Double|Boolean|Binary|Object</type>
            <!-- indicates that this attribute may contain multiple values - boolean value -->
            <multi-valued>true|false</multi-valued>
            <!-- indicates if this attribute should be indexed by the catalog - boolean value -->
            <indexed>true|false</indexed>
            <!-- indicates if this attribute should be stored by the catalog - boolean value -->
            <stored>true|false</stored>
            <!-- indicates if this value is a tokenized value and should be tokenized before storing - boolean value -->
            <tokenized>true|false</tokenized>
        </attribute>
        <attribute> ... </attribute>
    </attributes>
</metacard>

The following xml metacard definition describes the base "ddf" metacard type that should be a part of all metacard definitions. It defines the base attributes used by the catalog for any type of metacard.

DDF Metacard Type Definition File
<?xml version="1.0" encoding="UTF-8"?>
<metacard>
    <name>ddf</name>
    <attributes>
        <attribute>
            <name>created</name>
            <type>Date</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>modified</name>
            <type>Date</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>expiration</name>
            <type>Date</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>effective</name>
            <type>Date</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>id</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>location</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>sourceId</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>thumbnail</name>
            <type>Binary</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>title</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>metadata</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>metadata-target-namespace</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>metadata-content-type</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>point-of-contact</name>
            <type>String</type>
            <multi-valued>true</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>description</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>metadata-content-type-version</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>resource-uri</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>resource-size</name>
            <type>String</type>
            <multi-valued>false</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
        <attribute>
            <name>security</name>
            <type>String</type>
            <multi-valued>true</multi-valued>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
        </attribute>
    </attributes>
</metacard>

As an example of building on top of the "ddf" metacard definition, consider the following definition fragment for a "nitf" metacard type. It includes the definition of "ddf" so that the resulting metacard has both the standard "ddf" metacard attributes as well as the explicitly defined attributes that describe NITF products.

The following xml snippet shows what the format of a metacard definition looks like:

Sample NITF Metacard Definition
<metacard>
    <name>nitf</name>
    <include>ddf</include>
    <attributes>
        <!-- nitf-specific fields-->
        <attribute>
            <name>nitf.version</name>
            <type>String</type>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
            <multi-valued>false</multi-valued>
        </attribute>
        <attribute>
            <name>complexityLevel</name>
            <type>String</type>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
            <multi-valued>true</multi-valued>
        </attribute>
        <attribute>
            <name>originatingStationId</name>
            <type>String</type>
            <indexed>true</indexed>
            <stored>true</stored>
            <tokenized>false</tokenized>
            <multi-valued>true</multi-valued>
        </attribute><metacard>

        <!-- additional nitf attributes here -->

    </attributes>
</metacard>

Adding new metacard types to the system

New metacard type definitions can be added to the running system by dropping them in to metadata folder (default is <DDF_HOME>/etc/metadata). Once a file is detected in that directory, the system will read it, parse it, and register the corresponding metacard type with the Metacard Type Registry. Once registered, the new metacard type can be used to generate new metacards of that type, or it can be used as the basis for new metacard type definitions.

Decision

Added a way to create metacard type definitions that can be injected but the design changed based on initial prototyping and use cases.

https://github.com/codice/ddf/tree/2.27.x/catalog/core/catalog-core-definitionparser

https://github.com/codice/ddf/tree/2.27.x/catalog/core/catalog-core-injectattribute