Hierarchical Data to DDF Taxonomy

Hierarchical Data to DDF Taxonomy

Design discussion on how to handle hierarchical data structures to DDF taxonomy mappings. The DDF taxonomy currently is intentionally a flat data structure. External systems may use a hierarchical data structure to represent their data, but needs to be mapped into the DDF taxonomy for discovery.

node/1/id
node/1/startTime
node/1/endTime
node/2/id
node/2/startTime
node/2/endTime

Challenges

  1. Data should be grouped and displayed in a user-friendly format in the Catalog UI.

  2. Queries need to use the data relationships (e.g. – (Not CQL) → anyText=houses and test/alternate-id contains '123' and test/startTime after 'Jan 1 2017' and test/endTime before 'May 1 2017')

Solr Capabilities

Potential Options

Option

Pros

Cons

Option

Pros

Cons

Delimiter option alternateId=NRO;MyDoc;20170404T04:04:04Z;1234-1234Z

  • Easy to encode incoming data from transformer.

  • If a specific format is followed, could transform outgoing as well.

  • Would not display very well in the UI.

  • User wouldn't have a very easy time determining what field each value corresponded to.

  • No field typing information.

  • Unable to query fields with specificity (e.g. only really could do a text based query on the full string)

Positional option (temporalCoverage.dateStart[4] corresponds to temporal Coverage.dateEnd[4])

 

  • Slightly more difficult to encode incoming data as various parallel field arrays would need to be coordinated and some indicator of a missing value would need to be used

  • Outgoing data would be more difficult to encode, as various multivalue attributes would need to be queried to construct a node.

Metacard encoding (Contact[0] = metacard4321-4321-4321-4321-4321-4321-4321-4321

  • Supports related data attributes having independent fields with appropriate type information.

  • Would require using a JOIN or other method in order to query data by by groups of related fields.

  • Multiple metacards would be generated for a single piece of metadata.

Via Associations (an associated metacard contains the contact information in a contact metacard)

  • Capability is already supported within DDF.

  • Same as above where a JOIN or other method would need to be used in order to query by groups of related fields.

Store these types of data as XML.

  • XML Data type is already support as an attribute type.

  • XML allows for the grouping of related nodes.

  • Existing Export capabilities could be used for viewing XML attributes.

  • Encoding and decoding in the transformer would be able to using existing schema and directly copy data.

  • Solr Xpath support slow or weak

Create a Collection Attribute type that contains a collection of other element types. Possibly serialize this type as XML.

 

 

Solr Nested Document

  • Can index and retain a deeply nested structure

  • Require schema changes to support all features

  • Query needs to understand structure and paths to nested documents and fields

  • Do not add a root document that has the same ID of a child document. This will violate integrity assumptions that Solr expects.

  • If you use Solr’s delete-by-query APIs, you have to be careful to ensure that no children remain of any documents that are being deleted. Doing otherwise will violate integrity assumptions that Solr expects.