Hierarchical Data to DDF Taxonomy

Design discussion on how to handle hierarchical data structures to DDF taxonomy mappings. The DDF taxonomy currently is intentionally a flat data structure. External systems may use a hierarchical data structure to represent their data, but needs to be mapped into the DDF taxonomy for discovery.

node/1/id
node/1/startTime
node/1/endTime
node/2/id
node/2/startTime
node/2/endTime

Challenges

Data should be grouped and displayed in a user-friendly format in the Catalog UI.
Queries need to use the data relationships (e.g. – (Not CQL) → anyText=houses and test/alternate-id contains '123' and test/startTime after 'Jan 1 2017' and test/endTime before 'May 1 2017')

Solr Capabilities

Graph (https://cwiki.apache.org/confluence/display/solr/Graph+Traversal) - Only supported with SolrCloud
Join (https://wiki.apache.org/solr/Join) - Only supported in single instance Solr nodes and not supported with SolrCloud
Nested Documents (https://lucene.apache.org/solr/guide/8_1/searching-nested-documents.html & https://lucene.apache.org/solr/guide/8_1/indexing-nested-documents.html#indexing-nested-documents) - Solr 8.1 adds support for partial/atomic updates to nested documents

Potential Options

Option	Pros	Cons
Delimiter option alternateId=NRO;MyDoc;20170404T04:04:04Z;1234-1234Z	Easy to encode incoming data from transformer. If a specific format is followed, could transform outgoing as well.	Would not display very well in the UI. User wouldn't have a very easy time determining what field each value corresponded to. No field typing information. Unable to query fields with specificity (e.g. only really could do a text based query on the full string)
Positional option (temporalCoverage.dateStart[4] corresponds to temporal Coverage.dateEnd[4])		Slightly more difficult to encode incoming data as various parallel field arrays would need to be coordinated and some indicator of a missing value would need to be used Outgoing data would be more difficult to encode, as various multivalue attributes would need to be queried to construct a node.
Metacard encoding (Contact[0] = metacard4321-4321-4321-4321-4321-4321-4321-4321	Supports related data attributes having independent fields with appropriate type information.	Would require using a JOIN or other method in order to query data by by groups of related fields. Multiple metacards would be generated for a single piece of metadata.
Via Associations (an associated metacard contains the contact information in a contact metacard)	Capability is already supported within DDF.	Same as above where a JOIN or other method would need to be used in order to query by groups of related fields.
Store these types of data as XML.	XML Data type is already support as an attribute type. XML allows for the grouping of related nodes. Existing Export capabilities could be used for viewing XML attributes. Encoding and decoding in the transformer would be able to using existing schema and directly copy data.	Solr Xpath support slow or weak
Create a Collection Attribute type that contains a collection of other element types. Possibly serialize this type as XML.
Solr Nested Document	Can index and retain a deeply nested structure	Require schema changes to support all features Query needs to understand structure and paths to nested documents and fields Do not add a root document that has the same ID of a child document. This will violate integrity assumptions that Solr expects. If you use Solr’s delete-by-query APIs, you have to be careful to ensure that no children remain of any documents that are being deleted. Doing otherwise will violate integrity assumptions that Solr expects.

Hierarchical Data to DDF Taxonomy

Challenges

Solr Capabilities

Potential Options

Related content