Hierarchical Data to DDF Taxonomy

Design discussion on how to handle hierarchical data structures to DDF taxonomy mappings. The DDF taxonomy currently is intentionally a flat data structure. External systems may use a hierarchical data structure to represent their data, but needs to be mapped into the DDF taxonomy for discovery.

node/1/id
node/1/startTime
node/1/endTime
node/2/id
node/2/startTime
node/2/endTime

Challenges

  1. Data should be grouped and displayed in a user-friendly format in the Catalog UI.
  2. Queries need to use the data relationships (e.g. – (Not CQL) → anyText=houses and test/alternate-id contains '123' and test/startTime after 'Jan 1 2017' and test/endTime before 'May 1 2017')

Solr Capabilities

Potential Options

OptionProsCons

Delimiter option alternateId=NRO;MyDoc;20170404T04:04:04Z;1234-1234Z

  • Easy to encode incoming data from transformer.
  • If a specific format is followed, could transform outgoing as well.
  • Would not display very well in the UI.
  • User wouldn't have a very easy time determining what field each value corresponded to.
  • No field typing information.
  • Unable to query fields with specificity (e.g. only really could do a text based query on the full string)

Positional option (temporalCoverage.dateStart[4] corresponds to temporal Coverage.dateEnd[4])


  • Slightly more difficult to encode incoming data as various parallel field arrays would need to be coordinated and some indicator of a missing value would need to be used
  • Outgoing data would be more difficult to encode, as various multivalue attributes would need to be queried to construct a node.

Metacard encoding (Contact[0] = metacard4321-4321-4321-4321-4321-4321-4321-4321

  • Supports related data attributes having independent fields with appropriate type information.
  • Would require using a JOIN or other method in order to query data by by groups of related fields.
  • Multiple metacards would be generated for a single piece of metadata.

Via Associations (an associated metacard contains the contact information in a contact metacard)

  • Capability is already supported within DDF.
  • Same as above where a JOIN or other method would need to be used in order to query by groups of related fields.

Store these types of data as XML.

  • XML Data type is already support as an attribute type.
  • XML allows for the grouping of related nodes.
  • Existing Export capabilities could be used for viewing XML attributes.
  • Encoding and decoding in the transformer would be able to using existing schema and directly copy data.
  • Solr Xpath support slow or weak

Create a Collection Attribute type that contains a collection of other element types. Possibly serialize this type as XML.



Solr Nested Document
  • Can index and retain a deeply nested structure
  • Require schema changes to support all features
  • Query needs to understand structure and paths to nested documents and fields
  • Do not add a root document that has the same ID of a child document. This will violate integrity assumptions that Solr expects.
  • If you use Solr’s delete-by-query APIs, you have to be careful to ensure that no children remain of any documents that are being deleted. Doing otherwise will violate integrity assumptions that Solr expects.