/
Hierarchical Data to DDF Taxonomy

Hierarchical Data to DDF Taxonomy

Design discussion on how to handle hierarchical data structures to DDF taxonomy mappings. The DDF taxonomy currently is intentionally a flat data structure. External systems may use a hierarchical data structure to represent their data, but needs to be mapped into the DDF taxonomy for discovery.

node/1/id
node/1/startTime
node/1/endTime
node/2/id
node/2/startTime
node/2/endTime

Challenges

  1. Data should be grouped and displayed in a user-friendly format in the Catalog UI.
  2. Queries need to use the data relationships (e.g. – (Not CQL) → anyText=houses and test/alternate-id contains '123' and test/startTime after 'Jan 1 2017' and test/endTime before 'May 1 2017')

Solr Capabilities

Potential Options

OptionProsCons

Delimiter option alternateId=NRO;MyDoc;20170404T04:04:04Z;1234-1234Z

  • Easy to encode incoming data from transformer.
  • If a specific format is followed, could transform outgoing as well.
  • Would not display very well in the UI.
  • User wouldn't have a very easy time determining what field each value corresponded to.
  • No field typing information.
  • Unable to query fields with specificity (e.g. only really could do a text based query on the full string)

Positional option (temporalCoverage.dateStart[4] corresponds to temporal Coverage.dateEnd[4])


  • Slightly more difficult to encode incoming data as various parallel field arrays would need to be coordinated and some indicator of a missing value would need to be used
  • Outgoing data would be more difficult to encode, as various multivalue attributes would need to be queried to construct a node.

Metacard encoding (Contact[0] = metacard4321-4321-4321-4321-4321-4321-4321-4321

  • Supports related data attributes having independent fields with appropriate type information.
  • Would require using a JOIN or other method in order to query data by by groups of related fields.
  • Multiple metacards would be generated for a single piece of metadata.

Via Associations (an associated metacard contains the contact information in a contact metacard)

  • Capability is already supported within DDF.
  • Same as above where a JOIN or other method would need to be used in order to query by groups of related fields.

Store these types of data as XML.

  • XML Data type is already support as an attribute type.
  • XML allows for the grouping of related nodes.
  • Existing Export capabilities could be used for viewing XML attributes.
  • Encoding and decoding in the transformer would be able to using existing schema and directly copy data.
  • Solr Xpath support slow or weak

Create a Collection Attribute type that contains a collection of other element types. Possibly serialize this type as XML.



Solr Nested Document
  • Can index and retain a deeply nested structure
  • Require schema changes to support all features
  • Query needs to understand structure and paths to nested documents and fields
  • Do not add a root document that has the same ID of a child document. This will violate integrity assumptions that Solr expects.
  • If you use Solr’s delete-by-query APIs, you have to be careful to ensure that no children remain of any documents that are being deleted. Doing otherwise will violate integrity assumptions that Solr expects.

Related content