7. Library schemas

7.1. Why library schemas?

The variety and complexity of events in electrophysiological experiments make full documentation challenging. As more experiments move out of controlled laboratory environments and into less controlled virtual and real-world settings, the terminology required to adequately describe events has the potential to grow exponentially.

In addition, experiments in any given subfield can create pressures to add overly-specific terms and jargon to the schema hierarchy—for example, adding musical terms to tag events in music-based experiments, video markup terms for experiments involving movie viewing, traffic terms for experiments involving virtual driving, and so forth.

Clinical fields using neuroimaging also have their own specific vocabularies for describing data features of clinical interest (e.g., seizure, sleep stage IV). Including these discipline-specific terms quickly makes the standard HED schema unwieldy and less usable by the broader user community.

Third generation HED addressed the problem of vocabulary bloat by introducing HED library schemas to organize discipline-specific terminology. To use a programming analogy, when programmers write a Python module, the resulting code does not become part of the Python language or core libraries. Instead, the module becomes part of a library used in conjunction with core modules of the programming language.

A HED library schema contains the specialized vocabulary terms needed for event annotation in a specialized area. An example of such a library is the HED SCORE schema for annotation of EEG by clinicians.

7.2. Standalone schemas

7.3. Partnered schemas

HED library schemas were originally assumed to be standalone vocabularies, complete with all the needed schema attributes and properties. These standalone library schemas were usually used in conjunction with the HED standard schema, and the tags from the two different vocabularies were distinguished by prefixing the tags from one of the vocabularies with xx:. Here xx: is called the namespace for that schema within the annotation and is chosen by the annotator.

Partnered library schemas were introduced in HED specification version 3.2.0 and are supported by HED standard schema versions ≥ 8.2.0.

A partnered library schema version is tied to a specific version of the HED standard schema as specified in its header. A given library schema version is either partnered or standalone.

7.3.1. Partnered files

The XML file corresponding to a partnered library schema is a single, unified schema containing the information from both the library and its standard schema partner and validated as an integrated whole.

This XML merged schema file is downloaded and used by tools. Downstream tools see a single schema and can process it with no special handling. The following example shows the XML header for merged TESTLIB library version 2.0.0.

XML header for TESTLIB library 2.0.0 partnered with 8.2.0 (merged).

<?xml version="1.0" ?>
<HED library="testlib" version="2.0.0" withStandard="8.2.0">

The canonical filename for this .xml file is HED_testlib_2.0.0.xml. This file is always stored in the libraries hedxml directory in the hed-schemas GitHub repository. For the above example, the directory is library_schemas/testlib/hedxml.

As with any HED schema, schema builders develop and maintain their schema in MediaWiki mark-down format and use tools to convert to XML. The schema developer’s version is unmerged, containing only the information specific to the library schema. The following example shows the header for the .mediawiki developer’s version of a partnered library schema.

MediaWiki header for TESTLIB library 2.0.0 partnered with 8.2.0 (unmerged).

HED library="testlib" version="2.0.0" withStandard="8.2.0" unmerged="true"

The canonical filename for this .mediawiki file is HED_testlib_2.0.0_unmerged.mediawiki.

Tools also support an alternative form of the .mediawiki library schema containing all the information in the merged schema (a mirror to the XML), which may be useful for debugging, but is usually not explicitly created.

The following table summarizes the different partnered library schema formats and their uses. File names and link examples are specifically for the TESTLIB library. For other libraries, substitute the library name for the word testlib.

Format

Merged
status

Canonical filename

Handling

XML

merged

HED_testlib_2.0.0.xml

Stored in library hedxml.
Used by tools.

XML

unmerged

HED_testlib_2.0.0_unmerged.xml

Can be generated but is never
stored on hed-schemas.
Not used, but available for completeness.

MediaWiki

merged

HED_testlib_2.0.0.mediawiki

Usually not stored in hedwiki.
Possibly used during
schema development.

MediaWiki

unmerged

HED_testlib_2.0.0_unmerged.mediawiki

Working format for developers
Should be stored in hedwiki.

7.3.2. Partnered formats

There are four significant differences between merged and unmerged MediaWiki formats:

  1. The unmerged version has the unmerged="true" attribute in its header line.
     

  2. The unmerged version should only include the auxiliary sections (e.g., unit classes, unit modifiers, value classes, schema attributes, and schema properties) that it explicitly extends.
     

  3. In the unmerged schema, nodes with the rooted=XXX schema attribute must be top-level tags, and XXX must correspond to a node in the standard schema. In the merged schema, nodes with the rooted=XXX schema attribute are placed directly under the standard schema node XXX.
     

  4. Nodes in the unmerged version cannot have the inLibrary attribute. In contrast, nodes from the library schema are given the inLibrary==YYY attribute during the merging process. Here YYY is the library schema name.

The following excerpt from an unmerged TESTLIB library schema in MediaWiki format shows a library schema node (Data-mode) rooted to Statistical-value in the standard schema.

Example of a rooted node in an unmerged schema in MediaWiki format.

                      . . .
'''Data-mode''' <nowiki>{rooted=Statistical-value}[A value that occurs most often in data.]</nowiki>   
* <nowiki># {takesValue, valueClass=numericClass}</nowiki> 
                      . . .

Notice that the indentation asterisks (*) indicate that the node’s children are at the first level. In the merged schema, these are adjusted accordingly as shown in the following:

When merged with the standard schema, the indentation levels are adjusted.

                      . . .
*** Statistical-value <nowiki>{extensionAllowed}[A value based on or employing the principles of statistics.]</nowiki>              
                       . . .
**** Data-minimum <nowiki>[The smallest possible quantity.]</nowiki>
***** <nowiki># {takesValue, valueClass=numericClass}</nowiki>
**** Data-mode''' <nowiki>{inLibrary=testlib, rooted}[A value that occurs most often in data.]</nowiki>   
***** <nowiki># {takesValue, valueClass=numericClass, inLibrary=testlib}</nowiki> 
**** Probability <nowiki> [A measure of the expectation of the occurrence of a particular event.]</nowiki>
***** <nowiki># {takesValue, valueClass=numericClass}</nowiki>
                      . . .

Similar differences occur between the merged and unmerged XML formats, but only the merged XML format is useful.

7.3.3. Auxiliary sections

The unmerged version of a partnered library schema must have prologue and epilogue sections that appropriately explain the purpose of the library schema. The contents of these prologue and epilogue sections become the prologue and epilogue, respectively, in the merged schema.

All the other auxiliary sections of the corresponding partner standard schema are inherited by the merged schema. Most unmerged partnered library schemas will not contain any additional auxiliary sections.

Auxiliary section items that do not appear in a standard schema are unlikely to be supported by the HED infrastructure if they require special handling. Thus, adding items to the auxiliary library schema sections is discouraged.

Library schema developers who need to add an item, such as a unit class to an auxiliary section, should first contact the HED Working Group to determine whether this item could be appropriately added to the standard schema. If a new item must be added, only that item and its corresponding auxiliary section should appear in the unmerged schema.

Library schema additions of units, unit classes, unit modifiers, value classes, and schema attributes are permitted, though not encouraged. Library schemas cannot add information to the property definitions section of the schema.

7.3.4. Partnered attributes

To support partnered library schema the following items were introduced in HED standard schema 8.2.0:

Name

Type

Role

withStandard

Header attribute

  • Indicates that this is a partnered library schema.
  • Its value is the version of its standard schema partner.

unmerged

Header attribute

  • Indicates that this schema contains only library information.
  • Its value is either “true” or “false.
  • If “false”, the attribute should be omitted.

inLibrary

Element attribute

  • Indicates that this element is in the library schema.
  • Its value is the library name in lowercase.
  • The attribute appears only in merged schemas.

rooted=XXX

Node attribute

  • Indicates that this node is to appear directly under
    standard schema node XXX in the merged schema.
  • A node with the rooted attribute must be
    a top-level node in the unmerged schema.

reserved

Node attribute

  • Indicates that this node has special meaning or function.
  • Can only appear in standard schemas.
.

7.2.5. Motivation for partners

Starting with HED specification version 3.2.0 and HED standard schema version 8.2.0, partnered library schemas have become the recommended form for library schemas. This section describes the motivation for this preference.

7.3.5.1. Auxiliary consistency

A standalone library schema must duplicate the auxiliary schema sections appearing in standard schemas, introducing the possibility of inconsistency in usage or definition between the library schema and standard schemas.

Partnered library schema automatically inherit the partner standard schema’s auxiliary attributes, this assuring consistent handling by tools and preventing the introduction of inconsistently handled attributes.

Although standalone library schemas may add additional items to the auxiliary sections, HED tools only guarantee support of standard schema auxiliary items requiring special handling. Thus, addition of items in the auxiliary sections of a library schema is discouraged.

7.3.5.2. Reserved tag handling

Several tags in the standard schema such as Definition, Onset, and Offset define the structure of events and the data. By partnering with a standard schema, a library schema is assured of having HED support for key features such as events of temporal extent and definitions.

Developers of partnered library schemas should release new versions whenever HED updates its standard schema. This ensures that the partnered library schema benefits from the latest updates to HED features and tools.

If the update can be done without conflict, this update may be initiated as part of the release mechanism by the maintainers of the HED repositories.

7.3.5.3. Annotation conciseness

The most common use case for library schemas in annotation requires tags from both a standard schema and a library schema, thus requiring that a xx: be assigned to tags from one of the schemas when standalone library schemas are used.

Because a partnered library schema is merged with a standard schema to form a single, unified schema, users can annotate data without the xx: namespace designator. The xx: is still needed if more than one library schema is used.

7.3.5.4. Library searches

The subtrees appearing in the library schemas are often elaborations of a particular term in the standard schema. However, if the library schema terms are not in appropriate standard schema hierarchy, HED search can not be leveraged to find these elaborations by searching for a more general standard schema term.

7.3.5.5. Suggested tags

Standalone library schemas cannot use the suggestedTag or relatedTag attributes to suggest using particular tags from the standard schema, since the values of the tags must be in the schemas themselves. However, with partnered library schemas, validation is only performed on the merged versions of the schema, so tags from the standard schema can be used as suggestTag or relatedTag values.

7.3.6 Lazy partnering

HED allows multiple partnered schemas to be loaded and used without prefixes provided that there are no conflicts. We refer to this process as lazy merging. Conflicting schemas can always be used together if all but one have an associated prefix. A merge is attempted for all non-prefixed schemas and for each group of schemas with the same prefix.

In the following example, all the library schemas are partnered with ‘8.2.0’. Library schemas liba_1.0.0 and libc_4.3.2 are merged with no prefix, and library schemas ac:libb_2.8.1 and ac:exam_2.3.2 are merged with prefix ac:. The schema sc:test_1.3.2 stays the same and schema 8.2.0 has no effect, since it is already included as a partner of liba_1.0.0 and libc_4.3.2. If there are any conflicts during the merging process, an error is raised.

Example: Merging of multiple schemas.

     ['liba_1.0.0', 'ac:libb_2.8.1', 'libc_4.3.2', '8.2.0', sc:test_1.3.2', 'ac:exam_2.3.2']

Rules for lazy merging of multiple partnered schemas.

  1. Partnered library schemas MUST have same standard schema partner to merge.

  2. Partnered library schemas with no prefix form a merge group.

  3. Schemas with the same namespace prefix form a merge group.

  4. Schemas in the same merged group are merged in the order.

  5. Standard schemas in a merge group are ignored if already the group partner.

  6. Standard schemas in a merge group raise an error if different from the group partner.

  7. The prefixes of the resulting merge groups must be unique.

  8. If any tags match in two library schemas in a merge group, even if identical, the load fails.

  9. The prologues and epilogues of the schemas are ignored since merge groups are never saved.

  10. Partnered library schemas can specify schema attributes or properties.

  11. New library schema unit classes and their accompanying units are merged directly.

  12. New library schema units under an existing unit class are merged if no conflicts.

  13. New library schema value classes are merged if no conflicts.

If an incompatible list of schemas is given, a SCHEMA_LOAD_FAILED error is generated.

Avoid new auxiliary section entries in library schemas.

Note: With the possible (and rare) exception of new unitClasses and units, partnered library schemas should not have auxiliary sections except for the prologue and epilogue.

Auxiliary sections have information for HED tools, and new entries may require modification to schema validation tools.

If a new entry is needed, contact the HED Working Group (hed.maintainers@gmail.com) to see if the entry might be added to the standard schema instead.

7.4. Library schema design

Library schema should be developed and maintained in MediaWiki format for readability. Developers should always validate the schema before converting to XML. Only validated versions of the schema should be uploaded to the GitHub hed-schemas repository. More information about the development process is contained in the HED schema developers guide.

7.4.1. General design rules

This section summarizes the general design rules for all library schema.

General design rules for HED library schema.

  1. Follow naming conventions:
    A library schema must be given a name containing only alphabetic characters. This name must appear in the schema header line in the required format.
     

  2. Use semantic versioning:
    A library library must use semantic versioning and follow the versioning update rules used by the HED standard schema as specified in Semantic versioning.
     

  3. Tag uniqueness:
    Every term must be unique within the library schema and must conform to the rules for HED schema terms.
     

  4. Have a meaningful prologue:
    The schema should include a prologue section giving an overview, purpose and scope of the library schema.
     

  5. Have a meaningful epilogue:
    The schema should include an epilogue section containing reference, citation, and license information.
     

  6. Be understandable:
    Schema terms should be readily understood by most users. The terms should not be ambiguous and should be meaningful in themselves without reference to their position in the schema hierarchy.
     

  7. Be well-organized:
    If possible, no schema sub-tree should have more than 7 direct subordinate sub-trees.
     

  8. Maintain subtree orthogonality:
    Terms that are used independently of one another should be in different sub-trees (orthogonality).
     

  9. Enforce is-a relationship between child nodes and their parents:
    Every node in a HED hierarchy must be a subclass of its parent node. This is required for HED search generalizability.

Rules 1 through 5 are enforced by validators, while rules 6 through 9 are the responsibility of the schema designers and review committees.

In general, library schema developers should avoid adding schema terms that duplicate those found in the latest HED standard schema at the time of release. Library schema developers should also try to avoid overlap of terms found in other schema libraries.

All HED schemas, including library schemas, must use semantic versions and adhere to the rules specified 3.3 Semantic versioning.

Standalone library schema developers must include the auxiliary schema classes from the standard HED schema including the schema attributes, unit classes, unit modifiers, value classes, and schema properties. No changes should be made to these sections since HED tools support the special auxiliary classes from the standard schema, but in general do not support special handling of added classes beyond basic verification.

If your application requires schema classes that are not available in the standard HED schema and would like these classes to be supported, please make a request using the issues forum of the hed-schemas GitHub repository.

7.4.2. Standalone design rules

The following design rules are specifically meant for standalone library schemas.

Design rules specific to standalone HED library schemas.

  1. Avoid tag duplication:
    The terms in the library schema should not overlap terms present in the latest version of the HED schema at the time of its release.
     

  2. Do not modify the special auxiliary sections:
    The standalone library schema should exactly duplicate of special auxiliary sections of the HED standard schema that was the latest version when this schema version was released. The special sections include: schema attributes, unit classes, unit modifiers, value classes, and schema properties.
     

  3. Avoid adding special auxiliary items:
    A library schema may not modify any of the items in the special sections of the HED standard schema.
     

  4. Obtain the appropriate reviews early:
    Any additions to the special sections must be reviewed by the HED Working Group to determine what requirements the additions would impose on downstream tools. This should be done as early in the process as possible.

Standalone library schemas are no longer recommended because of the difficulty in enforcing conflict rules with HED standard schemas.

7.4.3. Partnered design rules

Partnered library schemas are now the recommended format for the reasons listed in Motivation for partners. The following design rules are specifically meant for partnered library schemas.

Design rules specific to partnered HED library schemas.

  1. Check for overlap:
    The terms in the partnered library schema must not overlap with terms present in its partnered standard schema.
     

  2. Use the latest released version of the standard schema:
    A partnered library schema should always use the latest version of the HED schema available at the time of its release.
     

  3. Do not put any auxiliary sections:
    A partnered library schema should not contain the special auxiliary sections (e.g., schema attributes, unit classes, unit modifiers, value classes, and schema properties), unless a new item is added to the section, in which only that item should appear.
     

  4. Seek reviews early in the process:
    Any additions to the special sections must be reviewed by the HED Working Group to determine what requirements the additions would impose on downstream tools.

It is recognized that HED standard and library schemas will both evolve and that additions or tag reorganizations may cause conflicts. These conflicts must be resolved as they occur. In general the standard schema takes precedence over any library schema in resolving these conflicts.

7.4.4. Schema namespaces

As part of the HED annotation process, users must associate one or more HED schemas with their datasets. Since it would be impossible to avoid naming conflicts across schema libraries built in parallel by different user communities, HED supports schema library namespaces to facilitate the use of multiple schemas in annotating a datasets.

If multiple schemas are used, users must define a local namespace for each additional schema and prefix the tags from each of these additional schemas by their respective namespace in annotations. The local names should be strictly alphabetic with no blanks or punctuation. If a tag namespace prefix is invalid in the version specification, a schema loading error occurs.

Example: Driving library schema example tags.

dp:Drive-action/Change-lanes
dp:Drive/Change-lanes
dp:Change-lanes

A colon (:) is used to separate the qualifying local name from the remainder of the tag.

The introduction of partnered library schemas has greatly reduced the need for namespaces, since the most common use case is a library schema used with a standard schema.

7.5. Library schemas in BIDS

The most common use case (for 99.9% of the HED users) is to tag events using a standard HED schema (preferably the latest one) available in the standard_schema/hedxml directory of the hed-schemas repository of the hed-standard organization on GitHub. The standard schemas are available at: https://github.com/hed-standard/hed-schemas/tree/main/standard_schema.

The official library schemas are available at https://github.com/hed-standard/hed-schemas/tree/main/library_schemas.

Standard schemas are referenced by their version number (e.g., 8.1.0), while library schema are referenced by a combination of library name and version number (e.g., score_1.0.0).

For BIDS datasets, the versions of the HED schema are specified by the HEDVersion field of the BIDS dataset_description.json file. The following example specifies that version 8.1.0 of the standard HED schema is to be used in addition to score library schema version 1.0.0.

Illustration of using the namespace prefix for tagging.

The dataset_description.json file contains:

{
  "Name": "A great experiment",
  "BIDSVersion": "1.8.0",
  "HEDVersion": ["8.1.0", "sc:score_1.0.0"]
}

A typical annotation is:

"Data-feature, sc:Photomyogenic-response, sc:Wicket-spikes"

Based on the above description tools will download:

  1. The standard HED schema:
    https://raw.githubusercontent.com/hed-standard/hed-schemas/main/standard_schema/hedxml/HED8.1.0.xml.

  2. The HED score library schema version 1.0.0:
    https://raw.githubusercontent.com/hed-standard/hed-schemas/main/library_schemas/score/hedxml/HED_score_1.0.0.xml.

In the dataset annotations for the above example, tags drawn from the score schema would be prefixed with sc:, where sc is a local name used to distinguish tags from the additional schema.

The array specification of the schema versions in BIDS can have at most one version appearing without a colon prefix.

SCORE version 1.0.0 is not partnered, so the HED version specification had to include both the library and standard schema versions. In contrast, SCORE version 1.1.0 is partnered with HED standard schema 8.2.0, so no namespace prefixes are needed as shown in the following example:

Example: An example specification of HED version for a partnered schema.

The dataset_description.json file contains:

{
  "Name": "A great experiment",
  "BIDSVersion": "1.8.0",
  "HEDVersion": "score_1.1.0"
}

A typical annotation is:

"Data-feature, Photomyogenic-response, Wicket-spikes"