3. HED formats¶
This chapter describes the requirements and formats for HED schema and HED annotations.
3.1. Schema formats¶
A HED schema is a formal specification of a HED vocabulary and annotation format rules. A HED schema vocabulary is organized hierarchically so that similar concepts and terms appear close to one another in the organizational hierarchy.
HED schema nodes must satisfy an “is-a” relationship with their parent nodes in the schema. That is, if node A is an ancestor of node B in the schema, then B is a type of A. This relationship is fundamental to HED and permits search generality. Searches for A are able to also return instances of B.
A key requirement for third generation HED (versions >=8.0.0) is that all node names (tag terms) in
the HED schema (except for #
placeholders) must be unique.
Additional details about HED schema format can be found in appendix A. Schema format details. 7. Library schemas discusses the additional requirements and restrictions on library schemas.
B.2. Schema validation errors gives the errors Library specific schema issues usually generate SCHEMA_LIBRARY_INVALID errors.
3.1.1. Official schema releases¶
The HED ecosystem supports a standard base schema and additional discipline-specific library schemas. (See the expandable schema viewer to explore existing schemas.)
Releases of the HED standard base schema are stored in standard_schema/hedxml directory of the hed-schemas repository.
Releases of a HED library schemas are stored in a subdirectory of library_schemas whose name is the library name.
3.1.2. Schema layout overview¶
Schemas can be specified in either .mediawiki
or .xml
format.
Online tools
provide an easy way for users to validate schema and convert between formats.
HED schema developers usually use .mediawiki
format for more convenient editing,
display, and viewing on GitHub.
However, the stable links provided for tools to access and download the HED schema
are to the XML versions.
Both formats must be available and synchronized in the
hed/standard/hed-schemas GitHub repository.
Regardless of the format, a valid HED schema must have the following sections in this order:
Required sections of a HED schema (in the required order):
Section |
Mediawiki format |
XML format |
---|---|---|
Header line |
|
|
Prologue |
|
|
Schema start |
|
|
Schema end |
|
|
Unit classes |
|
|
Unit modifiers |
|
|
Value classes |
|
|
Schema attributes |
|
|
Properties |
|
|
Epilogue |
|
|
Ending line |
|
|
The sections in the .xml
version must always be terminated by closing </ >
tokens,
whereas the sections of the .mediawiki
version, which is line-oriented,
are terminated when the next section begins (#!
) or a top tag ('''
) is encountered.
The actual HED tag specifications (referred to in the discussion as nodes or tag terms)
appear in the schema
section,
while the remaining sections specify additional information and behavior.
These additional sections are required, but are allowed to be empty.
If any of the required sections of the schema are missing or out of order, a SCHEMA_SECTION_MISSING error occurs.
Each of the schema sections has “schema attributes”, which are the attributes that may be assigned to elements in a given section. If a schema attribute is applied improperly to an element in a given section, the SCHEMA_ATTRIBUTE_INVALID error occurs.
See Appendix A. Schema format details for additional details.
3.1.2.1. The header¶
The schema header line specifies the version, which must satisfy semantic versioning. See SCHEMA_VERSION_INVALID.
A schema’s library name or lack there of is used to locate the schema in the HED schema repository located in the hed-schemas GitHub repository.
The header line may optionally include an XSD namespace specification. If the schema contains any additional unrecognized attributes, SCHEMA_HEADER_INVALID error occurs.
3.1.2.2. The prologue¶
The prologue should contain a concise introduction to the schema and its purpose. Together with the epilogue section, the contents are used by tools to provide information about the schema to the users.
The prologue may only contain the following: letters, digits, blank, comma, newline, +, -, :, ;, ., /, (, ), ?, *, %, $, @ or a SCHEMA_CHARACTER_INVALID error occurs.
3.1.2.3. The schema section¶
The schema section contains the actual vocabulary contents of the schema. Each element in this section is a node element, which we will also call a tag term. The location of the node element within the section specifies its relationship to other tag terms in the schema.
A node element specifies a name,
node attributes, and an informative description of the tag term’s meaning.
A node name may only contain alphanumeric characters, hyphen, and underscore.
An exception to this is the #
character which is used to represent a placeholder
for a value to be provided during annotation.
See SCHEMA_CHARACTER_INVALID and
Each schema node element must be unique or a SCHEMA_DUPLICATE_NODE error is generated.
3.1.2.4. Unit classes and units¶
The unit classes are attributes that modify the #
schema placeholder nodes.
The unit class definition section specifies the allowed unit classes for the schema
as well as the associated units that can be used with tags that take values.
Only the singular version of each unit is explicitly specified,
but the corresponding plurals of the explicitly mentioned
singular version are also allowed (e.g., feet
is allowed in addition to foot
).
HED uses a pluralize
function available in both Python and Javascript to check validity.
Units may be in one of four forms as designated by their unit type attributes:
Unit type |
Unit type attributes |
---|---|
SI unit |
only |
SI unit symbol |
both |
unit that is not an SI unit |
no unit type attribute |
unit symbol is not an SI unit |
only |
Most units appear after the value in annotations. However, certain units such as $
appear before their corresponding values.
These units have the unitPrefix
attribute.
If a unit class, SIUnit
, or unitPrefix
attribute appears in a
section other than the unit class definition section of the schema,
a SCHEMA_ATTRIBUTE_INVALID error occurs.
See appendix A.1.1. Unit classes and units
for additional details and a listing.
Units are not case-sensitive, but unit symbols maintain their case.
3.1.2.5. Unit modifiers¶
The unit modifier definition section lists the SI unit multiples and submultiples
that are allowed to be prepended to units that have the SIUnit
schema attribute.
Unit modifiers can only be used with SI units and SI unit symbols.
SI unit modifiers used with ordinary SI units have the SIUnitModifier
attribute,
while unit modifiers used with SI unit symbols have the SIUnitSymbolModifier
attribute.
If a SIUnitModifier
, or SIUNitSymbolModifier
attribute appears in a
section other than `unit modifier section of the schema,
a SCHEMA_ATTRIBUTE_INVALID error occurs.
Unit modifiers are case-sensitive.
See appendix A.1.2. Unit modifiers for additional details and a listing of values for the standard schema.
3.1.2.6. Value classes¶
The value class definition section specifies rules for
the values that are substituted for placeholders (#
).
Examples are special characters that are allowed for numeric values
or dates. Placeholders that have no valueClass
attributes, are assumed to take textClass
values.
See appendix A.1.3. Value classes for additional details and a listing of values for the standard schema.
3.1.2.7. Schema attributes¶
The schema attribute definition section lists the schema attributes that may be applied to schema elements in other sections of the schema (except for the properties section).
The specification of which type of schema elements a particular schema attribute may apply to is specified by its schema properties. If a schema attribute appears in a section contradicted by its properties, a SCHEMA_ATTRIBUTE_INVALID error occurs.
See appendices A.1.4. Schema attributes and A.1.5. Schema properties for additional details and a listing for the standard schema.
3.1.2.8. Schema properties¶
The schema properties section lists the allowed properties of the schema attributes. These properties help tools validate certain requirements directly based on the HED schema rather than on a hard-coded implementation.
There are two types of properties: form type and section type properties.
The boolProperty
is a form type property indicating that a schema attribute
does not take a value.
Rather, its presence indicates true and absence indicate false.
The section type properties indicate the sections in which a schema attribute may appear.
The section properties include unitClassProperty
, unitModifierProperty
,
unitProperty
, and valueClassProperty
.
Schema attributes without any section properties are assumed to apply to node elements.
A schema attribute may have multiple section properties, indicating that the attribute may appear as an attribute in multiple sections of the schema.
See A.1.4 Schema attributes and A.1.5. Schema properties for information and a listing of schema attributes and their respective properties.
3.1.2.9. The epilogue¶
The epilogue should give license information, acknowledgments, and references.
The epilogue may only contain the following: letters, digits, blank, comma, newline, +, -, :, ;, ., /, (, ), ?, *, %, $, @ or a SCHEMA_CHARACTER_INVALID error occurs.
3.1.3. Naming conventions¶
The different parts of the HED schema have different rules for the characters and the names that are allowed.
UTF-8 characters are not supported.
3.1.3.1. Node elements¶
Schema designers and users that extend HED schema or develop library
schema will be mainly concerned with nodes (tag terms) found in the schema section.
The names of these elements must conform to the rules for
nameClass
.
Other conventions and requirements for the contents of schema node elements are as follows:
Naming conventions for nodes (tag terms) in HED schema.
By convention, the first letter of a schema node (tag term) should be capitalized with the remainder lower case.
Schema node names consisting of multiple words may not contain blanks and should be hyphenated.
Schema descriptions should be concise sentences, possibly with clarifying examples.
Schema descriptions may include characters allowed by
textClass
as well as commas. They may not contain square brackets, curly braces, quotes, or other characters.
3.1.3.2. Epilogue and prologue¶
The epilogue and prologue section text must conform to the rules for
textClass
.
The section text may have new lines, which are preserved.
3.1.3.3. Naming in other blocks¶
The names of elements corresponding to schema attributes, schema properties, unit classes, and value classes should start with a lower case letter, with the remainder in camel case.
Units and unit modifiers follow the naming conventions of the units they represent.
Case is preserved for unit modifiers, as uppercase and lowercase versions often have distinct meanings. The case for unit symbols is also maintained.
3.1.4. Mediawiki schema format¶
Mediawiki is a markdown-like format that was selected as the HED schema editing format because of its flexibility and ability to represent nested or hierarchical relationships.
The format is line-oriented, so each schema entry should be on a single line.
The schema must follow the layout described in the previous section. All sections are required, although they may be empty.
Top nodes in the schema are enclosed by pairs of three single quotes ('''
).
The levels of other nodes are designated by the number of asterisks (*
) at the beginning of the respective defining lines.
Each term is separated from its level-indicating asterisks by a single space.
Descriptions, which are enclosed in square brackets ([ ]
),
indicate the meaning of the item they modify.
The descriptions are displayed to users by schema browsers and other tools,
so every effort should be made to make them informative and clear.
Attributes are enclosed with curly braces ({ }
).
These attributes provide additional rules about how the item and
modifying values should be used and handled by tools.
If an attribute or property is referenced in the schema, it must be defined in the appropriate definition section of the schema, or schema processing tools will generate a SCHEMA_ATTRIBUTE_INVALID error.
Allowed HED node attributes include unit class and value class values as well as
HED schema attributes that do not have one of the following modifiers:
unitClassProperty
, unitModifierProperty
, unitProperty
, or valueClassProperty
.
Note: schema attributes having the elementProperty
may apply anywhere in the
schema, including the schema header,
schema attributes having the nodeProperty
may only apply to node elements.
HED schema attributes that have the boolProperty
appear with just their name
in the schema element they are modifying.
The presence of such an attribute indicates that it is true or present.
HED schema attributes that do not have the boolProperty
are specified in the form of a
name=value
pair.
If multiple values of a particular attribute are applicable,
they should be specified as name-value pairs separated by commas within the curly braces.
The following example shows a simple HED schema in .mediawiki
format.
Example: Example HED schema in .mediawiki format.
HED version="8.0.0"
'''Prologue'''
This prologue introduces the schema.
!# start schema
'''Event''' <nowiki>[Something that happens at a given place and time.]</nowiki>
* Sensory-event <nowiki>{suggestedTag=Task-event-role,suggestedTag=Sensory-presentation}[Something perceivable by an agent.]</nowiki>
. . .
'''Property'''<nowiki>{extensionAllowed}[A characteristic.] </nowiki>
* Informational-property <nowiki>[A quality pertaining to information.]</nowiki>
** Label <nowiki>[A string of 20 or fewer characters.]</nowiki>
*** <nowiki># {takesValue}</nowiki>
!# end schema
'''Unit classes''' <nowiki>[Unit classes and units for the nodes.]</nowiki>
. . .
'''Unit modifiers''' <nowiki>[Unit multiples and submultiples.]</nowiki>
. . .
'''Value classes''' <nowiki>[Rules for the values provided by users.]</nowiki>
. . .
'''Schema attributes''' <nowiki>[Allowed node attributes.]</nowiki>
* extensionAllowed <nowiki>{boolProperty}[Attribute indicating that users can add child nodes.]</nowiki>
* suggestedTag <nowiki>[Attribute indicating another tag that is often associated with this tag.]</nowiki>
* takesValue <nowiki>{boolProperty}[Attribute indicating a placeholder to be replaced by a user-defined value.] </nowiki>
. . .
'''Properties''' <nowiki>[Properties of the schema attributes.]</nowiki>
* boolProperty <nowiki>[Indicates a schema attribute represents a boolean.]</nowiki>
. . .
'''Epilogue'''
An optional section that is the place for notes and is ignored in HED processing.
!# end hed
In the above example, Property
in the schema
section is a top node because it appears
enclosed by three single quotes, while Informational-property
is a first-level node
because its defining line begins with a single asterisk (*
).
Sensory-event
in the schema
section has a suggestedTag
attribute (shown in curly braces).
Similarly, Property
has an extensionAllowed
attribute, and the #
placeholder has a takesValue
attribute.
The schema attributes
section must include definitions of suggestedTag,
extensionAllowed
and takesValue
or the schema will not validate.
The definition of the takesValue
attribute has boolProperty
,
so a definition of boolProperty
must be included in the Properties
section
or the schema will not validate.
Everything after each HED node (tag term) must be enclosed by <nowiki></nowiki>
markup elements.
The contents within these markup elements include the description and attributes.
Within the HED schema a #
node indicates that the user must supply a value
consistent with the unit and value class attributes of the #
node during annotation.
Lines with hashtag (#
) placeholders should have
everything after the asterisks, including the #
placeholder, enclosed by <nowiki></nowiki>
markup elements.
Additional details and rules can be found in appendix A.2 Mediawiki file format
3.1.5. XML schema format¶
The .xml
format directly mirrors the order and information in the .mediawiki
version of the schema.
The <node>
elements of the schema represent the HED tags (tag terms),
with remaining schema elements specifying additional information and properties.
Each <node>
element must have a <name>
child element corresponding to the HED tag term
that it specifies.
A <node>
element should also have a <description>
child element whose content
corresponds to the text that appears in square brackets ([ ]
) in the .mediawiki
version.
The schema attributes, which appear as name
values or name-value
pairs enclosed in
curly braces ({ }
) in the .mediawiki
file, are translated into <attribute>
child elements
of <node>
in the .xml
. These <attribute>
elements always have a <name>
element child
and also have a <value>
element if the corresponding schema attribute does not have boolProperty
.
The following is a translation of the .mediawiki
example from the previous section in the HEDXML format.
Example: XML version of the example schema in the previous section.
<?xml version="1.0" ?>
<HED version="8.0.0">
<prologue>This prologue introduces the schema.</prologue>
<schema>
<node>
<name>Event</name>
<description>Something that happens at a given place and time.</description>
<node>
<name>Sensory-event</name>
<description>Something perceivable by an agent.</description>
<attribute>
<name>suggestedTag</name>
<value>Task-event-role</value>
</attribute>
</node>
</node>
. . .
<node>
<name>Property</name>
<description>A characteristic of some entity.</description>
<attribute>
<name>extensionAllowed</name>
</attribute>
<node>
<name>Informational-property</name>
<description>A quality pertaining to information.</description>
<node>
<name>Label</name>
<description>A string of less than 20.</description>
<node>
<name>#</name>
<attribute>
<name>takesValue</name>
</attribute>
</node>
</node>
</node>
</node>
</schema>
<unitClassDefinitions></unitClassDefinitions>
<unitModifierDefinitions></unitModifierDefinitions>
<valueClassDefinitions></valueClassDefinitions>
<schemaAttributeDefinitions>
<schemaAttributeDefinition>
<name>extensionAllowed</name>
<description>Attribute indicating that users can add child nodes.</description>
<property>
<name>boolProperty</name>
</property>
</schemaAttributeDefinition>
<schemaAttributeDefinition>
<name>suggestedTag</name>
<description>Attribute indicating another tag that is often associated with this tag.</description>
</schemaAttributeDefinition>
<schemaAttributeDefinition>
<name>takesValue</name>
<description>Attribute indicating a placeholder to be replaced by a user-defined value.</description>
<property>
<name>boolProperty</name>
</property>
</schemaAttributeDefinition>
</schemaAttributeDefinitions>
<propertyDefinitions>
<propertyDefinition>
<name>boolProperty</name>
<description>Attribute indicating a placeholder to be replaced by a user-defined value.</description>
</propertyDefinition>
</propertyDefinitions>
<epilogue>This epilogue is a place for notes and is ignored in HED processing.</epilogue>
</HED>
Additional details and rules can be found in appendix A.3 XML file format
3.2. Annotation formats¶
HED annotations are comma-separated strings of HED tags drawn from a HED schema vocabulary. HED validators and other tools use the information encoded in the relevant schema when performing validation and other processing of HED annotations.
Users must provide the version of the HED schema they are using when creating an annotation.
3.2.1. Vocabulary organization¶
HED (Hierarchical Event Descriptors) are nodes (tag terms) organized hierarchically under their
respective root or top nodes.
In HED versions >= 8.0.0 these top nodes are:
Event
, Agent
, Action
, Item
, Property
, and Relation
.
Each top node and its subtree represent distinct is-a relationships
for the vocabulary schema.
The Event
subtree tags indicate the general event category, such as whether it
is a sensory event, an agent action, a data feature, or an event indicating experiment control or structure.
The HED annotations describing each event may be assembled from a number of sources during processing and the annotations associated with a single event marker may represent multiple events.
Many analysis tools use the Event
tags as a primary means of
segregating, epoching, and processing the data.
Ideally, tags from the Event
subtree should appear at the top level of the
HED annotation describing an event to facilitate analysis.
The Agent
subtree tags indicate the types of agents (e.g., persons, animals, avatars)
that take an active role or produce a specified effect. An Agent
tag should be
grouped with property tags that provide information about the agent, such as
whether the agent is an experiment participant.
The Action
subtree tags indicate actions performed by agents. Generally these are
grouped in a triple (A
, (Action
, B
)) which is interpreted as A
does Action
on B
.
If the action does not have a target, it should be annotated (A
, (Action
)), meaning
A
does Action
.
The Item
subtree tags represent things with (actual or virtual) physical existence
such as objects, sounds, or language.
Descriptive tags are organized in the Property
subtree. These descriptive
tags should always be grouped with the tags they describe using parentheses.
Binary relations are in the Relation
subtree. Like items from the Action
subtree,
these should be annotated using (A
, (Relation
, B
)).
3.2.2. Tag forms¶
A HED tag is a term in the HED vocabulary identified by a path consisting of the
individual node names from some branch of the HED schema hierarchy
separated by forward slashes (/
).
Valid HED tags do not have leading or trailing forward slashes (/
).
A HED tag path may also not have consecutive forward slashes.
An important requirement of third generation HED (versions >= 8.0.0) is that the node names in the HED schema must be unique. As a consequence, the user may specify as much of the path to the root as desired when using the tag in annotation.
The full path version is referred to as long form, and the version with only the final tag element (excluding placeholder) is called short form.
Any intermediate form of the tag path is also allowed as illustrated by this example:
HED tools are available to map between shortened and long forms as needed. The tag must be associated with a schema and must correspond to a path in the schema (excluding any extension or value).
See NODE_NAME_EMPTY for errors involving
forward slashes (/
) and TAG_INVALID for
other types of tag syntax errors.
3.2.3. Tag case-sensitivity¶
Although by convention tag terms start with a capital letter with the remainder being lower case, tag processing is case-insensitive. This convention makes annotation strings more readable and is recommended for tag extensions. Validators and other tools must treat tags containing the same characters, but different variations in capitalization as equivalent.
The only exception to the case-insensitive processing rule is that the correct case of units should be preserved, both during schema processing and during annotation processing. This rule is required because SI distinguishes symbols and unit modifiers that differ in case.
3.2.5. Tag extensions¶
A tag extension, in contrast to a value, is a tag that users add
as a child of an existing schema node as a more specific term for an item already in the schema.
For example, a user might want to use Helicopter
instead of the more general term Aircraft
.
Since Aircraft
inherits the extensionAllowed
attribute,
users may use extended tags such as Aircraft/Helicopter
in their annotation.
The requirements for such an extension are:
Warning
Requirements for tag extensions by users:
Unlike values, an extension term must not already be a node in the schema.
The extension term must only have alphanumeric, hyphen, or underbar characters so that it conforms to the rules for a nameClass value.
The parent of the tag extension must always be included with the extended tag in annotation.
The extension term must satisfy the “is-a” relationship with its parent node.
The
#
placeholder cannot be used as an extension – in particular it cannot be used as a placeholder in definitions or as value annotations in sidecars.
Note: The is-a relationship is not checked by validators. It is needed so that term search works correctly.
Tag extensions should follow the same naming conventions as those for schema nodes. See 3.1.3. Naming conventions for more information about HED naming conventions. A STYLE_WARNING warning is issued for extension tags that do not follow the HED naming convention.
Users should not use tag extension unless necessary for their application, as this breaks the commonality among annotations across datasets. Please open an issue proposing that the new term be added to the schema in question, if you think the term would be useful to other users.
See TAG_EXTENSION_INVALID for information on the specific validation errors associated invalid tag extensions.
Note: User tag extensions are sometimes accidental and due to misspelling, particularly when a long or intermediate form of the tag is used. For this reason the TAG_EXTENDED warning is issued for extended tags during validation.
3.2.6. Tag namespace prefixes¶
Users may select tags from multiple schemas, but additional schemas must be included in the HED version specification.
Users are free to use any alphabetic prefix and associate it with a specific schema in the HED version specification. Tags from the associated schema must be prefixed with this namespace designator (including the colon) when used in annotation.
Terms from only one schema can appear in the annotation without a namespace prefix followed by a colon.
See TAG_NAMESPACE_PREFIX_INVALID for information on the specific validation errors associated with missing schemas.
See 7.4. Library schema in BIDS for an example of how the namespace prefix notation is used in BIDS.
3.2.7. Strings and groups¶
A HED string is an unordered, comma-separated list of HED tags and/or HED tag groups.
A HED tag group is an unordered, comma-separated list of HED tags and/or tag groups enclosed in parentheses. Tag groups may include other tag groups.
The validation errors for HED tags and HED strings are summarized in Appendix B: HED errors.
3.2.7.1. Parenthesis and order¶
Any ordering of HED tags and HED tag groups at the same level within a HED string is equivalent. Valid HED strings may have parentheses nested to arbitrary levels (nested groups). The parentheses must be properly nested and matched.
Parentheses are meaningful and convey association.
If A
and B
represent HED expressions, (A
, B
) is not equivalent to
the HED string A
, B
.
The distinction should be preserved if possible.
(A
, B
) means that HED tag A
and HED tag B
are associated with each other,
whereas A
, B
means that A
and B
are each annotating some larger construct.
Specific rules of association will be encoded in a future version of the HED specification.
See PARENTHESES_MISMATCH for validation errors result from improper use of parentheses.
3.2.7.2. Tag group attributes¶
A HED tag corresponding to a schema node with the tagGroup
attribute
must appear inside parentheses (e.g., must be in HED tag group).
A HED tag corresponding to a schema node with the topLevelTagGroup
must appear
in an unnested HED group in an assembled HED annotation.
Only one tag with the topLevelTagGroup
attribute may appear in the same
top-level group.
The topLevelTagGroup
attribute is usually associated with tags
that have special meanings in HED such as Definition
and Onset
.
See TAG_GROUP_ERROR for information on the group errors detected based on schema attributes.
3.2.7.4. Repeated expressions¶
Duplicated tag expressions at the same level in a
HED tag group or HED string are not allowed.
For example, the expressions (Red
, Blue
, Red
) and
(Red
, Blue
), (Red
, Blue
) have duplicated tag expressions at the same
level and are hence invalid.
See TAG_EXPRESSION_REPEATED for more details on validation errors due to repeated tag expressions.
3.2.9. Sidecars¶
JSON sidecars are an integral part of the BIDS (Brain Imaging Data Structure) neuroimaging standard and are used to associate metadata with data files.
The JSON sidecars that are relevant to HED are associated with tabular data files. For example, the rows of tabular event files represent time markers on the experimental timeline, and the assembled HED annotations for each row describe what happened at that time marker. A sidecar containing annotations associated with the columns of such an event file allows HED tools to assemble HED annotations for each row of the file.
In addition to sidecars, HED annotations can also be given in the HED
column of tabular files.
At validation or analysis time the HED information from both the HED
column of a tabular file
and its associated sidecar are assembled to provide the annotation.
HED validators assume that the annotation dictionary is saved in JSON format and that they comply with the BIDS sidecar format.
3.2.9.1. Sidecar entries¶
A BIDS sidecar is a JSON dictionary with several types of entries, three of which are relevant to HED:
The other types of sidecar entries include categorical and value
entries with no "HED"
key, as well as arbitrary entries
whose keys do not correspond to column names in an associated tabular file.
When annotations are assembled, sidecar entries with no "HED"
key are ignored
as are entries in the corresponding tabular data file that have n/a
or blank values.
See 3.2.9.4. A sidecar example for an elaborated example of these different types of entries and 3.2.10.2 Event-level processing for an example of how the resulting HED annotations are assembled.
3.2.9.2. Sidecar validation¶
All HED-related entries in a JSON sidecar must
have "HED"
as a key in a second-level dictionary.
"HED"
cannot appear as a sidecar key that is not at the second level.
Further, a sidecar is not permitted to provide a HED annotation for n/a
.
Both of these generate a SIDECAR_INVALID error.
HED definitions are required to be separated into dummy sidecar column entries and cannot appear in sidecar entries containing tags other than definitions. A HED definition appearing in a categorical or value sidecar entry generates a DEFINITION_INVALID error.
The sidecar does not have to provide a HED-relevant entry for every event file column. Columns with no corresponding sidecar entry are skipped during assembly of the HED annotation for an event file row. However, if a value is encountered in a tabular file column that is annotated as a categorical column but does not have a HED annotation, a SIDECAR_KEY_MISSING warning is generated.
HED value sidecar entries must contain exactly one #
placeholder in
the HED string annotation associated with the entry.
The #
placeholder should correspond to a #
in the HED schema,
indicating that the parent tag (also included in the annotation) expects a value.
These issues generate a PLACEHOLDER_INVALID error.
If the placeholder is followed by a unit designator, the validator checks that
these units are consistent with the unit class of the
corresponding #
in the schema. The units are not mandatory.
3.2.9.3. Sidecar curly braces¶
The curly brace notation is new with HED specification version 3.2.0 and is supported by all versions of the HED schema ≥ 8.0.0. The notation was introduced to facilitate proper nesting of HED tags associated with different event file columns when the complete HED annotation for an event marker is assembled.
When a column name appears in curly braces within a HED annotation in a JSON sidecar, the corresponding HED annotation for that row is substituted for the curly braces and their contents when the HED annotation is assembled.
Rules for curly braces notation in sidecars.
The item within the curly braces must either be the word
HED
or the name of another HED-annotated column within the sidecar.The HED annotation for the column in curly braces directly replaces the curly braces and their contents in the target annotation.
During assembly of a HED annotation for an event, if the ‘n/a’ value appears in a curly brace column, the curly brace expression including the curly braces as well as any extra parentheses or commas are removed.
A sidecar column name cannot both appear in a curly braces and have an annotation that uses curly braces (to prevent circular references).
The curly braces cannot be used within a
Definition
.
If curly braces appear in the HED column of a tabular file, a CHARACTER_INVALID error is generated.
If curly braces appear in a Definition
,
a DEFINITION_INVALID error is generated.
If the curly brace notation is used improperly in a sidecar or elsewhere, a SIDECAR_BRACES_INVALID is generated.
3.2.9.4. A sidecar example¶
The following example illustrates the different types of JSON sidecar entries.
Different types of sidecar annotation entries that might appear in
{
"event_type": {
"LongName": "Event category",
"Description": "Indicator of type of event.",
"Levels": {
"show": "Show a face to a participant.",
"press": "Participant presses key to indicate symmetry."
},
"HED": {
"show": "Sensory-event, Visual-presentation, {stim_file}",
"press": "Agent-action, (Experiment-participant, (Press, {key}))"
}
},
"stim_file": {
"LongName": "Stimulus image file",
"Description": "Time from stimulus presentation until subject presses button",
"HED": "(Image, Face, Pathname/#)"
},
"key": {
"LongName": "Indicates which key is pressed.",
"Description": "Indicator of participant evaluation.",
"HED": {
"left-arrow": "((Leftward, Arrow), Keypad-key)",
"right-arrow": "((Rightward, Arrow), Keypad-key)"
}
},
"symmetry": {
"LongName": "Indicates symmetrical or asymmetrical.",
"Description": "Indicates the participant's judgement of symmetry.",
"HED": {
"symmetric": "(Judge, Asymmetrical)",
"asymmetric": "(Judge, Symmetrical)"
}
},
"dummy_defs": {
"HED": {
"MyDef1": "(Definition/Cue1, (Buzz))",
"MyDef2": "(Definition/Image/#, (Image, Face, Label/#))"
}
}
}
In the example, "event_type"
is the name of a column that is annotated using the
categorical strategy.
Its top-level dictionary has "LongName"
, "Description"
, "Levels"
, and "HED"
keys.
The value of "Levels"
is a dictionary with the unique values in the "event_type"
column keyed to full text descriptions of these unique values.
The value of "HED"
is a dictionary with the unique values in "event_type"
keyed to the corresponding HED annotations of these unique values.
In the above example, the unique values are "show"
and "press"
.
The HED annotation for show
is "Sensory-event, Visual-presentation, {stim_file}"
.
Notice use of curly braces in the notation. Here "stim_file"
must
correspond to another HED-annotated column in the sidecar.
The "stim_file"
column is an example of a value column.
Its top level dictionary keys are "LongName"
, "Description"
, and "HED"
.
and its annotation entry:
"(Image, Face, Pathname/#)"
.
This annotation has a single #
.
The filename in the stim_file
column replaces this #
when the HED annotation for a
line in an associated events.tsv
file is assembled.
Since "stim_file
and "key"
appear within curly braces in annotations
for "event_type"
, their HED annotations can not use curly braces.
The "dummy_defs"
is an example of a dummy annotation.
The value of this entry is a dictionary with a "HED"
key
pointing to a dictionary.
A dummy annotation is similar in form to a categorical annotation,
but its keys do not correspond to any event file column values.
Rather it is used as a container to organize HED definitions.
In the example,
Definition/Cue1
is a definition that does not use a placeholder (#
) modifier in its name,
while Definition/Image/#
is a definition whose name Image
is modified by a placeholder value.
Notice that Image
is both a definition name and an actual tag in the schema in this example.
This is permitted.
3.2.10. Tabular files¶
A tabular file is a text file in which each line represents a row in a table. The column entries in a given row are separated by tabs. Further, the first line of the file must contain a tab-separated list of column names, which should be unique. This description of tabular file conforms to that used by BIDS.
Generally each row in a tabular file represents an item and the columns values provide properties of that item. The most common HED-annotated tabular file represents event markers in an experiment. In this case each row in the file represents a time at which something happened.
Another common HED-annotated tabular file represents experiment participants. In this case each row in the file represents a participant, and the columns provide characteristics or other information about the participant identified in that row.
In any case, the general strategy for validation or other processing is:
Process the individual components of the HED annotation (tag and string level processing).
Assemble the component annotations for a row (event or row level processing).
Check consistency and relationships among the row annotations (file-level processing).
3.2.10.1. Tabular annotations¶
HED annotations in tabular files can occur both in a HED
column within the file and
in an associated JSON sidecar.
The HED strings that appear in a HED
column must be valid HED strings.
Definitions many not appear in the HED
column of a tabular file.
Definitions may not appear in any entry of a JSON sidecar corresponding
to a column of the tabular file.
3.2.10.2. Event-level processing¶
After individual HED tags and HED strings in the HED
column of tabular files and
in the associated sidecars are validated or otherwise processed,
the HED strings associated with each row of the tabular file must be assembled to provide an overall
annotation for the row.
We refer to this as event-level or row processing.
If the HED schema used for processing contains a schema node that has the required
attribute, then
the assembled HED annotations for each row must include that tag.
Currently, HED schema versions ≥ 8.0.0 do not contain any nodes with the required
attribute, and this attribute may be deprecated in future versions of the schema.
If the HED schema used for processing contains a schema node that has the unique
attribute,
then the assembled HED annotations for each row must contain no more than one occurrence of that tag.
Currently, only Event-context
has the unique
attribute for HED schema versions ≥ 8.0.0.
See REQUIRED_TAG_MISSING
and TAG_NOT_UNIQUE for information
on the validation errors that may occur with tags that have the required
or unique
schema attributes, respectively.
General procedure for event-level (row) assembly.
Create an empty result list.
Create an assembly list of columns that contain HED annotations and whose names do not appear in the curly braces of other HED annotations.
For each the column in the assembly list look up the annotation in the sidecar, replacing all curly braces and place holder values appropriately. Append to the result list.
If a
HED
column annotation exists for that row andHED
did not appear in curly braces in the sidecar, concatentate the annotation to the result list.Finally, join all the entries of the result list using a comma (
,
) separator.
In all cases n/a
column values are skipped.
To illustrate the assembly process, consider the following excerpt from an event file:
General procedure for event-level (row) assembly.
onset |
duration |
event_type |
stim_file |
key |
symmetry |
HED |
---|---|---|---|---|---|---|
3.42 |
n/a |
show |
h234.bmp |
n/a |
n/a |
“(Recording, Label/Setup)” |
3.86 |
n/a |
press |
n/a |
left-arrow |
asymmetric |
n/a |
7.42 |
n/a |
show |
h734.bmp |
n/a |
n/a |
n/a |
Using the example sidecar results in the following assembled HED annotation for the first row of the event file:
A result for event-level (row) assembly of the sample file.
"Sensory-event, Visual-presentation, (Image, Face, Pathname/h234.bmp), (Recording, Label/Setup)"
The specific annotation (Image, Face, Pathname/h234.bmp)
has been substituted for
{stim_file}
and the annotation for in the HED
column of the events.tsv
file
has been included. The entries with n/a
have been ignored.
For more examples of event assembly, see How HED works in BIDS tutorial.
3.2.10.3 File-level processing¶
HED versions >= 8.0.0 allow annotation of relationships among rows in a tabular file. Hence, processing generally requires that annotations for all the rows be assembled so that consistency can be checked.
To validate temporal scope, the validator must assure that each Onset
and Offset
tag
is associated with an appropriately defined identifier corresponding to a definition name.
The validator must also check to make sure that Onset
and Offset
tags are
properly matched within the data recording.
In particular every Offset
tag group must correspond to a preceding Onset
tag group.
See ONSET_OFFSET_INSET_ERROR for details on the
type of errors that are generated due to Onset
and Offset
errors.
3.3. Semantic versioning¶
HED schema use the following rules for changing the major.minor.patch semantic version. These rules are based on the assumption that the HED tag short form will not require data annotators to retag their data for patch-level or minor-version changes of the schema. That is, a dataset tagged using schema version X.Y.Z will also validate for X.Y+.Z+. However, the reverse is not necessarily true. In addition, validation errors might occur during for patch-level or minor-version changes for changes or corrections in tag values or units.
Here is a summary of the types of changes that correspond to different levels of changes in the semantic version:
Change |
Semantic-level |
---|---|
Major addition to HED functionality |
Major |
Tag deleted from schema. |
Major |
Unit or unit class removed from node. |
Major |
New tag added to the schema. |
Minor |
New attribute added to schema. |
Minor |
New unit class or unit added to schema. |
Minor |
New unit class added to node. |
Minor |
Node moved in schema without change in meaning. |
Minor |
Revision of description field in schema. |
Patch |
Correction of suggestedTag or relatedTag. |
Patch |
Correction of wiki syntax such as closing tags. |
Patch |
Note: It is an official policy that once in a schema, a node will not be removed.
If a node becomes out-of-date, a deprecated
attribute will be added to the tag in the schema.
Suggested replacement tags should be included in the node description.
A suggested replacement should be added to the tag patch table.