Quickstart

Reading Data

To get the fun part out of the way, you can read ur data:

from pathlib import Path
from rich.pretty import pprint, pretty_repr
from rich.console import Console
from nwb_linkml.io import HDF5IO

# set up for pprinting in notebooks
console = Console(width=100)
print = console.print

# find sample data file and read
nwb_file = Path('../../nwb_linkml/tests/data/aibs.nwb')
data = HDF5IO(nwb_file).read()
print(data) 
Hide code cell output
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[1], line 12
     10 # find sample data file and read
     11 nwb_file = Path('../../nwb_linkml/tests/data/aibs.nwb')
---> 12 data = HDF5IO(nwb_file).read()
     13 print(data) 

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/io/hdf5.py:87, in HDF5IO.read(self, path)
     70 def read(self, path: Optional[str] = None) -> Union["NWBFile", BaseModel, Dict[str, BaseModel]]:
     71     """
     72     Read data into models from an NWB File.
     73 
   (...)
     84         otherwise whatever Model or dictionary of models applies to the requested ``path``
     85     """
---> 87     provider = self.make_provider()
     89     h5f = h5py.File(str(self.path))
     90     src = h5f.get(path) if path else h5f

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/io/hdf5.py:160, in HDF5IO.make_provider(self)
    157 provider = SchemaProvider(versions=versions)
    159 # build schema so we have them cached
--> 160 provider.build_from_dicts(schema)
    161 h5f.close()
    162 return provider

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/providers/linkml.py:130, in LinkMLProvider.build_from_dicts(self, schemas, **kwargs)
    128         for needed in schema_needs:
    129             adapter.imported.append(ns_adapters[needed])
--> 130     adapter.complete_namespaces()
    132 # then do the build
    133 res = {}

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:169, in NamespacesAdapter.complete_namespaces(self)
    158 """
    159 After loading the namespace, and after any imports have been added afterwards,
    160 this must be called to complete the definitions of the contained schema objects.
   (...)
    166 It **is** automatically called if it hasn't been already by the :meth:`.build` method.
    167 """
    168 self._populate_imports()
--> 169 self._roll_down_inheritance()
    171 for i in self.imported:
    172     i.complete_namespaces()

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:207, in NamespacesAdapter._roll_down_inheritance(self)
    204 new_cls.parent = cls.parent
    206 # reinsert
--> 207 self._overwrite_class(new_cls, cls)

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:241, in NamespacesAdapter._overwrite_class(self, new_cls, old_cls)
    238         new_cls.parent.groups[new_cls.parent.groups.index(old_cls)] = new_cls
    239 else:
    240     # top level class, need to go and find it
--> 241     schema = self.find_type_source(old_cls)
    242     if isinstance(new_cls, Dataset):
    243         schema.datasets[schema.datasets.index(old_cls)] = new_cls

File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:291, in NamespacesAdapter.find_type_source(self, cls, fast)
    289     return matches[0]
    290 else:
--> 291     raise KeyError(f"No schema found that define {cls}")

KeyError: "No schema found that define Group({\n  'neurodata_type_def': 'DynamicTable',\n  'neurodata_type_inc': 'Container',\n  'doc': ('A group containing multiple datasets that are aligned on the first dimension '\n     '(Currently, this requirement if left up to APIs to check and enforce). Apart '\n     'from a column that contains unique identifiers for each row there are no '\n     'other required datasets. Users are free to add any number of VectorData '\n     'objects here. Table functionality is already supported through compound '\n     'types, which is analogous to storing an array-of-structs. DynamicTable can '\n     'be thought of as a struct-of-arrays. This provides an alternative structure '\n     'to choose from when optimizing storage for anticipated access patterns. '\n     'Additionally, this type provides a way of creating a table without having to '\n     'define a compound type up front. Although this convenience may be '\n     'attractive, users should think carefully about how data will be accessed. '\n     'DynamicTable is more appropriate for column-centric access, whereas a '\n     'dataset with a compound type would be more appropriate for row-centric '\n     'access. Finally, data size should also be taken into account. For small '\n     'tables, performance loss may be an acceptable trade-off for the flexibility '\n     'of a DynamicTable. For example, DynamicTable was originally developed for '\n     'storing trial data and spike unit metadata. Both of these use cases are '\n     'expected to produce relatively small tables, so the spatial locality of '\n     'multiple datasets present in a DynamicTable is not expected to have a '\n     'significant performance impact. Additionally, requirements of trial and unit '\n     'metadata tables are sufficiently diverse that performance implications can '\n     'be overlooked in favor of usability.'),\n  'quantity': 1,\n  'attributes': [{'dtype': 'text',\n    'name': 'colnames',\n    'dims': ['num_columns'],\n    'shape': [None],\n    'value': None,\n    'default_value': None,\n    'doc': 'The names of the columns in this table. This should be used to specify '\n           'an order to the columns.',\n    'required': True},\n    {'dtype': 'text',\n    'name': 'description',\n    'dims': None,\n    'shape': None,\n    'value': None,\n    'default_value': None,\n    'doc': 'Description of what is in this dynamic table.',\n    'required': True}],\n  'datasets': [{'neurodata_type_def': None,\n    'neurodata_type_inc': 'ElementIdentifiers',\n    'name': 'id',\n    'default_name': None,\n    'dims': ['num_rows'],\n    'shape': [None],\n    'value': None,\n    'default_value': None,\n    'doc': 'Array of unique identifiers for the rows of this dynamic table.',\n    'quantity': 1,\n    'linkable': None,\n    'attributes': None,\n    'dtype': 'int'},\n    {'neurodata_type_def': None,\n    'neurodata_type_inc': 'VectorData',\n    'name': None,\n    'default_name': None,\n    'dims': None,\n    'shape': None,\n    'value': None,\n    'default_value': None,\n    'doc': 'Vector columns of this dynamic table.',\n    'quantity': '*',\n    'linkable': None,\n    'attributes': None,\n    'dtype': None},\n    {'neurodata_type_def': None,\n    'neurodata_type_inc': 'VectorIndex',\n    'name': None,\n    'default_name': None,\n    'dims': None,\n    'shape': None,\n    'value': None,\n    'default_value': None,\n    'doc': 'Indices for the vector columns of this dynamic table.',\n    'quantity': '*',\n    'linkable': None,\n    'attributes': None,\n    'dtype': None}]\n})"

Load and manipulate NWB schemas

A git provider module can manage a repository to provide a given NWB namespace at a given version, and cast the schema into Pydantic models from nwb_schema_language.

For the nwb-core schema, loading first just the namespaces file (without adjoining schema):

from nwb_linkml.providers.git import NWB_CORE_REPO
from nwb_linkml.io.schema import load_namespaces

namespace_file: 'Path' = NWB_CORE_REPO.provide_from_git('2.6.0')
core_namespaces = load_namespaces(namespace_file)
print(core_namespaces)

Or for a schema file…

from nwb_linkml.io.schema import load_schema_file

base_schema_file =  namespace_file.parent / 'nwb.base.yaml'
nwb_core_base = load_schema_file(base_schema_file)
print(nwb_core_base)

And additional adapters are used to handle some of the implicit behavior in nwb schema files, like importing other namespaces at a specific version, and inter-schema class imports. Eg. the NamespacesAdapter finds the implicitly imported hdmf-common namespace (again provided by the git schema provider).

from nwb_linkml.adapters import NamespacesAdapter

core_ns = NamespacesAdapter.from_yaml(namespace_file)
print(core_ns.imported) 

The classes in nwb_schema_language are just pydantic models, so they can be used like any other to create new, validated schemas.

Translating to LinkML

adapters handle the conversion from NWB schema language to LinkML.

core_linkml = core_ns.build()
print(core_linkml)

The BuildResult class holds the LinkML representation of each of the schemas and their classes, which are now in linkml_runtime.linkml_model.SchemaDefinition and ClassDefinition classes:

print(core_linkml.schemas[0])

Generating Pydantic Models

Todo

Document Pydantic model generation

Caching Output with Providers

Todo

Document provider usage