Quickstart¶
Reading Data¶
To get the fun part out of the way, you can read ur data:
from pathlib import Path
from rich.pretty import pprint, pretty_repr
from rich.console import Console
from nwb_linkml.io import HDF5IO
# set up for pprinting in notebooks
console = Console(width=100)
print = console.print
# find sample data file and read
nwb_file = Path('../../nwb_linkml/tests/data/aibs.nwb')
data = HDF5IO(nwb_file).read()
print(data)
Show code cell output
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[1], line 12
10 # find sample data file and read
11 nwb_file = Path('../../nwb_linkml/tests/data/aibs.nwb')
---> 12 data = HDF5IO(nwb_file).read()
13 print(data)
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/io/hdf5.py:87, in HDF5IO.read(self, path)
70 def read(self, path: Optional[str] = None) -> Union["NWBFile", BaseModel, Dict[str, BaseModel]]:
71 """
72 Read data into models from an NWB File.
73
(...)
84 otherwise whatever Model or dictionary of models applies to the requested ``path``
85 """
---> 87 provider = self.make_provider()
89 h5f = h5py.File(str(self.path))
90 src = h5f.get(path) if path else h5f
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/io/hdf5.py:160, in HDF5IO.make_provider(self)
157 provider = SchemaProvider(versions=versions)
159 # build schema so we have them cached
--> 160 provider.build_from_dicts(schema)
161 h5f.close()
162 return provider
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/providers/linkml.py:130, in LinkMLProvider.build_from_dicts(self, schemas, **kwargs)
128 for needed in schema_needs:
129 adapter.imported.append(ns_adapters[needed])
--> 130 adapter.complete_namespaces()
132 # then do the build
133 res = {}
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:169, in NamespacesAdapter.complete_namespaces(self)
158 """
159 After loading the namespace, and after any imports have been added afterwards,
160 this must be called to complete the definitions of the contained schema objects.
(...)
166 It **is** automatically called if it hasn't been already by the :meth:`.build` method.
167 """
168 self._populate_imports()
--> 169 self._roll_down_inheritance()
171 for i in self.imported:
172 i.complete_namespaces()
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:207, in NamespacesAdapter._roll_down_inheritance(self)
204 new_cls.parent = cls.parent
206 # reinsert
--> 207 self._overwrite_class(new_cls, cls)
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:241, in NamespacesAdapter._overwrite_class(self, new_cls, old_cls)
238 new_cls.parent.groups[new_cls.parent.groups.index(old_cls)] = new_cls
239 else:
240 # top level class, need to go and find it
--> 241 schema = self.find_type_source(old_cls)
242 if isinstance(new_cls, Dataset):
243 schema.datasets[schema.datasets.index(old_cls)] = new_cls
File ~/checkouts/readthedocs.org/user_builds/nwb-linkml/checkouts/tmp-dump-examples/nwb_linkml/src/nwb_linkml/adapters/namespaces.py:291, in NamespacesAdapter.find_type_source(self, cls, fast)
289 return matches[0]
290 else:
--> 291 raise KeyError(f"No schema found that define {cls}")
KeyError: "No schema found that define Group({\n 'neurodata_type_def': 'DynamicTable',\n 'neurodata_type_inc': 'Container',\n 'doc': ('A group containing multiple datasets that are aligned on the first dimension '\n '(Currently, this requirement if left up to APIs to check and enforce). Apart '\n 'from a column that contains unique identifiers for each row there are no '\n 'other required datasets. Users are free to add any number of VectorData '\n 'objects here. Table functionality is already supported through compound '\n 'types, which is analogous to storing an array-of-structs. DynamicTable can '\n 'be thought of as a struct-of-arrays. This provides an alternative structure '\n 'to choose from when optimizing storage for anticipated access patterns. '\n 'Additionally, this type provides a way of creating a table without having to '\n 'define a compound type up front. Although this convenience may be '\n 'attractive, users should think carefully about how data will be accessed. '\n 'DynamicTable is more appropriate for column-centric access, whereas a '\n 'dataset with a compound type would be more appropriate for row-centric '\n 'access. Finally, data size should also be taken into account. For small '\n 'tables, performance loss may be an acceptable trade-off for the flexibility '\n 'of a DynamicTable. For example, DynamicTable was originally developed for '\n 'storing trial data and spike unit metadata. Both of these use cases are '\n 'expected to produce relatively small tables, so the spatial locality of '\n 'multiple datasets present in a DynamicTable is not expected to have a '\n 'significant performance impact. Additionally, requirements of trial and unit '\n 'metadata tables are sufficiently diverse that performance implications can '\n 'be overlooked in favor of usability.'),\n 'quantity': 1,\n 'attributes': [{'dtype': 'text',\n 'name': 'colnames',\n 'dims': ['num_columns'],\n 'shape': [None],\n 'value': None,\n 'default_value': None,\n 'doc': 'The names of the columns in this table. This should be used to specify '\n 'an order to the columns.',\n 'required': True},\n {'dtype': 'text',\n 'name': 'description',\n 'dims': None,\n 'shape': None,\n 'value': None,\n 'default_value': None,\n 'doc': 'Description of what is in this dynamic table.',\n 'required': True}],\n 'datasets': [{'neurodata_type_def': None,\n 'neurodata_type_inc': 'ElementIdentifiers',\n 'name': 'id',\n 'default_name': None,\n 'dims': ['num_rows'],\n 'shape': [None],\n 'value': None,\n 'default_value': None,\n 'doc': 'Array of unique identifiers for the rows of this dynamic table.',\n 'quantity': 1,\n 'linkable': None,\n 'attributes': None,\n 'dtype': 'int'},\n {'neurodata_type_def': None,\n 'neurodata_type_inc': 'VectorData',\n 'name': None,\n 'default_name': None,\n 'dims': None,\n 'shape': None,\n 'value': None,\n 'default_value': None,\n 'doc': 'Vector columns of this dynamic table.',\n 'quantity': '*',\n 'linkable': None,\n 'attributes': None,\n 'dtype': None},\n {'neurodata_type_def': None,\n 'neurodata_type_inc': 'VectorIndex',\n 'name': None,\n 'default_name': None,\n 'dims': None,\n 'shape': None,\n 'value': None,\n 'default_value': None,\n 'doc': 'Indices for the vector columns of this dynamic table.',\n 'quantity': '*',\n 'linkable': None,\n 'attributes': None,\n 'dtype': None}]\n})"
Load and manipulate NWB schemas¶
A git provider module can manage a repository to
provide a given NWB namespace at a given version, and cast the
schema into Pydantic models from nwb_schema_language.
For the nwb-core schema, loading first just the namespaces file (without adjoining schema):
from nwb_linkml.providers.git import NWB_CORE_REPO
from nwb_linkml.io.schema import load_namespaces
namespace_file: 'Path' = NWB_CORE_REPO.provide_from_git('2.6.0')
core_namespaces = load_namespaces(namespace_file)
print(core_namespaces)
Or for a schema file…
from nwb_linkml.io.schema import load_schema_file
base_schema_file = namespace_file.parent / 'nwb.base.yaml'
nwb_core_base = load_schema_file(base_schema_file)
print(nwb_core_base)
And additional adapters are used to handle some of the
implicit behavior in nwb schema files, like importing other namespaces
at a specific version, and inter-schema class imports. Eg. the
NamespacesAdapter finds the implicitly
imported hdmf-common namespace (again provided by the git schema provider).
from nwb_linkml.adapters import NamespacesAdapter
core_ns = NamespacesAdapter.from_yaml(namespace_file)
print(core_ns.imported)
The classes in nwb_schema_language are just pydantic models, so they
can be used like any other to create new, validated schemas.
Translating to LinkML¶
adapters handle the conversion from NWB schema language to
LinkML.
core_linkml = core_ns.build()
print(core_linkml)
The BuildResult class holds the LinkML representation
of each of the schemas and their classes, which are now in linkml_runtime.linkml_model.SchemaDefinition
and ClassDefinition classes:
print(core_linkml.schemas[0])
Generating Pydantic Models¶
Todo
Document Pydantic model generation
Caching Output with Providers¶
Todo
Document provider usage