command line script: pydabu

pydabu has a few subcommands:


Analyse the data stucture of a directory tree.


This command checks a file in the nasa ames format.


This command checks a file in the format netCDF. It uses the CF Checker:


This command checks the file formats in a directory tree.


This command read the given json file and writes it in a common format to stdout.


This command creates a data bubble in the give directory.


This command checks a data bubble in the given directory.


This command lists the provided and used json schemas.


This command reads the data bubble (.dabu.json and .dabu.schema) and creates a json-ld data bubble (.dabu.json-ld and .dabu.json-ld.schema).

These commands are explained in more detail in the following (help output):

pydabu is a script to check a data bubble.

usage: pydabu [-h]

Positional Arguments


Possible choices: analyse_data_structure, check_nasa_ames_format, check_netcdf_file, check_file_format, common_json_format, create_data_bubble, check_data_bubble, listschemas, data_bubble2jsonld

There are different sub-commands with there own flags.



see also: analyse_data_structure_output.schema

For more help: pydabu analyse_data_structure -h

pydabu analyse_data_structure [-h] [-output_format f] [-directory d [d ...]]

Named Arguments


Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]


Set the directory to use. You can also give a list of directories separated by space. default: .

Default: [‘.’]


This command checks a file in the nasa ames format.

pydabu check_nasa_ames_format [-h] [-output_format f] -file f [f ...]

Named Arguments


Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-file Set the file(s) to use.


This command checks a file in the format netCDF. It uses the CF Checker:

pydabu check_netcdf_file [-h] [-output_format f] -file f [f ...]

Named Arguments


Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-file Set the file(s) to use.


see also: dabu.schema

This command checks the file formats. In a first step the data structure is analysed like the command “analyse_data_structure” does. Each file is checked by a tool choosen by the file extension. For the file extension “.nc” the command check_netcdf_file is used.

pydabu check_file_format [-h] [-output_format f] [-directory d [d ...]]
                         [-skip_creating_checksums] [-checksum_from_file f]

Named Arguments


Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]


Set the directory to use. You can also give a list of directories separated by space. default: .

Default: [‘.’]


Skip creating checksums, which could take a while.

Default: False

 Try to get checksums from the given file.


This command read the given json file and writes it in a common format to stdout.

pydabu common_json_format [-h] -file f [f ...] [-indent i]

Named Arguments

-file Set the file(s) to use.

In the output the elements will be indented by this number of spaces.

Default: [4]


see also: dabu.schema and dabu_requires.schema

This command creates a data bubble in the give directory. The data is generated with the command “check_file_format” from the data in the directory. Also the resulting files are not a data management plan, you can enhance it to become one.

pydabu create_data_bubble [-h] -directory d [d ...] [-indent i]
                          [-skip_creating_checksums] [-checksum_from_file f]
                          [-dabu_instance_file f] [-dabu_schema_file f]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.

In the output the elements will be indented by this number of spaces.

Default: [4]


Skip creating checksums, which could take a while.

Default: False

 Try to get checksums from the given file.

Gives the name of the file describing the content of a data bubble. If this file already exists an error is raised. The name is relative to the given directory.

Default: [‘.dabu.json’]


Gives the name of the file describing the necessary content of a data bubble. If this file already exists an error is raised. The name is relative to the given directory.

Default: [‘.dabu.schema’]


This command checks a data bubble in the given directory. The data bubble should be created with “pydabu create_data_bubble” and manually enhanced. Instead of this script you can also use your preferred tool to check a json instance (e. g. .dabu.json) against a json schema (e. g. .dabu.schema) – see examples.

pydabu check_data_bubble [-h] -directory d [d ...] [-dabu_instance_file f]
                         [-dabu_schema_file f]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.

Gives the name of the file describing the content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.json’]


Gives the name of the file describing the necessary content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.schema’]


see also: Provided and used json schemas

This command lists the provided and used json schemas.

pydabu listschemas [-h] [-output_format f]

Named Arguments


Possible choices: simple, json

Set the output format to use. simple lists the json schmeas in lines. json leads to a json output. default: simple

Default: [‘simple’]


This command reads the data bubble (.dabu.json and .dabu.schema) and creates a json-ld data bubble (.dabu.json-ld and .dabu.json-ld.schema). If you are fine with these new files, you should delete the old ones by youself.

pydabu data_bubble2jsonld [-h] -directory d [d ...] [-indent i]
                          [-dabu_instance_file f] [-dabu_schema_file f]
                          [-dabu_jsonld_instance_file f]
                          [-dabu_jsonld_schema_file f] [-vocabulary v]
                          [-cachefilename f] [-cachefilepath p] [-author p]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.

In the output the elements will be indented by this number of spaces.

Default: [4]


Gives the name of the file describing the content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.json’]


Gives the name of the file describing the necessary content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.schema’]


Gives the name of the file describing the content of a data bubble as jsonld. If this file already exists an error is raised. The name is relative to the given directory. default: .dabu.json-ld

Default: [‘.dabu.json-ld’]


Gives the name of the file describing the necessary content of a data bubble with json-ld. If this file already exists an error is raised. The name is relative to the given directory. default: .dabu.json-ld.schema

Default: [‘.dabu.json-ld.schema’]


Possible choices:

Sets the vocabulary to use. At the moment only is implemented. default:

Default: [‘’]


We need data from If you set cachefilename to an empty string, nothing is cached. If the file ends with common extension for compression, this comperession is used (e. g.: .gz, .lzma, .xz, .bz2). The file is created in the cachefilepath (see this option). default: “schemaorg-current-https.jsonld.bz2”

Default: [‘schemaorg-current-https.jsonld.bz2’]


This path is used for the cachefilename. If necessary, this directory will be created (not the directory tree!). default: “/tmp/json_schema_from_schema_org_runner”

Default: [‘/tmp/json_schema_from_schema_org_runner’]

-author Sets the author of the data bubble. If not given, it is not added to the dabu_jsonld_instance_file. Anyway the dabu_jsonld_schema_file will require it. You can just give a string or any json object.

You can few the json output for example in firefox, e. g. in bash:

output=$(tempfile –suffix=’.json’); pydabu analyse_data_structure -output_format json > $output && firefox $output; sleep 3; rm $output

output=$(tempfile –suffix=’.json’); pydabu check_netcdf_file -f $(find . -iname ‘*.nc’) -output_format json > $output && firefox $output; sleep 3; rm $output

Author: Daniel Mohr Date: 2021-07-01 License: GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.