command line script: pydabu

pydabu has a few subcommands:

analyse_data_structure

Analyse the data stucture of a directory tree.

check_nasa_ames_format

This command checks a file in the nasa ames format.

check_netcdf_file

This command checks a file in the format netCDF. It uses the CF Checker: https://github.com/cedadev/cf-checker

check_file_format

This command checks the file formats in a directory tree.

common_json_format

This command read the given json file and writes it in a common format to stdout.

create_data_bubble

This command creates a data bubble in the give directory.

check_data_bubble

This command checks a data bubble in the given directory.

listschemas

This command lists the provided and used json schemas.

data_bubble2jsonld

This command reads the data bubble (.dabu.json and .dabu.schema) and creates a json-ld data bubble (.dabu.json-ld and .dabu.json-ld.schema).

These commands are explained in more detail in the following (help output):

pydabu is a script to check a data bubble.

usage: pydabu [-h]
              {analyse_data_structure,check_nasa_ames_format,check_netcdf_file,check_file_format,common_json_format,create_data_bubble,check_data_bubble,listschemas,data_bubble2jsonld}
              ...

Positional Arguments

subparser_name

Possible choices: analyse_data_structure, check_nasa_ames_format, check_netcdf_file, check_file_format, common_json_format, create_data_bubble, check_data_bubble, listschemas, data_bubble2jsonld

There are different sub-commands with there own flags.

Sub-commands:

analyse_data_structure

see also: analyse_data_structure_output.schema

For more help: pydabu analyse_data_structure -h

pydabu analyse_data_structure [-h] [-output_format f] [-directory d [d ...]]

Named Arguments

-output_format

Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-directory

Set the directory to use. You can also give a list of directories separated by space. default: .

Default: [‘.’]

check_nasa_ames_format

This command checks a file in the nasa ames format.

pydabu check_nasa_ames_format [-h] [-output_format f] -file f [f ...]

Named Arguments

-output_format

Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-file Set the file(s) to use.

check_netcdf_file

This command checks a file in the format netCDF. It uses the CF Checker: https://github.com/cedadev/cf-checker

pydabu check_netcdf_file [-h] [-output_format f] -file f [f ...]

Named Arguments

-output_format

Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-file Set the file(s) to use.

check_file_format

see also: dabu.schema

This command checks the file formats. In a first step the data structure is analysed like the command “analyse_data_structure” does. Each file is checked by a tool choosen by the file extension. For the file extension “.nc” the command check_netcdf_file is used.

pydabu check_file_format [-h] [-output_format f] [-directory d [d ...]]
                         [-skip_creating_checksums] [-checksum_from_file f]

Named Arguments

-output_format

Possible choices: human_readable, json, json1

Set the output format to use. human_readable gives a nice json output with skipped data. json is the normal json output. json1 is the full data with nice output like human_readable. default: json1

Default: [‘json1’]

-directory

Set the directory to use. You can also give a list of directories separated by space. default: .

Default: [‘.’]

-skip_creating_checksums
 

Skip creating checksums, which could take a while.

Default: False

-checksum_from_file
 Try to get checksums from the given file.

common_json_format

This command read the given json file and writes it in a common format to stdout.

pydabu common_json_format [-h] -file f [f ...] [-indent i]

Named Arguments

-file Set the file(s) to use.
-indent

In the output the elements will be indented by this number of spaces.

Default: [4]

create_data_bubble

see also: dabu.schema and dabu_requires.schema

This command creates a data bubble in the give directory. The data is generated with the command “check_file_format” from the data in the directory. Also the resulting files are not a data management plan, you can enhance it to become one.

pydabu create_data_bubble [-h] -directory d [d ...] [-indent i]
                          [-skip_creating_checksums] [-checksum_from_file f]
                          [-dabu_instance_file f] [-dabu_schema_file f]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.
-indent

In the output the elements will be indented by this number of spaces.

Default: [4]

-skip_creating_checksums
 

Skip creating checksums, which could take a while.

Default: False

-checksum_from_file
 Try to get checksums from the given file.
-dabu_instance_file
 

Gives the name of the file describing the content of a data bubble. If this file already exists an error is raised. The name is relative to the given directory.

Default: [‘.dabu.json’]

-dabu_schema_file
 

Gives the name of the file describing the necessary content of a data bubble. If this file already exists an error is raised. The name is relative to the given directory.

Default: [‘.dabu.schema’]

check_data_bubble

This command checks a data bubble in the given directory. The data bubble should be created with “pydabu create_data_bubble” and manually enhanced. Instead of this script you can also use your preferred tool to check a json instance (e. g. .dabu.json) against a json schema (e. g. .dabu.schema) – see examples.

pydabu check_data_bubble [-h] -directory d [d ...] [-dabu_instance_file f]
                         [-dabu_schema_file f]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.
-dabu_instance_file
 

Gives the name of the file describing the content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.json’]

-dabu_schema_file
 

Gives the name of the file describing the necessary content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.schema’]

listschemas

see also: Provided and used json schemas

This command lists the provided and used json schemas.

pydabu listschemas [-h] [-output_format f]

Named Arguments

-output_format

Possible choices: simple, json

Set the output format to use. simple lists the json schmeas in lines. json leads to a json output. default: simple

Default: [‘simple’]

data_bubble2jsonld

This command reads the data bubble (.dabu.json and .dabu.schema) and creates a json-ld data bubble (.dabu.json-ld and .dabu.json-ld.schema). If you are fine with these new files, you should delete the old ones by youself.

pydabu data_bubble2jsonld [-h] -directory d [d ...] [-indent i]
                          [-dabu_instance_file f] [-dabu_schema_file f]
                          [-dabu_jsonld_instance_file f]
                          [-dabu_jsonld_schema_file f] [-vocabulary v]
                          [-cachefilename f] [-cachefilepath p] [-author p]

Named Arguments

-directory Set the directory to use. You can also give a list of directories separated by space.
-indent

In the output the elements will be indented by this number of spaces.

Default: [4]

-dabu_instance_file
 

Gives the name of the file describing the content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.json’]

-dabu_schema_file
 

Gives the name of the file describing the necessary content of a data bubble. The name is relative to the given directory.

Default: [‘.dabu.schema’]

-dabu_jsonld_instance_file
 

Gives the name of the file describing the content of a data bubble as jsonld. If this file already exists an error is raised. The name is relative to the given directory. default: .dabu.json-ld

Default: [‘.dabu.json-ld’]

-dabu_jsonld_schema_file
 

Gives the name of the file describing the necessary content of a data bubble with json-ld. If this file already exists an error is raised. The name is relative to the given directory. default: .dabu.json-ld.schema

Default: [‘.dabu.json-ld.schema’]

-vocabulary

Possible choices: schema.org

Sets the vocabulary to use. At the moment only schema.org is implemented. default: schema.org

Default: [‘schema.org’]

-cachefilename

We need data from schema.org. If you set cachefilename to an empty string, nothing is cached. If the file ends with common extension for compression, this comperession is used (e. g.: .gz, .lzma, .xz, .bz2). The file is created in the cachefilepath (see this option). default: “schemaorg-current-https.jsonld.bz2”

Default: [‘schemaorg-current-https.jsonld.bz2’]

-cachefilepath

This path is used for the cachefilename. If necessary, this directory will be created (not the directory tree!). default: “/tmp/json_schema_from_schema_org_runner”

Default: [‘/tmp/json_schema_from_schema_org_runner’]

-author Sets the author of the data bubble. If not given, it is not added to the dabu_jsonld_instance_file. Anyway the dabu_jsonld_schema_file will require it. You can just give a string or any json object.

You can few the json output for example in firefox, e. g. in bash:

output=$(tempfile –suffix=’.json’); pydabu analyse_data_structure -output_format json > $output && firefox $output; sleep 3; rm $output

output=$(tempfile –suffix=’.json’); pydabu check_netcdf_file -f $(find . -iname ‘*.nc’) -output_format json > $output && firefox $output; sleep 3; rm $output

Author: Daniel Mohr Date: 2021-07-01 License: GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.