walk-through pydabu
===================

your data
---------

Your data is nothing more than a data bubble, until it is:

  * described
  * shared
  * published

pydabu can help you to describe your data. Think of a simple kind of a
basic data management plan (cf. [wikipedia:DMP]_; [RDA_DMP]_),
which is good research practice (cf. [DFG]_; [Helmholtz]_).

Like your data itself, a description of your data can be shared.
For example if in-house a search platform (e. g. [Solr]_) is running, you
could share your description of your data and enable your colleagues to find
your data.

During publication you will most probably need a description of your data
in terms of metadata.

.. only:: html

  References:
  ___________

.. [wikipedia:DMP] https://en.wikipedia.org/wiki/Data_management_plan
.. [RDA_DMP] Miksa, Tomasz and Walk, Paul and Neish, Peter;
	     RDA DMP Common Standard for Machine-actionable
	     Data Management Plans
	     https://doi.org/10.15497/rda00039
.. [DFG] Deutsche Forschungsgemeinschaft;
	 Guidelines for Safeguarding Good Research Practice. Code of Conduct
	 https://doi.org/10.5281/zenodo.3923602
.. [Helmholtz] Good scientific practice
	       https://www.helmholtz.de/en/about-us/the-association/good-scientific-practice/
.. [Solr] https://lucene.apache.org/solr/

creating a data bubble
----------------------

First of all you have to collect all data belonging to you data bubble in a
directory. Use you preferred way to copy/move your data. The directory could
look like::

  $ cd pydabu && ls -1a
  doc/
  .git
  gpl.txt
  install2home
  INSTALL.txt
  LICENSE.txt
  manual_pydabu.pdf
  PKG-INFO
  pydabu_unittests/
  README.md
  setup.py
  src/

Or storing big data it could look like::

  $ cd foo && ls -1
  glow_XIMAS1848000_001.zip
  glow_XIMAS1848000_2020-08-05_00003_140552145659648.img
  glow_XIMAS1848000.log
  graphics/
  info.txt
  overview_XOMAS1848000_001.zip
  overview_XOMAS1848000_2020-08-05_00004_140603729520384.img
  overview_XOMAS1848000.log
  pytwanrc_doc.pdf
  result.pdf
  result.rst
  signals.pdf
  twanrc_rf_trigger_AK06FZRP.log

Now let us create some description with :option:`pydabu create_data_bubble`::

  pydabu create_data_bubble -dir .

Two files ".dabu.json" and ".dabu.schema" are created as a draft for you.
In ".dabu.schema" the json schema describes the structured data stored
in the json instance ".dabu.json".

The schema describes not only the type of some data, but also required
metadata. You can yourself adapt it to your needs. Or you supervisor can
describe his requirement there.

The instance describes your data and holds some simple format check results.
You have to fill this draft with additional information and you should
check it.

With every text editor you can look at the generated files.
We will use a viewer::

  firefox .dabu.json

checking and fixing a data bubble
---------------------------------

You can check if your json instance is valid regarding the schema
(e. g. for "pydabu" (from above) you will not get any output)::

  jsonschema -i .dabu.json .dabu.schema
  pydabu check_data_bubble -dir .

At the moment the command :option:`pydabu check_data_bubble` gives
an overview of errors/warnings. Mainly you will see missing properties,
which are required.

For example for the data in the directory "foo" (from above), you will get::

  $ jsonschema -i .dabu.json .dabu.schema
  u'data integrity control' is a required property

Since, at this point we did not edit ".dabu.json" manually it is easy to fix.
Use [pfu]_ to create some checksums (if you have a few GB or more, this could
take a while) and recreate the data bubble::

  $ pfu.py create_checksum -dir . -store single
  $ rm .dabu.json .dabu.schema
  $ pydabu create_data_bubble -dir .
  $ jsonschema -i .dabu.json .dabu.schema
  ...
  u'license' is a required property

Instead of pfu you can also use your preferred checksumming tool.

Now you have to add a license, e. g. write a file "LICENSE.txt"::

  $ rm .checksum.sha512 .dabu.json .dabu.schema
  $ vim LICENSE.txt
  $ pfu.py create_checksum -directory . -store single
  $ pydabu create_data_bubble -dir .
  $ jsonschema -i .dabu.json .dabu.schema

And all necessary (depends on ".dabu.schema") metadata is collected in
".dabu.json".

.. only:: html

  References:
  ___________

.. [pfu] pfu -- Python File Utilities, https://gitlab.dlr.de/pfu/pfu