Re: #correspondencetables - what needs to be done? #correspondencetables


 

Michele is supposed to get this organized, but I can provide some inputs from my side:

This working group should have multiple outputs which build upon each other.

a. Define a standard for correspondence tables, and convert everything we can find to that format.

I am 100% convinced that this format should be https://frictionlessdata.io/specs/data-package/. Each correspondence table would consist of a CSV with the raw data, and a JSON file with the metadata. Our task would be to define the metadata format (building on what the OKFN has already done, we just need to fill some things in). The idea is that the metadata can be consistent and therefore machine-readable.

For the CSV, we should discuss. 1-1 correspondence is easy. I think that 1-N and N-1 is also easy; one could have a two column format:

    foo, bar1
    foo, bar2

and

    foo1, bar
    foo2, bar

We could also have a third column that would give weights when more than one mapping is possible.

But we want to do this right, so should look at the various proposal defined for "crosswalk" tables, how these mappings are stored in open source LCA software, etc.

This should be a new repo, with one directory for the final product, one directory for the jupyter notebooks/whatever used to convert the raw data, and a third directory for the input data in its "native" form (if applicable). See https://open-power-system-data.org/ for inspiration.

I see that Brandon has just responded to this question with a totally different answer, so I look forward to a good discussion! I believe that data packages are language and community agnostic and are therefore much more of a community resource than something RDF specific would be. As always, the more value we provide to our information, the higher the chance that it is used by others, and then maintained by others :)

On the other hand, Brandon's approach allows us to express relationships much more concretely, and we would need this level of detail at some point in any case.

b. Set up a simple web app at correspondence.bonsai.uno that would return these correspondence in multiple formats.

Technically quite easy, and would be a good exercise to set up a BONSAI python web app skeleton.

c. Write a Python library that would allow the easy application of these correspondence tables.

During and maybe after the hackathon (or not - surprise me :)

Join hackathon2019@bonsai.groups.io to automatically receive all group messages.