Re: #correspondencetables - Getting to 1.0 #correspondencetables

Miguel Fernández Astudillo
 

Dear all

 

An update on this. Yesterday I created a repo following the python skeleton for the functions that will automate part of the workflow. Its called grafter. I have not had time to work on the functions.

 

https://github.com/BONSAMURAIS/grafter

 

see you tomorrow,

 

Miguel

 

 

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: 01 April 2019 12:54
To: hackathon2019@bonsai.groups.io
Subject: [hackathon2019] #correspondencetables - Getting to 1.0

 

Dear all-

I am happy that there are a number of people participating here, and I think we have everything ready for assembly into a 1.0 version of this package. However, from reading these emails and looking at the repo itself, it seems like a little organization and goal-setting could help move this project forward. Here are some suggestions:

1. The goal and capabilities (user stories) for 1.0 should be clearly defined. Some possibilities:
- Python package that provides for trivial application of correspondence tables. As a BONSAI user, I want to be able to call `correspondence(data, field_identifier, table_name, aggregation_func, disaggregation_func)` and get my `data` updated automatically.
- All output correspondence data should be provided in a 3-column format, with the third column being the SKOS verb. Maybe a fourth column is needed for dis/aggregation weights.
- All output correspondence data should have metadata in DataPackage form

1.5 If a system uses multiple identifiers (e.g. exiobase), all identifiers should be in their own columns, as at some point each one will be needed.

2. This should be a python package based on the python skeleton. Being a python package would provide structure so that people would know what goes where. However, not every directory would need to be included in the python library itself. Instead, you could have this structure:

correspondence (python library code here)
    python code to do matching
    output
        csv and json files
        autogenerated index.html which lists all files and their descriptions
raw (input data in original downloaded form)

Of course, other models are possible...

3. The RDF vocabulary terms needed should be identified and documented in the README

4. RDF terms should be computed automatically from the correspondence tables, perhaps with a bit of manual intervention. The default should probably not be an exact match, but this would be configurable. In general it should be possible to map N-1 relations with one term, 1-1 with another, etc. without having to have a person go through long lists.

I would be happy to help with specific technical implementations of any of these tasks.

Who is now coordinating this working group? Could you please update issue #3 to show the current status and short-term plans?

Join hackathon2019@bonsai.groups.io to automatically receive all group messages.