Topics

#correspondencetables - Getting to 1.0 #correspondencetables


 

Dear all-

I am happy that there are a number of people participating here, and I think we have everything ready for assembly into a 1.0 version of this package. However, from reading these emails and looking at the repo itself, it seems like a little organization and goal-setting could help move this project forward. Here are some suggestions:

1. The goal and capabilities (user stories) for 1.0 should be clearly defined. Some possibilities:
- Python package that provides for trivial application of correspondence tables. As a BONSAI user, I want to be able to call `correspondence(data, field_identifier, table_name, aggregation_func, disaggregation_func)` and get my `data` updated automatically.
- All output correspondence data should be provided in a 3-column format, with the third column being the SKOS verb. Maybe a fourth column is needed for dis/aggregation weights.
- All output correspondence data should have metadata in DataPackage form

1.5 If a system uses multiple identifiers (e.g. exiobase), all identifiers should be in their own columns, as at some point each one will be needed.

2. This should be a python package based on the python skeleton. Being a python package would provide structure so that people would know what goes where. However, not every directory would need to be included in the python library itself. Instead, you could have this structure:

correspondence (python library code here)
    python code to do matching
    output
        csv and json files
        autogenerated index.html which lists all files and their descriptions
raw (input data in original downloaded form)

Of course, other models are possible...

3. The RDF vocabulary terms needed should be identified and documented in the README

4. RDF terms should be computed automatically from the correspondence tables, perhaps with a bit of manual intervention. The default should probably not be an exact match, but this would be configurable. In general it should be possible to map N-1 relations with one term, 1-1 with another, etc. without having to have a person go through long lists.

I would be happy to help with specific technical implementations of any of these tasks.

Who is now coordinating this working group? Could you please update issue #3 to show the current status and short-term plans?


Miguel Fernández Astudillo
 

Dear all, my replies inline

 

Best, Miguel

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: 01 April 2019 12:54
To: hackathon2019@bonsai.groups.io
Subject: [hackathon2019] #correspondencetables - Getting to 1.0

 

Dear all-

I am happy that there are a number of people participating here, and I think we have everything ready for assembly into a 1.0 version of this package. However, from reading these emails and looking at the repo itself, it seems like a little organization and goal-setting could help move this project forward. Here are some suggestions:

1. The goal and capabilities (user stories) for 1.0 should be clearly defined. Some possibilities:

 

I agree that it does need cleaning and more clear goal setting. I actually opened an issue (#21) about this and suggest to move some of the content to other repos.

 


- Python package that provides for trivial application of correspondence tables. As a BONSAI user, I want to be able to call `correspondence(data, field_identifier, table_name, aggregation_func, disaggregation_func)` and get my `data` updated automatically.

 

Is it the idea to use these functions later in the arborist repo to generate the “rdf” files?


- All output correspondence data should be provided in a 3-column format, with the third column being the SKOS verb.

Maybe a fourth column is needed for dis/aggregation weights.
- All output correspondence data should have metadata in DataPackage form

 

That makes a lot of sense to me, although we may need some help to choose the predicates.


1.5 If a system uses multiple identifiers (e.g. exiobase), all identifiers should be in their own columns, as at some point each one will be needed.

2. This should be a python package based on the python skeleton. Being a python package would provide structure so that people would know what goes where. However, not every directory would need to be included in the python library itself. Instead, you could have this structure:

correspondence (python library code here)
    python code to do matching

 

I’m not sure what is expected here. We can discuss the details.


    output
        csv and json files
        autogenerated index.html which lists all files and their descriptions
raw (input data in original downloaded form)
 


Of course, other models are possible...

3. The RDF vocabulary terms needed should be identified and documented in the README

 

Do we limit it to the predicates?

4. RDF terms should be computed automatically from the correspondence tables, perhaps with a bit of manual intervention. The default should probably not be an exact match, but this would be configurable. In general it should be possible to map N-1 relations with one term, 1-1 with another, etc. without having to have a person go through long lists.

I would be happy to help with specific technical implementations of any of these tasks.

Who is now coordinating this working group? Could you please update issue #3 to show the current status and short-term plans?


Miguel Fernández Astudillo
 

Dear all

 

An update on this. Yesterday I created a repo following the python skeleton for the functions that will automate part of the workflow. Its called grafter. I have not had time to work on the functions.

 

https://github.com/BONSAMURAIS/grafter

 

see you tomorrow,

 

Miguel

 

 

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: 01 April 2019 12:54
To: hackathon2019@bonsai.groups.io
Subject: [hackathon2019] #correspondencetables - Getting to 1.0

 

Dear all-

I am happy that there are a number of people participating here, and I think we have everything ready for assembly into a 1.0 version of this package. However, from reading these emails and looking at the repo itself, it seems like a little organization and goal-setting could help move this project forward. Here are some suggestions:

1. The goal and capabilities (user stories) for 1.0 should be clearly defined. Some possibilities:
- Python package that provides for trivial application of correspondence tables. As a BONSAI user, I want to be able to call `correspondence(data, field_identifier, table_name, aggregation_func, disaggregation_func)` and get my `data` updated automatically.
- All output correspondence data should be provided in a 3-column format, with the third column being the SKOS verb. Maybe a fourth column is needed for dis/aggregation weights.
- All output correspondence data should have metadata in DataPackage form

1.5 If a system uses multiple identifiers (e.g. exiobase), all identifiers should be in their own columns, as at some point each one will be needed.

2. This should be a python package based on the python skeleton. Being a python package would provide structure so that people would know what goes where. However, not every directory would need to be included in the python library itself. Instead, you could have this structure:

correspondence (python library code here)
    python code to do matching
    output
        csv and json files
        autogenerated index.html which lists all files and their descriptions
raw (input data in original downloaded form)

Of course, other models are possible...

3. The RDF vocabulary terms needed should be identified and documented in the README

4. RDF terms should be computed automatically from the correspondence tables, perhaps with a bit of manual intervention. The default should probably not be an exact match, but this would be configurable. In general it should be possible to map N-1 relations with one term, 1-1 with another, etc. without having to have a person go through long lists.

I would be happy to help with specific technical implementations of any of these tasks.

Who is now coordinating this working group? Could you please update issue #3 to show the current status and short-term plans?