Topics

Adding provenance #ontology #intro #provenance


Agneta
 

Dear all

I would like to introduce Emil Riis Hansen to the Bonsai community. He has been recently employed as a research assistant with the computer science department at Aalborg University. 

Emil is interested in working with adding provenance to our current BONSAI ontology. 

Provenance helps us add information on the origin of data.i.e where does the data come from/ who generated the data/ licence of the data etc. We had discussed this issue during the hackathon but hadn't developed it since.
Currently Emil has proposed a high level provenance which is limited to determining the origin of the dataset and not individual values in it. For example, if anyone queries data from BONSAI, they will get the info that the data is sourced from Exiobase, but if other datasets are integrated to semantic web using Bonsai ontology, they will find information on the origin of that dataset. Provenance of individual values in a dataset is harder to determine as they may be calculated, estimated, or raw data from the data provider.

Emil is currently also preparing a conference paper with respect to how he plans to add provenance to the current ontology. For the purpose of this paper, it would be useful to upload the provenance information to the current rdf data we have on the Jena database. This will help the reviewers query the information as presented in the paper.

If anyone here has been working with provenance or are interested, please feel free to write to me.
Kind regards

Agneta


Bo Weidema
 

Dear Emil and Agneta,

A warm velcome to Emil. Re. provenance of the individual numbers, there is a good description on the wiki, relating to the recommedations from RDA. This is an elegant and efficient way of handling this issue, I think.

Best regards

Bo

Den 2019-11-13 kl. 12.42 skrev Agneta:

Dear all

I would like to introduce Emil Riis Hansen to the Bonsai community. He has been recently employed as a research assistant with the computer science department at Aalborg University. 

Emil is interested in working with adding provenance to our current BONSAI ontology. 

Provenance helps us add information on the origin of data.i.e where does the data come from/ who generated the data/ licence of the data etc. We had discussed this issue during the hackathon but hadn't developed it since.
Currently Emil has proposed a high level provenance which is limited to determining the origin of the dataset and not individual values in it. For example, if anyone queries data from BONSAI, they will get the info that the data is sourced from Exiobase, but if other datasets are integrated to semantic web using Bonsai ontology, they will find information on the origin of that dataset. Provenance of individual values in a dataset is harder to determine as they may be calculated, estimated, or raw data from the data provider.

Emil is currently also preparing a conference paper with respect to how he plans to add provenance to the current ontology. For the purpose of this paper, it would be useful to upload the provenance information to the current rdf data we have on the Jena database. This will help the reviewers query the information as presented in the paper.

If anyone here has been working with provenance or are interested, please feel free to write to me.
Kind regards

Agneta

--


Bo Weidema
 

Dear Emil and Agneta,

A warm velcome to Emil.

Re. provenance of the individual numbers and calculations, there is a good description in the section "Versioning and citation" in this document, relating to the recommedations from RDA. This is an elegant and efficient way of handling this issue, I think. I thought I had added that to the wiki, but right now I cannot find it (?).

Best regards

Bo

Den 2019-11-13 kl. 12.42 skrev Agneta:

Dear all

I would like to introduce Emil Riis Hansen to the Bonsai community. He has been recently employed as a research assistant with the computer science department at Aalborg University. 

Emil is interested in working with adding provenance to our current BONSAI ontology. 

Provenance helps us add information on the origin of data.i.e where does the data come from/ who generated the data/ licence of the data etc. We had discussed this issue during the hackathon but hadn't developed it since.
Currently Emil has proposed a high level provenance which is limited to determining the origin of the dataset and not individual values in it. For example, if anyone queries data from BONSAI, they will get the info that the data is sourced from Exiobase, but if other datasets are integrated to semantic web using Bonsai ontology, they will find information on the origin of that dataset. Provenance of individual values in a dataset is harder to determine as they may be calculated, estimated, or raw data from the data provider.

Emil is currently also preparing a conference paper with respect to how he plans to add provenance to the current ontology. For the purpose of this paper, it would be useful to upload the provenance information to the current rdf data we have on the Jena database. This will help the reviewers query the information as presented in the paper.

If anyone here has been working with provenance or are interested, please feel free to write to me.
Kind regards

Agneta

--


Agneta
 

Thanks for the document Bo

The document recommends timestamping of the datapoints and query outputs. Although I am unsure what degree will we be able to add provenance to each value on Exiobase. Although Exiobase does use data from multiple sources it adds some algorithms to provide a balanced dataset. In other words, its a secondary dataset (primary datasets are those which contain raw data)

If in future some values are changed, this leads to the publication of a new version of the dataset. So the provenance for all values in exiobase is generated as exiobase + (specific version).
My question is do we need to have provenance of individual values in a secondary dataset. Its different when we have minute by minute information of temperature change in a region (raw data). Here the timestamping of individual values might be more relevant.

What do you think?
Agneta


Bo Weidema
 

Dear Agneta,

First, it is important to distinguish between:

1) What is "raw data" in a BONSAI context, namely the data as they are received from elsewhere. These data may be either direct measurements (very rarely) or previously more or less processed (in the case of Exiobase definitively more so), with or without explicit previous provenance. For these data, it is obviously sufficient to report the direct source, as it is received (example: Exiobase version NNh, downloaded from URL at Time) which is then applicable to all datapoints within that dataset.

2) Data that are corrected or otherwise manipulated after receipt, in which case it is relevant to add the nature of the correction or calculation, and a timestamp for the changed dataset (but not for the parts unchanged). In this way, one can always trace the origin of any datum to the form it originally was provided to BONSAI.

As ambitions and resources increase, someone may later want to add further upstream provenance to the data in BONSAI, which is of course always possible and desirable.

Best regards

Bo

Den 2019-11-13 kl. 14.42 skrev Agneta:

Thanks for the document Bo

The document recommends timestamping of the datapoints and query outputs. Although I am unsure what degree will we be able to add provenance to each value on Exiobase. Although Exiobase does use data from multiple sources it adds some algorithms to provide a balanced dataset. In other words, its a secondary dataset (primary datasets are those which contain raw data)

If in future some values are changed, this leads to the publication of a new version of the dataset. So the provenance for all values in exiobase is generated as exiobase + (specific version).
My question is do we need to have provenance of individual values in a secondary dataset. Its different when we have minute by minute information of temperature change in a region (raw data). Here the timestamping of individual values might be more relevant.

What do you think?
Agneta

--


Søren
 

Yes, this is also in line with the talk we had with Emil yesterday


Matteo Lissandrini (AAU)
 

Hi all,


so what is listed as case 1) by Bo is what Emil proposal is about.


The idea is to annotate provenance of the named graphs we have.

This is a first, necessary, step because without that the data in each named graph is "orphan" of any basic information required by provenance.


Emil will come up with a proposal on what we need to extend in our scripts in order to have this information, I think it will be a few additions to the arborist code.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of loekke via Groups.Io <loekke@...>
Sent: 14 November 2019 13:17:14
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 
Yes, this is also in line with the talk we had with Emil yesterday


Emil Riis Hansen
 

Hello everyone,
I have made a proposed first implementation of provenance in the BONSAI project, including lineage information for flows, activityTypes, and locations as well as versioning of the activity (Arborist Script), used in the extraction of the data. I believe the implementation satisfies our initial requirements, or at least the requirements needed to write a resource paper regarding the BONSAI database. The proposal is very flexible, and can easily be extended or changed.

I would like to share the proposal by a pull quest. How do I get permission to do this?

Best regards
Emil


Michele De Rosa
 

Hi Emil, 

send me your GitHub username.

Mic


Emil Riis Hansen
 

Hi Michele


Username: IKnowLogic

I will share some more details soon before creating the PR

Thank you, Michele


Best Regards

Emil Riis Hansen


From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Michele De Rosa via Groups.Io <michele.derosa@...>
Sent: Friday, November 15, 2019 9:46:47 AM
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 
Hi Emil, 

send me your GitHub username.

Mic


Emil Riis Hansen
 

Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil


Matteo Lissandrini (AAU)
 

Hi Chris, Michele, everyone.


We would really like to get your input on Emil's pull request.

It is not a lot of code and it adds at least some minimum of metadata regarding the datasets and their provenance/lineage.

Please let us know if we can merge it in.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Emil Riis Hansen via Groups.Io <emilrh@...>
Sent: 03 December 2019 11:51:24
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 
Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil


Matteo Lissandrini (AAU)
 

Hi all,


the pull request is still pending,

Brandon had a nice comment that we are working on addressing.


Emil is anxious to get some more work done on this, I think we should not have his excitement fade ;)


If you all agree I would proceed merging the current pull request, and we will followup with more of them to enhance the current solution.


Please let us know.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: 06 December 2019 09:06:56
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 

Hi Chris, Michele, everyone.


We would really like to get your input on Emil's pull request.

It is not a lot of code and it adds at least some minimum of metadata regarding the datasets and their provenance/lineage.

Please let us know if we can merge it in.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Emil Riis Hansen via Groups.Io <emilrh@...>
Sent: 03 December 2019 11:51:24
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 
Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil


Massimo Pizzol
 

Fine for me


From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: Tuesday, December 17, 2019 10:43:46 AM
To: main@bonsai.groups.io
Cc: Christopher Mutel; Michele De Rosa
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 

Hi all,


the pull request is still pending,

Brandon had a nice comment that we are working on addressing.


Emil is anxious to get some more work done on this, I think we should not have his excitement fade ;)


If you all agree I would proceed merging the current pull request, and we will followup with more of them to enhance the current solution.


Please let us know.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: 06 December 2019 09:06:56
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 

Hi Chris, Michele, everyone.


We would really like to get your input on Emil's pull request.

It is not a lot of code and it adds at least some minimum of metadata regarding the datasets and their provenance/lineage.

Please let us know if we can merge it in.


Thanks,

Matteo



---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Emil Riis Hansen via Groups.Io <emilrh@...>
Sent: 03 December 2019 11:51:24
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 
Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil


Søren
 

Go for it!

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Massimo Pizzol via Groups.Io
Sent: 19 December 2019 13:11
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Fine for me


From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: Tuesday, December 17, 2019 10:43:46 AM
To: main@bonsai.groups.io
Cc: Christopher Mutel; Michele De Rosa
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi all,

 

the pull request is still pending,

Brandon had a nice comment that we are working on addressing.

 

Emil is anxious to get some more work done on this, I think we should not have his excitement fade ;)

 

If you all agree I would proceed merging the current pull request, and we will followup with more of them to enhance the current solution.

 

Please let us know.

 

Thanks,

Matteo

 

 

---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo





From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: 06 December 2019 09:06:56
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi Chris, Michele, everyone.

 

We would really like to get your input on Emil's pull request.

It is not a lot of code and it adds at least some minimum of metadata regarding the datasets and their provenance/lineage.

Please let us know if we can merge it in.

 

Thanks,

Matteo

 

 

---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo





From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Emil Riis Hansen via Groups.Io <emilrh@...>
Sent: 03 December 2019 11:51:24
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil


Matteo Lissandrini (AAU)
 

Thanks,


then we will proceed :)


---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo






From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Søren via Groups.Io <loekke@...>
Sent: 19 December 2019 13:37:31
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance
 

Go for it!

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Massimo Pizzol via Groups.Io
Sent: 19 December 2019 13:11
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Fine for me


From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: Tuesday, December 17, 2019 10:43:46 AM
To: main@bonsai.groups.io
Cc: Christopher Mutel; Michele De Rosa
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi all,

 

the pull request is still pending,

Brandon had a nice comment that we are working on addressing.

 

Emil is anxious to get some more work done on this, I think we should not have his excitement fade ;)

 

If you all agree I would proceed merging the current pull request, and we will followup with more of them to enhance the current solution.

 

Please let us know.

 

Thanks,

Matteo

 

 

---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo





From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Matteo Lissandrini (AAU) via Groups.Io <matteo@...>
Sent: 06 December 2019 09:06:56
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi Chris, Michele, everyone.

 

We would really like to get your input on Emil's pull request.

It is not a lot of code and it adds at least some minimum of metadata regarding the datasets and their provenance/lineage.

Please let us know if we can merge it in.

 

Thanks,

Matteo

 

 

---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo





From: main@bonsai.groups.io <main@bonsai.groups.io> on behalf of Emil Riis Hansen via Groups.Io <emilrh@...>
Sent: 03 December 2019 11:51:24
To: main@bonsai.groups.io
Subject: Re: [bonsai] Adding provenance #ontology #intro #provenance

 

Hi everyone,

I have prepared a pull request for the initial implementation of provenance.

The request extends upon the work by the Arborist working group, by an initial minimal provenance implementation, which adds lineage information between instances of entities and the EXIOBASE dataset, as well as provenance information regarding the Arborist script itself.
Further improvements will be needed and will arrive.

The request also fixes the issue of "missing dataset declarations" from the RDF repository.

Matteo has helped review the request before the submit, but I look forward to your feedback.

Pull request: https://github.com/BONSAMURAIS/arborist/pull/14
Issue: https://github.com/BONSAMURAIS/rdf/issues/3

Best Regards,
Emil