First, it is important to distinguish between:

1) What is "raw data" in a BONSAI context, namely the data as they are received from elsewhere. These data may be either direct measurements (very rarely) or previously more or less processed (in the case of Exiobase definitively more so), with or without explicit previous provenance. For these data, it is obviously sufficient to report the direct source, as it is received (example: Exiobase version NNh, downloaded from URL at Time) which is then applicable to all datapoints within that dataset.

2) Data that are corrected or otherwise manipulated after receipt, in which case it is relevant to add the nature of the correction or calculation, and a timestamp for the changed dataset (but not for the parts unchanged). In this way, one can always trace the origin of any datum to the form it originally was provided to BONSAI.

As ambitions and resources increase, someone may later want to add further upstream provenance to the data in BONSAI, which is of course always possible and desirable.

Den 2019-11-13 kl. 14.42 skrev Agneta:

Thanks for the document Bo

The document recommends timestamping of the datapoints and query outputs. Although I am unsure what degree will we be able to add provenance to each value on Exiobase. Although Exiobase does use data from multiple sources it adds some algorithms to provide a balanced dataset. In other words, its a secondary dataset (primary datasets are those which contain raw data)

If in future some values are changed, this leads to the publication of a new version of the dataset. So the provenance for all values in exiobase is generated asĀ exiobase + (specific version).
My question is do we need to have provenance of individual values in a secondary dataset. Its different when we have minute by minute information of temperature change in a region (raw data). Here the timestamping of individual values might be more relevant.

