Re: #infrastructure New working group and practice guidelines
#infrastructure
On Thu, 4 Apr 2019 at 13:20, Matteo Lissandrini (AAU) <matteo@cs.aau.dk> wrote:
No problem, this is to be expected as we are still evolving the schema, and making sure our RDF is valid and implemented properly. However, at some point soon we should get to a point where the Aalborg server is considered stable, while db.b.u is still for playing. It actually isn't that easy to restore everything, as we need a relatively large amount of data currently (on the order of 3 gb for EXIOBASE, and 300 mb for the electricity stuff). The metadata is easy - arborist can rewrite the data in https://github.com/BONSAMURAIS/rdf, which can in turn be the foundation of the triple store. It would be nice to have a function that would take all these small turtle files and merge them into one file (which could then be uploaded to the triple store). In the medium-term, I don't think that it makes sense to store metadata for specific databases like exiobase in arborist - this can just as easily be part of the file including the actual data as well. We only evolved this code pathway because we were learning as we were going. Indeed, it is probably more clever in the long term to have https://github.com/BONSAMURAIS/rdf generated from the database itself. I think the small importer you wrote will work fine for smaller datasets, but we will need to do file uploads for larger ones, as they won't fit into memory (to be loaded by RDFLib). This should be easy to do, though there may be some Jena configuration bugs to work out still. So everything is in a bit of a flux, and it would be great if you could take charge of this little bit of it! Please document the hell out of stuff, so we don't have to bug you too much. Probably in the triplestore repo? -- ############################ Chris Mutel Technology Assessment Group, LEA Paul Scherrer Institut OHSA D22 5232 Villigen PSI Switzerland http://chris.mutel.org Telefon: +41 56 310 5787 ############################
|
|
Re: #infrastructure New working group and practice guidelines
#infrastructure
Matteo Lissandrini (AAU)
Hi Chris,
my importer is actually doing the file upload, this is the command I ran yesterday night ```bash for f in `find ../rdf -name '*.ttl'`; do bseeder -i $f; done ``` So you do not need to merge files in /rdf repo, actually if you do that you end up with a big problem: you lose track of which triples go in which named graph. In my view the RDF repo is for the instances of the taxonomies, small datasets that changes slowly (e.g., flow object/items or activity types). While the actual data would remain out of it. For very big files what we can do is: 1) upload them via scp/rsync to a dedicated directory on the server, 2) use the file importer utility provided by jena itself I understand that restoring is not easy, but we need to have it for reproducibility and for reliability (if bad things happen we may need to restore the database from scratch) Cheers, Matteo ________________________________________ From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io] Sent: Thursday, April 04, 2019 1:43 PM To: main@bonsai.groups.io Subject: Re: [bonsai] #infrastructure New working group and practice guidelines On Thu, 4 Apr 2019 at 13:20, Matteo Lissandrini (AAU) <matteo@cs.aau.dk> wrote: No problem, this is to be expected as we are still evolving the schema, and making sure our RDF is valid and implemented properly. However, at some point soon we should get to a point where the Aalborg server is considered stable, while db.b.u is still for playing. It actually isn't that easy to restore everything, as we need a relatively large amount of data currently (on the order of 3 gb for EXIOBASE, and 300 mb for the electricity stuff). The metadata is easy - arborist can rewrite the data in https://github.com/BONSAMURAIS/rdf, which can in turn be the foundation of the triple store. It would be nice to have a function that would take all these small turtle files and merge them into one file (which could then be uploaded to the triple store). In the medium-term, I don't think that it makes sense to store metadata for specific databases like exiobase in arborist - this can just as easily be part of the file including the actual data as well. We only evolved this code pathway because we were learning as we were going. Indeed, it is probably more clever in the long term to have https://github.com/BONSAMURAIS/rdf generated from the database itself. I think the small importer you wrote will work fine for smaller datasets, but we will need to do file uploads for larger ones, as they won't fit into memory (to be loaded by RDFLib). This should be easy to do, though there may be some Jena configuration bugs to work out still. So everything is in a bit of a flux, and it would be great if you could take charge of this little bit of it! Please document the hell out of stuff, so we don't have to bug you too much. Probably in the triplestore repo? -- ############################ Chris Mutel Technology Assessment Group, LEA Paul Scherrer Institut OHSA D22 5232 Villigen PSI Switzerland http://chris.mutel.org Telefon: +41 56 310 5787 ############################
|
|
Serializing large LD datasets
Maybe our approach to serializing large graphs is maybe not that great. You can see the current code here - basically, we convert Python to JSON line by line, with some text mangling. It sounds (and looks) a bit crazy; the idea behind this decision was that RDFLib can't really handle large datasets, such as BONSAI.
The latest straw was realizing that we need to declare a `dataset` for the actual data (not just metadata). In turtle, this is (for example):
In JSON-LD, if is... more involved:
Moreover, it is difficult for me to reason about why the JSON-LD is formatted the way that it is. On the other hand, the Turtle file is much nicer to read and predict.
We had said earlier (though without a formal decision) that we want to use JSON-LD for data interchange, but it would make life a lot easier to use Turtle, if people were OK with that. Let me know what you think!
|
|
Re: Serializing large LD datasets
Miguel Fernández Astudillo
Hi!
In the correspondence table group we struggled a bit when we had to move from Turtle to json-LD. We spend some time trying to figure out how to do it in JSON and ended up writing turtle. We found it easier to write and read and we were told there was an automatic code to translate one to the other. I prefer Turtle, but I am not aware of the advantages of JSON-LD.
Best,
Miguel
From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
Maybe our approach to serializing large graphs is maybe not that great. You can see the current code here - basically, we convert Python to JSON line by line, with some text mangling. It sounds (and looks) a bit crazy; the idea behind this decision was that RDFLib can't really handle large datasets, such as BONSAI.
In JSON-LD, if is... more involved:
Moreover, it is difficult for me to reason about why the JSON-LD is formatted the way that it is. On the other hand, the Turtle file is much nicer to read and predict.
|
|
Re: Serializing large LD datasets
+1 for turtle format Much easier to read and write.
|
|
Re: Serializing large LD datasets
Massimo Pizzol
No opinion here, I trust those who have already worked hands-on on this, and their choice. BR
From: <main@bonsai.groups.io> on behalf of "Agneta via Groups.Io" <agneta.20@...>
+1 for turtle format Much easier to read and write.
|
|
Re: #ontology Can we come up with a better term than "Flow Object"?
#ontology
I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md. Feel free to edit this if you think I have made a mistake.
> To re-iterate: Flow is a verb Flow can be a verb or a noun, and there is something to be said for having all the core terms be nouns (I think everything else is).
|
|
Re: BEP-0004 BONSAI knowledge management and communication strategy | open for discussion / seeking editor
I have created a bonsai.uno repo, which we need to fill out, to eventually replace the existing content of the website (this is included in BEP 4). The current website structure looks like:
Homepage Challenge and vision Organization Static downloads Strategy Many working group pages Archive Static downloads Become a member Contributions Here is the beginning of a new layout which emphasizes our concepts and work methods. I really think that the web page will be better for documentation than the wiki, as we can control the presentation more, and add a little white space so we don't have the "wall of text" effect. See the proposed BEP4 for a discussion of how best to use the different communication media. Homepage Vision (short) -> Common ontology for LCA, MFA, and IE -> Open data pipeline By the community, for the community -> Getting started guide -> GH projects repo Common ontology Data pipeline Getting started guide Basic technologies -> Contribute data -> Build web apps -> Using the API Community management Data reconciliation NPO (BONSAI non-profit organization) Become a member Archive of official documents One possible way to separate the content from the presentation by storing the text with some simple markup (e.g. Markdown) in a separate directory. @agneta and @romain, let's discuss how we can each participate. Perhaps we could start by better planning an outline, and writing down what we want to accomplish. Feel free to provide your thoughts and concerns.
|
|
Two votes - please participate!
Dear all-
1. If you haven't voted for or against BEP 1, please do it now! If not enough people participate, the proposal will automatically fail. 2. We have had a lively discussion on the terminology used in the ontology, and have several different options before us. It would be nice to get a sense of the broader groups preferences through an indicative, though not necessarily binding, vote. When multiple option are present, ranked choice voting (in this case in the form of instant runoff) is a decent polling choice. So please visit the list of candidates: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md, and reply to this email with your preferences in order by letter, from first to last. For example, here are my personal preferences: BDACFE Please rank all six possibilities, so we can get complete statistics.
|
|
#bonsamurai.github.io
Hey, I start a discussion here on the new bonsai.uno webpage.
Here is the structure suggested by Chris.
Did i get the hierarchy right?
|
|
Re: Two votes - please participate!
Massimo Pizzol
DCAFBE
|
|
Re: Two votes - please participate!
Matteo Lissandrini (AAU)
AFDCEB
From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Massimo Pizzol via Groups.Io [massimo@...]
Sent: Sunday, April 07, 2019 5:19 PM To: main@bonsai.groups.io Subject: Re: [bonsai] Two votes - please participate! DCAFBE
|
|
Re: #ontology Can we come up with a better term than "Flow Object"?
#ontology
Elias Sebastian Azzi
Hello, Reading up that long email thread I wrote a summary of the different views expressed. I also summarise an article that describes another ontology for IE, rather different vocabulary, hoping it will help us see the ontology from a different perspective.
Alpha / Summary
Issue - Human vocabulary for BONSAI's core ontology While there seems to be an agreement among the participants around the three core classes of the ontology (i.e. on their conceptual meaning), there is not yet a consensus on how these classes should be named in human readable language. There is however an agreement on the fact that the vocabulary used during the hackathon 2019 is not ideal. Most of the controversy lies in the term "flow object". This issue seems of high importance because it affects how people perceive the ontology, understand it and decide whether to take it up or not.
Below, we summarise the different views/suggestions on that issue, pros, cons and remarks.
V1. [Chris] "Flow object" is not consistent with the other terms of the ontology and is hard to related to. The alternative "item" is suggested. Pro: definition of item is "an individual article or unit, especially one that is part of a list, collection, or set" which fits in the concept. Pro: it echoes to fields of computer science and mathematics Remark: activities are also part of a list/collection/set, according to that definition activities are also items of a collection of activities.
V2. [Chris] "Flow" is good but has no natural counterpart. An alternative for "flow" could be "exchange".
V3. [Agneta] Return to the published LCA ontology (Kuczenski et al. 2016), with the three terms Activity (a thing that happens), Flow (a thing in the world that exists because of some instance of an Activity), and Exchange (an established relationship between an activity instance and a flow instance). Pro: (to verify) coherence with the vocabulary used by most industrial ecologist / (disagreement) in (1) the authors argue that terminology is not consistent between industrial ecologist, even for basic definitions. (1) Pauliuk, S.; Majeau-Bettez, G.; Müller, D. B.; Hertwich, E. G. Toward a Practical Ontology for Socioeconomic Metabolism. J. Ind. Ecol. 2016, 20 (6), 1260–1272; DOI 10.1111/jiec.12386.
V4. [Rutger] In ecospold1, only exchanges are defined. In ILCD data formats, both exchange and flows (i.e. flow objects) are defined. Environmental compartments are specified. In SimaPro platform, Flows do not include compartments, as in the Bonsai hackathon version. Exchange is not yet used, but is considered. At PRé, flow-objects are of two types: substances and products, but not perfect. Con: flow and exchange are both dynamic terms
V5. [Matteo] Flow and Flow-object in the post-hackathon ontology are clear and well defined: they relate the Flow and the Object of the Flow (aka the Flow Object). In other words, by keeping the word "flow" in both definitions their link and subtle difference is kept explicit and forces the new-comer to think twice about these definitions. Pro: all terms can be confusing, the advantage of Flow and Flow-object is that the difficulty is not hidden behind different terms, does not allow for misunderstanding to happen.
V6. [Bo] The vocabulary we use needs to distinguish between "the observation of a specific flow (22 kg input of steel) and the abstract flow-object (steel)".
V7. [Agneta] "hackathon vocabulary" -> "new vocabulary" Flow-object => Flow Flow => Exchange
Long List of Terms: Flow object, entity, object, flux, item, thing, element, substance, component, Noumenon, Flow-item, commodity Flow, Exchange, Phenomenon Activity
-------
Bravo / Looking at it from a different angle
This being said, I would like to add to the discussion the following points: - Matteo has a point: by using the work “flow” twice (in flow and flow-object) we keep the complexity explicit.
- We seem to agree on the structure, but finding the right words for human communication is tricky: do we have to choose? In the end, examples speak by themselves. We will choose a term now, but we can keep the list of alternatives: the list helps clarify things!
- Do we actually agree on the structure? Your discussions forced me to re-open that article by Pauliuk and co: they have the same goal as Bonsai, performed a review of all IE fields, and (wait for it) came up with a totally different wording. I would say that it is one level of abstraction higher than the current Bonsai ontology, and rather stimulating to read. Here some highlights: o Many inconsistencies of vocabulary and definitions exist within IE and even within certain fields e.g. LCA o Industrial ecologist describe socioeconomic metabolism by a bipartite directed graph (i.e. SUTs) or directed graph o Five key definitions: Definition 1, Sets: A set is a collection of distinct objects Definition 2, Hierarchical, mutually exclusive and collectively exhaustive (H-MECE) object classification: An HMECE object classification is a grouping of a given set of objects into an H-MECE collection of sets. Definition 3, Stock: A stock is a set of objects of interest. Definition 4, Process: A process is a set-based description of one or several events of interest, expressed in terms of the objects of interest that are involved in these events during their course. Definition 5, Flow: A flow is a description of a particular type of event, where objects are preserved and move from one set a to another set b. o In sounds very different, but when you read the article in details, all the issues we face are somehow discussed. Including how to handle the properties of objects of interest (see Figure 2) o Definition 2 is of interest for the correspondence table group
mvh Elias
From: main@bonsai.groups.io <main@bonsai.groups.io>
On Behalf Of Chris Mutel
Sent: den 5 april 2019 12:46 To: main@bonsai.groups.io Subject: Re: [bonsai] #ontology Can we come up with a better term than "Flow Object"?
I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md.
Feel free to edit this if you think I have made a mistake.
|
|
Re: Two votes - please participate!
Elias Sebastian Azzi
ADCFBE is my current preference.
mvh Elias
From: main@bonsai.groups.io <main@bonsai.groups.io>
On Behalf Of Matteo Lissandrini (AAU)
Sent: den 7 april 2019 17:44 To: main@bonsai.groups.io Subject: Re: [bonsai] Two votes - please participate!
AFDCEB
From:
main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Massimo Pizzol via Groups.Io [massimo@...] DCAFBE
|
|
Re: #ontology Can we come up with a better term than "Flow Object"?
#ontology
Andreas Ciroth
Dear all, interesting. As part of the discussion you may want to consider also the JSON-LD format names: http://greendelta.github.io/olca-schema/ In my view, process, flow, exchange is most commonly used (used in “our” JSON-LD format and in ILCD) and it is not too bad (meaning: short, not misleading; it is good to distinguish flows from exchanges). Yes, process, and flow, and also exchange can be a noun and a verb but this is common in English language. So, maybe, in view that there are really lots of things to do in LCA and data availability and LCA ontologies, it is maybe good to stick with this. Or, invent something really different. Point, line, square, e.g., would be different, for flow, exchange, process. All the best! Andreas
Von: main@bonsai.groups.io <main@bonsai.groups.io> Im Auftrag von Elias Sebastian Azzi
Hello, Reading up that long email thread I wrote a summary of the different views expressed. I also summarise an article that describes another ontology for IE, rather different vocabulary, hoping it will help us see the ontology from a different perspective.
Alpha / Summary
Issue - Human vocabulary for BONSAI's core ontology While there seems to be an agreement among the participants around the three core classes of the ontology (i.e. on their conceptual meaning), there is not yet a consensus on how these classes should be named in human readable language. There is however an agreement on the fact that the vocabulary used during the hackathon 2019 is not ideal. Most of the controversy lies in the term "flow object". This issue seems of high importance because it affects how people perceive the ontology, understand it and decide whether to take it up or not.
Below, we summarise the different views/suggestions on that issue, pros, cons and remarks.
V1. [Chris] "Flow object" is not consistent with the other terms of the ontology and is hard to related to. The alternative "item" is suggested. Pro: definition of item is "an individual article or unit, especially one that is part of a list, collection, or set" which fits in the concept. Pro: it echoes to fields of computer science and mathematics Remark: activities are also part of a list/collection/set, according to that definition activities are also items of a collection of activities.
V2. [Chris] "Flow" is good but has no natural counterpart. An alternative for "flow" could be "exchange".
V3. [Agneta] Return to the published LCA ontology (Kuczenski et al. 2016), with the three terms Activity (a thing that happens), Flow (a thing in the world that exists because of some instance of an Activity), and Exchange (an established relationship between an activity instance and a flow instance). Pro: (to verify) coherence with the vocabulary used by most industrial ecologist / (disagreement) in (1) the authors argue that terminology is not consistent between industrial ecologist, even for basic definitions. (1) Pauliuk, S.; Majeau-Bettez, G.; Müller, D. B.; Hertwich, E. G. Toward a Practical Ontology for Socioeconomic Metabolism. J. Ind. Ecol. 2016, 20 (6), 1260–1272; DOI 10.1111/jiec.12386.
V4. [Rutger] In ecospold1, only exchanges are defined. In ILCD data formats, both exchange and flows (i.e. flow objects) are defined. Environmental compartments are specified. In SimaPro platform, Flows do not include compartments, as in the Bonsai hackathon version. Exchange is not yet used, but is considered. At PRé, flow-objects are of two types: substances and products, but not perfect. Con: flow and exchange are both dynamic terms
V5. [Matteo] Flow and Flow-object in the post-hackathon ontology are clear and well defined: they relate the Flow and the Object of the Flow (aka the Flow Object). In other words, by keeping the word "flow" in both definitions their link and subtle difference is kept explicit and forces the new-comer to think twice about these definitions. Pro: all terms can be confusing, the advantage of Flow and Flow-object is that the difficulty is not hidden behind different terms, does not allow for misunderstanding to happen.
V6. [Bo] The vocabulary we use needs to distinguish between "the observation of a specific flow (22 kg input of steel) and the abstract flow-object (steel)".
V7. [Agneta] "hackathon vocabulary" -> "new vocabulary" Flow-object => Flow Flow => Exchange
Long List of Terms: Flow object, entity, object, flux, item, thing, element, substance, component, Noumenon, Flow-item, commodity Flow, Exchange, Phenomenon Activity
-------
Bravo / Looking at it from a different angle
This being said, I would like to add to the discussion the following points:
Definition 1, Sets: A set is a collection of distinct objects Definition 2, Hierarchical, mutually exclusive and collectively exhaustive (H-MECE) object classification: An HMECE object classification is a grouping of a given set of objects into an H-MECE collection of sets. Definition 3, Stock: A stock is a set of objects of interest. Definition 4, Process: A process is a set-based description of one or several events of interest, expressed in terms of the objects of interest that are involved in these events during their course. Definition 5, Flow: A flow is a description of a particular type of event, where objects are preserved and move from one set a to another set b.
mvh Elias
From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md. Feel free to edit this if you think I have made a mistake.
|
|
Re: #ontology Can we come up with a better term than "Flow Object"?
#ontology
mmremolona@...
Hi all, Flow -> Right now this term is used to refer to the transfer of material or objects from an activity (as an output) to another activity (as an input), thereby connecting these two activities. In my opinion, changing this term to exchange does not affect the overall understanding of the ontology. Either would work. I can have a material flow from one activity to another activity. Flow-object -> This is defined as an object that is referenced in a flow. Many flows can reference a single instance of a flow-object. I think this is where confusion may set in, as a flow-object can be imagined as an instance of flow. And I agree with Chris that this doesn’t sound right when talking about it. It just doesn’t seem natural to mention a flow object. I don’t think flow itself works here as the word flow doesn’t equate to any object or material. My initial idea to fix this is by making flow an adjective, as in the case of Flowing-Object. However, this doesn’t sound right in language as well. My previous argument for the flow-item would then be reversed; coal steel and those solid objects do not necessarily flow. My secondary idea involves using the term Exchanged-Object. This is not necessarily related to the first term flow, but both can be adapted so that it sounds more congruent overall. This also sounds better as the question that arises from it sounds better in English (e.g. What’s the exchanged-object between the two activities you mentioned? In this flow, what’s the exchanged-object?) TLDR:
Best,
Miguel Remolona
|
|
5.4.19 Catch-up meeting minutes and next meeting planning
Next catch-up meeting
We will have another catch-up meeting on 12.4.19 at 15:00 CEST, and then skip the next week (19.4.19) due to Easter holidays. 5.4.19 Catch-up meeting minutes Correspondence tables
Started https://github.com/BONSAMURAIS/grafter tool to change 1-1 CSVs to RDF with actual predicates
Added some new tables, and metadata to existing tables
Priority is EXIOBASE - ENTSO-E, as we need this for first proof on concept deliverable
Data cleaning/conversion now is laborious and manual, need a better way. See an example here: https://github.com/BONSAMURAIS/Correspondence-tables/blob/master/scripts/from_raw_to_clean_tables.ipynb
Communication
Need a clean and prominent place to summarize existing repos, their functions, and their interdependencies (one possible overview from Tom Millross is attached)
Could be on wiki or bonsai README
bonsai.uno website rework is starting, repo here: https://github.com/BONSAMURAIS/bonsai.uno
Ontology
Discussion on nomenclature is ongoing, with several creative solutions proposed
Adaptation of existing probability ontology is difficult due to all examples being XML; volunteers to help adapt this ontology please contact Agneta
Move away from JSON-LD and towards Turtle as default exchange format for RDF data, due to readability and ease of programming
System model / calculation interface
Work and documentation is proceeding after the hackathon, such as procedures for dis/aggregation (e.g. Aggregating different types of gas with different calorific values)
REST endpoints to be defined and documented
Outreach
Miguel A. will attend https://forum.openmod-initiative.org/t/aarhus-2019-workshop/1126
Those attending LCM 2019 will hold an outreach event, with organizing and content support from others in the BONSAI team
|
|
Re: 5.4.19 Catch-up meeting minutes and next meeting planning
Repo overview attachment
|
|
Re: #bonsamurai.github.io
Maybe easier to split it up into actual URLs:
Note that the following is just one possibility, and will be changed now and in the future. Our aim is to make such changes easy. bonsai.uno
bonsai.uno/ontology
bonsai.uno/data-pipeline
bonsai.uno/getting-started
bonsai.uno/getting-started/contribute-data bonsai.uno/getting-started/our-api bonsai.uno/getting-started/others as we develop bonsai.uno/community
bonsai.uno/FAQs
bonsai.uno/NPO
To do:
|
|
Re: Two votes - please participate!
FYI:
1. The vote on BEP 1 is trending towards acceptance; the voting will stop if two more people participate and approve. 2. We currently have 5 votes in our nomenclature discussion. Here are the average ranks:
|
|