Topics

Start of the #ontology sub-group #ontology


Matteo Lissandrini (AAU)
 

Thanks Massimo,
now I would like to get a consensus from you and the other domain experts about the following statement you wrote:

- the determining flow is always a product _output_ flow

If this is true, then we have that Flow has two sub classes Input / Output, and Output has a the subclass Determining Flow, and there will be exactly one instance of Determining Flow associated with each activity in the database.

Is there in any dataset you have at hand, a case where this is not true?

*) names are provisional



From: hackathon2019@bonsai.groups.io [hackathon2019@bonsai.groups.io] on behalf of Massimo Pizzol via Groups.Io [massimo@...]
Sent: Monday, March 11, 2019 8:33 PM
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

(Disclaimer: I am simplifying things a bit here I hope the LCA people will forgive me)

Dear Matteo

I believe you have understood how it works, but there are some other details that perhaps you should know. Examples: If an activity produces electricity from burning coal, the determining flow is the product output flow ‘electricity’. If an activity produces simultaneously electricity AND heat from coal, then either electricity OR heat will be the determining flow. Waste example: You described an activity that converts waste into something else. For example this could be incinerating municipal solid waste to generate electricity. In this case the ‘treatment of municipal solid waste’ is the determining product output flow of the ‘waste incineration’ activity that has also another product output flow of ‘electricity’. Electricity is not the determining flow here because as you rightly concluded we burn  waste because we want to get rid of it (or in other words: we don’t produce more waste just because we want more electricity...).

Summing up:
- the determining flow is always a product output flow
- activities can have multiple product output flows, but
- there is only one determining flow per each activity
- ‘product’ is a generic term that includes both ‘goods’ (e.g. coal) and ‘services’ (e.g. treatment of waste)

Now the confusing thing here is that ‘waste treatment’ is a product flow (a service in fact) but *sounds* like an activity. Same with ‘transport’. So the next question for the LCA people in this thread is: how are we going to represent waste flows in the schema? My only reference for names is ecoinvent but I don’t think that is really super understandable (my students generally have a hard time understanding them, for example)

Massimo




On 11 Mar 2019, at 16.51, Matteo Lissandrini (AAU) via Groups.Io <matteo@...> wrote:

What is the utility and the actual definition of a "reference flow"?

The more semantically precise term is actually "determining flow".

The definition is: "Flow of an activity for which a change will affect
the production volume of the activity"

The utility is to be able to distinguish the flow that drives (causes)
the activity from flows that are caused by the activity.

Now I see,
so, as you suggested earlier, a waste can be the determining flow as input of a waste disposal activity that produces something else, because we need to dispose of this waste.

Is this correct?


Thanks a lot for the clarification.

Matteo




---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo









 

I think it would be a mistake to bake too many restrictions into the
general framework. There is a certain mental model that prevails in
LCA, but we don't want BONSAI to accept these restrictions at the
beginning unless they are absolutely necessary, and BONSAI is not just
for LCA (e.g. should also be useful for MFA). For now it might be
worth skipping the determining flow completely, as it doesn't seem
necessary for the hackathon.

Determining flows are not always outputs, treatment of waste by
landfill has waste as a determining flow input.


Massimo Pizzol
 

Chris is right that one can use a negative (= input) reference flow. I just never use this approach and I forgot, my mistake.

I don’t see how we can skip the reference flow concept though if we are going to work with LCA data (deliverable 2 and 3).

Massimo

From: <hackathon2019@bonsai.groups.io> on behalf of "Chris Mutel via Groups.Io" <cmutel@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Monday, 11 March 2019 at 22.32
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

I think it would be a mistake to bake too many restrictions into the

general framework. There is a certain mental model that prevails in

LCA, but we don't want BONSAI to accept these restrictions at the

beginning unless they are absolutely necessary, and BONSAI is not just

for LCA (e.g. should also be useful for MFA). For now it might be

worth skipping the determining flow completely, as it doesn't seem

necessary for the hackathon.

 

Determining flows are not always outputs, treatment of waste by

landfill has waste as a determining flow input.

 

 

 


Bo Weidema
 

Den 2019-03-11 kl. 22.31 skrev Chris Mutel:

For now it might be worth skipping the determining flow completely, as it doesn't seem necessary for the hackathon.
Not having this concept will mean a loss of information when importing from e.g. EXIObase or ecoinvent.

Bo


Michele De Rosa
 

Good point Massimo. In Fact, the output flow of the activity "Waste Incineration" should be "TREATED municipal solid waste" and not "Treatment of municipal solid waste"
Michele


 

Dear all-

I very much appreciate the work and active participation of people in
this working group! Unfortunately, I must make your lives a little bit
harder :)

1. The final proposal should include not just what you have found
consensus on, but also the alternatives you have considered, and why
they were not chosen. This has two purposes: to stop people from
bringing up the same issues over and over again, and to communicate
that you made informed decisions.

So, for example, when we debate over the modelling of waste treatment,
we should be drawing simple models of each possibility, and then
discussing the practical effects of these models. I don't think it is
sufficient (certainly not in the long term, maybe for the hackathon)
to just assert that it works like this, because I know/am smart and
have thought about it.

2. While I completely agree that in the scope of LCA flow objects are
universal, while activities are located in time and space, we still
need to be able to enter other types of data, such as:
- GDP/population of a country over a time interval
- Recycling rate of different materials in a country (independent of a
particular recycling activity, as this is not specified in the input
data - could be linked later by the system model)
- Total amount of CO2/other emissions observed at a specific spatial
scale over a particular time

3. Simple is better than complex, even if it loses a little bit of
"realism". The lesson that I have learned when re-implementing some of
the modelling choices in version 3 of ecoinvent is that even good
ideas can have weird and unpredictable side effects when combined with
other seemingly good ideas. People appreciate models that they can
understand completely in a few minutes!

On Tue, 12 Mar 2019 at 09:10, <michele.derosa@bonsai.uno> wrote:

Good point Massimo. In Fact, the output flow of the activity "Waste Incineration" should be "TREATED municipal solid waste" and not "Treatment of municipal solid waste"
Michele
--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################


Stefano Merciai
 

Hi,

I think that using negative inputs to indicate an outputs is already a complication that may be not clear for many people. I think that Bonsai can be also used by the IO community, not just by LCA. IO practitioners do not like negatives.

Then, for example, in Exiobase (or in the WIOM of Nakamura and Kondo) the determining product (or principal production or reference product) of waste activities is a waste service, for example the service of recycling waste. This to say that perhaps we should agree on the framework that we are going to use. I think there is not unanimous consensus so better to spend some time for deciding the approach to adopt.

Stefano




On 12/03/2019 00:10, Massimo Pizzol wrote:

Chris is right that one can use a negative (= input) reference flow. I just never use this approach and I forgot, my mistake.

I don’t see how we can skip the reference flow concept though if we are going to work with LCA data (deliverable 2 and 3).

Massimo

From: <hackathon2019@bonsai.groups.io> on behalf of "Chris Mutel via Groups.Io" <cmutel@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Monday, 11 March 2019 at 22.32
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

I think it would be a mistake to bake too many restrictions into the

general framework. There is a certain mental model that prevails in

LCA, but we don't want BONSAI to accept these restrictions at the

beginning unless they are absolutely necessary, and BONSAI is not just

for LCA (e.g. should also be useful for MFA). For now it might be

worth skipping the determining flow completely, as it doesn't seem

necessary for the hackathon.

 

Determining flows are not always outputs, treatment of waste by

landfill has waste as a determining flow input.

 

 

 


-- 
Best,
S.


Bo Weidema
 

Den 2019-03-12 kl. 10.28 skrev Chris Mutel:

2. While I completely agree that in the scope of LCA flow objects are universal, while activities are located in time and space, we still
need to be able to enter other types of data, such as:
- GDP/population of a country over a time interval
- Recycling rate of different materials in a country (independent of a particular recycling activity, as this is not specified in the input data - could be linked later by the system model)
- Total amount of CO2/other emissions observed at a specific spatial scale over a particular time
The three examples of data you mention are not (raw) data inputs, but rather outputs from querying the database, i.e. they can all be calculated from the raw data. As such these three examples are weel siuted as "competency questions" as requested by Matteo.

Best regards

Bo


Bo Weidema
 

Just to chip in on the discussion on waste and waste treatment activities:

- The problem that a service activity often has the same name as the service that it provides is a well-known problem. It is not solved by inventing new speculative unique names, but rather by linking the instances of the names to their classes (activity or flow-object).

- Principle: We try to avoid making fixed choices, like sign nomenclatures, that are only useful in specific contexts.

- Principle: It is a good practice for a model to stay as close to reality as possible

- Principle: Do not introduce unneccesary (obligatory) classifications

Therefore:

- Wastes, by-products and emissions do not need to be distinguished. Following the physical reality, they are all just non-determining output flows of the activity that produces them and determining input flows to the activity that is activated by their prescence (waste treatment for wastes, recycling activities for by-products for treatment, markets for by-products that do not need treatment, ecological fate activities for emissions). The fact that some calculations require that non-determining outputs are calculated as negative inputs does not mean that the database needs to use such artificial conventions.

Best regards

Bo


 

On Tue, 12 Mar 2019 at 11:34, Bo Weidema <bo.weidema@bonsai.uno> wrote:

Den 2019-03-12 kl. 10.28 skrev Chris Mutel:

2. While I completely agree that in the scope of LCA flow objects are universal, while activities are located in time and space, we still
need to be able to enter other types of data, such as:
- GDP/population of a country over a time interval
- Recycling rate of different materials in a country (independent of a particular recycling activity, as this is not specified in the input data - could be linked later by the system model)
- Total amount of CO2/other emissions observed at a specific spatial scale over a particular time
The three examples of data you mention are not (raw) data inputs, but
rather outputs from querying the database, i.e. they can all be
calculated from the raw data. As such these three examples are weel
siuted as "competency questions" as requested by Matteo.
I guess I am missing something here - I was imagining a system where
GDP, etc. would exactly be raw data inputs, and used to
validate/estimate error on how much of the economy/whatever our system
is able to model. I can't see a CSV from the World Bank being anything
other than a raw data input... ? To me, one of the substantial
advancements of BONSAI is that we are using these new sources of data,
either directly, or as factors in allocating/disaggregating, or as
validation/sanity checks. We want to be explicit about how we
reconcile different sources which are representations of the same data
point.

Best regards

Bo





--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################


Matteo Lissandrini (AAU)
 

1. The final proposal should include not just what you have found
consensus on, but also the alternatives you have considered, and why
they were not chosen. This has two purposes: to stop people from
bringing up the same issues over and over again, and to communicate
that you made informed decisions.

So, for example, when we debate over the modelling of waste treatment,
we should be drawing simple models of each possibility, and then
discussing the practical effects of these models. I don't think it is
sufficient (certainly not in the long term, maybe for the hackathon)
to just assert that it works like this, because I know/am smart and
have thought about it.

I definetely agree on this.
We should probably have a shared notebook for this, I think we will lose track of emails.

A document on github and using issues?
Some other form of collaborative writing?


Thanks,
Matteo


________________________________________
From: hackathon2019@bonsai.groups.io [hackathon2019@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io]
Sent: Tuesday, March 12, 2019 10:28 AM
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

Dear all-

I very much appreciate the work and active participation of people in
this working group! Unfortunately, I must make your lives a little bit
harder :)

1. The final proposal should include not just what you have found
consensus on, but also the alternatives you have considered, and why
they were not chosen. This has two purposes: to stop people from
bringing up the same issues over and over again, and to communicate
that you made informed decisions.

So, for example, when we debate over the modelling of waste treatment,
we should be drawing simple models of each possibility, and then
discussing the practical effects of these models. I don't think it is
sufficient (certainly not in the long term, maybe for the hackathon)
to just assert that it works like this, because I know/am smart and
have thought about it.

2. While I completely agree that in the scope of LCA flow objects are
universal, while activities are located in time and space, we still
need to be able to enter other types of data, such as:
- GDP/population of a country over a time interval
- Recycling rate of different materials in a country (independent of a
particular recycling activity, as this is not specified in the input
data - could be linked later by the system model)
- Total amount of CO2/other emissions observed at a specific spatial
scale over a particular time

3. Simple is better than complex, even if it loses a little bit of
"realism". The lesson that I have learned when re-implementing some of
the modelling choices in version 3 of ecoinvent is that even good
ideas can have weird and unpredictable side effects when combined with
other seemingly good ideas. People appreciate models that they can
understand completely in a few minutes!

On Tue, 12 Mar 2019 at 09:10, <michele.derosa@bonsai.uno> wrote:

Good point Massimo. In Fact, the output flow of the activity "Waste Incineration" should be "TREATED municipal solid waste" and not "Treatment of municipal solid waste"
Michele


--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################


Agneta
 

YES PLEASE! There have been lots of interesting suggestions from everyone but I am not sure if this is the best medium to maintain such a discussion
I second the the suggestion for a document on github.


Thanks
Agneta



On Tue, 12 Mar 2019 at 12:07, Matteo Lissandrini (AAU) <matteo@...> wrote:


> 1. The final proposal should include not just what you have found
> consensus on, but also the alternatives you have considered, and why
> they were not chosen. This has two purposes: to stop people from
> bringing up the same issues over and over again, and to communicate
> that you made informed decisions.
>
> So, for example, when we debate over the modelling of waste treatment,
> we should be drawing simple models of each possibility, and then
> discussing the practical effects of these models. I don't think it is
> sufficient (certainly not in the long term, maybe for the hackathon)
> to just assert that it works like this, because I know/am smart and
> have thought about it.
>


I definetely agree on this.
We should probably have a shared notebook for this, I think we will lose track of emails.

A document on github and using issues?
Some other form of collaborative writing?


Thanks,
Matteo


________________________________________
From: hackathon2019@bonsai.groups.io [hackathon2019@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io]
Sent: Tuesday, March 12, 2019 10:28 AM
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

Dear all-

I very much appreciate the work and active participation of people in
this working group! Unfortunately, I must make your lives a little bit
harder :)

1. The final proposal should include not just what you have found
consensus on, but also the alternatives you have considered, and why
they were not chosen. This has two purposes: to stop people from
bringing up the same issues over and over again, and to communicate
that you made informed decisions.

So, for example, when we debate over the modelling of waste treatment,
we should be drawing simple models of each possibility, and then
discussing the practical effects of these models. I don't think it is
sufficient (certainly not in the long term, maybe for the hackathon)
to just assert that it works like this, because I know/am smart and
have thought about it.

2. While I completely agree that in the scope of LCA flow objects are
universal, while activities are located in time and space, we still
need to be able to enter other types of data, such as:
- GDP/population of a country over a time interval
- Recycling rate of different materials in a country (independent of a
particular recycling activity, as this is not specified in the input
data - could be linked later by the system model)
- Total amount of CO2/other emissions observed at a specific spatial
scale over a particular time

3. Simple is better than complex, even if it loses a little bit of
"realism". The lesson that I have learned when re-implementing some of
the modelling choices in version 3 of ecoinvent is that even good
ideas can have weird and unpredictable side effects when combined with
other seemingly good ideas. People appreciate models that they can
understand completely in a few minutes!

On Tue, 12 Mar 2019 at 09:10, <michele.derosa@...> wrote:
>
> Good point Massimo. In Fact, the output flow of the activity "Waste Incineration" should be "TREATED municipal solid waste" and not "Treatment of municipal solid waste"
> Michele
>



--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################








--
Agneta Ghose, PhD 
Post doc, The Danish Centre for Environmental Assessment  
Aalborg University
Rendsburggade 14
Aalborg 9000
Denmark 
( +45 93 56 2051



Massimo Pizzol
 

Sorry if sounded smart or know-all I just wanted to explain to Matteo some of the issues in layman terms. I apologized for my mistake on the negative input straight away so I assume we can make peace now.

 

>>> The final proposal should include not just what you have found consensus on, but also the alternatives you have considered, and why they were not chosen

I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion.

In this working group we are discussing the schema proposed by Matteo. Points of discussion:

 

  • The use and necessity of using the “input” and “output” subclasses has been discussed. Contra: seems redundant when there is already a property. Pro: useful for filtering activities later on. Decision needed.
  • The use and necessity of a reference flow.
    • Not clear if it should be a class, subclass, or property. Pro/contra missing.
    • Not clear if it should always be output or could be input. à Clarified: mathematically it can be both but the convention choice has implications (e.g. IO people like it output and positive). Problem: According to Matteo having ref flow both input and output is problematic in the schema. Reason still not clear though.
    • Leave it out. Contra:  information loss when importing from / exporting to LCA/IO format; without it we can’t determine causality. Pro: makes the model less complex.
  • Environmental exchanges and waste flows missing in the schema, how to include them:
    • I suggested that we could either 1) create a class “Substance” (or other meaningful name) similar to class “Product” or  2) remove class “Product” and just keep class “Flow” that would be valid for both environmental and product exchanges.
    • Bo argued that “Wastes, by-products and emissions do not need to be distinguished.” How does this translate in practice in the schema, is not clear yet. Perhaps as in the point 2. above?
    • Other solutions?

 

BR
Massimo


Bo Weidema
 

Dear Chris,

Yes, of course in principle you can store the GDP/person of a country over a time interval (e.g. from World Bank) in the database :

- Activity: All economic activities (defined as those that have monetary labour costs, net taxes and/or net operating surplus)

- Flow-object: Value added (= labour costs, net taxes and net operating surplus)

- Flow-property: Monetary value

- Property-relation: Person

but normally we would calculate that by summing the value added over all activities in the database for that country in that time period and dividing by the population, which is why I said it was a query output. But you are right that you could use this calculated value to compare with that of the World Bank.

And likewise for the CO2 emission / country:

- Activity: All

- Flow-object: carbon dioxide

And likewise the recycling rate for a material of a country could be stored as the output of the national market for that material for recycling with a property-relation to the output of the market for the material (virgin + recycled).

Also in these cases, the external "raw" value can be compared to the calculated from the more specific data in the database. But behind these "raw" data from e.g. World Bank, there are of course other databases that have summed over other specific data...

Best regards

Bo


 

On Tue, 12 Mar 2019 at 12:57, Bo Weidema <bo.weidema@bonsai.uno> wrote:

Dear Chris,

Yes, of course in principle you can store the GDP/person of a country
over a time interval (e.g. from World Bank) in the database :

- Activity: All economic activities (defined as those that have monetary
labour costs, net taxes and/or net operating surplus)

- Flow-object: Value added (= labour costs, net taxes and net operating
surplus)

- Flow-property: Monetary value

- Property-relation: Person

but normally we would calculate that by summing the value added over all
activities in the database for that country in that time period and
dividing by the population, which is why I said it was a query output.
But you are right that you could use this calculated value to compare
with that of the World Bank.

And likewise for the CO2 emission / country:

- Activity: All

- Flow-object: carbon dioxide

And likewise the recycling rate for a material of a country could be
stored as the output of the national market for that material for
recycling with a property-relation to the output of the market for the
material (virgin + recycled).

Also in these cases, the external "raw" value can be compared to the
calculated from the more specific data in the database. But behind these
"raw" data from e.g. World Bank, there are of course other databases
that have summed over other specific data...
Thanks Bo, this was really helpful for me (and hopefully for others) -
it shows the power of what you have developed over the last years, and
really helps me understand it on a more fundamental level.

It seems to me like this should be one of the examples included in the
initial proposal, as it shows the comprehensiveness of the system, as
well as how it can handle different scopes (not just space and time).

Best regards

Bo



--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################


Massimo Pizzol
 

>>> I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion. In this working group we are discussing the schema proposed by Matteo. Points of discussion: […]

 

I have started drafting a “PEP 0003 ontology” document to have an idea of how it should look like and it’s available here.

I mailed with Chris quickly and I understood that what we are supposed to do is:

1. first to clarify the points of discussion in my previous mail (+ others of course) and

2. only after we have reached a consensus (or non-consensus) update the document and include it to the bonsai repository (via pull request).

 

BR
Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Massimo Pizzol via Groups.Io" <massimo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Tuesday, 12 March 2019 at 12.24
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

Sorry if sounded smart or know-all I just wanted to explain to Matteo some of the issues in layman terms. I apologized for my mistake on the negative input straight away so I assume we can make peace now.

 

>>> The final proposal should include not just what you have found consensus on, but also the alternatives you have considered, and why they were not chosen

I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion.

In this working group we are discussing the schema proposed by Matteo. Points of discussion:

 

  • The use and necessity of using the “input” and “output” subclasses has been discussed. Contra: seems redundant when there is already a property. Pro: useful for filtering activities later on. Decision needed.
  • The use and necessity of a reference flow.
    • Not clear if it should be a class, subclass, or property. Pro/contra missing.
    • Not clear if it should always be output or could be input. à Clarified: mathematically it can be both but the convention choice has implications (e.g. IO people like it output and positive). Problem: According to Matteo having ref flow both input and output is problematic in the schema. Reason still not clear though.
    • Leave it out. Contra:  information loss when importing from / exporting to LCA/IO format; without it we can’t determine causality. Pro: makes the model less complex.
  • Environmental exchanges and waste flows missing in the schema, how to include them:
    • I suggested that we could either 1) create a class “Substance” (or other meaningful name) similar to class “Product” or  2) remove class “Product” and just keep class “Flow” that would be valid for both environmental and product exchanges.
    • Bo argued that “Wastes, by-products and emissions do not need to be distinguished.” How does this translate in practice in the schema, is not clear yet. Perhaps as in the point 2. above?
    • Other solutions?

 

BR
Massimo


Brandon Kuczenski
 

Massimo: it seems your BEP is not publicly viewable. (i get 404: https://github.com/massimopizzol/enhancements/blob/master/beps/0003-bep-ontology.md )
I will save my not-so-humble-opinions for the BEP.
-Brandon


Massimo Pizzol
 

>>>I have started drafting a “PEP 0003 ontology” document t

I moved it here which seems a more appropriate location, sorry for late notice.

Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Massimo Pizzol via Groups.Io" <massimo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Tuesday, 12 March 2019 at 13.48
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

>>> I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion. In this working group we are discussing the schema proposed by Matteo. Points of discussion: […]

 

I have started drafting a “PEP 0003 ontology” document to have an idea of how it should look like and it’s available here.

I mailed with Chris quickly and I understood that what we are supposed to do is:

1. first to clarify the points of discussion in my previous mail (+ others of course) and

2. only after we have reached a consensus (or non-consensus) update the document and include it to the bonsai repository (via pull request).

 

BR
Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Massimo Pizzol via Groups.Io" <massimo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Tuesday, 12 March 2019 at 12.24
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

Sorry if sounded smart or know-all I just wanted to explain to Matteo some of the issues in layman terms. I apologized for my mistake on the negative input straight away so I assume we can make peace now.

 

>>> The final proposal should include not just what you have found consensus on, but also the alternatives you have considered, and why they were not chosen

I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion.

In this working group we are discussing the schema proposed by Matteo. Points of discussion:

 

  • The use and necessity of using the “input” and “output” subclasses has been discussed. Contra: seems redundant when there is already a property. Pro: useful for filtering activities later on. Decision needed.
  • The use and necessity of a reference flow.
    • Not clear if it should be a class, subclass, or property. Pro/contra missing.
    • Not clear if it should always be output or could be input. à Clarified: mathematically it can be both but the convention choice has implications (e.g. IO people like it output and positive). Problem: According to Matteo having ref flow both input and output is problematic in the schema. Reason still not clear though.
    • Leave it out. Contra:  information loss when importing from / exporting to LCA/IO format; without it we can’t determine causality. Pro: makes the model less complex.
  • Environmental exchanges and waste flows missing in the schema, how to include them:
    • I suggested that we could either 1) create a class “Substance” (or other meaningful name) similar to class “Product” or  2) remove class “Product” and just keep class “Flow” that would be valid for both environmental and product exchanges.
    • Bo argued that “Wastes, by-products and emissions do not need to be distinguished.” How does this translate in practice in the schema, is not clear yet. Perhaps as in the point 2. above?
    • Other solutions?

 

BR
Massimo


Massimo Pizzol
 

Dear Ontology/RDF group

 

We have a meeting Friday and I would like to share some points for discussion.

 

I am thinking a lot about our ontology and there are two pressing issues that I hope we can clarify.

 

  1. The use of “input” and “output” subclasses

 

Bo has suggested this below as arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.

 

>>> Principle: We try to avoid making fixed choices, like sign nomenclatures, that are only useful in specific contexts.

>>> Principle: It is a good practice for a model to stay as close to reality as possible

>>> Principle: Do not introduce unneccesary (obligatory) classifications

 

I agree with these principles and I think it makes sense not to have the subclasses product/waste/etc. My problem is that I don’t see how the choice of using the “input” and “output” subclasses fits with these principles. It is a sign convention, useful in specific context, and it is an obligatory classification. I don’t know if classifying things in “input” and “output” is close to reality more than classifying them in “products”, “emission”, “waste”. Thus, my preference is to remove the “input” and “output” subclasses and keep only the “isInputof” and “isOutputof” predicates.

 

So far the arguments for using the input and output subclasses have been:

 

>>> at the ontology level to restrict the domain of the input and output relationships.

I am not totally clear on what this means. My concern is whether the use of subclasses unnecessarily increases the complexity of the model because – assuming I have understood thing correctly - there would be two instances of e.g. a “coal” flow. One is the “coal input flow” and the other is the “coal output flow” each of them with a different URI. So if you are looking at the instance “electricity production” you will find it is related to a specific URI for coal input,  and if you are looking at the instance of “coal production” you will find it’s related to another different URI for coal output. So the same thing (coal) in the physical reality is now described by two different codes.

>>>Assume you are looking at a specific instance of a 10tonnes of coal in your database, then you ask yourself “is this an input for something or an output of something?”

My view is that in the physical reality 10 kg coal is not the output or the input of something in absolute terms. It is just coal, i.e. an object. The fact that is input or output is determined only in relative terms, i.e. in relation to another object (activity). Coal is output of coal production. Coal is input to electricity production. I would instead ask this type of questions: “Who is this 10 kg of coal associated with?” And what I would expect to find out is that it is the output of a coal production activity and the input of a electricity production activity.

 

>>>for sure you could find the answer by checking  "is this the source of inputOf relationships?",

This sounds really nice IMO! I was thinking this was actually how we should find out about things. I also guess that this is a competency question? I would like to better understand why this is not  sufficiently “operational”.

 

>>>but operationally you can ask "what is the type of this? And in my view this would be the correct way to do this because something is either input or output"

I argued above that in the physical reality something is not either input or output in absolute terms. Anyway, we could certainly ask the question “"what is the type of this?” with referent to whether something is classified as input or output. But if we start asking these type of questions for the input vs output classification, then if we are consistent why not asking the same type of question for each other possible classification? For example: if something is a product exchange? or an environmental exchange? For example I could ask “Is CO2 an emission or a product?” But Bo has argued based on the principles above that this is not a relevant question. So why is the question relevant for input and output?

 

  1. In general I am unclear on how much should we adhere to existing LCA frameworks.

 

>>> LCA people all have their own mental model.

On one hand I agree we should keep an open mind and not be constrained by specific mental models. But on the other end, I also understand that we are doing this for the use of “LCA people” too. I thought one of our purposes was to create an infrastructure to support LCAs (e.g. because by making specific queries one can get LCA datasets). If our purpose is to make an ontology that is valid for all models in all disciplines from economics to environmental sciences, then perhaps the terms “input” and “output” are the most generic ones (can apply to anything from a tree to a whole country economy) and this might be sufficient (preferably as predicates, as I argued above). However, In order to use the linked data to create some LCIs, we would need some ways of separating what is A matrix (products) and what is B matrix (substances, costs, or many other things) , and what is reference flow, because this is what LCA people are used to work with. So perhaps we have to allow for the possibility of identifying this LCA-specific information. With the current ontology the “only” information we can obtain from e.g. the graph of steel production is a list of inputs and outputs. So how do I distinguish if steel is the reference flow of steel production instead of CO2?

 

 

Hope this was useful and I am looking forward to a good discussion on Friday
Massimo

 

 


 

Two small things.

1. This discussion of Apache Jena might be interesting for some of you: https://news.ycombinator.com/item?id=19419025

2. Thanks Massimo, I think this is a great format and makes it much easier to follow the train of ideas, especially over multiple days. Let's do this more!