Start of the #ontology sub-group #ontology


Massimo Pizzol
 

Updated version of “PEP 0003 ontology” document here, revised after our meeting on Friday and discussions during the week.

I didn’t have much time to do this so I hope to have captured the main issues, but please check it and change/comment if this is not the case.

 

BR
Massimo


Agneta
 

In my understanding the next steps would be (correct me if I am wrong):

1.To explore on interoperability- How to connect unlinked data (something a potential data provider- e.g. LCA practitioner, national statistics, etc) be connected with our existing database.
2. Develop query templates to extract the data we could extract from BONSAI, which could be useful for our analysis (whether it is LCA/ MFA)

Agneta


Massimo Pizzol
 

>>> as we might be going a bit in circles

Thanks. I think we agree on all points. You confirm that we are going for a general solution and that if I want to upgrade to a specific solution (one that looks like the textbook) then it’s a separate step.

Concretely: some info on biosphere, ref flows, etc. does not strictly need to be in the ontology because can either be obtained from external sources or because belongs to a specific mental model only.

 

Massimo

 


 

But how to convert the list in a specific format and for a specific purpose (e.g. organize the activities and flows in B matrix + square A matrix to be used in the g = BA-1f calculation for LCA) remains outside the scope of this ontology. This conversion should be regarded as a separate step. One would have to develop another additional ontology on top of the base one, or using some additional information external to the ontology like a list specifying what are the biosphere flows, or make other additional assumptions and choices like deciding which of two output flows is the determining one and applying one or another method to solve multifunctionality.
We will be able to talk about this soon in person, which should be helpful as we might be going a bit in circles. In any case, here are my contributions to the dance:

1. Identifying potential biosphere flows is relatively easy - they are either only consumed and never produced, or vice versa. We can debate if this is enough, or even fits into our logical sense of how we want to build the model, but if we want to make a solvable linear algebra problem it is enough. Think about the *purpose* of separating the A and B matrices.
2. A square technosphere matrix is an output of one possible system model, but is not required. Some system models will not produce square matrices, and other approaches will use graph traversal and not use matrices at all.
3. Our database is not only used for matrix-based LCA, but also e.g. MFA, so should not be too specific to what you read in an LCA textbook. This has already been said before by multiple people, but is worth repeating :)


Massimo Pizzol
 

Within yesterday’s correspondence on biosphere flows and reference flows Agneta wrote that:

 

>> [for doing LCA…] we can develop a secondary ontology which continues to use BONSAI as the primary ontology and build on top of it.

 

My understanding right now is that a RDF-type database built from raw data linked using our ontology in its current format can be queried to automatically generate a list of the input and output flows of a number of activities, with the main advantage being that although the raw data might be from different sources, the activities and flows in the list will be uniquely and consistently identified.

 

But how to convert the list in a specific format and for a specific purpose (e.g. organize the activities and flows in B matrix + square A matrix to be used in the g = BA-1f calculation for LCA) remains outside the scope of this ontology. This conversion should be regarded as a separate step. One would have to develop another additional ontology on top of the base one, or using some additional information external to the ontology like a list specifying what are the biosphere flows, or make other additional assumptions and choices like deciding which of two output flows is the determining one and applying one or another method to solve multifunctionality.

 

I have been wrong before so…is my understanding correct? I hope this helps in aligning expectations.

 

Massimo


Bo Weidema
 

Den 2019-03-20 kl. 16.14 skrev Stefano Merciai:

Last thing, there are  values that are important when building a database, such as combustion coefficients (emissions produced in the activity act123 when burning fuel123). Are these properties of products?

In the wiki, we have about the observation (datapoint):

Number: Floating point numbers xsd:float. (...) Note also that ecoSpold2 has the option of providing a @mathematicalRelation that defines a mathematical formula which can also contain variables and which will fill the value of the @amount if @isCalculatedAmount is TRUE. This is a very explicit and recommendable option for providing provenance information. In ecoSpold2 the formula are defined by a sub-set of the OpenFormula standard. Other RDF-related formula standards are described on the Wikipedia-page for MathML.

The @mathematicalRelation allows exactly to e.g. express an emission as an input number multiplied by a fixed combustione coefficient.

Re. the distinction between CO2 emissions and CO2 as a product, one option can be to use different names (such as "carbon dioxide, purified" or ""carbon dioxide, under pressure" for the product).


Stefano Merciai
 

I think that a square matrix can just be done by aggregation.

Best,

Stefano


On 20/03/2019 16:33, Massimo Pizzol wrote:

>>>2. We don't need the concept of a reference flow to make a technosphere matrix,

 

 

But we need that to make a square matrix. The question is whether it needs to be specified in the ontology or can be done externally like in the case of the biosphere.

Massimo

 


-- 
Best,
S.


Massimo Pizzol
 

>>>2. We don't need the concept of a reference flow to make a technosphere matrix,

 

 

But we need that to make a square matrix. The question is whether it needs to be specified in the ontology or can be done externally like in the case of the biosphere.

Massimo

 


Agneta
 

Hi Elias (and others)

Just a few comments on some of the points you have mentioned:

>>Use of subclasses for input and output VS Use of predicates isInputOf and isOutputOf
 I am aware that this question has caused us much confusion and I have some clarifications on this- A class or a sub-class contain a unique set of Unique Resource Identifier (URIs). All components of an RDF triple (Subject.- Predicate- Object) have URIs. Having subclasses helps users of rdf data to develop better queries (to search the data for what you are looking for).
In other words as Massimo said- “get the answer we want by making the right question”. Ofcourse all the flows (irrespective of input or output) could be bundled under the class Flow but it makes our queries a bit more tedious.

>>How do we make the ontology/database usable for LCA-people if it does not have LCA-specific information in it? 
Good question and this is one of my main concern. Although it is clear to us and everyone on board here that BONSAI aims to develop a data merging platform for all areas of industrial ecology (LCA, MFA, IO, etc.). This is the reason to keeping our primary BONSAI ontology minimal. 

Now lets say as an LCA research group we are interested in structuring our data in a traditional way (e.g. product, by-product, emission)/ (Impact methods and characterization factors); we can develop a secondary ontology which continues to use BONSAI as the primary ontology and build on top of it. Eg. all my segregations of(product, by-product, emission) can be a sub-class of Bonsai: Flow object. 

But we dont want to do this now as adding complexity to the ontology will be a barrier to its uptake among different IE groups.

 

Thanks again for your comments we will bring some discussion on these issues on our meeting this Friday.
Regards

Agneta


Stefano Merciai
 

Hi,

Thank you for the nice exchange of ideas. I just want to add some little inputs.

How do we distinguish between CO2 emitted from the chimney and the CO2 used for soft drinks?

Then I think that the value of "waste/residues" is determined by more factors. Homogeneity of materials for example. The same material, if mixed to other waste flows, may have a lower value (even negative) because a service of waste separation is needed. So the final price of the waste flow may be the value of the material, which can be somehow fixed, less the cost of the separation. This to say that there could be properties, such as " sorted" and “unsorted”, that could indicate if it is a waste flow or not.

As for the reference flow, I think that the classification of activity gives an idea of the reference flow. A coal mining activity will have coal as reference flow (or perhaps it is the other way around, if coal is the output of an activity, then that activity is a coal mining). If we intend to insert economic values, such as prices, the determining products could be that resulting into more revenues for the activity. However, by doing that, the classification of the activity may change. I think Elias mentioned the CHP plants, where both heat and electricity may be determing flows depending on the period of the year/day.

Last thing, there are  values that are important when building a database, such as combustion coefficients (emissions produced in the activity act123 when burning fuel123). Are these properties of products?

Best,

Stafano




On 20/03/2019 15:43, Massimo Pizzol wrote:
Ok thanks so this is the solution I was asking for. We can separate between technosphere and rest by using an external list of names.  And yes you are right about the matrix operation it will work even if the order of columns and rows is not the same. So we neither need the ref flow predicate nor any product subclass in the ontology.
Massimo
On 20 Mar 2019, at 15.29, Chris Mutel via Groups.Io <cmutel@...> wrote:

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@...> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.






-- 
Best,
S.


Massimo Pizzol
 

Ok thanks so this is the solution I was asking for. We can separate between technosphere and rest by using an external list of names. And yes you are right about the matrix operation it will work even if the order of columns and rows is not the same. So we neither need the ref flow predicate nor any product subclass in the ontology.
Massimo

On 20 Mar 2019, at 15.29, Chris Mutel via Groups.Io <cmutel=gmail.com@groups.io> wrote:

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@plan.aau.dk> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.



 

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@plan.aau.dk> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.


Massimo Pizzol
 

Thanks Elias for the interesting reflections. I believe all your points are related. My impression is that we are converging towards an ontology that is operational with a minimal number of elements and can potentially be expanded with additional layers for specific uses (e.g. LCA).

 

  • the input and output subclasses allow us to work with raw (unlinked) data.

After my short chat with Matteo, I understand that even if redundant these subclasses are a more elegant way (semantically speaking) to structure our ontology because they allow us to “get the answer we want by making the right question”. We can get the same answer indirectly but this approach is less elegant (and since I am Italian, for me elegance is everything…). So it might be actually advantageous to keep them.

 

  • “product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.

 

Looking forward to the meeting on Friday.

BR
Massimo

p.s. +1 for “ALPHA”, ”BRAVO”, “CHARLY”. I am having some good laughs thinking about “Hot shots!” right now

 

_._,_._,_


Elias Sebastian Azzi
 
Edited

Hello,
Finally have time for some inputs to your extensive discussions and nice summaries.

ALPHA/ Arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.
I add to the list of arguments:
Products (goods or services), emissions, or waste are terms that embody value judgements.
Examples:
(i) CO2 is mostly seen as an emission to the atmosphere, but some describe it as a waste of our industrial activities (for which we could also provide treatment, direct air capture, or carbon capture and storage).
(ii) In circular economy, waste becomes a resource; zero-waste people would say there is no waste only resources.
(iii) The boundary between waste (paying for a treatment service) and by-product (getting paid for the material) can vary with markets/supply/demand changes.

So, the physical raw fact is that things go in and out of an activity (i.e. metabolism); and we (or I) think that this is what must be stored in the database (pure accounting).
The value judgment can come as a second layer, when using the database for life cycle impact assessments. Basically, an impact assessment method is a set of value judgments that gives us characterization factors. The Bonsai implementation of GWP100 will take all flow instances of (fossil) CO2 from any activity in the life cycle to the atmosphere and sum them up.

References for the BEP:
Weidema, B. P.; Schmidt, J.; Fantke, P.; Pauliuk, S. On the Boundary between Economy and Environment in Life Cycle Assessment. Int. J. Life Cycle Assess. 2018, 23 (9), 1839–1846; DOI 10.1007/s11367-017-1398-4.

BRAVO/ Use of subclasses for input and output VS Use of predicates isInputOf and isOutputOf
From the previous meeting, I understood that the discussion raised by Massimo depends on which database we are talking about: unlinked or linked database?
I wrote this after our meeting, to clarify my understanding:

Note: the use of "linked" in this section does not refer to the "LinkedData" concept.
The database exist in multiple versions: an unlinked version (raw data) and linked versions (different linking assumptions lead to different versions, e.g. attributional, consequential).
The unlinked version represents the way data are collected in practice, the way data are available. Data are collected for each activity: what are the inputs of activity X and what are the outputs of activity X. The unlinked database does not allow to say where the output of activity X goes. In practice, unless the supply chain is known in detail (note: in LCA, knowing fully the supply chain is nearly impossible), the destinations of outputs and the provenance of inputs are not known.
This information gap is solved in linked versions of the database by using linking assumptions. These linking assumptions can vary in algorithm complexity, have different interpretations and value judgments.

In a linked database, a flow-instance is output of a single activity and is input to a single activity.
In an unlinked database, a flow-instance is either input or output of an activity.
[I feel I could be heavily wrong on that statement; and also feel that predicates are interesting]

CHARLY/ Validation, agency and social LCA in the ontology
Bo and Chris mentioned how external data can be used for validation of the BONSAI database; e.g. with GDP data from the World Bank. This echoes to https://chris.mutel.org/next-steps.html#id1 but also extends to other options for validation (e.g. remote sensing data for land use change; anthropogenic emissions). 
Conceptual difference [do you agree with it?]
>> GDP value by World Bank & GDP value by Bonsai are somehow based on the same raw data (though not easily accessible)
>> Remote sensing data for NOx emissions vs Bonsai data for NOx emissions are not of the same type of raw data; different measurement techniques/reporting frameworks
This said, for the purpose of the hackathon, GDP validation is enough to implement.

The GDP example raised the question: how to include agents in the ontology. This sounds also important for social LCA (Is the social LCA database on our list of data sources?).
Agents are complex. The first terms I think of are: Companies; Governments; Individuals; Employees; Households; Multinationals; Teams.
I have failed (tonight) in finding an existing ontology of agents; but I am sure it exists

With our focus on "activities", the first predicate I can think of is "isPerformedBy" an agent or set of agents. But then, it gets blurry / not easy to generalise.
Example:
Activity = "Electricity production" #isPerformedBy Agent = "Coal power plant nb 1234"
Agent = "Coal power plant nb 1234" #isLocatedIn Country = "Germany", #isOwnedBy Entity = "InternationalPowerCompany" (at 60%), #isOwnedBy Entity= "City of Dusseldorf" (at 40%)
Agent = "Coal power plant nb 1234"  #hasWorkers literal = "70"
Company = "InternationalPowerCompany" #hasHighSkilledEmployees literal = "2000"
Company = "InternationalPowerCompany" #hasMediumSkilledEmployees literal = "5000"
...
Issues:
- ownership of a plant by several entities
- entities, companies, being multinational, much larger than the plant
- workers not the same as employee; working somewhere; employed by someone? Can be employed by a company but work in several places/plants. 

Simplification: only have population/agent data for "super classes" that aggregate at a sector level or country level? as in Exiobase; issues: how to deal with multinational entities; companies; workers?

DELTA/ How do we make the ontology/database usable for LCA-people if it does not have LCA-specific information in it?
Massimo asked this question.
If not directly included, I am guessing that the ontology/database becomes usable for LCA-people (or other type of people) via some additional layers.
For impact assessments, see reply to ALPHA.
For knowing the reference flow of an activity, I think that this is solved in the linked databases (linked as in BRAVO, not LinkedData); but if you work with the raw unlinked data, you have to make the assumption anyway-

Besides, my (very) long-term vision with Bonsai is to advance further in merging Industrial ecology methods: LCA, MFA, and even IOA, IAM, all forms of socioeconomic metabolism analysis. Including bridges with dynamic system modelling and complex system modelling.

 


Massimo Pizzol
 

Thanks Brandon

 

>>> a flow is not an input or an output- it has to be both.

I completely agree, and this is what I was trying to write as well. In my understanding a “flow” object is not an input or output in absolute terms but only in relative terms, i.e. in relation to another “activity” object. Therefore, using the predicates “IsInputof” and “IsOutputof” seems to me an appropriate and sufficient way to express this relationship while I don’t think we should use of the “Input” and “output” subclasses for the reasons previously outlined (not fully correct, redundant, inconsistent).

 

BR
Massimo

 


Brandon Kuczenski
 

Massimo,
Let me weigh in on the input / output question. In my view, a flow is not an input or an output- it has to be both. It has to be an output from the process that created it and an input from the process that consumed it. The flow is the same in both cases; therefore it is an error to call it one or the other.

I haven't seen the term 'exchange' used very much but in my view, a flow is simply a product/substance/material/service and a quantity of measurement (say, 'mass'). (this has to be fixed in order for the use of many different databases to be stable). I think of an exchange as a 4-tuple: an activity that defines the exchange (which I call the parent), a flow that is being exchanged, a direction with respect to the parent, and a termination, which is the other activity (or compartment or stock or market) that is the partner to the exchange. If the termination is null, then it's a cutoff flow- auditing these flows is part of reviewing a model.

This view is pretty consistent with your discussion about "Who is this 10 kg of coal associated with?"

A characteristic of this definition is that it is non-numeric, i.e. there is no quantitative information- only adjacency. This helps to define the model without getting hung up on what the exchange value or uncertainty is. Obviously there could be uncertainty in the termination - from where / what supplier / what time of day / etc? but that is not quantitative uncertainty.

When the parent activity is invoked as part of a query, it would be "responsible" for "figuring out" the exchange value given the query it is answering, and the termination could / would have to be figured out by the software that is doing the query. But it's the exchange that is directional, not the flow.

I will try to make the call on Friday but I'm not sure what time it is.

-Brandon


--
Brandon Kuczenski, Ph.D.
Associate Researcher

University of California at Santa Barbara
Institute for Social, Behavioral, and Economic Research
Santa Barbara, CA 93106-5131

email: bkuczenski@...


 

Two small things.

1. This discussion of Apache Jena might be interesting for some of you: https://news.ycombinator.com/item?id=19419025

2. Thanks Massimo, I think this is a great format and makes it much easier to follow the train of ideas, especially over multiple days. Let's do this more!


Massimo Pizzol
 

Dear Ontology/RDF group

 

We have a meeting Friday and I would like to share some points for discussion.

 

I am thinking a lot about our ontology and there are two pressing issues that I hope we can clarify.

 

  1. The use of “input” and “output” subclasses

 

Bo has suggested this below as arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.

 

>>> Principle: We try to avoid making fixed choices, like sign nomenclatures, that are only useful in specific contexts.

>>> Principle: It is a good practice for a model to stay as close to reality as possible

>>> Principle: Do not introduce unneccesary (obligatory) classifications

 

I agree with these principles and I think it makes sense not to have the subclasses product/waste/etc. My problem is that I don’t see how the choice of using the “input” and “output” subclasses fits with these principles. It is a sign convention, useful in specific context, and it is an obligatory classification. I don’t know if classifying things in “input” and “output” is close to reality more than classifying them in “products”, “emission”, “waste”. Thus, my preference is to remove the “input” and “output” subclasses and keep only the “isInputof” and “isOutputof” predicates.

 

So far the arguments for using the input and output subclasses have been:

 

>>> at the ontology level to restrict the domain of the input and output relationships.

I am not totally clear on what this means. My concern is whether the use of subclasses unnecessarily increases the complexity of the model because – assuming I have understood thing correctly - there would be two instances of e.g. a “coal” flow. One is the “coal input flow” and the other is the “coal output flow” each of them with a different URI. So if you are looking at the instance “electricity production” you will find it is related to a specific URI for coal input,  and if you are looking at the instance of “coal production” you will find it’s related to another different URI for coal output. So the same thing (coal) in the physical reality is now described by two different codes.

>>>Assume you are looking at a specific instance of a 10tonnes of coal in your database, then you ask yourself “is this an input for something or an output of something?”

My view is that in the physical reality 10 kg coal is not the output or the input of something in absolute terms. It is just coal, i.e. an object. The fact that is input or output is determined only in relative terms, i.e. in relation to another object (activity). Coal is output of coal production. Coal is input to electricity production. I would instead ask this type of questions: “Who is this 10 kg of coal associated with?” And what I would expect to find out is that it is the output of a coal production activity and the input of a electricity production activity.

 

>>>for sure you could find the answer by checking  "is this the source of inputOf relationships?",

This sounds really nice IMO! I was thinking this was actually how we should find out about things. I also guess that this is a competency question? I would like to better understand why this is not  sufficiently “operational”.

 

>>>but operationally you can ask "what is the type of this? And in my view this would be the correct way to do this because something is either input or output"

I argued above that in the physical reality something is not either input or output in absolute terms. Anyway, we could certainly ask the question “"what is the type of this?” with referent to whether something is classified as input or output. But if we start asking these type of questions for the input vs output classification, then if we are consistent why not asking the same type of question for each other possible classification? For example: if something is a product exchange? or an environmental exchange? For example I could ask “Is CO2 an emission or a product?” But Bo has argued based on the principles above that this is not a relevant question. So why is the question relevant for input and output?

 

  1. In general I am unclear on how much should we adhere to existing LCA frameworks.

 

>>> LCA people all have their own mental model.

On one hand I agree we should keep an open mind and not be constrained by specific mental models. But on the other end, I also understand that we are doing this for the use of “LCA people” too. I thought one of our purposes was to create an infrastructure to support LCAs (e.g. because by making specific queries one can get LCA datasets). If our purpose is to make an ontology that is valid for all models in all disciplines from economics to environmental sciences, then perhaps the terms “input” and “output” are the most generic ones (can apply to anything from a tree to a whole country economy) and this might be sufficient (preferably as predicates, as I argued above). However, In order to use the linked data to create some LCIs, we would need some ways of separating what is A matrix (products) and what is B matrix (substances, costs, or many other things) , and what is reference flow, because this is what LCA people are used to work with. So perhaps we have to allow for the possibility of identifying this LCA-specific information. With the current ontology the “only” information we can obtain from e.g. the graph of steel production is a list of inputs and outputs. So how do I distinguish if steel is the reference flow of steel production instead of CO2?

 

 

Hope this was useful and I am looking forward to a good discussion on Friday
Massimo

 

 


Massimo Pizzol
 

>>>I have started drafting a “PEP 0003 ontology” document t

I moved it here which seems a more appropriate location, sorry for late notice.

Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Massimo Pizzol via Groups.Io" <massimo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Tuesday, 12 March 2019 at 13.48
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

>>> I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion. In this working group we are discussing the schema proposed by Matteo. Points of discussion: […]

 

I have started drafting a “PEP 0003 ontology” document to have an idea of how it should look like and it’s available here.

I mailed with Chris quickly and I understood that what we are supposed to do is:

1. first to clarify the points of discussion in my previous mail (+ others of course) and

2. only after we have reached a consensus (or non-consensus) update the document and include it to the bonsai repository (via pull request).

 

BR
Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Massimo Pizzol via Groups.Io" <massimo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Tuesday, 12 March 2019 at 12.24
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] Start of the #ontology sub-group #RDFFramework #ontology

 

Sorry if sounded smart or know-all I just wanted to explain to Matteo some of the issues in layman terms. I apologized for my mistake on the negative input straight away so I assume we can make peace now.

 

>>> The final proposal should include not just what you have found consensus on, but also the alternatives you have considered, and why they were not chosen

I guess by final proposal you are suggesting the use of BEP for this working group, do I understand right? I’ll try to make a summary of the discussion.

In this working group we are discussing the schema proposed by Matteo. Points of discussion:

 

  • The use and necessity of using the “input” and “output” subclasses has been discussed. Contra: seems redundant when there is already a property. Pro: useful for filtering activities later on. Decision needed.
  • The use and necessity of a reference flow.
    • Not clear if it should be a class, subclass, or property. Pro/contra missing.
    • Not clear if it should always be output or could be input. à Clarified: mathematically it can be both but the convention choice has implications (e.g. IO people like it output and positive). Problem: According to Matteo having ref flow both input and output is problematic in the schema. Reason still not clear though.
    • Leave it out. Contra:  information loss when importing from / exporting to LCA/IO format; without it we can’t determine causality. Pro: makes the model less complex.
  • Environmental exchanges and waste flows missing in the schema, how to include them:
    • I suggested that we could either 1) create a class “Substance” (or other meaningful name) similar to class “Product” or  2) remove class “Product” and just keep class “Flow” that would be valid for both environmental and product exchanges.
    • Bo argued that “Wastes, by-products and emissions do not need to be distinguished.” How does this translate in practice in the schema, is not clear yet. Perhaps as in the point 2. above?
    • Other solutions?

 

BR
Massimo


Brandon Kuczenski
 

Massimo: it seems your BEP is not publicly viewable. (i get 404: https://github.com/massimopizzol/enhancements/blob/master/beps/0003-bep-ontology.md )
I will save my not-so-humble-opinions for the BEP.
-Brandon