Topics

Start of the #ontology sub-group #ontology


Brandon Kuczenski
 

Massimo,
Let me weigh in on the input / output question. In my view, a flow is not an input or an output- it has to be both. It has to be an output from the process that created it and an input from the process that consumed it. The flow is the same in both cases; therefore it is an error to call it one or the other.

I haven't seen the term 'exchange' used very much but in my view, a flow is simply a product/substance/material/service and a quantity of measurement (say, 'mass'). (this has to be fixed in order for the use of many different databases to be stable). I think of an exchange as a 4-tuple: an activity that defines the exchange (which I call the parent), a flow that is being exchanged, a direction with respect to the parent, and a termination, which is the other activity (or compartment or stock or market) that is the partner to the exchange. If the termination is null, then it's a cutoff flow- auditing these flows is part of reviewing a model.

This view is pretty consistent with your discussion about "Who is this 10 kg of coal associated with?"

A characteristic of this definition is that it is non-numeric, i.e. there is no quantitative information- only adjacency. This helps to define the model without getting hung up on what the exchange value or uncertainty is. Obviously there could be uncertainty in the termination - from where / what supplier / what time of day / etc? but that is not quantitative uncertainty.

When the parent activity is invoked as part of a query, it would be "responsible" for "figuring out" the exchange value given the query it is answering, and the termination could / would have to be figured out by the software that is doing the query. But it's the exchange that is directional, not the flow.

I will try to make the call on Friday but I'm not sure what time it is.

-Brandon


--
Brandon Kuczenski, Ph.D.
Associate Researcher

University of California at Santa Barbara
Institute for Social, Behavioral, and Economic Research
Santa Barbara, CA 93106-5131

email: bkuczenski@...


Massimo Pizzol
 

Thanks Brandon

 

>>> a flow is not an input or an output- it has to be both.

I completely agree, and this is what I was trying to write as well. In my understanding a “flow” object is not an input or output in absolute terms but only in relative terms, i.e. in relation to another “activity” object. Therefore, using the predicates “IsInputof” and “IsOutputof” seems to me an appropriate and sufficient way to express this relationship while I don’t think we should use of the “Input” and “output” subclasses for the reasons previously outlined (not fully correct, redundant, inconsistent).

 

BR
Massimo

 


Elias Sebastian Azzi
 
Edited

Hello,
Finally have time for some inputs to your extensive discussions and nice summaries.

ALPHA/ Arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.
I add to the list of arguments:
Products (goods or services), emissions, or waste are terms that embody value judgements.
Examples:
(i) CO2 is mostly seen as an emission to the atmosphere, but some describe it as a waste of our industrial activities (for which we could also provide treatment, direct air capture, or carbon capture and storage).
(ii) In circular economy, waste becomes a resource; zero-waste people would say there is no waste only resources.
(iii) The boundary between waste (paying for a treatment service) and by-product (getting paid for the material) can vary with markets/supply/demand changes.

So, the physical raw fact is that things go in and out of an activity (i.e. metabolism); and we (or I) think that this is what must be stored in the database (pure accounting).
The value judgment can come as a second layer, when using the database for life cycle impact assessments. Basically, an impact assessment method is a set of value judgments that gives us characterization factors. The Bonsai implementation of GWP100 will take all flow instances of (fossil) CO2 from any activity in the life cycle to the atmosphere and sum them up.

References for the BEP:
Weidema, B. P.; Schmidt, J.; Fantke, P.; Pauliuk, S. On the Boundary between Economy and Environment in Life Cycle Assessment. Int. J. Life Cycle Assess. 2018, 23 (9), 1839–1846; DOI 10.1007/s11367-017-1398-4.

BRAVO/ Use of subclasses for input and output VS Use of predicates isInputOf and isOutputOf
From the previous meeting, I understood that the discussion raised by Massimo depends on which database we are talking about: unlinked or linked database?
I wrote this after our meeting, to clarify my understanding:

Note: the use of "linked" in this section does not refer to the "LinkedData" concept.
The database exist in multiple versions: an unlinked version (raw data) and linked versions (different linking assumptions lead to different versions, e.g. attributional, consequential).
The unlinked version represents the way data are collected in practice, the way data are available. Data are collected for each activity: what are the inputs of activity X and what are the outputs of activity X. The unlinked database does not allow to say where the output of activity X goes. In practice, unless the supply chain is known in detail (note: in LCA, knowing fully the supply chain is nearly impossible), the destinations of outputs and the provenance of inputs are not known.
This information gap is solved in linked versions of the database by using linking assumptions. These linking assumptions can vary in algorithm complexity, have different interpretations and value judgments.

In a linked database, a flow-instance is output of a single activity and is input to a single activity.
In an unlinked database, a flow-instance is either input or output of an activity.
[I feel I could be heavily wrong on that statement; and also feel that predicates are interesting]

CHARLY/ Validation, agency and social LCA in the ontology
Bo and Chris mentioned how external data can be used for validation of the BONSAI database; e.g. with GDP data from the World Bank. This echoes to https://chris.mutel.org/next-steps.html#id1 but also extends to other options for validation (e.g. remote sensing data for land use change; anthropogenic emissions). 
Conceptual difference [do you agree with it?]
>> GDP value by World Bank & GDP value by Bonsai are somehow based on the same raw data (though not easily accessible)
>> Remote sensing data for NOx emissions vs Bonsai data for NOx emissions are not of the same type of raw data; different measurement techniques/reporting frameworks
This said, for the purpose of the hackathon, GDP validation is enough to implement.

The GDP example raised the question: how to include agents in the ontology. This sounds also important for social LCA (Is the social LCA database on our list of data sources?).
Agents are complex. The first terms I think of are: Companies; Governments; Individuals; Employees; Households; Multinationals; Teams.
I have failed (tonight) in finding an existing ontology of agents; but I am sure it exists

With our focus on "activities", the first predicate I can think of is "isPerformedBy" an agent or set of agents. But then, it gets blurry / not easy to generalise.
Example:
Activity = "Electricity production" #isPerformedBy Agent = "Coal power plant nb 1234"
Agent = "Coal power plant nb 1234" #isLocatedIn Country = "Germany", #isOwnedBy Entity = "InternationalPowerCompany" (at 60%), #isOwnedBy Entity= "City of Dusseldorf" (at 40%)
Agent = "Coal power plant nb 1234"  #hasWorkers literal = "70"
Company = "InternationalPowerCompany" #hasHighSkilledEmployees literal = "2000"
Company = "InternationalPowerCompany" #hasMediumSkilledEmployees literal = "5000"
...
Issues:
- ownership of a plant by several entities
- entities, companies, being multinational, much larger than the plant
- workers not the same as employee; working somewhere; employed by someone? Can be employed by a company but work in several places/plants. 

Simplification: only have population/agent data for "super classes" that aggregate at a sector level or country level? as in Exiobase; issues: how to deal with multinational entities; companies; workers?

DELTA/ How do we make the ontology/database usable for LCA-people if it does not have LCA-specific information in it?
Massimo asked this question.
If not directly included, I am guessing that the ontology/database becomes usable for LCA-people (or other type of people) via some additional layers.
For impact assessments, see reply to ALPHA.
For knowing the reference flow of an activity, I think that this is solved in the linked databases (linked as in BRAVO, not LinkedData); but if you work with the raw unlinked data, you have to make the assumption anyway-

Besides, my (very) long-term vision with Bonsai is to advance further in merging Industrial ecology methods: LCA, MFA, and even IOA, IAM, all forms of socioeconomic metabolism analysis. Including bridges with dynamic system modelling and complex system modelling.

 


Massimo Pizzol
 

Thanks Elias for the interesting reflections. I believe all your points are related. My impression is that we are converging towards an ontology that is operational with a minimal number of elements and can potentially be expanded with additional layers for specific uses (e.g. LCA).

 

  • the input and output subclasses allow us to work with raw (unlinked) data.

After my short chat with Matteo, I understand that even if redundant these subclasses are a more elegant way (semantically speaking) to structure our ontology because they allow us to “get the answer we want by making the right question”. We can get the same answer indirectly but this approach is less elegant (and since I am Italian, for me elegance is everything…). So it might be actually advantageous to keep them.

 

  • “product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.

 

Looking forward to the meeting on Friday.

BR
Massimo

p.s. +1 for “ALPHA”, ”BRAVO”, “CHARLY”. I am having some good laughs thinking about “Hot shots!” right now

 

_._,_._,_


 

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@plan.aau.dk> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.


Massimo Pizzol
 

Ok thanks so this is the solution I was asking for. We can separate between technosphere and rest by using an external list of names. And yes you are right about the matrix operation it will work even if the order of columns and rows is not the same. So we neither need the ref flow predicate nor any product subclass in the ontology.
Massimo

On 20 Mar 2019, at 15.29, Chris Mutel via Groups.Io <cmutel=gmail.com@groups.io> wrote:

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@plan.aau.dk> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.



Stefano Merciai
 

Hi,

Thank you for the nice exchange of ideas. I just want to add some little inputs.

How do we distinguish between CO2 emitted from the chimney and the CO2 used for soft drinks?

Then I think that the value of "waste/residues" is determined by more factors. Homogeneity of materials for example. The same material, if mixed to other waste flows, may have a lower value (even negative) because a service of waste separation is needed. So the final price of the waste flow may be the value of the material, which can be somehow fixed, less the cost of the separation. This to say that there could be properties, such as " sorted" and “unsorted”, that could indicate if it is a waste flow or not.

As for the reference flow, I think that the classification of activity gives an idea of the reference flow. A coal mining activity will have coal as reference flow (or perhaps it is the other way around, if coal is the output of an activity, then that activity is a coal mining). If we intend to insert economic values, such as prices, the determining products could be that resulting into more revenues for the activity. However, by doing that, the classification of the activity may change. I think Elias mentioned the CHP plants, where both heat and electricity may be determing flows depending on the period of the year/day.

Last thing, there are  values that are important when building a database, such as combustion coefficients (emissions produced in the activity act123 when burning fuel123). Are these properties of products?

Best,

Stafano




On 20/03/2019 15:43, Massimo Pizzol wrote:
Ok thanks so this is the solution I was asking for. We can separate between technosphere and rest by using an external list of names.  And yes you are right about the matrix operation it will work even if the order of columns and rows is not the same. So we neither need the ref flow predicate nor any product subclass in the ontology.
Massimo
On 20 Mar 2019, at 15.29, Chris Mutel via Groups.Io <cmutel@...> wrote:

On Wed, 20 Mar 2019 at 12:30, Massimo Pizzol <massimo@...> wrote:
“product”, “emission”, etc. are subjective.

Agree, and formalizing them limits our flexibility. But indeed some of those might be useful to work in LCA context. I think that the only two pieces of information we actually need for doing LCA are: if a flow belongs to the technosphere (all the rest is B matrix) and if a flow is a reference flow (diagonal of tech matrix). Right now I can’t think of any automatic way of determining this information from a raw list of inputs and outputs. So we have to include this info in the ontology because we can’t use an algorithm or write a code to figure this out. But perhaps I am wrong and somebody in the group has a solution for this and then we can skip these classifications altogether, that would be perfect. I also recognize that this means introducing some subjective elements in the model, because who decides what is technosphere? But as I wrote before if we want to use the liked data for LCA we have to accept that there is an LCA framework.
This is a great comment, and is to me a perfect example of how
people's experience leads them to accept restraints without even
realizing it.

1. Mathematically, we don't need to distinguish between technosphere
and biosphere, this can be one big matrix. In practical terms, our
biosphere will be a different set of names; or, they will be flows for
which there is no associated producing activity.

2. We don't need the concept of a reference flow to make a
technosphere matrix, and there isn't anything special about positive
numbers of the diagonal. Production amounts can be randomly ordered,
and in any case everything produced is positive, everything consumed
is negative, regardless of whether it is a reference product,
co-product, or whatever. The notion of reference product is helpful
for humans trying to understand the reason a particular dataset was
modelled, but irrelevant for the computer doing the math.






-- 
Best,
S.


Agneta
 

Hi Elias (and others)

Just a few comments on some of the points you have mentioned:

>>Use of subclasses for input and output VS Use of predicates isInputOf and isOutputOf
 I am aware that this question has caused us much confusion and I have some clarifications on this- A class or a sub-class contain a unique set of Unique Resource Identifier (URIs). All components of an RDF triple (Subject.- Predicate- Object) have URIs. Having subclasses helps users of rdf data to develop better queries (to search the data for what you are looking for).
In other words as Massimo said- “get the answer we want by making the right question”. Ofcourse all the flows (irrespective of input or output) could be bundled under the class Flow but it makes our queries a bit more tedious.

>>How do we make the ontology/database usable for LCA-people if it does not have LCA-specific information in it? 
Good question and this is one of my main concern. Although it is clear to us and everyone on board here that BONSAI aims to develop a data merging platform for all areas of industrial ecology (LCA, MFA, IO, etc.). This is the reason to keeping our primary BONSAI ontology minimal. 

Now lets say as an LCA research group we are interested in structuring our data in a traditional way (e.g. product, by-product, emission)/ (Impact methods and characterization factors); we can develop a secondary ontology which continues to use BONSAI as the primary ontology and build on top of it. Eg. all my segregations of(product, by-product, emission) can be a sub-class of Bonsai: Flow object. 

But we dont want to do this now as adding complexity to the ontology will be a barrier to its uptake among different IE groups.

 

Thanks again for your comments we will bring some discussion on these issues on our meeting this Friday.
Regards

Agneta


Massimo Pizzol
 

>>>2. We don't need the concept of a reference flow to make a technosphere matrix,

 

 

But we need that to make a square matrix. The question is whether it needs to be specified in the ontology or can be done externally like in the case of the biosphere.

Massimo

 


Stefano Merciai
 

I think that a square matrix can just be done by aggregation.

Best,

Stefano


On 20/03/2019 16:33, Massimo Pizzol wrote:

>>>2. We don't need the concept of a reference flow to make a technosphere matrix,

 

 

But we need that to make a square matrix. The question is whether it needs to be specified in the ontology or can be done externally like in the case of the biosphere.

Massimo

 


-- 
Best,
S.


Bo Weidema
 

Den 2019-03-20 kl. 16.14 skrev Stefano Merciai:

Last thing, there are  values that are important when building a database, such as combustion coefficients (emissions produced in the activity act123 when burning fuel123). Are these properties of products?

In the wiki, we have about the observation (datapoint):

Number: Floating point numbers xsd:float. (...) Note also that ecoSpold2 has the option of providing a @mathematicalRelation that defines a mathematical formula which can also contain variables and which will fill the value of the @amount if @isCalculatedAmount is TRUE. This is a very explicit and recommendable option for providing provenance information. In ecoSpold2 the formula are defined by a sub-set of the OpenFormula standard. Other RDF-related formula standards are described on the Wikipedia-page for MathML.

The @mathematicalRelation allows exactly to e.g. express an emission as an input number multiplied by a fixed combustione coefficient.

Re. the distinction between CO2 emissions and CO2 as a product, one option can be to use different names (such as "carbon dioxide, purified" or ""carbon dioxide, under pressure" for the product).


Massimo Pizzol
 

Within yesterday’s correspondence on biosphere flows and reference flows Agneta wrote that:

 

>> [for doing LCA…] we can develop a secondary ontology which continues to use BONSAI as the primary ontology and build on top of it.

 

My understanding right now is that a RDF-type database built from raw data linked using our ontology in its current format can be queried to automatically generate a list of the input and output flows of a number of activities, with the main advantage being that although the raw data might be from different sources, the activities and flows in the list will be uniquely and consistently identified.

 

But how to convert the list in a specific format and for a specific purpose (e.g. organize the activities and flows in B matrix + square A matrix to be used in the g = BA-1f calculation for LCA) remains outside the scope of this ontology. This conversion should be regarded as a separate step. One would have to develop another additional ontology on top of the base one, or using some additional information external to the ontology like a list specifying what are the biosphere flows, or make other additional assumptions and choices like deciding which of two output flows is the determining one and applying one or another method to solve multifunctionality.

 

I have been wrong before so…is my understanding correct? I hope this helps in aligning expectations.

 

Massimo


 

But how to convert the list in a specific format and for a specific purpose (e.g. organize the activities and flows in B matrix + square A matrix to be used in the g = BA-1f calculation for LCA) remains outside the scope of this ontology. This conversion should be regarded as a separate step. One would have to develop another additional ontology on top of the base one, or using some additional information external to the ontology like a list specifying what are the biosphere flows, or make other additional assumptions and choices like deciding which of two output flows is the determining one and applying one or another method to solve multifunctionality.
We will be able to talk about this soon in person, which should be helpful as we might be going a bit in circles. In any case, here are my contributions to the dance:

1. Identifying potential biosphere flows is relatively easy - they are either only consumed and never produced, or vice versa. We can debate if this is enough, or even fits into our logical sense of how we want to build the model, but if we want to make a solvable linear algebra problem it is enough. Think about the *purpose* of separating the A and B matrices.
2. A square technosphere matrix is an output of one possible system model, but is not required. Some system models will not produce square matrices, and other approaches will use graph traversal and not use matrices at all.
3. Our database is not only used for matrix-based LCA, but also e.g. MFA, so should not be too specific to what you read in an LCA textbook. This has already been said before by multiple people, but is worth repeating :)


Massimo Pizzol
 

>>> as we might be going a bit in circles

Thanks. I think we agree on all points. You confirm that we are going for a general solution and that if I want to upgrade to a specific solution (one that looks like the textbook) then it’s a separate step.

Concretely: some info on biosphere, ref flows, etc. does not strictly need to be in the ontology because can either be obtained from external sources or because belongs to a specific mental model only.

 

Massimo

 


Agneta
 

In my understanding the next steps would be (correct me if I am wrong):

1.To explore on interoperability- How to connect unlinked data (something a potential data provider- e.g. LCA practitioner, national statistics, etc) be connected with our existing database.
2. Develop query templates to extract the data we could extract from BONSAI, which could be useful for our analysis (whether it is LCA/ MFA)

Agneta


Massimo Pizzol
 

Updated version of “PEP 0003 ontology” document here, revised after our meeting on Friday and discussions during the week.

I didn’t have much time to do this so I hope to have captured the main issues, but please check it and change/comment if this is not the case.

 

BR
Massimo