Re: Start of the #ontology sub-group #ontology

Elias Sebastian Azzi

Finally have time for some inputs to your extensive discussions and nice summaries.

ALPHA/ Arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.
I add to the list of arguments:
Products (goods or services), emissions, or waste are terms that embody value judgements.
(i) CO2 is mostly seen as an emission to the atmosphere, but some describe it as a waste of our industrial activities (for which we could also provide treatment, direct air capture, or carbon capture and storage).
(ii) In circular economy, waste becomes a resource; zero-waste people would say there is no waste only resources.
(iii) The boundary between waste (paying for a treatment service) and by-product (getting paid for the material) can vary with markets/supply/demand changes.

So, the physical raw fact is that things go in and out of an activity (i.e. metabolism); and we (or I) think that this is what must be stored in the database (pure accounting).
The value judgment can come as a second layer, when using the database for life cycle impact assessments. Basically, an impact assessment method is a set of value judgments that gives us characterization factors. The Bonsai implementation of GWP100 will take all flow instances of (fossil) CO2 from any activity in the life cycle to the atmosphere and sum them up.

References for the BEP:
Weidema, B. P.; Schmidt, J.; Fantke, P.; Pauliuk, S. On the Boundary between Economy and Environment in Life Cycle Assessment. Int. J. Life Cycle Assess. 2018, 23 (9), 1839–1846; DOI 10.1007/s11367-017-1398-4.

BRAVO/ Use of subclasses for input and output VS Use of predicates isInputOf and isOutputOf
From the previous meeting, I understood that the discussion raised by Massimo depends on which database we are talking about: unlinked or linked database?
I wrote this after our meeting, to clarify my understanding:

Note: the use of "linked" in this section does not refer to the "LinkedData" concept.
The database exist in multiple versions: an unlinked version (raw data) and linked versions (different linking assumptions lead to different versions, e.g. attributional, consequential).
The unlinked version represents the way data are collected in practice, the way data are available. Data are collected for each activity: what are the inputs of activity X and what are the outputs of activity X. The unlinked database does not allow to say where the output of activity X goes. In practice, unless the supply chain is known in detail (note: in LCA, knowing fully the supply chain is nearly impossible), the destinations of outputs and the provenance of inputs are not known.
This information gap is solved in linked versions of the database by using linking assumptions. These linking assumptions can vary in algorithm complexity, have different interpretations and value judgments.

In a linked database, a flow-instance is output of a single activity and is input to a single activity.
In an unlinked database, a flow-instance is either input or output of an activity.
[I feel I could be heavily wrong on that statement; and also feel that predicates are interesting]

CHARLY/ Validation, agency and social LCA in the ontology
Bo and Chris mentioned how external data can be used for validation of the BONSAI database; e.g. with GDP data from the World Bank. This echoes to but also extends to other options for validation (e.g. remote sensing data for land use change; anthropogenic emissions). 
Conceptual difference [do you agree with it?]
>> GDP value by World Bank & GDP value by Bonsai are somehow based on the same raw data (though not easily accessible)
>> Remote sensing data for NOx emissions vs Bonsai data for NOx emissions are not of the same type of raw data; different measurement techniques/reporting frameworks
This said, for the purpose of the hackathon, GDP validation is enough to implement.

The GDP example raised the question: how to include agents in the ontology. This sounds also important for social LCA (Is the social LCA database on our list of data sources?).
Agents are complex. The first terms I think of are: Companies; Governments; Individuals; Employees; Households; Multinationals; Teams.
I have failed (tonight) in finding an existing ontology of agents; but I am sure it exists

With our focus on "activities", the first predicate I can think of is "isPerformedBy" an agent or set of agents. But then, it gets blurry / not easy to generalise.
Activity = "Electricity production" #isPerformedBy Agent = "Coal power plant nb 1234"
Agent = "Coal power plant nb 1234" #isLocatedIn Country = "Germany", #isOwnedBy Entity = "InternationalPowerCompany" (at 60%), #isOwnedBy Entity= "City of Dusseldorf" (at 40%)
Agent = "Coal power plant nb 1234"  #hasWorkers literal = "70"
Company = "InternationalPowerCompany" #hasHighSkilledEmployees literal = "2000"
Company = "InternationalPowerCompany" #hasMediumSkilledEmployees literal = "5000"
- ownership of a plant by several entities
- entities, companies, being multinational, much larger than the plant
- workers not the same as employee; working somewhere; employed by someone? Can be employed by a company but work in several places/plants. 

Simplification: only have population/agent data for "super classes" that aggregate at a sector level or country level? as in Exiobase; issues: how to deal with multinational entities; companies; workers?

DELTA/ How do we make the ontology/database usable for LCA-people if it does not have LCA-specific information in it?
Massimo asked this question.
If not directly included, I am guessing that the ontology/database becomes usable for LCA-people (or other type of people) via some additional layers.
For impact assessments, see reply to ALPHA.
For knowing the reference flow of an activity, I think that this is solved in the linked databases (linked as in BRAVO, not LinkedData); but if you work with the raw unlinked data, you have to make the assumption anyway-


Join to automatically receive all group messages.