Date   

Re: Competency Questions #ontology #rdf

 

On Tue, 19 Mar 2019 at 09:39, Bo Weidema <bo.weidema@...> wrote:

Den 2019-03-18 kl. 23.35 skrev Matteo Lissandrini (AAU):

the concept of macroeconomic scenario which is not present in the data
I've seen.
Please see the BONSAI glossary:
https://github.com/BONSAMURAIS/bonsai/wiki/Glossary

In our last conversation it appeared this very important detail that
was missing to me (and so to the modeling):
The EXIOBASE data is a specific static snaptshot, in this data the
center is actually the activities and each flow is either input to 1
or output to 1 activity only.
When we reason about product footprint, then intervenes some other
data/analysis process that links in some way flows, so only at this
point we can record:
That the coal C4 from CM2 is actually used as the input coal C1 for SP1.
Yes, there are in fact (at least) two different instances of the database:

- one before linking, in which flows are only recorded as being either
inputs to or outputs from an activity

- one after linking (which implies the application of the algorithms of
a specific system model, as described by Chris), where each flow is
recorded as a flow between two specific activities

The ontology can (or should be able to) handle both these instances.
This is, of course, correct, though it does help clarify things for me.

However, perhaps this needs a different conceptual approach? Perhaps
something like a resolved exchange, which has isInputOf *and*
isOutputOf? Probably the language would have to be adjusted. But I
guess we need this for trading activities in any case.

Bo





--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Miguel Fernández Astudillo
 

Upss… Sorry, me too, I was convinced it was today. Yesterday by that time I was in front of the computer :/.

 

I will try to catch up with the minutes

 

Miguel (F.A.)

 

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Massimo Pizzol
Sent: 19 March 2019 08:08
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

Sorry guys I missed the call yesterday. I totally messed up with the time zone, I thought it was today, stupid mistake.

I read the minutes and it’s alright will work on the specific task I have been assigned to.

BR
Massimo

 


Re: Competency Questions #ontology #rdf

Bo Weidema
 

Den 2019-03-18 kl. 23.35 skrev Matteo Lissandrini (AAU):

the concept of macroeconomic scenario which is not present in the data I've seen.
Please see the BONSAI glossary: https://github.com/BONSAMURAIS/bonsai/wiki/Glossary

In our last conversation it appeared this very important detail that was missing to me (and so to the modeling):
The EXIOBASE data is a specific static snaptshot, in this data the center is actually the activities and each flow is either input to 1 or output to 1 activity only.
When we reason about product footprint, then intervenes some other data/analysis process that links in some way flows, so only at this point we can record:
That the coal C4 from CM2 is actually used as the input coal C1 for SP1.
Yes, there are in fact (at least) two different instances of the database:

- one before linking, in which flows are only recorded as being either inputs to or outputs from an activity

- one after linking (which implies the application of the algorithms of a specific system model, as described by Chris), where each flow is recorded as a flow between two specific activities

The ontology can (or should be able to) handle both these instances.

Bo


Re: Competency Questions #ontology #rdf

 

This is a very important question! The following is my opinion, and other might have a different perspective.

Questions over what can really substitute for what (e.g. for coal this is sulfur content, energy density, but also in general lignite takes totally different handling than bituminous) are long known to be difficult questions; similarly, the correct way of modelling markets with multiple providers, trade, and re-export is also tricky. They are tricky because in most cases we have to make value judgments in what we think is the best model, without really being able to get the "right" answer.

As such, these decisions should be done by the system modelling software (see my recent blog post), and should not be addressed in the data format. We want to be able to try multiple approaches, and be able to quantify the effects of different choices. Instead, the data format should be able to represent different kinds of coal, their origin locations, trade patterns (trade is an activity, the same as other activities), and the properties of these coals. The data format can also give the volume of specific kinds of coals consumed by various activities in a region. The system model is responsible for taking this large set of data points, and creating a balanced view of a possible world.


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Massimo Pizzol
 

Sorry guys I missed the call yesterday. I totally messed up with the time zone, I thought it was today, stupid mistake.

I read the minutes and it’s alright will work on the specific task I have been assigned to.

BR
Massimo

 


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Brandon Kuczenski
 

Hey all,
I created a number of issues in the reproducibility repo, and assigned one to each of you. Let's all try and do a turn on our assigned issue by the end of Wednesday.
Work can be done directly in the issue or through contributions to the repo. I will also try to fill out the written docs.

-Brandon



--
Brandon Kuczenski, Ph.D.
Associate Researcher

University of California at Santa Barbara
Institute for Social, Behavioral, and Economic Research
Santa Barbara, CA 93106-5131

email: bkuczenski@...


Re: Start of the #ontology sub-group #ontology

Massimo Pizzol
 

Thanks Brandon

 

>>> a flow is not an input or an output- it has to be both.

I completely agree, and this is what I was trying to write as well. In my understanding a “flow” object is not an input or output in absolute terms but only in relative terms, i.e. in relation to another “activity” object. Therefore, using the predicates “IsInputof” and “IsOutputof” seems to me an appropriate and sufficient way to express this relationship while I don’t think we should use of the “Input” and “output” subclasses for the reasons previously outlined (not fully correct, redundant, inconsistent).

 

BR
Massimo

 


Re: Competency Questions #ontology #rdf

Matteo Lissandrini (AAU)
 

Dear all,

I've collected the discussion and something more re: competency questions in the wiki for the RDF framework repository[1],
this will have to be restructured.
Please feel free to fix any typo or other issue you'll see.
Also, let me know if you add any new competency question.


There are a number of things still open on this. For instance, in the questions come up the concept of macroeconomic scenario which is not present in the data I've seen.
To this, probably (or maybe not, please let me know) connects the issue on Input/Output.

In our last conversation it appeared this very important detail that was missing to me (and so to the modeling):
The EXIOBASE data is a specific static snaptshot, in this data the center is actually the activities and each flow is either input to 1 or output to 1 activity only.
E.g. there is this steel production SP1 that consume some  coal C1 and outputs some steel S1.
Then there is this other steel production SP2 that consume some different coal C2 and outputs some other steel S2.
Then there is this coal mine CM1 that outputs some coal C3.
Then there is this other coal mine CM2 that outputs some coal C4.

When we reason about product footprint, then intervenes some other data/analysis process that links in some way flows, so only at this point we can record:
That the coal C4 from CM2 is actually used as the input coal C1 for SP1.
The coal C3 from CM1 is actually the input coal for SP2.

So C4=C1 -> at the same time output (of CM2) and input (for SP1), is this the issue ?

Yet, there may be cases like the following:
CM1 which outputs C3 actually split 70% in C1 for  SP1 and 30% in C2 for SP2.

Please, let me know if I'm understanding this correctly.



Thanks a lot,
Matteo















[1]https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/wiki/The-BONSAI-Ontology-and-RDF-Framework


Re: Start of the #ontology sub-group #ontology

Brandon Kuczenski
 

Massimo,
Let me weigh in on the input / output question. In my view, a flow is not an input or an output- it has to be both. It has to be an output from the process that created it and an input from the process that consumed it. The flow is the same in both cases; therefore it is an error to call it one or the other.

I haven't seen the term 'exchange' used very much but in my view, a flow is simply a product/substance/material/service and a quantity of measurement (say, 'mass'). (this has to be fixed in order for the use of many different databases to be stable). I think of an exchange as a 4-tuple: an activity that defines the exchange (which I call the parent), a flow that is being exchanged, a direction with respect to the parent, and a termination, which is the other activity (or compartment or stock or market) that is the partner to the exchange. If the termination is null, then it's a cutoff flow- auditing these flows is part of reviewing a model.

This view is pretty consistent with your discussion about "Who is this 10 kg of coal associated with?"

A characteristic of this definition is that it is non-numeric, i.e. there is no quantitative information- only adjacency. This helps to define the model without getting hung up on what the exchange value or uncertainty is. Obviously there could be uncertainty in the termination - from where / what supplier / what time of day / etc? but that is not quantitative uncertainty.

When the parent activity is invoked as part of a query, it would be "responsible" for "figuring out" the exchange value given the query it is answering, and the termination could / would have to be figured out by the software that is doing the query. But it's the exchange that is directional, not the flow.

I will try to make the call on Friday but I'm not sure what time it is.

-Brandon


--
Brandon Kuczenski, Ph.D.
Associate Researcher

University of California at Santa Barbara
Institute for Social, Behavioral, and Economic Research
Santa Barbara, CA 93106-5131

email: bkuczenski@...


Re: Start of the #ontology sub-group #ontology

 

Two small things.

1. This discussion of Apache Jena might be interesting for some of you: https://news.ycombinator.com/item?id=19419025

2. Thanks Massimo, I think this is a great format and makes it much easier to follow the train of ideas, especially over multiple days. Let's do this more!


Re: Start of the #ontology sub-group #ontology

Massimo Pizzol
 

Dear Ontology/RDF group

 

We have a meeting Friday and I would like to share some points for discussion.

 

I am thinking a lot about our ontology and there are two pressing issues that I hope we can clarify.

 

  1. The use of “input” and “output” subclasses

 

Bo has suggested this below as arguments to NOT introduce subclasses like “product”, “emission”, and “waste”.

 

>>> Principle: We try to avoid making fixed choices, like sign nomenclatures, that are only useful in specific contexts.

>>> Principle: It is a good practice for a model to stay as close to reality as possible

>>> Principle: Do not introduce unneccesary (obligatory) classifications

 

I agree with these principles and I think it makes sense not to have the subclasses product/waste/etc. My problem is that I don’t see how the choice of using the “input” and “output” subclasses fits with these principles. It is a sign convention, useful in specific context, and it is an obligatory classification. I don’t know if classifying things in “input” and “output” is close to reality more than classifying them in “products”, “emission”, “waste”. Thus, my preference is to remove the “input” and “output” subclasses and keep only the “isInputof” and “isOutputof” predicates.

 

So far the arguments for using the input and output subclasses have been:

 

>>> at the ontology level to restrict the domain of the input and output relationships.

I am not totally clear on what this means. My concern is whether the use of subclasses unnecessarily increases the complexity of the model because – assuming I have understood thing correctly - there would be two instances of e.g. a “coal” flow. One is the “coal input flow” and the other is the “coal output flow” each of them with a different URI. So if you are looking at the instance “electricity production” you will find it is related to a specific URI for coal input,  and if you are looking at the instance of “coal production” you will find it’s related to another different URI for coal output. So the same thing (coal) in the physical reality is now described by two different codes.

>>>Assume you are looking at a specific instance of a 10tonnes of coal in your database, then you ask yourself “is this an input for something or an output of something?”

My view is that in the physical reality 10 kg coal is not the output or the input of something in absolute terms. It is just coal, i.e. an object. The fact that is input or output is determined only in relative terms, i.e. in relation to another object (activity). Coal is output of coal production. Coal is input to electricity production. I would instead ask this type of questions: “Who is this 10 kg of coal associated with?” And what I would expect to find out is that it is the output of a coal production activity and the input of a electricity production activity.

 

>>>for sure you could find the answer by checking  "is this the source of inputOf relationships?",

This sounds really nice IMO! I was thinking this was actually how we should find out about things. I also guess that this is a competency question? I would like to better understand why this is not  sufficiently “operational”.

 

>>>but operationally you can ask "what is the type of this? And in my view this would be the correct way to do this because something is either input or output"

I argued above that in the physical reality something is not either input or output in absolute terms. Anyway, we could certainly ask the question “"what is the type of this?” with referent to whether something is classified as input or output. But if we start asking these type of questions for the input vs output classification, then if we are consistent why not asking the same type of question for each other possible classification? For example: if something is a product exchange? or an environmental exchange? For example I could ask “Is CO2 an emission or a product?” But Bo has argued based on the principles above that this is not a relevant question. So why is the question relevant for input and output?

 

  1. In general I am unclear on how much should we adhere to existing LCA frameworks.

 

>>> LCA people all have their own mental model.

On one hand I agree we should keep an open mind and not be constrained by specific mental models. But on the other end, I also understand that we are doing this for the use of “LCA people” too. I thought one of our purposes was to create an infrastructure to support LCAs (e.g. because by making specific queries one can get LCA datasets). If our purpose is to make an ontology that is valid for all models in all disciplines from economics to environmental sciences, then perhaps the terms “input” and “output” are the most generic ones (can apply to anything from a tree to a whole country economy) and this might be sufficient (preferably as predicates, as I argued above). However, In order to use the linked data to create some LCIs, we would need some ways of separating what is A matrix (products) and what is B matrix (substances, costs, or many other things) , and what is reference flow, because this is what LCA people are used to work with. So perhaps we have to allow for the possibility of identifying this LCA-specific information. With the current ontology the “only” information we can obtain from e.g. the graph of steel production is a list of inputs and outputs. So how do I distinguish if steel is the reference flow of steel production instead of CO2?

 

 

Hope this was useful and I am looking forward to a good discussion on Friday
Massimo

 

 


Re: Competency Questions #ontology #rdf

Bo Weidema
 

Dear Miguel,

These are relevant issues. However, for the time being, we have restricted ourselves to data that relect averages with a duration of minimum 1 year. This is because these are the typical data used, and a larger granularity would risk an overcomplication relative to the typical data in use in the domain. Nevertheless, I believe that in the future, it will be relevant to allow more flexibility here, and I think that will also be possible without actually changing the ontology. The current restriction is not ontological, just practical.

Best regards

Bo

Den 2019-03-18 kl. 05.24 skrev mmremolona via Groups.Io:

Hi all,

Sorry for not participating as much the past few weeks. I'm trying to catch up with what everyone has said so far.

In terms of these competency questions. I guess the question that Massimo is asking is with respect to time scales and time windows. I'm not entirely familiar with the dataset that is available in the domain, but these time scales for measurements can cause some incongruity in the representation that is finally done in the ontologies. I'm not sure if the questions I ask are of the type to be included in these competency questions but my opinions are as follows:

(MR_Q1) What is the time granularity of the data that we acquire? This includes flow rates and production statistics. I also assume this varies with the different sources of data. Some data may already be averaged (Do we handle these differently?).
(MR_Q2) Are we going to aggregate data as part of the ontology specification or is this left for other parts of the pipeline? And if we are to aggregate data, to what degree and time scales? (per hour, per day, per week - I think this depends on how often we aggregate data and what data is available, I don't think a per minute data is significant enough in the overall scheme of LCA but I might be wrong) 

As of now, these are the questions that came to my head as I'm reading along the threads in this group. I'll post more ideas as I come across them.

Best,

Miguel Remolona
--


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Brandon Kuczenski
 

Hey folks,
I can probably make 5pm tomorrow (9am tomorrow). 5:30 is also fine but I am sensitive to the hungry kids. I can also do pretty much any time later than 5pm so let me know if something else is better.

Here's a zoom link-

Brandon Kuczenski is inviting you to a scheduled Zoom meeting.


Topic: reproducibility group

Time: Mar 18, 2019 9:00 AM Pacific Time (US and Canada)


Join Zoom Meeting

https://ucsb.zoom.us/j/222361622


One tap mobile

+16699006833,,222361622# US (San Jose)

+16468769923,,222361622# US (New York)


Dial by your location

+1 669 900 6833 US (San Jose)

+1 646 876 9923 US (New York)

Meeting ID: 222 361 622

Find your local number: https://zoom.us/u/awvzew4J


Join by SIP

222361622@...



On Mon, Mar 18, 2019 at 12:25 AM <miguel.astudillo@...> wrote:

For me 5.30 is fine, let's see if Brandon can make it.

 

Best, Miguel

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Massimo Pizzol
Sent: 15 March 2019 15:37
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

5:30 PM is really a bad time for me (“kids are hungry”-time) but go ahead I’ll do my best and if I can’t join then amen !

 

Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Carlos David Gaete via Groups.Io" <cdgaete@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Friday, 15 March 2019 at 15.23
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

Hi all,

 

It would be best for me next week. Monday, Tuesday...

I think we can meet after  5pm CET so that Brandon can join us. So I propose Monday 5:30pm CET 

Regards

Carlos



--
Brandon Kuczenski, Ph.D.
Associate Researcher

University of California at Santa Barbara
Institute for Social, Behavioral, and Economic Research
Santa Barbara, CA 93106-5131

email: bkuczenski@...


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Miguel Fernández Astudillo
 

For me 5.30 is fine, let's see if Brandon can make it.

 

Best, Miguel

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Massimo Pizzol
Sent: 15 March 2019 15:37
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

5:30 PM is really a bad time for me (“kids are hungry”-time) but go ahead I’ll do my best and if I can’t join then amen !

 

Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Carlos David Gaete via Groups.Io" <cdgaete@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Friday, 15 March 2019 at 15.23
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

Hi all,

 

It would be best for me next week. Monday, Tuesday...

I think we can meet after  5pm CET so that Brandon can join us. So I propose Monday 5:30pm CET 

Regards

Carlos


Re: Competency Questions #ontology #rdf

mmremolona@...
 

Hi all,

Sorry for not participating as much the past few weeks. I'm trying to catch up with what everyone has said so far.

In terms of these competency questions. I guess the question that Massimo is asking is with respect to time scales and time windows. I'm not entirely familiar with the dataset that is available in the domain, but these time scales for measurements can cause some incongruity in the representation that is finally done in the ontologies. I'm not sure if the questions I ask are of the type to be included in these competency questions but my opinions are as follows:

(MR_Q1) What is the time granularity of the data that we acquire? This includes flow rates and production statistics. I also assume this varies with the different sources of data. Some data may already be averaged (Do we handle these differently?).
(MR_Q2) Are we going to aggregate data as part of the ontology specification or is this left for other parts of the pipeline? And if we are to aggregate data, to what degree and time scales? (per hour, per day, per week - I think this depends on how often we aggregate data and what data is available, I don't think a per minute data is significant enough in the overall scheme of LCA but I might be wrong) 

As of now, these are the questions that came to my head as I'm reading along the threads in this group. I'll post more ideas as I come across them.

Best,

Miguel Remolona


Re: #softwaremethods Python library skeleton #softwaremethods

Stefano Merciai
 

Dear Brandon and Tomas,

I am sorry but I have withdrawn from the group to better focus on other issues.

Best,

SM


On 14/03/2019 23:28, Chris Mutel wrote:
Dear Brandon, Stefano, and Tomas:

As I did not see much movement from your working group, I have done the following:

Please follow up and complete these deliverables, as people will be reliant on them from the start of the hackathon.

-- 
Best,
S.


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Massimo Pizzol
 

5:30 PM is really a bad time for me (“kids are hungry”-time) but go ahead I’ll do my best and if I can’t join then amen !

 

Massimo

 

 

From: <hackathon2019@bonsai.groups.io> on behalf of "Carlos David Gaete via Groups.Io" <cdgaete@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Friday, 15 March 2019 at 15.23
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

Hi all,

 

It would be best for me next week. Monday, Tuesday...

I think we can meet after  5pm CET so that Brandon can join us. So I propose Monday 5:30pm CET 

Regards

Carlos


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Carlos David Gaete <cdgaete@...>
 

Hi all,

It would be best for me next week. Monday, Tuesday...
I think we can meet after  5pm CET so that Brandon can join us. So I propose Monday 5:30pm CET 
Regards
Carlos


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Massimo Pizzol
 

Next week better for me, Monday Tuesday Wednesday. Doodle perhaps?

BR
Massimo

 

From: <hackathon2019@bonsai.groups.io> on behalf of "miguel.astudillo via Groups.Io" <miguel.astudillo@...>
Reply-To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Date: Friday, 15 March 2019 at 14.57
To: "hackathon2019@bonsai.groups.io" <hackathon2019@bonsai.groups.io>
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

Dear all

 

I am  also available today although from 13pm “GMT-4” (I think 10 am pacific time), although talking today may be a bit of a stretch.

 

Otherwise at any time next week at “CET times” (not before 8am CET please !)

 

Best

 

Miguel F. Ast.

 

 

 

 

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Brandon Kuczenski
Sent: 14 March 2019 06:09
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

OK, I added a basic overview to the main README and added you all as contributors to the reproducibility repo. Please commit with wild abandon.

Regarding a call- I am free after 9am US pacific (I think 5pm CET) Thursday or Friday- not unfortunately Thursday night / Friday morning. Another possibility is to use github issues and comments for discussion. First person to open a new issue gets a cigar.

-Brandon


Re: #reproduciblemodels working group - getting organized #reproduciblemodels

Miguel Fernández Astudillo
 

Dear all

 

I am  also available today although from 13pm “GMT-4” (I think 10 am pacific time), although talking today may be a bit of a stretch.

 

Otherwise at any time next week at “CET times” (not before 8am CET please !)

 

Best

 

Miguel F. Ast.

 

 

 

 

 

From: hackathon2019@bonsai.groups.io <hackathon2019@bonsai.groups.io> On Behalf Of Brandon Kuczenski
Sent: 14 March 2019 06:09
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #reproduciblemodels working group - getting organized

 

OK, I added a basic overview to the main README and added you all as contributors to the reproducibility repo. Please commit with wild abandon.

Regarding a call- I am free after 9am US pacific (I think 5pm CET) Thursday or Friday- not unfortunately Thursday night / Friday morning. Another possibility is to use github issues and comments for discussion. First person to open a new issue gets a cigar.

-Brandon