Topics

Opening a second work track for BONSAI #dataliberation


 

I would like to propose we open a second work track for BONSAI: data liberation. The goal is to allow for more work to be done on interesting research questions and practical challenges in a distributed and democratic fashion.
 
The problem
 
We currently have ambitions about all-encompassing database using an ontology and tool chain, both currently under development. This plan has a lot of ideas that are, in my opinion, fundamentally correct, including the use of different kinds of data to develop inventories, storing information in its original form and transparently documenting any transformations, and including actors as fundamental bits of data along with flows and activities. However, the relatively slow progress, specific tool chain, and lack of widespread expertise at the intersection of tools and concepts make it difficult and frustrating for most BONSAI participants to make real contributions - and essentially impossible for outsiders! You can see this clearly in the commit stream to the BONSAI repositories.
 
At the same time, there are stupid barriers in doing industrial ecology and LCA work where BONSAI could make a substantial contribution in the very short term, like common standards for nomenclature, improving data formats, conversion tools, small parametric models for activities, science-based algorithms for merging and reconciling data, and tools for extracting data from published LCA studies.
 
Finally, BONSAI seems to exist as a separate pillar or project, working to some degree with others (e.g. the presentations seen or to come from US EPA, HESTIA, TRASE), but still it's own "team." Probably this is inevitable, to some degree, but such separation means some duplication of effort, as well as effort on many sides to translate across data formats or ontologies. There is enough tribalism in the world already, and, again in my opinion, we are too willing to let the perfect be the enemy of the good enough.
 
A different vision
 
  • BONSAI is the entire LCA/industrial ecology community (or maybe vice-versa). BONSAI reflects a number of different subject domains and methodological approaches, and doesn't try to fit everything into it's own shoebox.
  • Use open stuff already being worked on by others! No more wheels needed. BONSAI is not an individual tree, but an aspen community (but without the cloning part :).
  • Prioritize efforts during and outside of hackathons based on broad input from the community. Incrementalism instead of revolution.
  • Widely used, understood, and damn simple tech choices. See work of Stefan Pauliuk [0] for inspiration.
  • No religious debates on tiny details. Need proof of importance before starting widespread discussion, and big choices can only be made through the BEP process [1], not by a few people in a room somewhere.
 
Next steps
 
Let's talk about it - this is only my perspective, and this proposal (and indeed, BONSAI as a whole) doesn't work without many people participating. Discussion as usual on the mailing list (to keep an open record) under the hashtag #dataliberation.
 
Personally, given my limited time and energy, I will no longer be developing the BONSAI semantic web database work track.
 
[0] http://www.database.industrialecology.uni-freiburg.de/
[1] https://github.com/BONSAMURAIS/enhancements/blob/master/beps/0002-bonsai-project-community-governance-structure.md


 

Hi all,

Disclaimer: I'm new to BONSAI. I know nothing about ontologies. And I have probably spent less than 30 minutes reading the available documentation. I guess I qualify as an interested outsider so please take my statements as such.

When I first heard about BONSAI, I was quite excited. I understood it as an open-source, collaborative LCA database which I really believe is needed right now in our field (there are millions of studies and sources for LCA data but they are scattered and sometimes inaccessible). However, my excitement was dampened when I visited your project site for the first time. Everything seemed to be about RDFs, ontologies, triples... I thought to myself that all this stuff is probably necessary to build the all-encompassing database that you guys want to build. But I know nothing about it. And because I don't understand it, there is nothing I can do to help. I am an LCA practitioner with an engineering background and solid programming skills. I thought I was your audience. But after visiting your website I was not so sure. I think Chris' proposal could be way to include guys like me in your process. If that's what you want.

Cheers,
Ben


Massimo Pizzol
 

Dear Crhis

 

>> I would like to propose we open a second work track for BONSAI: data liberation. The goal is to allow for more work to be done on interesting research questions and practical challenges in a distributed and democratic fashion.

 

Concrete examples? I have a hard time understanding what you have in mind.

 

Dear Ben

 

>> I thought I was your audience. But after visiting your website I was not so sure.

What was expectation, what did you expect to find exactly in our website? Perhaps you are not our audience right now. Right now there is a lot of focus on ontology because it is a fundamental brick of this Bonsai puzzle, and will allow to make new ground-breaking things later on. This is my understanding and vision at least. Agree the work is slow-progress due to various reasons (including lack of know-how). You don’t have necessarily to contribute to ontology, if you don’t understand it and doesn’t sound interesting. Based on what Chris suggests above there might be other types of BONSAI-related work where you can contribute to in the near future.

 

Massimo

 


 

Dear Massimo,

What was expectation, what did you expect to find exactly in our website?
Something remotely related to LCA ;) like process models, lcia methods, or data to build such. I expected that BONSAI was a platform for such things.

You don’t have necessarily to contribute to ontology, if you don’t understand it and doesn’t sound interesting.
It sounds quite interesting as a matter of fact. But right now I have no capacity to dig into complex new topics like that. Maybe I can pull out of my work for your next Hackathon to learn more about it and contribute in that direction.

Based on what Chris suggests above there might be other types of BONSAI-related work where you can contribute to in the near future.
Exactly! That's why I like Chris' proposal :)

Cheers,
Ben


Miguel Fernández Astudillo
 

Hi

As I see it, there is room for everything. The ontology track is quite revolutionary, but slow to implement. It is rare to find somebody with an understanding of ontologies, a vision for industrial ecology, substantial coding skills and free time. The slowness demotivates and can make the whole thing stall. 

However, there are incremental tasks that can have a more immediate result and are still useful for the development of an open-source well-structured database. For example (and at the risk of being monotematic) correspondence tables in csv are an intermediate result of the ttl versions and can be used by many researchers to ease their work and avoid duplication of efforts. I'd love to improve and document the current implementation of exiobase in brightway (so I can use it in my day to day work!). Or contribute to a data format that can be used to share foreground LCA models (referenced to background data and their URIs).

In summary, I am all in for incrementalism but with a vision of where we want to go.

best,

Miguel








On Tue, 25 Feb 2020 at 11:29, Benjamin W. Portner <benjamin.portner@...> wrote:
Dear Massimo,

What was expectation, what did you expect to find exactly in our website?
Something remotely related to LCA ;) like process models, lcia methods, or data to build such. I expected that BONSAI was a platform for such things.

You don’t have necessarily to contribute to ontology, if you don’t understand it and doesn’t sound interesting.
It sounds quite interesting as a matter of fact. But right now I have no capacity to dig into complex new topics like that. Maybe I can pull out of my work for your next Hackathon to learn more about it and contribute in that direction.

Based on what Chris suggests above there might be other types of BONSAI-related work where you can contribute to in the near future.
Exactly! That's why I like Chris' proposal :)

Cheers,
Ben


 

I have started a repo with the Bonsai ontology (to the best of my understanding) as a relational database schema here: https://github.com/BONSAMURAIS/schema (just a beginning, there are still issues!). I have also tried to label the Github repos with the topics "relational" and "semantic" to distinguish these two work tracks.

I think there are a number of people that want to start doing things with data, and this would be a "quick and somewhat dirty" way to get some data into people's hands. In particular, we still have quite some work to do on preparing data for linking, and on finding a consensus system model.

Data that could be added now

Storing data

We need a policy here, I don't really think it makes sense to import EXIOBASE for each commit, or at least not yet, but we still need to know that our data won't disappear. Storing a copy in e.g. Zenodo would perhaps be sensible - discuss.

Preparing data for linking

To link, we need to choose the correct temporal, spatial, and activity scale. We can't just pick randomly (this methodology already exists :), so we should be creative. Finding where differences matter is always nice. My expectation is that this processed data would be entered into a new database, though this could change. Plus of course we need data reconciliation! This is non-trivial, multiple people are writing PhD theses on it.

System modelling

Need software to implement system constructs. We can choose existing IO ones for now, just need something. Someone should check on whether it is possible to adapt mojo or if it would make more sense to start over. Mojo is very table focused, perhaps the ocelot approach, where data is stored as lists of dictionaries instead of in tables/arrays is more sensible.