Date   

Re: Two votes - please participate!

Massimo Pizzol
 

DCAFBEGH

"The 'flow' of an 'item' from/to an 'activity'" 
Is quite close to the way we colloquially talk about these things IMO, e.g. "10 kg of carbon dioxide from electricity production". I know 'item' might be appear too generic, but we have decided that we don't want to accept the mental limitations of predefined categories (like e.g. 'product', 'emission', etc.) so the chosen term actually has to be quite generic to allow us to identify almost everything...


Re: Two votes - please participate!

Carlos David Gaete <cdgaete@...>
 

AFCED


On Mon, 8 Apr 2019 at 15:54, Chris Mutel <cmutel@...> wrote:
FYI:

1. The vote on BEP 1 is trending towards acceptance; the voting will stop if two more people participate and approve.

2. We currently have 5 votes in our nomenclature discussion. Here are the average ranks:
  • A 2.0
  • B 4.2
  • C 2.8
  • D 2.6
  • E 5.2
  • F 4.2

I have updated the table with the alternatives suggested by Miguel R. You are, of course, allowed to alter your votes if you want.


Re: Two votes - please participate!

romain
 

DCABFGHE

/Romain


Re: Two votes - please participate!

 

FYI:

1. The vote on BEP 1 is trending towards acceptance; the voting will stop if two more people participate and approve.

2. We currently have 5 votes in our nomenclature discussion. Here are the average ranks:
  • A 2.0
  • B 4.2
  • C 2.8
  • D 2.6
  • E 5.2
  • F 4.2

I have updated the table with the alternatives suggested by Miguel R. You are, of course, allowed to alter your votes if you want.


Re: #bonsamurai.github.io

 

Maybe easier to split it up into actual URLs:

Note that the following is just one possibility, and will be changed now and in the future. Our aim is to make such changes easy.

bonsai.uno
  • Homepage
  • Vision (short)
    • Common ontology for LCA, MFA, and IE
    • Open data pipeline
  • By the community, for the community
    • Getting started guide
    • GH projects repo
  • Should be short, more of an appetizer than a meal, with links to more documentation

bonsai.uno/ontology
  • Introduction to core concepts of the ontology, starting with a gentle introduction to linked data
  • Ends with links to other docs/visualization for complete ontology
  • Target audience is people who have never heard "RDF" before

bonsai.uno/data-pipeline
  • Subway-style map with the different data processing steps, and the accompanying repositories / web resources
  • Target audience is people who are used to using the "Excel hammer"

bonsai.uno/getting-started
  • Brief page with links to more specific getting-started guides. Help people decide what getting started guide is right for them.
  • Could also contain a toolkit, like http://toolbox.schoolofdata.ch/

bonsai.uno/getting-started/contribute-data

bonsai.uno/getting-started/our-api

bonsai.uno/getting-started/others as we develop

bonsai.uno/community
  • Community management philosophy
  • Links to BEPs

bonsai.uno/FAQs
  • FAQs to be populated. 
    • How is BONSAI different than other LCA databases?
    • How can I contribute?
    • Who is behind BONSAI?
    • What is the relationship between the project and the NPO?
    • Is anyone paid to work on BONSAI?

bonsai.uno/NPO
  • Archive of official documents
  • Become a member

To do:
  • Look into CSS classes used (everything necessary is in the repo already), decide if we want to keep using SASS as CSS preprocessor, create some more classes (and maybe more meaningful labels for common layouts). Write up brief notes on using the CSS to get what you want.
  • Write some sample content for 1-2 pages, esp. data flow, homepage, and ontology
    • Then do some layout with bright, colorful, and simple graphs (e.g. for links between ontology concepts).


Re: 5.4.19 Catch-up meeting minutes and next meeting planning

 

Repo overview attachment


5.4.19 Catch-up meeting minutes and next meeting planning

 

Next catch-up meeting

We will have another catch-up meeting on 12.4.19 at 15:00 CEST, and then skip the next week (19.4.19) due to Easter holidays.

5.4.19 Catch-up meeting minutes

Correspondence tables
 
Started https://github.com/BONSAMURAIS/grafter tool to change 1-1 CSVs to RDF with actual predicates
Added some new tables, and metadata to existing tables
Priority is EXIOBASE - ENTSO-E, as we need this for first proof on concept deliverable
Data cleaning/conversion now is laborious and manual, need a better way. See an example here: https://github.com/BONSAMURAIS/Correspondence-tables/blob/master/scripts/from_raw_to_clean_tables.ipynb
 
Communication
 
Need a clean and prominent place to summarize existing repos, their functions, and their interdependencies (one possible overview from Tom Millross is attached)
Could be on wiki or bonsai README
bonsai.uno website rework is starting, repo here: https://github.com/BONSAMURAIS/bonsai.uno
 
Ontology
 
Discussion on nomenclature is ongoing, with several creative solutions proposed
Adaptation of existing probability ontology is difficult due to all examples being XML; volunteers to help adapt this ontology please contact Agneta
Move away from JSON-LD and towards Turtle as default exchange format for RDF data, due to readability and ease of programming
 
System model / calculation interface
 
Work and documentation is proceeding after the hackathon, such as procedures for dis/aggregation (e.g. Aggregating different types of gas with different calorific values)
REST endpoints to be defined and documented
 
Outreach
 
Miguel A. will attend https://forum.openmod-initiative.org/t/aarhus-2019-workshop/1126
Those attending LCM 2019 will hold an outreach event, with organizing and content support from others in the BONSAI team
 


Re: #ontology Can we come up with a better term than "Flow Object"? #ontology

mmremolona@...
 

Hi all,

My philosophy on naming in ontologies revolves not on the simplicity of the terms used but on how they sound like when you talk about them in normal conversations. Does it sound awkward or normal? On the terms of the ontology:

Flow -> Right now this term is used to refer to the transfer of material or objects from an activity (as an output) to another activity (as an input), thereby connecting these two activities. In my opinion, changing this term to exchange does not affect the overall understanding of the ontology. Either would work. I can have a material flow from one activity to another activity.

Flow-object -> This is defined as an object that is referenced in a flow. Many flows can reference a single instance of a flow-object. I think this is where confusion may set in, as a flow-object can be imagined as an instance of flow. And I agree with Chris that this doesn’t sound right when talking about it. It just doesn’t seem natural to mention a flow object.

I don’t think flow itself works here as the word flow doesn’t equate to any object or material.

For the idea regarding using the term “thing”, everything in any ontology is a subclass of owl:Thing, at least according to the specifications of w3c, so this is redundant and may lead to confusion.
Regarding flow-item, while this seems like a good idea, I generally associate the term item to something that I can itemize or count. Steel, copper, coal, and all the other things used don’t have a problem. However, for CO2, water, steam, etc., this doesn’t seem like a good term to use.

My initial idea to fix this is by making flow an adjective, as in the case of Flowing-Object. However, this doesn’t sound right in language as well. My previous argument for the flow-item would then be reversed; coal steel and those solid objects do not necessarily flow.

My secondary idea involves using the term Exchanged-Object. This is not necessarily related to the first term flow, but both can be adapted so that it sounds more congruent overall. This also sounds better as the question that arises from it sounds better in English (e.g. What’s the exchanged-object between the two activities you mentioned? In this flow, what’s the exchanged-object?)

TLDR:
Flow -> “Exchange” or retain “Flow”
Flow-object -> “Exchanged-Object”

 

Best,

 

Miguel Remolona


Re: #ontology Can we come up with a better term than "Flow Object"? #ontology

Andreas Ciroth
 

Dear all,

interesting. As part of the discussion you may want to consider also the JSON-LD format names:

http://greendelta.github.io/olca-schema/

In my view, process, flow, exchange is most commonly used (used in “our” JSON-LD format and in ILCD) and it is not too bad (meaning: short, not misleading; it is good to distinguish flows from exchanges). Yes, process, and flow, and also exchange can be a noun and a verb but this is common in English language. So, maybe, in view that there are really lots of things to do in LCA and data availability and LCA ontologies, it is maybe good to stick with this. Or, invent something really different. Point, line, square, e.g., would be different, for flow, exchange, process.

All the best!

Andreas

 

Von: main@bonsai.groups.io <main@bonsai.groups.io> Im Auftrag von Elias Sebastian Azzi
Gesendet: Sonntag, 7. April 2019 23:18
An: main@bonsai.groups.io
Betreff: Re: [bonsai] #ontology Can we come up with a better term than "Flow Object"?

 

Hello,

Reading up that long email thread I wrote a summary of the different views expressed. I also summarise an article that describes another ontology for IE, rather different vocabulary, hoping it will help us see the ontology from a different perspective.

 

 

Alpha / Summary

 

Issue - Human vocabulary for BONSAI's core ontology

While there seems to be an agreement among the participants around the three core classes of the ontology (i.e. on their conceptual meaning), there is not yet a consensus on how these classes should be named in human readable language. There is however an agreement on the fact that the vocabulary used during the hackathon 2019 is not ideal. Most of the controversy lies in the term "flow object". This issue seems of high importance because it affects how people perceive the ontology, understand it and decide whether to take it up or not.

 

Below, we summarise the different views/suggestions on that issue, pros, cons and remarks.

 

V1. [Chris] "Flow object" is not consistent with the other terms of the ontology and is hard to related to. The alternative "item" is suggested.

Pro: definition of item is "an individual article or unit, especially one that is part of a list, collection, or set" which fits in the concept.

Pro: it echoes to fields of computer science and mathematics

Remark: activities are also part of a list/collection/set, according to that definition activities are also items of a collection of activities.

 

V2. [Chris] "Flow" is good but has no natural counterpart. An alternative for "flow" could be "exchange".

 

V3. [Agneta] Return to the published LCA ontology (Kuczenski et al. 2016), with the three terms Activity (a thing that happens), Flow (a thing in the world that exists because of some instance of an Activity), and Exchange (an established relationship between an activity instance and a flow instance).

Pro: (to verify) coherence with the vocabulary used by most industrial ecologist / (disagreement) in (1) the authors argue that terminology is not consistent between industrial ecologist, even for basic definitions.

(1) Pauliuk, S.; Majeau-Bettez, G.; Müller, D. B.; Hertwich, E. G. Toward a Practical Ontology for Socioeconomic Metabolism. J. Ind. Ecol. 2016, 20 (6), 1260–1272; DOI 10.1111/jiec.12386.

 

V4. [Rutger] In  ecospold1, only exchanges are defined. In ILCD data formats, both exchange and flows (i.e. flow objects) are defined. Environmental compartments are specified. In SimaPro platform, Flows do not include compartments, as in the Bonsai hackathon version. Exchange is not yet used, but is considered. At PRé, flow-objects are of two types: substances and products, but not perfect.

Con: flow and exchange are both dynamic terms

 

V5. [Matteo] Flow and Flow-object in the post-hackathon ontology are clear and well defined: they relate the Flow and the Object of the Flow (aka the Flow Object). In other words, by keeping the word "flow" in both definitions their link and subtle difference is kept explicit and forces the new-comer to think twice about these definitions.

Pro: all terms can be confusing, the advantage of Flow and Flow-object is that the difficulty is not hidden behind different terms, does not allow for misunderstanding to happen.

 

V6. [Bo] The vocabulary we use needs to distinguish between "the observation of a specific flow (22 kg input of steel) and the abstract flow-object (steel)".

 

V7. [Agneta] "hackathon vocabulary" -> "new vocabulary"

Flow-object => Flow

Flow => Exchange

 

Long List of Terms:

Flow object, entity, object, flux, item, thing, element, substance, component, Noumenon, Flow-item, commodity

Flow, Exchange, Phenomenon

Activity

 

-------

 

 

Bravo / Looking at it from a different angle

 

This being said, I would like to add to the discussion the following points:

  • Matteo has a point: by using the work “flow” twice (in flow and flow-object) we keep the complexity explicit.

 

  • We seem to agree on the structure, but finding the right words for human communication is tricky: do we have to choose? In the end, examples speak by themselves. We will choose a term now, but we can keep the list of alternatives: the list helps clarify things!

 

  • Do we actually agree on the structure? Your discussions forced me to re-open that article by Pauliuk and co: they have the same goal as Bonsai, performed a review of all IE fields, and (wait for it) came up with a totally different wording. I would say that it is one level of abstraction higher than the current Bonsai ontology, and rather stimulating to read. Here some highlights:
    • Many inconsistencies of vocabulary and definitions exist within IE and even within certain fields e.g. LCA
    • Industrial ecologist describe socioeconomic metabolism by a bipartite directed graph (i.e. SUTs) or directed graph
    • Five key definitions:

Definition 1, Sets: A set is a collection of distinct objects

Definition 2, Hierarchical, mutually exclusive and collectively exhaustive (H-MECE) object classification: An HMECE object classification is a grouping of a given set of objects into an H-MECE collection of sets.

Definition 3, Stock: A stock is a set of objects of interest.

Definition 4, Process: A process is a set-based description of one or several events of interest, expressed in terms of the objects of interest that are involved in these events during their course.

Definition 5, Flow: A flow is a description of a particular type of event, where objects are preserved and move from one set a to another set b.

    • In sounds very different, but when you read the article in details, all the issues we face are somehow discussed. Including how to handle the properties of objects of interest (see Figure 2)
    • Definition 2 is of interest for the correspondence table group

 

 

mvh

Elias

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: den 5 april 2019 12:46
To: main@bonsai.groups.io
Subject: Re: [bonsai] #ontology Can we come up with a better term than "Flow Object"?

 

I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md. Feel free to edit this if you think I have made a mistake.

> To re-iterate: Flow is a verb

Flow can be a verb or a noun, and there is something to be said for having all the core terms be nouns (I think everything else is).


Re: Two votes - please participate!

Elias Sebastian Azzi
 

ADCFBE  is my current preference.

 

mvh

Elias

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Matteo Lissandrini (AAU)
Sent: den 7 april 2019 17:44
To: main@bonsai.groups.io
Subject: Re: [bonsai] Two votes - please participate!

 

AFDCEB

 


From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Massimo Pizzol via Groups.Io [massimo@...]
Sent: Sunday, April 07, 2019 5:19 PM
To: main@bonsai.groups.io
Subject: Re: [bonsai] Two votes - please participate!

DCAFBE


Re: #ontology Can we come up with a better term than "Flow Object"? #ontology

Elias Sebastian Azzi
 

Hello,

Reading up that long email thread I wrote a summary of the different views expressed. I also summarise an article that describes another ontology for IE, rather different vocabulary, hoping it will help us see the ontology from a different perspective.

 

 

Alpha / Summary

 

Issue - Human vocabulary for BONSAI's core ontology

While there seems to be an agreement among the participants around the three core classes of the ontology (i.e. on their conceptual meaning), there is not yet a consensus on how these classes should be named in human readable language. There is however an agreement on the fact that the vocabulary used during the hackathon 2019 is not ideal. Most of the controversy lies in the term "flow object". This issue seems of high importance because it affects how people perceive the ontology, understand it and decide whether to take it up or not.

 

Below, we summarise the different views/suggestions on that issue, pros, cons and remarks.

 

V1. [Chris] "Flow object" is not consistent with the other terms of the ontology and is hard to related to. The alternative "item" is suggested.

Pro: definition of item is "an individual article or unit, especially one that is part of a list, collection, or set" which fits in the concept.

Pro: it echoes to fields of computer science and mathematics

Remark: activities are also part of a list/collection/set, according to that definition activities are also items of a collection of activities.

 

V2. [Chris] "Flow" is good but has no natural counterpart. An alternative for "flow" could be "exchange".

 

V3. [Agneta] Return to the published LCA ontology (Kuczenski et al. 2016), with the three terms Activity (a thing that happens), Flow (a thing in the world that exists because of some instance of an Activity), and Exchange (an established relationship between an activity instance and a flow instance).

Pro: (to verify) coherence with the vocabulary used by most industrial ecologist / (disagreement) in (1) the authors argue that terminology is not consistent between industrial ecologist, even for basic definitions.

(1) Pauliuk, S.; Majeau-Bettez, G.; Müller, D. B.; Hertwich, E. G. Toward a Practical Ontology for Socioeconomic Metabolism. J. Ind. Ecol. 2016, 20 (6), 1260–1272; DOI 10.1111/jiec.12386.

 

V4. [Rutger] In  ecospold1, only exchanges are defined. In ILCD data formats, both exchange and flows (i.e. flow objects) are defined. Environmental compartments are specified. In SimaPro platform, Flows do not include compartments, as in the Bonsai hackathon version. Exchange is not yet used, but is considered. At PRé, flow-objects are of two types: substances and products, but not perfect.

Con: flow and exchange are both dynamic terms

 

V5. [Matteo] Flow and Flow-object in the post-hackathon ontology are clear and well defined: they relate the Flow and the Object of the Flow (aka the Flow Object). In other words, by keeping the word "flow" in both definitions their link and subtle difference is kept explicit and forces the new-comer to think twice about these definitions.

Pro: all terms can be confusing, the advantage of Flow and Flow-object is that the difficulty is not hidden behind different terms, does not allow for misunderstanding to happen.

 

V6. [Bo] The vocabulary we use needs to distinguish between "the observation of a specific flow (22 kg input of steel) and the abstract flow-object (steel)".

 

V7. [Agneta] "hackathon vocabulary" -> "new vocabulary"

Flow-object => Flow

Flow => Exchange

 

Long List of Terms:

Flow object, entity, object, flux, item, thing, element, substance, component, Noumenon, Flow-item, commodity

Flow, Exchange, Phenomenon

Activity

 

-------

 

 

Bravo / Looking at it from a different angle

 

This being said, I would like to add to the discussion the following points:

-          Matteo has a point: by using the work “flow” twice (in flow and flow-object) we keep the complexity explicit.

 

-          We seem to agree on the structure, but finding the right words for human communication is tricky: do we have to choose? In the end, examples speak by themselves. We will choose a term now, but we can keep the list of alternatives: the list helps clarify things!

 

-          Do we actually agree on the structure? Your discussions forced me to re-open that article by Pauliuk and co: they have the same goal as Bonsai, performed a review of all IE fields, and (wait for it) came up with a totally different wording. I would say that it is one level of abstraction higher than the current Bonsai ontology, and rather stimulating to read. Here some highlights:

o   Many inconsistencies of vocabulary and definitions exist within IE and even within certain fields e.g. LCA

o   Industrial ecologist describe socioeconomic metabolism by a bipartite directed graph (i.e. SUTs) or directed graph

o   Five key definitions:

Definition 1, Sets: A set is a collection of distinct objects

Definition 2, Hierarchical, mutually exclusive and collectively exhaustive (H-MECE) object classification: An HMECE object classification is a grouping of a given set of objects into an H-MECE collection of sets.

Definition 3, Stock: A stock is a set of objects of interest.

Definition 4, Process: A process is a set-based description of one or several events of interest, expressed in terms of the objects of interest that are involved in these events during their course.

Definition 5, Flow: A flow is a description of a particular type of event, where objects are preserved and move from one set a to another set b.

o   In sounds very different, but when you read the article in details, all the issues we face are somehow discussed. Including how to handle the properties of objects of interest (see Figure 2)

o   Definition 2 is of interest for the correspondence table group

 

 

mvh

Elias

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: den 5 april 2019 12:46
To: main@bonsai.groups.io
Subject: Re: [bonsai] #ontology Can we come up with a better term than "Flow Object"?

 

I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md. Feel free to edit this if you think I have made a mistake.

> To re-iterate: Flow is a verb

Flow can be a verb or a noun, and there is something to be said for having all the core terms be nouns (I think everything else is).


Re: Two votes - please participate!

Matteo Lissandrini (AAU)
 

AFDCEB


From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Massimo Pizzol via Groups.Io [massimo@...]
Sent: Sunday, April 07, 2019 5:19 PM
To: main@bonsai.groups.io
Subject: Re: [bonsai] Two votes - please participate!

DCAFBE


Re: Two votes - please participate!

Massimo Pizzol
 

DCAFBE


#bonsamurai.github.io

romain
 

Hey, I start a discussion here on the new bonsai.uno webpage.

Here is the structure suggested by Chris.

  1. Vision (short)
    1. Common ontology for LCA, MFA, and IE
    2. Open data pipeline
  2. By the community, for the community
    1. Getting started guide
      1. Basic technologies
        1. Contribute with data
        2. Build web apps
        3. Using the API
    2. GitHub projects repo
  3. Community management
  4. Data reconciliation
  5. NPO (BONSAI on-profit organization)
    1. Become a member
    2. Archive of official documents

Did i get the hierarchy right?


Two votes - please participate!

 

Dear all-

1. If you haven't voted for or against BEP 1, please do it now! If not enough people participate, the proposal will automatically fail.

2. We have had a lively discussion on the terminology used in the ontology, and have several different options before us. It would be nice to get a sense of the broader groups preferences through an indicative, though not necessarily binding, vote. When multiple option are present, ranked choice voting (in this case in the form of instant runoff) is a decent polling choice. So please visit the list of candidates: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md, and reply to this email with your preferences in order by letter, from first to last. For example, here are my personal preferences:

BDACFE

Please rank all six possibilities, so we can get complete statistics.


Re: BEP-0004 BONSAI knowledge management and communication strategy | open for discussion / seeking editor

 

I have created a bonsai.uno repo, which we need to fill out, to eventually replace the existing content of the website (this is included in BEP 4). The current website structure looks like:

Homepage
    Challenge and vision
    Organization
        Static downloads
    Strategy
        Many working group pages
    Archive
        Static downloads
    Become a member
        Contributions

Here is the beginning of a new layout which emphasizes our concepts and work methods. I really think that the web page will be better for documentation than the wiki, as we can control the presentation more, and add a little white space so we don't have the "wall of text" effect. See the proposed BEP4 for a discussion of how best to use the different communication media.

Homepage
    Vision (short)
        -> Common ontology for LCA, MFA, and IE
        -> Open data pipeline
    By the community, for the community
        -> Getting started guide
        -> GH projects repo

    Common ontology

    Data pipeline

    Getting started guide
        Basic technologies

        -> Contribute data
        -> Build web apps
        -> Using the API

    Community management

    Data reconciliation

    NPO (BONSAI non-profit organization)
        Become a member
        Archive of official documents

One possible way to separate the content from the presentation by storing the text with some simple markup (e.g. Markdown) in a separate directory.

@agneta and @romain, let's discuss how we can each participate. Perhaps we could start by better planning an outline, and writing down what we want to accomplish. Feel free to provide your thoughts and concerns.


Re: #ontology Can we come up with a better term than "Flow Object"? #ontology

 

I added a table with what I could make of the existing systems, and the possible alternatives we have discussed, here: https://github.com/BONSAMURAIS/BONSAI-ontology-RDF-framework/blob/master/Terminology-discussion.md. Feel free to edit this if you think I have made a mistake.

> To re-iterate: Flow is a verb

Flow can be a verb or a noun, and there is something to be said for having all the core terms be nouns (I think everything else is).


Re: Serializing large LD datasets

Massimo Pizzol
 

No opinion here, I trust those who have already worked hands-on on this, and their choice.

BR
Massimo

 

From: <main@bonsai.groups.io> on behalf of "Agneta via Groups.Io" <agneta.20@...>
Reply-To: "main@bonsai.groups.io" <main@bonsai.groups.io>
Date: Friday, 5 April 2019 at 10.56
To: "main@bonsai.groups.io" <main@bonsai.groups.io>
Subject: Re: [bonsai] Serializing large LD datasets

 

+1 for turtle format

Much easier to read and write. 


Re: Serializing large LD datasets

Agneta
 

+1 for turtle format

Much easier to read and write. 


Re: Serializing large LD datasets

Miguel Fernández Astudillo
 

Hi!

 

In the correspondence table group we struggled a bit when we had to move from Turtle to json-LD. We spend some time trying to figure out how to do it in JSON and ended up writing turtle. We found it easier to write and read and we were told there was an automatic code to translate one to the other. I prefer Turtle, but I am not aware of the advantages of JSON-LD.   

 

Best,

 

Miguel

 

 

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: 05 April 2019 10:06
To: main@bonsai.groups.io
Subject: [bonsai] Serializing large LD datasets

 

Maybe our approach to serializing large graphs is maybe not that great. You can see the current code here - basically, we convert Python to JSON line by line, with some text mangling. It sounds (and looks) a bit crazy; the idea behind this decision was that RDFLib can't really handle large datasets, such as BONSAI.

The latest straw was realizing that we need to declare a `dataset` for the actual data (not just metadata). In turtle, this is (for example):


@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://creativecommons.org/ns#> .
@prefix dc: <
http://purl.org/dc/elements/1.1/> .
@prefix ns2: <http://purl.org/vocab/vann/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

brdfat: a dtype:Dataset ;
    ns1:license <http://creativecommons.org/licenses/by/3.0/> ;
    dc:contributor "BONSAI team" ;
    dc:creator <http://bonsai.uno/foaf/bonsai.rdf#bonsai> ;
    dc:description "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17" ;
    dc:modified "2019-04-02"^^xsd:date ;
    dc:publisher "bonsai.uno" ;
    dc:title "EXIOBASE 3.3.17 activity types" ;
    ns2:preferredNamespaceUri <http://rdf.bonsai.uno/activitytype/exiobase3_3_17/#> ;
    owl:versionInfo "0.3" ;
    foaf:homepage brdfat:documentation.html .

In JSON-LD, if is... more involved:

 


{
  "@graph" : [ {
    "@id" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
    "@type" : "dtype:Dataset",
    "license" : "http://creativecommons.org/licenses/by/3.0/",
    "contributor" : "BONSAI team",
    "creator" : "http://bonsai.uno/foaf/bonsai.rdf#bonsai",
    "description" : "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17",
    "modified" : "2019-04-02",
    "publisher" : "bonsai.uno",
    "title" : "EXIOBASE 3.3.17 activity types",
    "preferredNamespaceUri" : "brdfat:#",
    "versionInfo" : "0.3",
    "homepage" : "brdfat:documentation.html"
  } ],
  "@context" : {
    "label" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#label"
    },
    "versionInfo" : {
      "@id" : "http://www.w3.org/2002/07/owl#versionInfo"
    },
    "homepage" : {
      "@id" : "http://xmlns.com/foaf/0.1/homepage",
      "@type" : "@id"
    },
    "title" : {
      "@id" : "http://purl.org/dc/elements/1.1/title"
    },
    "publisher" : {
      "@id" : "http://purl.org/dc/elements/1.1/publisher"
    },
    "description" : {
      "@id" : "http://purl.org/dc/elements/1.1/description"
    },
    "preferredNamespaceUri" : {
      "@id" : "http://purl.org/vocab/vann/preferredNamespaceUri",
      "@type" : "@id"
    },
    "creator" : {
      "@id" : "http://purl.org/dc/elements/1.1/creator",
      "@type" : "@id"
    },
    "license" : {
      "@id" : "http://creativecommons.org/ns#license",
      "@type" : "@id"
    },
    "contributor" : {
      "@id" : "http://purl.org/dc/elements/1.1/contributor"
    },
    "modified" : {
      "@id" : "http://purl.org/dc/elements/1.1/modified",
      "@type" : "http://www.w3.org/2001/XMLSchema#date"
    },
    "dtype" : "http://purl.org/dc/dcmitype/",
    "brdfat" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
  }
}

Moreover, it is difficult for me to reason about why the JSON-LD is formatted the way that it is. On the other hand, the Turtle file is much nicer to read and predict.

We had said earlier (though without a formal decision) that we want to use JSON-LD for data interchange, but it would make life a lot easier to use Turtle, if people were OK with that. Let me know what you think!

 

221 - 240 of 303