Topics

Serializing large LD datasets

 

Maybe our approach to serializing large graphs is maybe not that great. You can see the current code here - basically, we convert Python to JSON line by line, with some text mangling. It sounds (and looks) a bit crazy; the idea behind this decision was that RDFLib can't really handle large datasets, such as BONSAI.

The latest straw was realizing that we need to declare a `dataset` for the actual data (not just metadata). In turtle, this is (for example):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://creativecommons.org/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ns2: <http://purl.org/vocab/vann/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

brdfat: a dtype:Dataset ;
    ns1:license <http://creativecommons.org/licenses/by/3.0/> ;
    dc:contributor "BONSAI team" ;
    dc:creator <http://bonsai.uno/foaf/bonsai.rdf#bonsai> ;
    dc:description "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17" ;
    dc:modified "2019-04-02"^^xsd:date ;
    dc:publisher "bonsai.uno" ;
    dc:title "EXIOBASE 3.3.17 activity types" ;
    ns2:preferredNamespaceUri <http://rdf.bonsai.uno/activitytype/exiobase3_3_17/#> ;
    owl:versionInfo "0.3" ;
    foaf:homepage brdfat:documentation.html .

In JSON-LD, if is... more involved:


{
  "@graph" : [ {
    "@id" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
    "@type" : "dtype:Dataset",
    "license" : "http://creativecommons.org/licenses/by/3.0/",
    "contributor" : "BONSAI team",
    "creator" : "http://bonsai.uno/foaf/bonsai.rdf#bonsai",
    "description" : "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17",
    "modified" : "2019-04-02",
    "publisher" : "bonsai.uno",
    "title" : "EXIOBASE 3.3.17 activity types",
    "preferredNamespaceUri" : "brdfat:#",
    "versionInfo" : "0.3",
    "homepage" : "brdfat:documentation.html"
  } ],
  "@context" : {
    "label" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#label"
    },
    "versionInfo" : {
      "@id" : "http://www.w3.org/2002/07/owl#versionInfo"
    },
    "homepage" : {
      "@id" : "http://xmlns.com/foaf/0.1/homepage",
      "@type" : "@id"
    },
    "title" : {
      "@id" : "http://purl.org/dc/elements/1.1/title"
    },
    "publisher" : {
      "@id" : "http://purl.org/dc/elements/1.1/publisher"
    },
    "description" : {
      "@id" : "http://purl.org/dc/elements/1.1/description"
    },
    "preferredNamespaceUri" : {
      "@id" : "http://purl.org/vocab/vann/preferredNamespaceUri",
      "@type" : "@id"
    },
    "creator" : {
      "@id" : "http://purl.org/dc/elements/1.1/creator",
      "@type" : "@id"
    },
    "license" : {
      "@id" : "http://creativecommons.org/ns#license",
      "@type" : "@id"
    },
    "contributor" : {
      "@id" : "http://purl.org/dc/elements/1.1/contributor"
    },
    "modified" : {
      "@id" : "http://purl.org/dc/elements/1.1/modified",
      "@type" : "http://www.w3.org/2001/XMLSchema#date"
    },
    "dtype" : "http://purl.org/dc/dcmitype/",
    "brdfat" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
  }
}

Moreover, it is difficult for me to reason about why the JSON-LD is formatted the way that it is. On the other hand, the Turtle file is much nicer to read and predict.

We had said earlier (though without a formal decision) that we want to use JSON-LD for data interchange, but it would make life a lot easier to use Turtle, if people were OK with that. Let me know what you think!
 

miguel.astudillo@...
 

Hi!

 

In the correspondence table group we struggled a bit when we had to move from Turtle to json-LD. We spend some time trying to figure out how to do it in JSON and ended up writing turtle. We found it easier to write and read and we were told there was an automatic code to translate one to the other. I prefer Turtle, but I am not aware of the advantages of JSON-LD.   

 

Best,

 

Miguel

 

 

 

From: main@bonsai.groups.io <main@bonsai.groups.io> On Behalf Of Chris Mutel
Sent: 05 April 2019 10:06
To: main@bonsai.groups.io
Subject: [bonsai] Serializing large LD datasets

 

Maybe our approach to serializing large graphs is maybe not that great. You can see the current code here - basically, we convert Python to JSON line by line, with some text mangling. It sounds (and looks) a bit crazy; the idea behind this decision was that RDFLib can't really handle large datasets, such as BONSAI.

The latest straw was realizing that we need to declare a `dataset` for the actual data (not just metadata). In turtle, this is (for example):


@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://creativecommons.org/ns#> .
@prefix dc: <
http://purl.org/dc/elements/1.1/> .
@prefix ns2: <http://purl.org/vocab/vann/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

brdfat: a dtype:Dataset ;
    ns1:license <http://creativecommons.org/licenses/by/3.0/> ;
    dc:contributor "BONSAI team" ;
    dc:creator <http://bonsai.uno/foaf/bonsai.rdf#bonsai> ;
    dc:description "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17" ;
    dc:modified "2019-04-02"^^xsd:date ;
    dc:publisher "bonsai.uno" ;
    dc:title "EXIOBASE 3.3.17 activity types" ;
    ns2:preferredNamespaceUri <http://rdf.bonsai.uno/activitytype/exiobase3_3_17/#> ;
    owl:versionInfo "0.3" ;
    foaf:homepage brdfat:documentation.html .

In JSON-LD, if is... more involved:

 


{
  "@graph" : [ {
    "@id" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
    "@type" : "dtype:Dataset",
    "license" : "http://creativecommons.org/licenses/by/3.0/",
    "contributor" : "BONSAI team",
    "creator" : "http://bonsai.uno/foaf/bonsai.rdf#bonsai",
    "description" : "ActivityType instances needed for BONSAI modelling of EXIOBASE version 3.3.17",
    "modified" : "2019-04-02",
    "publisher" : "bonsai.uno",
    "title" : "EXIOBASE 3.3.17 activity types",
    "preferredNamespaceUri" : "brdfat:#",
    "versionInfo" : "0.3",
    "homepage" : "brdfat:documentation.html"
  } ],
  "@context" : {
    "label" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#label"
    },
    "versionInfo" : {
      "@id" : "http://www.w3.org/2002/07/owl#versionInfo"
    },
    "homepage" : {
      "@id" : "http://xmlns.com/foaf/0.1/homepage",
      "@type" : "@id"
    },
    "title" : {
      "@id" : "http://purl.org/dc/elements/1.1/title"
    },
    "publisher" : {
      "@id" : "http://purl.org/dc/elements/1.1/publisher"
    },
    "description" : {
      "@id" : "http://purl.org/dc/elements/1.1/description"
    },
    "preferredNamespaceUri" : {
      "@id" : "http://purl.org/vocab/vann/preferredNamespaceUri",
      "@type" : "@id"
    },
    "creator" : {
      "@id" : "http://purl.org/dc/elements/1.1/creator",
      "@type" : "@id"
    },
    "license" : {
      "@id" : "http://creativecommons.org/ns#license",
      "@type" : "@id"
    },
    "contributor" : {
      "@id" : "http://purl.org/dc/elements/1.1/contributor"
    },
    "modified" : {
      "@id" : "http://purl.org/dc/elements/1.1/modified",
      "@type" : "http://www.w3.org/2001/XMLSchema#date"
    },
    "dtype" : "http://purl.org/dc/dcmitype/",
    "brdfat" : "http://rdf.bonsai.uno/activitytype/exiobase3_3_17/",
  }
}

Moreover, it is difficult for me to reason about why the JSON-LD is formatted the way that it is. On the other hand, the Turtle file is much nicer to read and predict.

We had said earlier (though without a formal decision) that we want to use JSON-LD for data interchange, but it would make life a lot easier to use Turtle, if people were OK with that. Let me know what you think!

 

Agneta
 

+1 for turtle format

Much easier to read and write. 

Massimo Pizzol
 

No opinion here, I trust those who have already worked hands-on on this, and their choice.

BR
Massimo

 

From: <main@bonsai.groups.io> on behalf of "Agneta via Groups.Io" <agneta.20@...>
Reply-To: "main@bonsai.groups.io" <main@bonsai.groups.io>
Date: Friday, 5 April 2019 at 10.56
To: "main@bonsai.groups.io" <main@bonsai.groups.io>
Subject: Re: [bonsai] Serializing large LD datasets

 

+1 for turtle format

Much easier to read and write.