Date
1 - 5 of 5
#infrastructure New working group and practice guidelines #infrastructure
Dear all-
As many of you have already realized, we need to organize and document our infrastructure a bit better. Specifically, I see a need for:
|
|
Yes, I'll coordinate this.
|
|
Matteo Lissandrini (AAU)
Hi Chris,
I can imagine that some of my tests may have deleted some data from the database. Sorry for that. To be honest, I was under the impression that the data in jken
I would expect the database to be wiped out regularly until we reach a stable status.
But this should not be an issue.
I would like to help in establishing the (automatic) workflow that collects the data (in NON RDF formats) and parses and merges it with the ontology and the contents of the /rdf repo so that we can easily wipe and redeploy the Jena instance at will.
I believe this will require a coordination between the arborist repo, the rdf repo, the importer and probably some other?
Probably in the triplestore repo?
Thanks,
Matteo
From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel@...]
Sent: Thursday, April 04, 2019 10:34 AM To: main@bonsai.groups.io Subject: [bonsai] #infrastructure New working group and practice guidelines Dear all-
As many of you have already realized, we need to organize and document our infrastructure a bit better. Specifically, I see a need for:
|
|
On Thu, 4 Apr 2019 at 13:20, Matteo Lissandrini (AAU) <matteo@cs.aau.dk> wrote:
No problem, this is to be expected as we are still evolving the schema, and making sure our RDF is valid and implemented properly. However, at some point soon we should get to a point where the Aalborg server is considered stable, while db.b.u is still for playing. It actually isn't that easy to restore everything, as we need a relatively large amount of data currently (on the order of 3 gb for EXIOBASE, and 300 mb for the electricity stuff). The metadata is easy - arborist can rewrite the data in https://github.com/BONSAMURAIS/rdf, which can in turn be the foundation of the triple store. It would be nice to have a function that would take all these small turtle files and merge them into one file (which could then be uploaded to the triple store). In the medium-term, I don't think that it makes sense to store metadata for specific databases like exiobase in arborist - this can just as easily be part of the file including the actual data as well. We only evolved this code pathway because we were learning as we were going. Indeed, it is probably more clever in the long term to have https://github.com/BONSAMURAIS/rdf generated from the database itself. I think the small importer you wrote will work fine for smaller datasets, but we will need to do file uploads for larger ones, as they won't fit into memory (to be loaded by RDFLib). This should be easy to do, though there may be some Jena configuration bugs to work out still. So everything is in a bit of a flux, and it would be great if you could take charge of this little bit of it! Please document the hell out of stuff, so we don't have to bug you too much. Probably in the triplestore repo? -- ############################ Chris Mutel Technology Assessment Group, LEA Paul Scherrer Institut OHSA D22 5232 Villigen PSI Switzerland http://chris.mutel.org Telefon: +41 56 310 5787 ############################
|
|
Matteo Lissandrini (AAU)
Hi Chris,
my importer is actually doing the file upload, this is the command I ran yesterday night ```bash for f in `find ../rdf -name '*.ttl'`; do bseeder -i $f; done ``` So you do not need to merge files in /rdf repo, actually if you do that you end up with a big problem: you lose track of which triples go in which named graph. In my view the RDF repo is for the instances of the taxonomies, small datasets that changes slowly (e.g., flow object/items or activity types). While the actual data would remain out of it. For very big files what we can do is: 1) upload them via scp/rsync to a dedicated directory on the server, 2) use the file importer utility provided by jena itself I understand that restoring is not easy, but we need to have it for reproducibility and for reliability (if bad things happen we may need to restore the database from scratch) Cheers, Matteo ________________________________________ From: main@bonsai.groups.io [main@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io] Sent: Thursday, April 04, 2019 1:43 PM To: main@bonsai.groups.io Subject: Re: [bonsai] #infrastructure New working group and practice guidelines On Thu, 4 Apr 2019 at 13:20, Matteo Lissandrini (AAU) <matteo@cs.aau.dk> wrote: No problem, this is to be expected as we are still evolving the schema, and making sure our RDF is valid and implemented properly. However, at some point soon we should get to a point where the Aalborg server is considered stable, while db.b.u is still for playing. It actually isn't that easy to restore everything, as we need a relatively large amount of data currently (on the order of 3 gb for EXIOBASE, and 300 mb for the electricity stuff). The metadata is easy - arborist can rewrite the data in https://github.com/BONSAMURAIS/rdf, which can in turn be the foundation of the triple store. It would be nice to have a function that would take all these small turtle files and merge them into one file (which could then be uploaded to the triple store). In the medium-term, I don't think that it makes sense to store metadata for specific databases like exiobase in arborist - this can just as easily be part of the file including the actual data as well. We only evolved this code pathway because we were learning as we were going. Indeed, it is probably more clever in the long term to have https://github.com/BONSAMURAIS/rdf generated from the database itself. I think the small importer you wrote will work fine for smaller datasets, but we will need to do file uploads for larger ones, as they won't fit into memory (to be loaded by RDFLib). This should be easy to do, though there may be some Jena configuration bugs to work out still. So everything is in a bit of a flux, and it would be great if you could take charge of this little bit of it! Please document the hell out of stuff, so we don't have to bug you too much. Probably in the triplestore repo? -- ############################ Chris Mutel Technology Assessment Group, LEA Paul Scherrer Institut OHSA D22 5232 Villigen PSI Switzerland http://chris.mutel.org Telefon: +41 56 310 5787 ############################
|
|