Milestone: Data and workflow reached first stable and tested version #toolbox #ontology #rdf #exiobase #dataliberation
Matteo Lissandrini (AAU)
"annuntio vobis gaudium magnum", a.k.a., great news!
The BONSAI RDF codebase reached a quite stable and tested stage (more details are provided at the end of the email, don't miss them!) and we have published all the data in the endpoint alongside an updated ontology and a nice interface to query the data.
This result is thanks to great teamwork by many within BONSAI, but I would say is mainly thanks to the tireless and competent supervision by Agneta and the copious sweat of the fingertips of Emil.
Please join me in thanking them for this notable effort.
Now comes the great opportunity by BONSAI to capitalize on this: test the queries, write new queries, try to improve the tools, introduce a new dataset, but first of all maybe start with adding more examples and tutorials for newcomers.
So, why am I so happy? What do we have now?
- The data is *fully* reachable at odas.aau.dk
* the endpoint contains all the data (more on that below)
* we moved from Jena to Virtuoso, still opensource, but much more performant
* there is a nice GUI that allows to run a number of example queries (also called competence queries)
- Thanks to the competence queries and the GUI Agneta found out that the USE/SUPPLY tables in Exiobase were not square (Agneta can explain this in the correct details) what is important for me to highlight is that our improved accessibility to the data allowed to rectify a fundamentally wrong assumption we had about the Excel files. Huzza for Open Data and Accessibility to the data
- About the data, we now have exported and host _2_ datasets: Exiobase (+emissions) and YSTAFDB, both matching to the same ontology
- The ontology has now moved to v2 since we support now balanceable properties (and also we adopted a more intuitive terminology)
- We introduced in the extraction code (and consequently in the published data) annotation for provenance. Now data providers integrating new datasets can ensure traceability of each piece of information.
- This has resulted in various improvement and extentions of the Arborist code.
- Also, with the inclusion of the YSTAFDB we have tested the workflow required to incorporate new datasets.
- Tomas is working on the final piece of the puzzle to transform this Open Data in truly Linked Open Data.
As a result of this process we submitted an experiences paper to the international conference of semantic web to share the challenges and the lessons learned during this effort (fingers crossed the reviews are out soon!)
My understanding is that Agneta is also leading a paper writing for a LCI journal about this resource, I will let her update us on the status of that other paper. Nonetheless, I think we should definitely publicize this resource as much as possible.
With this milestone reached, Emil will have to move to a different project, so at least in the short term he will not be able to spend any more time on this (actually he already delayed other pressing matters in order to complete all the tasks here and some extra that we did not foresee, thanks Emil!), but I will be always available for questions and I know where he sits in case, so please contact me instead for SW related issues ;-)
Hence, once more: please play with the GUI, run some queries, write your queries, read the documentation.
Check open issues or open new ones and then try to fix them.
Especially if you see ways to improve the documentation, give it a try yourself and ask me or Agneta to review your work, this is the best way to get to know what we have! This has to be a community effort now.
This is the opportunity for anyone interested to start playing with this rich resource and keep improving it.
Again, great work, let's keep progressing and opening more data to the world.
Department of Computer Science