Re: #correspondencetables : from raw to triplets #correspondencetables

Matteo Lissandrini (AAU)
 

In this case your example seems fine, you can still say that fbcl is subclass of POWC probably.
you can also say sameAs between POWN and Nuclear, assuming that the only way of producing electricity is by nuclear fission (in contrast to fusion?)

rdf:type doesn't apply when matching different activity types.

________________________________________
From: hackathon2019@bonsai.groups.io [hackathon2019@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io]
Sent: Monday, April 08, 2019 4:08 PM
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #correspondencetables : from raw to triplets

Thanks Matteo-

It is a bit tricky keeping the class definitions and instances inline
with the idea of `rdf:type` referring to multiple classes - could you
provide an alternative implementation of the example?

BTW, "Nuclear" is the label ENTSO-E uses in its API, short for
"production of electricity using nuclear fission".

On Mon, 8 Apr 2019 at 15:57, Matteo Lissandrini (AAU) <matteo@...> wrote:

Hi Chris,
have you checked the very useful examples here:
https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html

In general let's use subsclass of and rdf:type when we know it is a subset or an instance of, and let's use skos for "fuzzy" concepts.

ActivityType are classes, so you can say that something is a subclass of a specific activity type.

I'm not sure what should be just "Nuclear" in your model.

About automatic tools, usually they introduce uncertainty, but above all, they require an initial ground truth, otherwise we cannot understand if they are doing what we want them to do.

We do not have a first full version of the BONSAI data and system, trying to address automatic data cleaning &co. is more likely to introduce noise and slow down the project.
So I would say, let's get done with a MVP (minimum viable product) with some manual work that assures the highest quality and control (we can limit to just a portion of the tables).
Later on I will be happy to help you investigate more automatic tools, but I would say to do this when we will be able to compare to something we know to be right.


Cheers,
Matteo

---
Matteo Lissandrini

Department of Computer Science
Aalborg University

http://people.cs.aau.dk/~matteo











________________________________
From: hackathon2019@bonsai.groups.io [hackathon2019@bonsai.groups.io] on behalf of Chris Mutel via Groups.Io [cmutel=gmail.com@groups.io]
Sent: Monday, April 08, 2019 2:03 PM
To: hackathon2019@bonsai.groups.io
Subject: Re: [hackathon2019] #correspondencetables : from raw to triplets

@Matteo, Bo, Miguel; please comment and correct!

Defining correspondence tables in RDF

Based on my reading of https://www.w3.org/TR/skos-reference/, I created the following:

@prefix bont: <http://ontology.bonsai.uno/core#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://rdf.bonsai.uno/activitytype/exiobase3_3_17/A_POWC> a bont:ActivityType ;
skos:prefLabel "Production of electricity by coal" .
skos:altLabel "A_POWC" .
skos:narrowMatch <http://rdf.bonsai.uno/activitytype/entsoe/fbcl> .

<http://rdf.bonsai.uno/activitytype/entsoe/fbcl> a bont:ActivityType ;
skos:prefLabel "Fossil Brown coal/Lignite" .
skos:broadMatch <http://rdf.bonsai.uno/activitytype/exiobase3_3_17/A_POWC> .

<http://rdf.bonsai.uno/activitytype/exiobase3_3_17/A_POWN> a bont:ActivityType ;
skos:prefLabel "Production of electricity by nuclear" .
skos:altLabel "A_POWN" .
skos:exactMatch <http://rdf.bonsai.uno/activitytype/entsoe/nuke> .

<http://rdf.bonsai.uno/activitytype/entsoe/nuke> a bont:ActivityType ;
skos:prefLabel "Nuclear" .

This has been very helpful for me, as it has helped build a mental model of how to express hierarchical relations, codes, etc. For sure, I have made mistakes though!

Outstanding questions:

1. It is unclear to me whether or not `narrowMatch` and `broadMatch` are transitive.
2. Do we need to declare `narrowMatch` and `broadMatch`?
3. Can we drop `rdfs:label` completely in favor of `skos:prefLabel`?
4. Do we agree on using `skos:altLabel` for codes?
5. Partial overlaps, as mentioned by Bo. There are possibilities to describe this in SKOS, but I don't know what approach is best.

Next steps for correspondence tables repo

I still think that the first step should be getting all the basic data (labels, codes, and URIs) into arborist, followed by the official correspondence lists using the above format. The example that Miguel posted should never be needed (A -> C, when we knew A -> B and B -> C), as we should be able to get this transitive relationship "automatically" though SPARQL queries (and we need to learn how to write these queries in any case).

We can then proceed with our own self-generated correspondences; there are a number of libraries to help with this besides fuzzywuzzy (though it does have the best name :)

https://recordlinkage.readthedocs.io/en/latest/about.html
https://github.com/dedupeio/dedupe
https://github.com/kvh/match
https://pypi.org/project/py_entitymatching/


Some research and trial phases would be necessary before picking any particular approach.




--
############################
Chris Mutel
Technology Assessment Group, LEA
Paul Scherrer Institut
OHSA D22
5232 Villigen PSI
Switzerland
http://chris.mutel.org
Telefon: +41 56 310 5787
############################

Join hackathon2019@bonsai.groups.io to automatically receive all group messages.