CEDAL

Zurück zur News-Liste

The Acronym CEDAL means Consistency Error Detection Algorithm, which is an approach to finding erroneous links across knowledge bases by reusing unique URI semantics within given knowledge bases. In this novel time-efficient algorithm, the error detection consists of finding distinct resources (i.e., resources with distinct URIs) which share the same dataset, given an RDF graph representing the union of all knowledge bases in a link repository.

A typical scenario using CEDAL is as follows: suppose that we have two datasets (D1 and D2) about geographic regions, and a linkset (L1) relating those data. As expected, L1 contains a link between equal resources such as D1:Leipzig and D2:Leipzig. However, within L1, we also find a link stating that D1:Dresden is the same as D2:Leipzig, whereas Leipzig and Dresden are two different cities in Germany. By virtue of transitivity, this error can happen in several linksets. Errors in these links can lead to applications failing completely and CEDAL is able to detect this kind of error.

CEDAL can detect two types of error, Semantic Accuracy and Consistency & Conciseness.

The Algorithm works in six steps:

As input, the algorithm receives a set of linksets.
The linksets are merged creating a unique RDF graph.
From the merged linksets, clusters are created containing the resources, datasets and knowledge base.
Cases are found in which two or more resources belong to the same dataset.
These resources, related paths, dataset names and knowledge-bases are put into a list and returned to the user as output.
The output includes all paths that were considered wrong and the original mappings.

The contributions of this work can be enumerated in five elements: (1) A time-efficient algorithm for the detection of erroneous links in large-scale link repositories without computing all closures required by the property axiom. (2) An approach that brings the possibility of tracking consistency problems inside link repositories. (3) A scalable algorithm that works well in a parallel and non-parallel mode. (4) A study case applied to a link repository called LinkLion. (5) A new linkset quality measure based on the number of erroneous candidates.

Contact: Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo
valdestilhas@informatik.uni-leipzig.de
tsoru@informatik.uni-leipzig.de
axel.ngonga@upb.de
Github repository: <link https: github.com dice-group cedal>github.com/dice-group/CEDAL
Paper: <link https: svn.aksw.org papers wi_cedal public.pdf>svn.aksw.org/papers/2017/WI_CEDAL/public.pdf
Presentation slides: <link https: www.slideshare.net andrvaldestilhas cedal-slides-web-inteligence-2017>www.slideshare.net/AndrValdestilhas/cedal-slides-web-inteligence-2017