RDF enrichment with DEER
Linked Data enrichment is the process of adding, altering or deleting a set of triples of a source dataset in order to obtain an enriched version of the source dataset. This enriched dataset usually provides significant benefits for a specific use case scenario. These benefits include (but are not limited to) more data (quantity), better data quality, better data organization (refined ontology) and interoperability with other datasets (interlinking). Over the past few years, several frameworks for RDF data enrichment have been developed. Such frameworks provide enrichment methods such as entity recognition, link discovery and schema enrichment.
<link http: aksw.org projects deer>DEER is a linked data enrichment framework that provides two main types of artifact: atomic enrichment functions and atomic enrichment operators. Thus, in cases where a user knows the type of enrichment that is to be carried out, (s)he can define the sequence of enrichment functions/operators that must be used to process their dataset. The task of an atomic enrichment function is to determine the set of triples to be added, altered or deleted from the source dataset to generate the enriched dataset. Currently,<link http: aksw.org projects deer> DEER implements a set of atomic enrichment functions including dereferencing, linking, conformation, filtering and NLP. For instance, the idea behind the dereferencing enrichment function of<link http: aksw.org projects deer> DEER is to find enrichment data from interlinked datasets. That is, for a source dataset which contains owl:sameAs or similar links,<link http: aksw.org projects deer> DEER dereferences all links from this dataset to other datasets by using content negotiation on HTTP. The process ends by adding relevant information from the returned set of triples to the source dataset. The idea behind the enrichment operators is to enable users to define a workflow for processing their input dataset, such as splitting and merging of dataset(s). However, the enrichment functions implemented by<link http: aksw.org projects deer> DEER (a) require manual configuration, and (b) do not exploit geospatial and temporal features of the input datasets, such as the case of POIs.
POI Enrichment
During<link http: slipo.eu> the SLIPO project, we will adapt the<link http: aksw.org projects deer> DEER framework to effectively handle the enrichment of Point Of Interest (POI) data.<link http: aksw.org projects deer> DEER was designed to be a modular framework which can be easily extended and re-purposed. Therefore, in addition to using the available enrichment functions in<link http: aksw.org projects deer> DEER, we intend to extend them by implementing POI-related enrichment functions. These include: retrieving the location of a POI from a third party geolocation service, determining the validity of a certain POI at a particular time based on a given time stamp and grouping POIs into areas of interest. Currently,<link http: aksw.org projects deer> DEER implements a supervised machine learning approach for generating the aforementioned sequence of enrichment functions and operators that must be used to process the input dataset. Some limitations in the current supervised approach implemented in<link http: aksw.org projects deer> DEER are the usage of only one input dataset and the generation of one enriched dataset. In<link http: slipo.eu> SLIPO, we will extend this approach by enabling the<link http: aksw.org projects deer> DEER supervised approach to accept a set of input datasets and generate a set of output datasets. Also, we plan to apply unsupervised or weakly supervised approaches for the automatic detection of enrichment configuration for enriching POI data with a focus on its geo-spatial and temporal dimensions. Our approaches will not only aim to enrich POI, but also to provide the enriched data in a format suitable for industrial consumption.
<link http: aksw.org projects deer.html link-upb-extern>For more information