Welcome to the Data Science Junior Research Group

The Data Science Junior Research Group (DS-JRG) investigates research questions in a data-driven manner. Among the main research topics of the group are explainable artificial intelligence and causal artificial intelligence for knowledge graphs. Past research focused on vandalism detection in knowledge graphs.

Click here for more details on courses and publications.

Explainable Artificial Intelligence (XAI)

Explainable Artificial Intelligence deals with the explanation of machine learning models. XAI Research that members of the Data Science Junior Research Group have particularly contributed to includes the following.

Our research focuses on learning concepts in description logics from positive and negative nodes in a knowledge graph. The concepts serve a explainable, white-box models able to make new predictions in a transparent way. Concepts can be learned with neural networks (ESWC 2022) or with evolutionary algorithms and random walks (CIKM 2022). We also study how to transform tabular data into a knowledge graph either semi-automatically (KCAP 2021) or fully automatically (ESWC 2022).

We study ways to explain the predictions of graph neural networks.

Causal Artificial Intelligence (CAI)

Causal knowledge is seen as one of the key ingredients to advance artificial intelligence. Yet, few knowledge bases comprise causal knowledge to date. To close this gap, we compiled CauseNet (CIKM 2020), a large-scale knowledge base of claimed causal relations between causal concepts. It contains more than 11 million causal relations extracted from the web.

At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct CausalQA, a benchmark corpus of 1.1 million causal questions with answers (COLING 2022).

Vandalism Detection

Wikidata is a large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information.

We constructed the large-scale Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism detection in knowledge bases (SIGIR 2015). Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia.

Wikidata is a large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity.Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. We developed new machine learning-based approach to detect vandalism in Wikidata (CIKM 2016). Moreover, we organized the WSDM Cup 2017 (WSDM 2017) - a data science challenge with the task of vandalism detection in Wikidata.

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them (WWW 2019).

News

The following full paper co-authored by a member of the data science junior research group has been accepted for publication at The Web Conference (WWW): Causal Question Answering with Reinforcement Learning Lukas Blübaum, Stefan Heindorf Preprint: https://arxiv.org/abs/2311.02760

Read more

The following short paper co-authored by a member of the data science junior research group has been accepted for publication at the 32nd ACM International Conference on Information and Knowledge Management  (CIKM): Accelerating Concept Learning via Sampling Alkid Baci, Stefan Heindorf Paper: https://ris.uni-paderborn.de/download/46575/46577/baci2023_CIKM.pdf Code: https://github.com/alkidbaci/OntoSample

Read more

The following full papers co-authored by a member of the data science junior research group have been accepted for publication at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD): LitCQD: Multi-Hop Reasoning in Incomplete Knowledge Graphs with Numeric Literals Caglar Demir, Michel Wiebesiek, Renzhong Lu, Axel-Cyrille Ngonga Ngomo, Stefan Heindorf Preprint: https://arxiv.org/a…

Read more

Contact

business-card image

Dr. Stefan Heindorf

Data Science Junior Research Group

Junior Research Group Leader Data Science

Write email