Welcome to the Data Science Junior Research Group
The Data Science Junior Research Group (DS-JRG) investigates research questions in a data-driven manner. Among the main research topics of the group are explainable artificial intelligence and causal artificial intelligence for knowledge graphs. Past researched focused on vandalism detection in knowledge graphs.
Click here for more details on courses and publications.
Explainable Artificial Intelligence (XAI)
Explainable Artificial Intelligence deals with the explanation of machine learning models. XAI Research that members of the Data Science Junior Research Group have particularly contributed to includes the following.
- Explainable White-Box Models
Our research focuses on learning concepts in description logics from positive and negative nodes in a knowledge graph. The concepts serve a explainable, white-box models able to make new predictions in a transparent way. Concepts can be learned with neural networks (ESWC 2022) or with evolutionary algorithms and random walks (CIKM 2022). We also study how to transform tabular data into a knowledge graph either semi-automatcially (KCAP 2021) or fully automatically (ESWC 2022).
- Explainable Black-Box Models
We study ways to explain the predictions of graph neural networks.
Causal Artificial Intelligence (CAI)
- Causality Graphs
Causal knowledge is seen as one of the key ingredients to advance artificial intelligence. Yet, few knowledge bases comprise causal knowledge to date. To close this gap, we compiled CauseNet (CIKM 2020), a large-scale knowledge base of claimed causal relations between causal concepts. It contains more than 11 million causal relations extracted from the web.
- Causal Question Answering
At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct CausalQA, a benchmark corpus of 1.1 million causal questions with answers (COLING 2022).
Wikidata is a large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information.
- Vandalism Corpus
We constructed the large-scale Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism detection in knowledge bases (SIGIR 2015). Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia.
- Vandalism Detection
Wikidata is a large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity.Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. We developed new machine learning-based approach to detect vandalism in Wikidata (CIKM 2016). Moreover, we organized the WSDM Cup 2017 (WSDM 2017) - a data science challenge with the task of vandalism detection in Wikidata.
- Debiasing Vandalism Detection Models
Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them (WWW 2019).
Dr. Stefan Heindorf
Data Science (JRG)
Junior Research Group Leader Data Science