Research

The CSS Group does research on natural language processing, with a focus on topics related to computational argumentation and computational sociolinguistics. Recently, we also increasingly study educational and explainable artificial intelligence. Past research tackled text mining processes in general. Details are given in the following.

Computational Argumentation (CA)

Computational argumentation deals with the computational analysis and synthesis of natural language arguments and argumentation, usually in an empirical data-driven manner. Computational argumentation research that members of the CSS Group have particularly contributed to includes the following. Many of the outcomes of the group are or will be demonstrated in the argument search engine args.me.

Argument Generation

In our argument generation research, we focus on understanding the core of an argument by inferring its conclusion (ACL 2020), as well as extracting snippets representing the most important claim and the reason behind (SIGIR 2020). We also study how to encode beliefs into arguments (EACL 2021), how to summarize a long argument down to its conclusion (ACL Findings 2021a), and how to undermine arguments with counter-arguments (ACL Findings 2021b).

Argument Quality

Starting from a literature survey, we defined a taxonomy of argumentation quality (EACL 2017), followed by an empirical comparison of theory and practice (ACL 2017). We developed computational approaches to assess specific quality dimensions, such as a machine learning regressor that uses argument mining for argumentation-related essay scoring (COLING 2016) and a PageRank adaptation for argument relevance (EACL 2017). Further, we developed a new notion of argument quality (CoNLL 2018) and assessed it for the different audience (ACL 2020). Recently, we have started to study what can be learnt about quality from argument revisions (EACL 2021).

Argument Search

With args.me, we introduced the first search engine for arguments on the web (ArgMining 2017). The mentioned assessment of argument relevance (EACL 2017) may be used for ranking found arguments. Moreover, we developed a computational approach to retrieve counterarguments to arguments based on their simultaneous similarity and dissimilarity (ACL 2018).

Computational Sociolinguistics (CSL)

Computational Sociolinguistics, as a subfield of computational social science, investigates research questions from the social sciences through empirical analyses of natural language data. Input data includes, but is not limited to, social media text, online activities, and socio-cultural key indicators. The focus is often on insights into social phenomena and dynamics rather than the technologies behind. The CSS group is doing computational sociolinguistics research on the following topics, partly in the context of the research program Digital Future.

Social Bias Analysis and Mitigation

Social bias can emerge from pre-existing beliefs about the characteristics of group members of any social group, often leading to prejudices and discrimination. Language can be a major factor in carrying and reinforcing those biases, causing NLP models that learn from it through texts to inherit those biases. Triggered by our project on bias in AI models, our research focuses on analyzing the bias in texts to understand the possible influence on downstream systems. More information can be found in our paper (ArgMining 2020). We are currently adding to this research by evaluating possible relations between social and political bias, as well as taking a closer at the methods that are widely used to detect biases in NLP systems (IJCAI 2021).

Media Bias Analysis and Mitigation

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. To solve this problem with NLP techniques, we developed machine learning models to analysis the media bias. Especially, we focused on (1) learning about the relation between sentence-level and article-level bias (EMNLP Findings 2020), and (2) studying at what granularity level and how sequential patterns media bias is manifested (NLP+CSS 2020).

Regarding bias mitigation, we studied the task of “flipping” the bias of news articles: Given an article with a political bias (left or right), generate an article with the same topic but opposite bias. We created a corpus with bias-labelled articles from allsides.com. As a first step, we analyze the corpus and discuss intrinsic characteristics of bias. They point to the main challenges of bias flipping, which in turn lead to a specific setting in the generation process. We applied an autoencoder incorporating information from an article’s content to learn how to automatically flip the bias. More about this research can be found in our paper (INLG 2018).

Communication in Crowdworking

Crowdworking is one phenomenon of the digitization of the society. Online platforms provide connections between requesters of tasks and workers who solve the tasks in an anonymous and distant manner. However, the quality of crowdworking is affected by communication problems in the task design, operation, and evaluation. To learn about the workers' side, we compared existing research on problems in crowdworking with complaints mined from a workers' online discussion forum (COLING 2020). The findings from this study form the basis for our research on how to improve crowdsourcing.

Educational and Explainable Artificial Intelligence

Educational applications of natural language processing and other AI techniques include the computational support of writing studying learner-specific characteristics. Explainability expresses the desire to make a system’s behavior intelligible and thus controllable by humans. The CSS group is investigating respective topics more and more deeply, with a focus on the analysis and generation of natural language explanations. The following topics are in the focus in this regard.

Generation of User-Adapted Explanations

We study the generation of used-adapted explanations, among others, in the context of our project in the CRC 901 "On-the-Fly Computing". Groundwork for explanations that adapt to the language of the user includes the modification of stylistic bias (INLG 2018), the generation of specific information (ACL 2020), and the encoding of beliefs into generated text (EACL 2021).

Computational Writing Support

In the context of our research project ArgSchool, we study how to support learning to write texts through computational methods that provide developmental feedback. These methods assess and explain which aspects of a text are good, which need to be improved, and how to improve them, adapted to the student’s learning stage. More information will follow in the next months.

Assessment of Explanation Quality

In the first of our projects within TRR 318 "Constructing Explainability", we study the pragmatic goal of all explaining processes: to be successful — that is, for the explanation to achieve the intended form of understanding (enabling, comprehension) of the given explanandum on the explainee's side. In particular, we investigate the question as to what characteristics successful explaining processes share in general, and what is specific to a given context or setting. Details on this topic will follow soon.

Modeling of Metaphorical Explanations

In the second of our projects within TRR 318 "Constructing Explainability", we study the metaphorical space established by different metaphors for one and the same concept. We seek to understand how metaphors foster understanding through highlighting and hiding, anw we aim to establish knowledge about when and how metaphors are used and adapted in explanatory dialogues. Details on this topic will follow soon.

Text Mining

Text mining deals with the automatic or semi-automatic discovery of new, previously unknown information of high quality from large numbers of unstructured texts. The types of information to be inferred from the texts are usually specified beforehand, i.e., text mining tackles given tasks. Text mining research that members of the CSS group have particularly contributed to includes the following.

Ad-hoc Design of Text Analysis Pipelines

We have developed a method that creates a text analysis pipeline ad-hoc in near-zero time for a specified information need along with a quality prioritization using partial order planning and greedy-best first search (CICLing 2013). In addition, we can automatically equip any such pipeline with an assumption-based truth maintenance system that systematically ensures that each algorithm in a pipeline analyzes only potentially relevant portions of text (CIKM 2013).

Efficiency Optimization of Text Analysis Pipelines

We developed a process to optimize the run-time efficiency of any text analysis pipeline (CIKM 2011). Based on the assumption-based truth maintenance system mentioned above, we found the theoretically optimal pipeline schedule using dynamic programming (COLING 2012). It depends on the run-time and found information of each employed text analysis algorithm. These values are not known beforehand, which is why an informed best-first search scheduling approach is more preferable in practice (LNCS 9383). In case the input texts to be process are heterogenous, an adaptive scheduling is needed, which we have realized with self-supervised online learning (IJCNLP 2013).

Research

Computational Argumentation (CA)

Computational Sociolinguistics (CSL)

Educational and Explainable Artificial Intelligence

Text Mining

Research pages

Research
Publications
Data
Software
Projects

Re­search

Com­pu­ta­tion­al Ar­gu­ment­a­tion (CA)

Com­pu­ta­tion­al So­ci­o­lin­guist­ics (CSL)

Edu­ca­tion­al and Ex­plain­able Ar­ti­fi­cial In­tel­li­gence

Text Min­ing

Research pages

Research Publications Data Software Projects

Research

Computational Argumentation (CA)

Computational Sociolinguistics (CSL)

Educational and Explainable Artificial Intelligence

Text Mining

Research
Publications
Data
Software
Projects