Introduction to Text Mining

Course, bachelor, winter 2018, L.079.05534

General

Lecture

  • Instructor. Henning Wachsmuth
  • Location. O2
  • Time. Thursday, 11am – 2pm, s.t.
  • First date. October 11, 2018, c.t.
  • Last date. January 31, 2018.

Tutorial

  • Instructor. Milad Alshomary
  • Location. H3
  • Time. Monday, 11am – 1pm, s.t. 
  • First date. October 15, 2018
  • Last date. January 28, 2018

Announcements

  • Lecture part X will be dropped. As a result, students who get only 4 ECTS points will not be examined on part IX (previously, part X was planned to be left out).
  • The application phase for the first exam phase (Feb–Mar 2019) is closed. For exam dates as of April 2019, please contact Henning Wachsmuth directly.

Description

This course teaches students all major skills needed to approach typical tasks in the analysis of natural language text. Starting from fundamentals of linguistics and statistics, the lecture gives an overview of several text analyses and covers a selection of them in detail. Both rule-based and statistical techniques are discussed, among the latter standard approaches from machine learning.

The students learn both theoretically and practically to design, implement, and evaluate text analysis algorithms for given tasks. Besides the topical content, the lecture aims to educate students in how to conduct scientific experiments and how to employ large datasets in experiments.

Lectures

The course will cover lectures the following topics. The slides from each lecture will be put here, usually soon after the respective lecture has taken place:

  • I. Overview (slides)
  • II. Basics of Linguistics (slides)
  • III. Text Mining using Rules (slides)
  • IV. Basics of Empirical Research (slides)
  • V. Text Mining using Grammars (slides)
  • VI. Basics of Machine Learning (slides
  • VII. Text Mining using Clustering (slides)
  • VIII. Text Mining using Classification and Regression (slides)
  • IX. Practical Issues (slides)
    dropped: X. Text Mining using Sequence Labeling 

Slides with meta-information:

  • Organizational info from first lecture (slides)
  • Comments on the teaching evaluation (slides)
  • Organizational info on exam dates (slides)
  • Organizational info on exam application (slides)

Tutorials

The tutorials will cover the following topics. The slides from each tutorial will be put here, usually soon after the respective lecture has taken place:

  • Oct 15: Introduction to Python I (slides)
  • Oct 22: Introduction to Python II (slides) and to assignment sheet 1 
  • Oct 29: Discussion of solutions of assignment sheet 1
  • ...

Assignments

The course includes six assignment sheets in total that are published bi-weekly. Each sheet consists of written tasks as well as Python programming tasks.

  • Assignment sheet 1 (published on Oct 18, submission until Oct 28)
  • Assignment sheet 2 (published on Nov 1, submission until Nov 11)
  • Assignment sheet 3 (published on Nov 15, submission until Nov 25)
  • Assignment sheet 4 (published on Nov 29, submission until Dec 9)
  • Assignment sheet 5 (published on Dec 13, submission until Jan 6 2019)
  • Assignment sheet 6 (published on Jan 10, submission until Jan 20 2019)

For all programming tasks, we provide a Python notebook containing template code (to help you start the task) that you fill up with your solutions. More information on how to use Python notebooks will be presented in the first tutorial.

Submission

Group submissions of up to three people are allowed. The deadline of the submission is always at 23:59 (UTC+1) on the respective day.

Please, submit your assignments via email to Milad Alshomary as a ZIP archive containing a .pdf file for the written part and .ipynb (python notebook) for the programming part. 

Please provide your last names and student numbers with your solutions as follows: <last name>-<student number>-tm-assignment<assignment number>.zip, for example, "meier-1234567-schulz-2345678-tm-assignment1.zip".

Assignments grades

Student grades are updated in the following file (link removed). Also, detailed assessments of student solutions are given in the tutorials.

Exam

An oral exam has to be taken in order to pass the course. Tentatively, the first round of oral exams will take place in February 2019. You need to register for taking the exam on PAUL as usual.

Students who get only 4 ECTS points for their course will not be examined on lecture part IX. In addition, each of these students can freely choose one of the other nine parts to be excluded from the exam.

Important: Each student needs to obtain at least 50% of all assignment points in order to be allowed to take the exam.