Finally we had our first in person event to meet all the ESRs and advisors of the DoSSIER project! On the first day, Suzan Verberne gave an interesting tutorial about Natural Language Processing with a focus on data preprocessing, data annotation process and Named Entity Recognition. In order to focus on the skills which most of us interest, Suzan did a questionnaire beforehand and selected the topics. The tutorial started with the introduction of common data preprocessing steps and also for which tasks which data preprocessing makes sense. Along with an exercise to compute the Levenshtein distance we concluded this chapter. Personally for me the most interesting part was the part about how to do annotation campaigns, which possible options for attaining labels there are and the final exercise with the interrater agreement. I think this lecture will give good guidance for all ESRs in the project who wang to do their own annotation campaign. Suzan ended the tutorial with the topic of Named Entity Recognition and with introducing first neural networks for this task.
You can find the slides of her presentation on her website of her course Text Mining which she teaches in Leiden University.