Resources
Below are some resources available for download. If you find them useful, please cite the relevant paper, mentioned below the resource.
Behaviour-based Annotation Tool (BB-AT) An annotation tool that captures time taken per word for a POS tagging task.
Kanishka Jain and Ashwini Vaidya. Using fortuitous data to improve annotation reliability. Annual Conference in Cognitive Science, Delhi, India. 2022.Hindi Word Similarity dataset (RG-65)
Word similarity ratings for the translated version of the English RG-65 (Rubenstein and Goodenough, 1965) dataset.
Relevant citation: Bhatia Kushagra, Divyanshu Aggarwal and Ashwini Vaidya. 2021. Fine-tuning distributional semantic models for closely related languages. In the Proceedings of VarDial- Eighth Workshop on NLP for Similar Languages, Varieties and Dialects at EACL 2021 pdfReaction Time data for Malayalam words in a lexical decision task zip file
The data consists of 100 words (50 words, 50 pseudowords) collected from 37 native speakers. For details about the words and the speakers, refer to the paper linked below.
Relevant citation: Richard Shallam and Ashwini Vaidya. A metric for lexical complexity in Malayalam. Proceedings of ICON 2019, Hyderabad. pdf