Options
Authorship Attribution of Scientific Abstracts
Journal
Proceedings - International Conference on Pattern Recognition
ISSN
10514651
Date Issued
2022-01-01
Author(s)
Suman, Chanchal
Saha, Sriparna
Bhattacharyya, Pushpak
Abstract
Plagiarism detection, and identifying the author(s) of scholarly articles are important for determining the actual credit of author(s). Authorship attribution of long texts mostly depends on manual feature engineering for the extraction of relevant features. In this paper, we have developed a capsule-based convolutional neural network (CNN) architecture, which takes term frequency-inverse document frequency (TF-IDF) vector as input, for solving the task of authorship attribution. The TF-IDF vectors are used to represent the articles. The generated TF-IDF vector is fed to a CNN framework, which uses a capsule for learning the spatial features. Our proposed approach reduces the extra effort of hand-crafted feature extraction. The developed model is tested on the created scholarly abstract-based dataset. From the obtained results, it can be concluded that TF-IDF based capsule-CNN method outperforms previous approaches which had utilized hand-engineered features with the neural architectures. Different multi-tasking-based models are also developed to analyze the effect of category information in the authorship prediction task. From the results, it is revealed that multi-tasking-based models yield comparative results as compared to the single-tasking-based system. To illustrate the relevance of our developed system, different qualitative analyses have also been carried out, which clearly reveals the effectiveness of utilizing category information, and capsule networks. The source codes for the proposed approach will be shared after the acceptance.
Volume
2022-August