Authorship Style Analysis of Textual Data using Character N-Grams

Vlado Kesljj
Professor of Computer Science
Dalhousie University, Halifax NS, Canada
SERC 306
Thursday, May 25, 2017 - 11:00
The main problem of authorship attribution is the task of detecting the author of a textual document of unknown author, with a high confidence, based on the samples of writing of various candidate authors. We will present some work based mainly on a character n-gram similarity measure, called CNG, which performed very successfully at several datasets. Our most recent work includes a visual analytics tool RNG-Sig for investigation of difference and similarities between text documents, inspired by spectral analysis visualization techniques. We also investigate the influence of document topics in authorship attribution task, and present some recommendations for neutralizing this influence, and propose a method to alleviate this form of classification bias.