Authors: Olivier de Vel, Malcolm Corney, Alison Anderson, and George Mohay (Queensland University of Technology)

DFRWS USA 2002

Abstract

We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for both author gender and language background cohort categorisation.

Downloads