Given a text, it may be useful to determine the age, gender, native language, nationality, personality and other demographic attributes of its author. This task is called author profiling, and has been studied by different areas, especially from linguistics and natural language processing, by extracting different content- and style-based features from training documents and then using various machine learning approaches.
In this paper we address the author profiling task by using several compression-inspired strategies. More specifically, we generate different models to identify the age and the gender of the author of a given document without analysing or extracting specific features from the textual content, making them style-oblivious approaches.
We compare and analyse their behaviour over datasets of different nature. Our results show that by using simple compression-inspired techniques we are able to obtain very competitive results in terms of accuracy and we are orders of magnitude faster for the evaluation phase when compared to other state-of-the-art complex and resource-demanding techniques.