"A detection method for mass-generated unnatural texts based on the topical structure analysis"
Pavlov A.S., Dobrov B.V.

Web spam is considered to be one of the greatest threats to modern search engines. Spammers use a wide range of algorithms to generate multiple unnatural texts. A new general model for texts generated from samples of natural texts is proposed. A new algorithm for detecting unnatural texts based on the topical structure analysis is also proposed. The proposed algorithm is evaluated on synthetic and real-world data.

Keywords: web spam, topical structure, modeling.

Pavlov A.S.   e-mail: pavvloff@gmail.com;
Dobrov B.V.   e-mail: dobroff@mail.cir.ru