Multi label pernicious comment classification using Machine Learning Algorithm


Jotaniya Hitesh Ghanshyambhai, Dr. Kinjal Adhvaryu


Millions of comments are sent day to day on internet, and there are many hidden pernicious comments in these contents. Manually identifying and detecting pernicious content in these comments are nearly impossible task. There is a need to use Machine Learning techniques to identify or to detect pernicious content present in their comments. Not only that there is a need to categories them in different levels like toxic, severe toxic, obscene, threat, insult, and identity-hate.

Thus, we address the task of automatically detecting pernicious comments in user generated texts. Predicting the comments contain any pernicious content or not. Also listing them into different labels.

The Challenge of Automatically Detecting of pernicious comments on huge set of Commented Data; and also classifying them into six different pernicious labels; like (toxic, severe toxic, obscene, threat, insult, and identity-hate) using various Machine Learning models or algorithms will be definitely beneficial to our social community.

Various Machine Learning algorithms BERT (Bidirectional Encoder Representations from Transformers), CNN, Bi-gram and K-fold cross validation is being applied; then the result is compared with its efficiency and accuracy; reports will be generated.


BERT (Bidirectional Encoder Representations from Transformers), CNN - convolutional neural network, KNN- k nearest neighbour, k-fold cross validation, NLP – Natural Language Processing.


