Indian students help detect hate speech in Twitter

24th April 2017
Indian students help detect hate speech in Twitter
Shashank Gupta, Pinkesh Badjatiya & Vasudeva Varma, Professor and Dean (R & D) at IIIT-Hyderabad

Students from IIIT Hyderabad bag top poster prize at WWW conference with their solution

By Anand Parthasarathy
It’s just  160 characters, but the reach and impact of a single tweet goes  a long  way.  With 313 million monthly active users, and 1 billion unique monthly visits to sites with embedded tweets in over 40 languages, its influence  huge -- even when the tweets are filled with hate.
Twitter founders  - and many professionals -- agonize over its potential danger for  being misused, misconstrued  or creating communal animosity.  Now,  students at the International Institute of Information Technology, Hyderabad (IIIT-H), have come up with a prize-winning solution.
Vasudeva Varma, Professor and Dean (R & D) at IIIT-H,  his students Pinkesh Badjatiya, Shashank Gupta and adjunct faculty Manish Gupta worked on this topic for close to a year and presented it at WWW2017 -- the world’s longest-running and most prestigious web conference -- at Perth, Australia, earlier this month. Their  poster entitled "Deep Learning for Hate Speech Detection in Twitter" was voted as the best poster presentation among 166 submissions from around the world.
IIIT-Hyderabad’s Informational and Retrieval Extraction Lab (IREL) has been r applying natural language processing and semantics to develop an automated system usingArtificial Intelligence chatterbots that can detect hate speech in tweets. As of now they are able to detect abusive language, sexist and racist speech and flag offence content. This is especially useful in not only automatically filtering such content, but also analysing public sentiment to get to the root of the problem through user generated content.
To detect hate speech, they use a popular approach in machine learning called supervised learning.  In essence, a computer algorithm is fed many examples of text from each form of hate, which can be categorised as ‘racist’ or ‘sexist’ tweets. The algorithm is designed in such a way that it ‘learns’ as it sees the data. The  programme  is smart enough to recognize racism or sexism in text, if it sees one. The algorithm uses neural networks, popularly called deep learning. These algorithms are inspired by  the human brain and try to simulate how humans learn from examples.
Some complexities remain to be addressed:  For example, how does natural language processing decipher the various forms of hatred, identify the targets of such hatred and deconstruct the double entendres or double meanings  of a language.  At IIIT-H the work continues even as the students and their guide bask in the glory of peer recognition at Perth.