Sentiment analytics - are we ready to move from text to multimode?
Today it is easy to give an opinion and there is no stopping us. Nearly one in every four people use social media and in some of the developing and developed countries, around 70% of the adult population spends more than 2 hours per day on the same. That's a lot of time to say a lot of things, things which matter to marketers. Hence companies are investing a lot into mining this huge amount of information on networking sites, blogs, and discussion forums to gain insights into underlying customer sentiment about their brand, customer service, new products, customer behavior and underlying needs.
Very often sentiment analytics is either considered as a topic within text analytics or text analytics is considered as the approach to sentiment analytics.This may be because, today, a large source of social online data is considered as text. But, the definition could be quite restrictive. More than 65000 videos are uploaded to YouTube alone per day and 6 billion hours of video are watched on it each month. More than 25000 photos are uploaded to Flickr every minute. With this huge number of videos and photos being shared and watched online, companies and analysts can no longer afford to ignore this data and stick to text alone.
If not statistics, we can also look to intuitive logic to build the case for looking beyond text. Sentiment to a layman would be polarity of an opinion. To judge the polarity, words are not enough, the emotion behind the words matter. And emotions are better reflected through tone and gesture. Remember what we learnt at college? "We communicate more effectively through body language than words" and that "a picture is worth a thousand words". Now, which is a better measure of sentiment - "how long a person smiled" in a video or "the number of smileys" in a transcript?
The time has come to move from a traditional single mode (text) approach to a multimode approach. But, are we ready for it? While, there is a lot of research done in labs/institutions to prove the feasibility and superiority of multimode approach, there is hardly any example in real life. One of the main challenges in non-textual data is the variation in representing the same meaning as against words, thus making it almost impossible to model. The other hurdle is that, even though a model is designed (example: Face detection), its applicability also depends upon the quality of the image. In real life people use home web cameras to upload videos or photos and there is an inherent lack of clarity in them. The third challenge is the size of the data, which would be many times more than a text scenario.
It seems we are far from ready, and the challenges are overwhelming. In order to progress on this, we need to take baby steps:
1. Focus on specific areas where a multimode approach can be a differentiator over text and bring in a lot more value, for e.g. lie detection, sarcasm identification, and real time voice analytics and demonstrate its usage in real life context.
2. Research on managing the huge size of data needs to continue.
3. While the ideal state is "reliable automated sentiment analytics on multimode data", we should look into innovative manual ways to do the same to enjoy the benefits now.
For example we could use crowdsourcing to distribute and manually categorize sentiment on a huge set of photos from Flickr or videos from you-tube or audio files from a call center over the internet. After all the quality of sentiment scoring a human being can do is likely to be superior to any algorithm based logic.
What do you think should be our way ahead?