Sentiment Analysis with AWS Comprehend and Python

Machine Learning is a hot topic recently. Even our clients are asking about various possibilities more and more often, even if they are not yet sure of what they can achieve. One of the interesting areas is Natural Language Processing.

Amazon Comprehend

It is not easy to prepare and train an own model to handle jobs related to natural language. Thankfully there are pre-trained engines you can use and I will focus on Amazon Comprehend because our company is using mostly AWS solutions. You can find similar solutions at Google (Cloud Natural Language API) or IBM (Watson Natural Language Understanding).

Amazon Comprehend is able to provide Keyphrase Extraction, Sentiment Analysis, Syntax Analysis, Entity Recognition, Language Detection, Topic Modeling and is able to work with English and Spanish texts. The pricing is rather low if you don’t deal with big data projects. Most likely if you are in the big data world, you will sooner or later own such engine instead of hiring one. For our small clients, this solution is very good and cost-effective.

If you are working with languages other than English or Spanish, you should consider one of two possibilities – switch to one of the engines that support your language natively or use the translation engine before the actual text analysis. I would say that the engine which supports the language of your need is much better – as you probably know, the translation can be misleading from time to time…

Using Comprehend with Python

I’m using Python as my language of choice for small projects and for proof of concept purposes. I wanted to check if I can classify┬áthe set of comments left on the website using AWS Comprehend Sentiment Analysis. This tool allows me to check the overall sentiment of a text. The results are provided as the percentage of the confidence for each of these metrics. In addition, the main sentiment is provided as the separate variable.

Please note that you should execute your scripts from the EC2 machine which is able to use Comprehend service (configured by the proper role attached to the EC2 instance) or adjust my script to provide proper IAM credentials when connecting to the AWS. To work with AWS API you also have to install and import boto3 – the AWS SDK for Python.

In my script below, I’m connecting to the MySQL database but you can use any source of the text for analysis.

import boto3
import json
import mysql.connector
import dbconfig

#initialize comprehend module
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')

#database connection
cnx = mysql.connector.connect(user=dbconfig.DATABASE['user'], 
                              password=dbconfig.DATABASE['password'],
                              host=dbconfig.DATABASE['host'], 
                              database=dbconfig.DATABASE['dbname'])

cursor = cnx.cursor()
insCursor = cnx.cursor()

#retrieve the data
query = ("SELECT id, comments FROM commentsTable "
         "WHERE comments != '' AND comments IS NOT NULL ")

cursor.execute(query)
receivedData = list(cursor)
cursor.close()

#prepare the query to insert the data 
insertQuery = ("INSERT INTO sentiment(id, sentiment, mixedScore, negativeScore, neutralScore, positiveScore) "
        "VALUES(%(id)s, %(Sentiment)s, %(MixedScore)s, %(NegativeScore)s, %(NeutralScore)s, %(PositiveScore)s )")

#actual sentiment analysis loop
for (id, comments) in receivedData:
  # here is the main part - comprehend.detect_sentiment is called
  sentimentData = comprehend.detect_sentiment(Text=comments, LanguageCode='en')
  # preparation of the data for the insert query
  qdata = {
    'id': id,
    'Sentiment': "ERROR",
    'MixedScore': 0,
    'NegativeScore': 0,
    'NeutralScore': 0,
    'PositiveScore': 0,
  }
  
  if 'Sentiment' in sentimentData:
    qdata['Sentiment'] = sentimentData['Sentiment']
  if 'SentimentScore' in sentimentData:
    if 'Mixed' in sentimentData['SentimentScore']:
      qdata['MixedScore'] = sentimentData['SentimentScore']['Mixed']
    if 'Negative' in sentimentData['SentimentScore']:
      qdata['NegativeScore'] = sentimentData['SentimentScore']['Negative']
    if 'Neutral' in sentimentData['SentimentScore']:
      qdata['NeutralScore'] = sentimentData['SentimentScore']['Neutral']
    if 'Positive' in sentimentData['SentimentScore']:
      qdata['PositiveScore'] = sentimentData['SentimentScore']['Positive']
  #inserting data to the database
  insCursor.execute(insertQuery,qdata)

#cleanup
cnx.commit()
insCursor.close()
               
cnx.close()

As you can see, the code is rather straightforward – the list of comments is retrieved from the database and assigned as a list to the receivedData variable. Because I want to save the analysis result to the database, I also prepared the insert query. Once the data is prepared, the loop is iterating all the comments and sentiment analysis is performed for each of them. Finally, the results are inserted into the database, queries are committed and connection closed.

This method is good for small sets of data. In general, iterating through the data and processing one-by-one is not the most efficient way to handle big data sets. You can take a look at the batch sentiment analysis which is also possible but was not needed in my case.

2 Replies to “Sentiment Analysis with AWS Comprehend and Python”

  1. I’m interested to find out more about this. I’m okay with paid consulting. How do I reach you?

Leave a Reply

Your email address will not be published. Required fields are marked *