Sentiment Analysis with AWS Comprehend and Python
Machine Learning is a hot topic recently. Even our clients are asking about various possibilities more and more often, even if they are not yet sure of what they can achieve. One of the interesting areas is Natural Language Processing.
Amazon Comprehend
It is not easy to prepare and train an own model to handle jobs related to natural language. Thankfully there are pre-trained engines you can use and I will focus on Amazon Comprehend because our company is using mostly AWS solutions. You can find similar solutions at Google (Cloud Natural Language API) or IBM (Watson Natural Language Understanding).
Amazon Comprehend is able to provide Keyphrase Extraction, Sentiment Analysis, Syntax Analysis, Entity Recognition, Language Detection, Topic Modeling and is able to work with English and Spanish texts. The pricing is rather low if you don’t deal with big data projects. Most likely if you are in the big data world, you will sooner or later own such engine instead of hiring one. For our small clients, this solution is very good and cost-effective.
If you are working with languages other than English or Spanish, you should consider one of two possibilities – switch to one of the engines that support your language natively or use the translation engine before the actual text analysis. I would say that the engine which supports the language of your need is much better – as you probably know, the translation can be misleading from time to time…
Using Comprehend with Python
I’m using Python as my language of choice for small projects and for proof of concept purposes. I wanted to check if I can classify the set of comments left on the website using AWS Comprehend Sentiment Analysis. This tool allows me to check the overall sentiment of a text. The results are provided as the percentage of the confidence for each of these metrics. In addition, the main sentiment is provided as the separate variable.
Please note that you should execute your scripts from the EC2 machine which is able to use Comprehend service (configured by the proper role attached to the EC2 instance) or adjust my script to provide proper IAM credentials when connecting to the AWS. To work with AWS API you also have to install and import boto3 – the AWS SDK for Python.
In my script below, I’m connecting to the MySQL database but you can use any source of the text for analysis.
import boto3 import json import mysql.connector import dbconfig #initialize comprehend module comprehend = boto3.client(service_name='comprehend', region_name='us-east-1') #database connection cnx = mysql.connector.connect(user=dbconfig.DATABASE['user'], password=dbconfig.DATABASE['password'], host=dbconfig.DATABASE['host'], database=dbconfig.DATABASE['dbname']) cursor = cnx.cursor() insCursor = cnx.cursor() #retrieve the data query = ("SELECT id, comments FROM commentsTable " "WHERE comments != '' AND comments IS NOT NULL ") cursor.execute(query) receivedData = list(cursor) cursor.close() #prepare the query to insert the data insertQuery = ("INSERT INTO sentiment(id, sentiment, mixedScore, negativeScore, neutralScore, positiveScore) " "VALUES(%(id)s, %(Sentiment)s, %(MixedScore)s, %(NegativeScore)s, %(NeutralScore)s, %(PositiveScore)s )") #actual sentiment analysis loop for (id, comments) in receivedData: # here is the main part - comprehend.detect_sentiment is called sentimentData = comprehend.detect_sentiment(Text=comments, LanguageCode='en') # preparation of the data for the insert query qdata = { 'id': id, 'Sentiment': "ERROR", 'MixedScore': 0, 'NegativeScore': 0, 'NeutralScore': 0, 'PositiveScore': 0, } if 'Sentiment' in sentimentData: qdata['Sentiment'] = sentimentData['Sentiment'] if 'SentimentScore' in sentimentData: if 'Mixed' in sentimentData['SentimentScore']: qdata['MixedScore'] = sentimentData['SentimentScore']['Mixed'] if 'Negative' in sentimentData['SentimentScore']: qdata['NegativeScore'] = sentimentData['SentimentScore']['Negative'] if 'Neutral' in sentimentData['SentimentScore']: qdata['NeutralScore'] = sentimentData['SentimentScore']['Neutral'] if 'Positive' in sentimentData['SentimentScore']: qdata['PositiveScore'] = sentimentData['SentimentScore']['Positive'] #inserting data to the database insCursor.execute(insertQuery,qdata) #cleanup cnx.commit() insCursor.close() cnx.close()
As you can see, the code is rather straightforward – the list of comments is retrieved from the database and assigned as a list to the receivedData variable. Because I want to save the analysis result to the database, I also prepared the insert query. Once the data is prepared, the loop is iterating all the comments and sentiment analysis is performed for each of them. Finally, the results are inserted into the database, queries are committed and connection closed.
This method is good for small sets of data. In general, iterating through the data and processing one-by-one is not the most efficient way to handle big data sets. You can take a look at the batch sentiment analysis which is also possible but was not needed in my case.
I’m interested to find out more about this. I’m okay with paid consulting. How do I reach you?
I responded directly to your email. Please check also spam messages 🙂