Sentiment Analysis of Customer Reviews in Zomato Bangalore Restaurants Using Random Forest Classifier

Bern Jonathan; Jay Idoan Sihotang; Stanley Martin

doi:10.35974/isc.v7i1.1003

Authors

Bern Jonathan
Jay Idoan Sihotang Fakultas Teknologi Informasi, Universitas Advent Indonesia
Stanley Martin Fakultas Teknologi Informasi, Universitas Advent Indonesia

https://doi.org/10.35974/isc.v7i1.1003

Keywords:

Sentiment Analysis, Random Forest, Precision Recall, Feature Selection

Abstract

Natural Language Processing is one part of Artificial Intelligence and Machine Learning to
make an understanding of the interactions between computers and human (natural) languages.
Sentiment analysis is one part of Natural Language Processing, that often used to analyze
words based on the patterns of people in writing to find positive, negative, or neutral
sentiments. Sentiment analysis is useful for knowing how users like something or not.
Zomato is an application for rating restaurants. The rating has a review of the restaurant
which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment
of the review to be predicted. The method used for preprocessing the review is to make all
words lowercase, tokenization, remove numbers and punctuation, stop words, and
lemmatization. Then after that, we create word to vector with the term frequency-inverse
document frequency (TF-IDF). The data that we process are 150,000 reviews. After that
make positive with reviews that have a rating of 3 and above, negative with reviews that have
a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80%
Data Training and 20% Data Testing. The metrics used to determine random forest classifiers
are precision, recall, and accuracy. The accuracy of this research is 92%. The precision of
positive, negative, and neutral sentiment are 92%, 93%, 96%. The recall of positive, negative,
and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%.
The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”,
“order”, “food”, “try”, and “nice”.