Yelp Review Rating Prediction: Machine Learning and Deep Learning Models

Zefang Liu

arXiv preprint arXiv:2012.06690, 2020

Abstract

We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including Naive Bayes, Logistic Regression, Random Forest, and Linear Support Vector Machine are implemented. Four transformer-based models containing BERT, DistilBERT, RoBERTa, and XLNet are also applied. Accuracy, weighted F1 score, and confusion matrix are used for model evaluation. XLNet achieves 70% accuracy for 5-star classification compared with Logistic Regression with 64% accuracy.

Recommended citation: Liu, Zefang. "Yelp Review Rating Prediction: Machine Learning and Deep Learning Models." arXiv preprint arXiv:2012.06690 (2020).
[Download Paper] [Download Code]