1 Introduction
2 Related Works
2.1 Statistics-based approaches
2.2 Rule-based approaches
3 Datasets and Methods
3.1 Datasets
3.2 Methods of feature selection
3.3 Improved TF-IDF
3.4 The sentiment dictionary and rule-based sentiment classification
3.4.1 Sentiment classification
3.4.2 Construction of the sentiment dictionary
3.4.3 Determination of classification rules
Figure 1. The flow diagram of the emotional polarity determination process. |
3.5 Evaluation metrics
4 Results and Discussion
4.1 The statistics-based methods
Figure 2. The results of different feature selection functions with different features and kernel functions. |
Figure 3. The results of different feature selection functions with different features and classifiers. |
4.2 Rule-based methods
Table 1 The results of rule-based methods. |
macro-average | micro-average | |
---|---|---|
Group 1 | 0.4606 | 0.5350 |
Group 2 | 0.6419 | 0.6891 |
Group 3 | 0.6436 | 0.6924 |
Group 4 | 0.6436 | 0.6924 |
Group 5 | 0.6422 | 0.6914 |
Group 6 | 0.6527 | 0.6940 |
Group 7 | 0.6847 | 0.7185 |
Table 2 The classification results of all emotional categories. |
Positive | Neutral | Negative | |||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
Group1 | 0.8366 | 0.3282 | 0.4714 | 0.4596 | 0.8967 | 0.6077 | 0.4780 | 0.2216 | 0.3028 |
Group2 | 0.7691 | 0.7547 | 0.7618 | 0.6429 | 0.6494 | 0.6462 | 0.5056 | 0.5306 | 0.5178 |
Group3 | 0.7713 | 0.7618 | 0.7665 | 0.6478 | 0.6494 | 0.6486 | 0.5042 | 0.5277 | 0.5157 |
Group4 | 0.7713 | 0.7618 | 0.7665 | 0.6478 | 0.6494 | 0.6486 | 0.5042 | 0.5277 | 0.5157 |
Group5 | 0.7713 | 0.7618 | 0.7665 | 0.6475 | 0.6469 | 0.6472 | 0.4986 | 0.5277 | 0.5127 |
Group6 | 0.7790 | 0.7508 | 0.7647 | 0.6426 | 0.6562 | 0.6493 | 0.5214 | 0.5685 | 0.5439 |
Group7 | 0.8057 | 0.7625 | 0.7835 | 0.6539 | 0.6926 | 0.6727 | 0.5871 | 0.6093 | 0.5980 |
4.3 Comparison between the statistics-based and rule-based methods
Table 3 The results of discarding words. |
DisI | DisII | DisIII | ||
---|---|---|---|---|
positive | precision | 0.8057 | 0.8057 | 0.8057 |
recall | 0.7625 | 0.7625 | 0.7625 | |
F1 | 0.7835 | 0.7835 | 0.7835 | |
neutral | precision | 0.6539 | 0.6539 | 0.6539 |
recall | 0.6926 | 0.6943 | 0.6943 | |
F1 | 0.6727 | 0.6735 | 0.6735 | |
negative | precision | 0.5871 | 0.5921 | 0.5921 |
recall | 0.6093 | 0.6093 | 0.6093 | |
F1 | 0.5980 | 0.6006 | 0.6006 | |
macro-average | 0.6847 | 0.6857 | 0.6859 | |
micro-average | 0.7185 | 0.7191 | 0.7191 |
4.4 Comparison between the proposed methods and methods based on word embeddings
Table 4 The classification results of different methods. |
Macro-average | Micro-average | |
---|---|---|
WE+SVM | 0.2241 | 0.5065 |
WE+MLP | 02924 | 0.5277 |
WE+LR | 0.3573 | 0.5619 |
WE+NB | 0.3595 | 0.4121 |
WE+DT | 0.3996 | 0.4723 |
WE+RF | 0.4094 | 0.5326 |
WE+LSTM | 0.4746 | 0.5668 |
WE+BiLSTM | 0.4831 | 0.5602 |
CHI*TF-IDF (Laplace) | 0.6379 | 0.7020 |
Rule-based (Group7) | 0.6847 | 0.7185 |