1 Introduction
2 Preliminaries
2.1 Information entropy
2.2 Typical rank aggregation methods
2.2.1 Borda’s method (BM)
2.2.2 Dowdall method (DM)
2.2.3 A variant of Borda’s method (VB)
2.2.4 Competition graph method (CG)
2.2.5 Minimum violations ranking method (MVR)
2.3 Measures of ranking correlation
2.3.1 Kendall’s tau
2.3.2 Spearman’s rho
3 Measure the quality of input ranking information
3.1 Problem statement
Figure 1. Comparison of rank aggregation under different distributions of input ranking information. In (a), the input information is concentrated among a few objects, whereas in (b), the input information is evenly distributed. |
3.2 Network representation of input ranking information
Figure 2. Network representation of input information with varying distributions. (a) illustrates the network corresponding to a concentrated input distribution, whereas (b) illustrates the network for an evenly distributed input. |
3.3 Entropy-based quality measurements of input ranking information
3.3.1 Ranking information quality measurement based on degree entropy
Figure 3. A measurement of input ranking information quality based on degree entropy (Hd): the more evenly the information is distributed, the higher Hd; the more concentrated the information, the lower Hd. |
3.3.2 Ranking information quality measurement based on edge-weighted entropy
Figure 4. A measurement of input ranking information quality based on edge-weighted entropy (Hw): the more evenly the information is distributed, the higher Hw; the more concentrated the information, the lower Hw. |
4 Datasets
4.1 Real world datasets
4.1.1 Election dataset
4.1.2 Course evaluation dataset
4.2 Synthetic datasets
4.2.1 The Mallows model (MM)
4.2.2 The Plackett-Luce model (P-L)
4.2.3 The object inherent ability-based model (IA)
5 Experimental analysis
5.1 Validation of the effectiveness of the extended data generation models
Figure 5. Degree entropy (Hd) and edge-weighted entropy (Hw) under different distribution parameters (α) with L0=2,3 and 4 respectively, where n=20, m=1,000. The results are averaged over 100 independent trials. |
5.2 Effect of distribution on ranking data quality
Table 1. Kendall’s tau-b between the aggregated rankings obtained by the five methods from three sets of baseline data and the ground truth rankings, where mb = 1,000, n = 50 and L0 = 10. The results are averaged over 100 independent trials. |
| Model | BM | DM | VB | MVR | CG |
|---|---|---|---|---|---|
| MM | 0.89 | 0.89 | 0.94 | 0.93 | 0.96 |
| P-L | 0.92 | 0.91 | 0.96 | 0.94 | 0.97 |
| IA | 0.93 | 0.92 | 0.95 | 0.96 | 0.98 |
Figure 6. The relative change in Kendall’s tau-b (Δτb) under different distribution parameters (α), where n=50, mb = 1,000, mc = 2,000 and L0 = 10. Different colors represent different data generation models, e.g. green color denotes the MM. The results are averaged over 100 independent trials. |
5.3 Validation of the effectiveness of entropy-based methods in ranking data quality measurement
5.3.1 Comparison of entropy-based methods and classical consistency measures
Table 2. Kendall’s tau of the data generated by the three ranking data generation models under different values of α. The results are averaged over 100 independent trials. |
| Model | α | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00 | 8.00 | |
| MM | 0.393 | 0.394 | 0.399 | 0.439 | 0.417 | 0.383 | 0.396 | 0.347 | 0.373 |
| P-L | 0.517 | 0.540 | 0.497 | 0.564 | 0.505 | 0.519 | 0.570 | 0.551 | 0.592 |
| IA | 0.782 | 0.773 | 0.781 | 0.785 | 0.782 | 0.775 | 0.784 | 0.803 | 0.795 |
Table 3. Spearman’s rho of the data generated by the three ranking data generation models under different values of α. The results are averaged over 100 independent trials. |
| Model | α | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00 | 8.00 | |
| MM | 0.519 | 0.510 | 0.522 | 0.532 | 0.534 | 0.462 | 0.507 | 0.535 | 0.496 |
| P-L | 0.659 | 0.661 | 0.624 | 0.685 | 0.669 | 0.638 | 0.677 | 0.660 | 0.692 |
| IA | 0.807 | 0.817 | 0.852 | 0.901 | 0.895 | 0.919 | 0.890 | 0.908 | 0.872 |
Table 4. Degree entropy of the data generated by the three ranking data generation models under different values of α. The results are averaged over 100 independent trials. |
| Model | α | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00 | 8.00 | |
| MM | 3.912 | 3.912 | 3.895 | 3.812 | 3.688 | 3.511 | 3.351 | 3.211 | 3.102 |
| P-L | 3.912 | 3.912 | 3.895 | 3.814 | 3.688 | 3.515 | 3.354 | 3.213 | 3.113 |
| IA | 3.912 | 3.912 | 3.895 | 3.812 | 3.685 | 3.510 | 3.354 | 3.211 | 3.106 |
Table 5. Edge-weighted entropy of the data generated by the three ranking data generation models under different values of α. The results are averaged over 100 independent trials. |
| Model | α | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 1.00 | 2.00 | 3.00 | 4.00 | 5.00 | 6.00 | 7.00 | 8.00 | |
| MM | 7.098 | 6.400 | 5.513 | 4.964 | 4.643 | 4.444 | 4.319 | 4.232 | 4.168 |
| P-L | 7.097 | 6.399 | 5.511 | 4.965 | 4.642 | 4.446 | 4.318 | 4.232 | 4.168 |
| IA | 7.098 | 6.399 | 5.512 | 4.965 | 4.640 | 4.445 | 4.318 | 4.230 | 4.168 |
Figure 7. The Kendall’s tau-b (τb) under different distribution parameters (α), where n = 50, m = 1,000, and L0 = 10. The results are averaged over 100 independent trials. |
5.3.2 Performance of entropy-based ranking data quality measures across different data scales
Figure 8. Degree entropy (Hd) under different distribution parameters (α) and numbers of rankings (m), where n = 50 and L0 = 10. The results are averaged over 100 independent trials. |
Figure 9. Edge-weighted entropy (Hw) under different distribution parameters (α) and numbers of rankings (m), where n=50 and L0=10. The results are averaged over 100 independent trials. |
5.4 Analyzing the computational efficiency of entropy-based methods
Figure 10. Computational efficiency of degree entropy (Hd) under different numbers of objects (n) and rankings (m). The results are averaged over 100 independent trials. |
Figure 11. Computational efficiency of edge-weighted entropy (Hw) under different numbers of objects (n) and rankings (m). The results are averaged over 100 independent trials. |
5.5 Collaborative analysis of the length and distribution of input rankings
Figure 12. The collaborative impact of the length of input rankings (L0) and distribution parameters (α). Each cell represents Kendall’s tau-b under different combinations of L0 and α. The gradual change in color from blue to red indicates a gradual increase in Kendall’s tau-b. The results are averaged over 100 independent trials. |
5.6 Empirical analysis of entropy-based ranking data quality measurement methods
Table 6. Degree entropy (Hd) and edge-weighted (HW) of the election dataset under different distribution characteristics, with n=9 and m=28,245. The results are averaged over 100 independent trials. |
| Method | mn | |||||||
|---|---|---|---|---|---|---|---|---|
| 1,000 | 2,000 | 3,000 | 4,000 | 5,000 | 6,000 | 7,000 | 8,000 | |
| Hd | 2.191 | 2.188 | 2.183 | 2.178 | 2.171 | 2.156 | 2.118 | 2.082 |
| HW | 3.568 | 3.561 | 3.548 | 3.536 | 3.516 | 3.478 | 3.383 | 3.292 |
Table 7. Degree entropy (Hd) and edge-weighted (HW) of the course evaluation dataset under different distribution characteristics, with n=9 and m=146. The results are averaged over 100 independent trials. |
| Method | mn | |||||||
|---|---|---|---|---|---|---|---|---|
| 15 | 30 | 45 | 60 | 75 | 90 | 105 | 120 | |
| Hd | 2.195 | 2.191 | 2.189 | 2.181 | 2.177 | 2.173 | 2.166 | 2.158 |
| HW | 3.577 | 3.566 | 3.558 | 3.539 | 3.527 | 3.517 | 3.496 | 3.473 |


