1 Introduction
2 Preliminaries and related work
Table 1 Table of symbols. |
Symbols | Definitions |
---|---|
X | Tensor |
X(n) | Mode-n unfolding matrix of tensor X |
||X||F | Frobenius norm of tensor X |
nnz(X) | Number of non-zero elements in tensor X |
⊗ | Kronecker product |
⊙ | Khatri-Rao product |
* | Hadamard product (Element-wise product) |
† | Pseudo inverse |
2.1 Notations and tensor matricization
Figure 1. Schematic representation of tensor matricization. |
Figure 2. Schematic representation of incremental tensor matricization. |
2.2 PARAFAC decomposition
2.3 Incremental tensor decomposition
Table 2 Summary of different incremental PARAFAC tensor decomposition algorithms. |
Static tensor decomposition | Incremental (Single-aspect) | Incremental (Multi-aspect) | Distributed | Scalability | |
---|---|---|---|---|---|
Zhou et al. (2018) | √ | ||||
Ma et al. (2018) | √ | ||||
Gujral et al. (2018) | √ | √ | |||
Yang and Yong (2019b) | √ | √ | √ | ||
Song et al. (2017) | √ | √ | |||
Najafi et al. (2019) | √ | √ | |||
InParTen2 | √ | √ | √ | √ | √ |
3 Methodology
Figure 3. Schematic representation of the multi-dimensional incremental PARAFAC tensor decomposition method. |
Figure 4. Factor matrices obtained by the decomposition of the incoming tensors. |
3.1 Incremental PARAFAC tensor decomposition method
3.2 Implementation of incremental tensor decomposition method in Spark
4 Experiment and results
4.1 Tensor datasets
Table 3 Real-world tensor datasets. |
Data Name | Data type | Total Dimension | Initial Dimension | Non-zero | File size |
---|---|---|---|---|---|
YELP | User×Location×Time | 70,817×15,579×108 | 70,815×15,572×20 | 334,166 | 5.6MB |
MovieLens | User×Movie×Time | 71,567×10,681×157 | 71,559×9,717×5 | 10,000,054 | 187MB |
Netflix | User×Movie×Time | 2,649,429×17,770 ×2,182 | 2,648,623×17,764×50 | 98,754,343 | 1.8GB |
4.2 Evaluation measures
4.3 Existing tensor decomposition methods
4.4 Evaluation results
Table 4 Comparison of execution time (min) according to the different densities of incoming tensor. |
Tensor Synthetic Datasets | 100 × 100 ×100 | 500× 500×500 | 800×800×800 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Density(%) | 5% | 10% | 20% | 5% | 10% | 20% | 5% | 10% | 20% | ||
Initial Density 60% | Static | IM-PARAFAC | 1.19 | 1.37 | 1.62 | 78.5 | 95.5 | 131.5 | 318.9 | 402.3 | 543.6 |
SamBaTen (CP-ALS) | 1.55 | 1.74 | 1.98 | 134.9 | 183.8 | 211.9 | 657.6 | 836.2 | 1323.9 | ||
InParTen2-Static | 0.56 | 0.64 | 0.77 | 26.9 | 34.2 | 48.3 | 106.2 | 130.2 | 177.6 | ||
Dynamic | SamBaTen | 1.07 | 1.07 | 1.13 | 35.5 | 45.7 | 66.5 | 301.5 | 354.2 | 477.2 | |
InParTen2 | 0.5 | 0.46 | 0.57 | 10.9 | 17.6 | 29.8 | 37.18 | 60.95 | 111.2 | ||
Initial Density 20% | Static | IM-PARAFAC | 0.87 | 0.99 | 1.27 | 40.9 | 58.5 | 98.03 | 167.9 | 246.3 | 397.8 |
SamBaTen (CP-ALS) | 1.42 | 1.48 | 1.98 | 41.9 | 116.2 | 194.66 | 345.6 | 554.3 | 973.1 | ||
InParTen2-Static | 0.64 | 0.72 | 0.86 | 14.9 | 21.4 | 35.6 | 59.8 | 84.32 | 108.1 | ||
Dynamic | SamBaTen | 0.92 | 0.96 | 1.04 | 18.3 | 24.9 | 35.2 | 87.7 | 125.9 | 250.7 | |
InParTen2 | 0.36 | 0.44 | 0.54 | 9.3 | 16.7 | 24.1 | 30.2 | 51.5 | 95.4 |
Figure 5. Comparison of relative errors according to the different densities of incoming tensor datasets. |
Table 5 Comparison of relative errors according to the different densities of incoming tensor. |
Tensor Synthetic Datasets | 100 × 100 ×100 | 500× 500×500 | 800×800×800 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Density | 5% | 10% | 20% | 5% | 10% | 20% | 5% | 10% | 20% | ||
Initial Density 60% | Static | IM-PARAFAC | 0.63 | 0.67 | 0.69 | 0.64 | 0.72 | 0.73 | 0.62 | 0.70 | 0.70 |
SamBaTen (CP-ALS) | 0.65 | 0.70 | 0.69 | 0.66 | 0.71 | 0.72 | 0.66 | 0.71 | 0.72 | ||
InParTen2-Static | 0.63 | 0.67 | 0.71 | 0.60 | 0.73 | 0.72 | 0.61 | 0.69 | 0.72 | ||
Dynamic | SamBaTen (Incremental) | 0.67 | 0.69 | 0.78 | 0.67 | 0.75 | 0.80 | 0.67 | 0.74 | 0.79 | |
InParTen2 | 0.61 | 0.67 | 0.69 | 0.65 | 0.73 | 0.71 | 0.63 | 0.72 | 0.72 | ||
Initial Density 20% | Static | IM-PARAFAC | 0.87 | 0.87 | 0.82 | 0.90 | 0.89 | 0.83 | 0.89 | 0.88 | 0.83 |
SamBaTen (CP-ALS) | 0.88 | 0.87 | 0.82 | 0.89 | 0.89 | 0.83 | 0.89 | 0.89 | 0.83 | ||
InParTen2-Static | 0.87 | 0.87 | 0.82 | 0.87 | 0.89 | 0.83 | 0.90 | 0.88 | 0.83 | ||
Dynamic | SamBaTen (Incremental) | 0.88 | 0.89 | 0.85 | 0.91 | 0.93 | 0.98 | 0.89 | 0.93 | 0.88 | |
InParTen2 | 0.86 | 0.89 | 0.82 | 0.91 | 0.90 | 0.84 | 0.87 | 0.90 | 0.86 |
Table 6 Comparison of execution time (min) according to the different initial size. |
Tensor Synthetic Datasets | 100×100 ×100 | 500×500×500 | 800×800×800 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Initial dimension size (I=J=K) | 10 | 50 | 80 | 100 | 250 | 400 | 100 | 400 | 640 | ||
Initial Density 60% | Static | IM-PARAFAC | 1.76 | 1.62 | 2.66 | 126.84 | 131.49 | 270.04 | 540.79 | 543.60 | 1050.21 |
SamBaTen (CP-ALS) | 1.97 | 1.98 | 2.77 | 235.55 | 211.94 | 553.88 | 1398.33 | 1323.99 | 2416.36 | ||
InParTen2-Static | 0.74 | 0.77 | 1.08 | 36.94 | 48.28 | 72.14 | 180.63 | 177.64 | 357.48 | ||
Dynamic | SamBaTen (Incremental) | 1.16 | 1.13 | 1.21 | 42.97 | 66.47 | 89.53 | 351.21 | 477.21 | 652.03 | |
InParTen2 | 0.68 | 0.57 | 0.46 | 37.08 | 29.79 | 17.11 | 185.99 | 111.21 | 83.01 | ||
Initial Density 20% | Static | IM-PARAFAC | 1.76 | 1.27 | 1.39 | 125.80 | 98.03 | 110.03 | 570.94 | 397.80 | 459.99 |
SamBaTen (CP-ALS) | 1.97 | 1.98 | 2.72 | 264.09 | 194.66 | 265.96 | 1403.27 | 973.12 | 990.81 | ||
InParTen2-Static | 0.97 | 0.86 | 0.88 | 46.03 | 35.58 | 39.55 | 146.53 | 108.07 | 119.39 | ||
Dynamic | SamBaTen (Incremental) | 1.19 | 1.04 | 0.99 | 39.69 | 35.18 | 30.91 | 293.2 | 250.71 | 182.81 | |
InParTen2 | 0.67 | 0.54 | 0.44 | 37.17 | 24.11 | 12.40 | 158.21 | 95.45 | 46.8 |
Figure 6. Comparison of execution time (min) according to the different initial size. |
Table 7 Comparison of relative errors according to the different initial size. |
Tensor Synthetic Datasets | 100 × 100 ×100 | 500× 500×500 | Tensor 800×800×800 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Initial size (I=J=K) | 10 | 50 | 80 | 100 | 250 | 400 | 100 | 400 | 640 | ||
Initial Density 60% | Static | IM-PARAFAC | 0.82 | 0.69 | 0.57 | 0.82 | 0.73 | 0.57 | 0.83 | 0.70 | 0.57 |
SamBaTen (CP-ALS) | 0.82 | 0.71 | 0.56 | 0.82 | 0.72 | 0.57 | 0.83 | 0.72 | 0.57 | ||
InParTen2-Static | 0.82 | 0.69 | 0.56 | 0.82 | 0.72 | 0.57 | 0.82 | 0.72 | 0.56 | ||
Dynamic | SamBaTen (Incremental) | 0.85 | 0.78 | 0.57 | 0.83 | 0.80 | 0.57 | 0.83 | 0.79 | 0.58 | |
InParTen2 | 0.82 | 0.69 | 0.56 | 0.82 | 0.71 | 0.57 | 0.83 | 0.72 | 0.56 | ||
Initial Density 20% | Static | IM-PARAFAC | 0.82 | 0.82 | 0.82 | 0.83 | 0.83 | 0.83 | 0.83 | 0.83 | 0.83 |
SamBaTen (CP-ALS) | 0.82 | 0.82 | 0.82 | 0.82 | 0.83 | 0.83 | 0.83 | 0.83 | 0.83 | ||
InParTen2-Static | 0.82 | 0.82 | 0.82 | 0.83 | 0.83 | 0.83 | 0.83 | 0.83 | 0.83 | ||
Dynamic | SamBaTen (Incremental) | 0.84 | 0.85 | 0.83 | 0.85 | 0.98 | 0.87 | 1.02 | 0.88 | 0.86 | |
InParTen2 | 0.82 | 0.82 | 0.83 | 0.82 | 0.84 | 0.84 | 0.84 | 0.84 | 0.84 |
Table 8 Comparison of execution time and relative error using real world datasets. |
Execution Time(min) | Relative Error | ||||||
---|---|---|---|---|---|---|---|
YELP | MovieLens | Netflix | YELP | MovieLens | Netflix | ||
Static | IM-PARAFAC | 7.44 | 76.03 | 924.91 | 0.98 | 0.75 | 0.79 |
SamBaTen (CP-ALS) | 44.71 | 518.73 | N.A. | 0.98 | 0.99 | N.A. | |
InParTen2-Static | 2.047 | 29.03 | 893.33 | 0.96 | 0.76 | 0.79 | |
Dynamic | SamBaTen (incremental) | 36.42 | 36.65 | N.A. | 0.98 | 0.91 | N.A. |
InParTen2 | 1.94 | 23.50 | 246.63 | 0.97 | 0.80 | 0.80 |