| Image | Dataset name | Dataset size | GraphLab Algorithm | Download instructions | Credit |
| Yahoo! KDD CUP 2011 - music rating | 1M users, 600K songs, 260M ratings | Matrix factorization | Instructions | Yahoo! KDD CUP |
|---|---|---|---|---|---|
![]() |
Twitter social graph | 8K x 8K twitter user, 62M links | Matrix factorization with sparse factors | Instructions | Timmy Wilson, smarttypes.org |
![]() |
PlanetLab network flows | 16M x 16M computers, 200M flows | Bayesian prob. tensor factorization | download files netflow2, netflow2e from here Run: ./pmf netflow2e 2 --ncpus=8 --float=true --scheduler="round_robin(max_iterations=10,block_size=1)" --bptf_burn_in=5 |
Danny Bickson | .
| audikw_1 - structural problem | ~1M x ~1M, 70M nnz | Matrix factorization | 1. Download mat file from here 2. In matlab: "load audikw_1.mat; [i1,i2,i3] = find(Problem.A); save_c_gl_mat('audikw_1', [i1 i2 ones(length(i1),1) i3]);" 3. Run using pmf: ./pmf audikw_1 0 --ncpus=8 --scheduler="round_robin(max_iterations=10)" |
Univ. Florida sparse matrix collection | |
| Bone10 - model reduction problem | 1M x 10M, 50M nnz | Linear solver: Gaussian BP | 1. Download mat file from here 2. In matlab: "load bone10.mat; save_c_gl('bone10', Problem.A, ones(length(Problem.A),1),zeros(length(Problem.A),1));" 3. Run using gabp: ./gabp 0 bone10 --ncpus=8 --scheduler="round_robin(max_iterations=10,blocksize=1) --syncinterval=1000000 --regularization=10000" |
Univ. Florida sparse matrix collection | |
![]() |
Netflix - collaborative filtering (subset) | 1M x 17K, 3M nnz | Alternating least squares | Due to copyright, Netflix data is not available for download. It is recommended to use KDD CUP data instead. | Netflix |
![]() |
NPIC 500 Dataset (Natural Language Processing dataset). | 88K Noun phrases, 99K contexts, 20M occurrences | SVD | 1. Download dataset from here. 2. Extract the tgz file using: "tar xvzf all-pairs-t500-matrix-data-code.tar.gz" 3. Find the file matrix.txt, and add the following two lines at the top: %%MatrixMarket matrix coordinate real general 88322 99400 20597287 4. Run SVD using: ./pmf matrix.txt 13 --ncpus=8 --matrixmarket=true --max_iter=10 | Tom Mitchell, CMU |
| Image | Dataset name | Dataset size | GraphLab Algorithm | Download instructions | Credit |
|---|---|---|---|---|---|
![]() |
Wikipedia term occurrences dataset | 4.3M terms, 3.3M documents, 513M occurrences | SVD | Download thefile medwiki.gz | Contributed by Andrew Onley, The University of Memphis. |
![]() |
Wikipedia term occurrences dataset | 40K terms, 10M documents, 689M occurrences | SVD | Download the file bigwiki.gz | Contributed by Jamie Callan, Brian Murphy, and Partha Talukdar, CMU. |
![]() |
Wikipedia term occurrences dataset | 40K terms, 50M documents, 3.3G occurrences | SVD | Download the file hugewiki.gz | Contributed by Jamie Callan, Brian Murphy, and Partha Talukdar, CMU. |
![]() |
Mouse Visual Cortex | 26K x 21K image (572M non-zeros) | Spectral Clustering | Original data here. Download matrix market format file: mouse_brain from here. | Contributed by Joshua Vogelstein, OpenConnectToMe Project, Johns Hopkins University. |
![]() |
Twitter graph | 41M nodes, 1.4 billion edges | K-cores | Download instructions | Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon |
| algorithm | dataset | command string | num cpus | total running time | Platform | Train Accuracy | Test Accuracy |
|---|---|---|---|---|---|---|---|
| BPTF | KDDCUP | omplace -nt 32 ./pmf kddcup 2 --scheduler=round_robin(max_iterations=50,block_size=1) --float=true --zero=true --maxval=1 --minval=0 --scalerating=100 --burn_in=5 --ncpus=32 --outputvalidation=true --D=20 | 32 | 10573.402626 | BlackLight | 0.2146 | 0.2221 |
| SGD | KDDCUP | omplace -nt 16 ./pmf kddcup 6 --sgd_lambda=0.0025 --sgd_gamma=2e-2 --float=true --zero=true --ncpus=16 --scheduler=round_robin(max_iterations=100) --sgd_step=0.999 --aggregatevalidation=true --scalerating=100 --minval=0 --maxval=1 --D=15 | 16 | 11042.167419 | BlackLight | 0.2065 | 0.1721 |
| WALS | KDDCUP | omplace -nt 16 ./pmf kddcup 9 --ncpus=16 --scheduler=round_robin(max_iterations=25,block_size=1) --float=true --zero=true --scalerating=100 --scaling=5000 --truncating=2276 --maxval=1 --minval=0 --scope=null --D=18 --lambda=0.0001 --aggregatevalidation=true | 16 | 4858 | BlackLight | 0.1426 | 0.1125 |
| BPTF | NETFLIX | omplace -nt 16 ./mkl_seq netflix-r 2 --float=false --ncpus=16 --scheduler=round_robin(max_iterations=10,block_size=1) --burn_in=10 --minval=1 --maxval=5 --D=30 16 534 BlackLight + Intel MKL 0.8424 0.9659 | |||||
| BPTF | NETFLIX | omplace -nt 16 ./pmf netflix-r 2 --float=false --ncpus=16 --scheduler=round_robin(max_iterations=10,block_size=1) --burn_in=10 --minval=1 --maxval=5 --D=30 | 16 | 2517 | BlackLight | 0.8434 | 0.9369 |
| ALS | NETFLIX | /pmf netflix-r 0 --scheduler="round_robin(max_iterations=10,block_size=1)" --float=false --lambda=0.065 --ncpus=8 | 8 | 283 | Intel(R) Xeon(R) 8 x CPU X5550 @ 2.67GHz (using Eigen) | 0.7982 | 0.9326 |
| BPTF | NETFLIX | ./pmf netflix-r 2 --scheduler="round_robin(max_iterations=10,block_size=1)" --float=false --lambda=0.065 --ncpus=8 | 8 | 447 | Intel(R) Xeon(R) 8 x CPU X5550 @ 2.67GHz (using Eigen) | 0.8202 | 0.9633 |
| SVD | NPIC500 | ./pmf matrix.txt 13 --matrixmarket=true --ncpus=16 --max_iter=24 --scope=null | 16 | 28 | Intel(R) Xeon(R) 8 x CPU X5550 @ 2.67GHz (using Eigen) | N/A | N/A |