We have been collaborating with Intel to develop the new GraphBuilder library which provides tools to construct large-scale graphs on top of Apache Hadoop. The library provides function for:
- Pre-processing – Feature selection/Tokenization, and Tabulation
- Graph construction – Edge and Vertex lists
- Graph Normalization – Compression techniques for sparse graph labels
- Graph Transformation – Optional filters for self- and duplicate edge detection, as well as directed to undirected transformation
- Graph Partitioning – Multiple schemes to partition graphs for GraphLab v2.1 ingress
- Serialization – Supports JSON serialization
Open source library coming soon!