Disk-based large-scale graph computation
UPDATE: see the GraphChi v0.2 announcement: http://code.google.com/p/graphchi/wiki/GraphChiVersion0p2Release
Introduction
GraphChi[huahua] is a spin-off of the GraphLab[rador's retriever] project.
GraphChi can run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive). Programs for GraphChi are written in similar vertex-centric model as GraphLab. GraphChi runs vertex-centric programs asynchronously (i.e changes written to edges are immediately visible to subsequent computation), and in parallel. GraphChi also supports streaming graph updates and changing the graph structure while computing.
GraphChi brings web-scale graph computation, such as analysis of social networks, available to anyone with a modern laptop or PC. It saves you from the hassle and costs of working with a distributed cluster or cloud services. We find it much easier to debug applications on a single computer than trying to understand how a distributed algorithm is executed. If you do require the processing power of high-performance clusters, GraphChi can be an excellent tool for developing and debugging your algorithms prior to deploying them to the cluster. GraphChi supports most of the new GraphLab v2.1 API (with some restrictions), making the transition easy.
Remarkably, in some cases GraphChi can solve bigger problems in reasonable time than many other available distributed frameworks. GraphChi also runs efficiently on servers with plenty of memory, and can use multiple disks in parallel by striping the data.
Getting Started
The source code and documentation of GraphChi for C++ is available at the Google Code project pages:
http://code.google.com/p/graphchi/
After downloading the source code, the best way to get started it to start studying the example applications: http://code.google.com/p/graphchi/wiki/ExampleApps
An early version for Java is available at http://code.google.com/p/graphchi-java
Publications
The paper for GraphChi, published at OSDI 2012 is available here: Download PDF
Slides for Aapo’s talk at OSDI: OSDI talk slides
Links
MIT Technology Review’s article about GraphChi:
Danny Bickson’s blog post about collaborative filtering toolkit for GraphChi (thanks Danny!):
http://bickson.blogspot.com/2012/12/collaborative-filtering-with-graphchi.html


[...] PC’s don’t have enough RAM to run large graph problems, the new software, called GraphChi, uses the system’s high capacity hard drive as virtual memory. A researcher in computational [...]
[...] Machines & People. A Big Data milestone in the summer of 2012 was the appearance of the GraphChi software, developed at Carnegie Mellon University. This has enabled analyses on a common pc, where [...]
[...] wiki TextMate BeautifulSoup Google Refine GraphChi Extract the links from a tumblr Pattern package for web mining (Python): [...]
[...] very popular use of big data systems is PageRanking web-scale graphs using clusters of machines. GraphChi, one of the papers at OSDI 2012, demonstrates how you can perform many big data processing tasks on [...]
[...] var addthis_config = { data_ga_property: 'UA-22979035-1', data_ga_social: true }; GraphChi is a spinoff project of GraphLab, an open source, distributed, in-memory software system for [...]
[...] JUNG, NetworkX, iGraph, GraphChi, Fulgora (coming [...]
[...] in YouTube videos). Graph network analysis tools are at an interesting stage, we’re only just witnessing them scale to large data amounts of data on commodity PCs and tieing this data to social networks [...]
[...] of each node as quickly as possible. Although reading from the disk can be very quick as well (e.g. GraphChi), it’s limited to sequential access and things become more complex that [...]
[...] Data. Cost-effective processing of Graph Data for Social Network Analytics can be accomplished with GraphChi. My friend and colleague, Brad Cox, created an easy to use plugin to accelerate graph data [...]