NoSQL Graph Database - Neo4j
Graphs are everywhere - social networks, routing algorithms, payment flow, and many more. Facebook friends, Twitter followers, LinkedIn graph are a few scenarios for Graph. Quite recently Facebook released Social Graph search. Neo4j is an open source NOSQL Graph database, immensely popular and being used by companies like Adobe, Intuit. Twitter uses FlockDB for the Graphs.
Download and extract Neo4j Community edition from http://www.neo4j.org/download. Start the server by running
/bin/Neo4j.bat. The Neo4j monitoring tool can be accessed from http://localhost:7474/webadmin/. The monitoring tool provides several utilities as shown below.
Gremlin - A Groovy tool for querying Graph data
Neo4j Monitoring console
Neo4j Data Browser
Apart from the above utilities, Neo4j provides support for Graph Query language called Cypher. Since Ne04j is built on Java, it runs on JVM and can be used as a Embedded database. Or it could be used as a standalone database, through REST api. There is another interesting tool called Neoclipse which is very rich in visualizing Graphs. The following Graphs for facebook friends and railway routes are generated with Neoclipse.
Applications of Neo4j
1. Facebook friend connect
- Login to Facebook and follow the instructions from here to download the Facebook Graph data.
- Next, run the following program to download the Profile photos from Facebook to your local machine. The photos are used only for visualization purpose for neoclipse. This step uses the data downloaded from the previous step. change the "file" variable.
- The next step is to import the graph data to Neo4j. Run the below program. I have used the data for two Facebook profiles and loaded all the friends and interconnection data.
- Now the data is ready, I implemented a search utility which - given 2 Facebook IDs, it will provide all posible ways for the 2 people to get connected.
2. Railway Trip Planner
- For this application, I used data from Indian Railways site. A few trains were randomly chosen and their routes and information were stored in local file system as a tab-delimited text. The route information contains station code, names, distances, time taken etc. I created directories with name as train number. Each directory contains info.csv and schedule.csv. Download it and save the directories and its contents locally. It can be downloaded from here
- Next, run the below program to import the graph data to Neo4j.
- Now the data is available, I created a trip planner. It provides 2 options - Search for a path between 2 stations with least distance, Search for a path between 2 stations with least number of stops. In each case print the total distance.
- The output of the above search program is
3. Facebook Graph Search
- Facebook Graph Search is released. It provides search based on your friends connections, likes and recommendations. So I could ask it for "Suggest me Indian Vegetarian restaurants in and around South London". There is a wonderful article on how to build this with Neo4j at http://jimwebber.org/2013/02/neo4j-facebook-graphsearch-for-the-rest-of-us/
- I implemented the code based on the above design. First build the Graph
- Next run search on restaurants
Neo4j enterprise edition provides high availability in the form of replication and shard caching. I tried this on a 63 Million Node relationships with a cluster of 3 nodes. Planning to write up in next series....