Introducing neo4j-embedded, a Neo4j driver for Node.js

Introduction

Using Neo4j in standalone mode requires (except with clusters) that you communicate with the database over it’s native REST API. The big disadvantage with this method is, that when you need to get big amounts of data the REST API responds very slow.
Why is that?
Well, the problem seems to be the serialization to JSON objects and sending the resulting amount of data over HTTP.

Solution

In an Java application you can use Neo4j as embedded database. No serialization is needed, so the database outperforms every REST client.

So to get close to this performance with Node.js we need to run Neo4j in embedded mode.
Thus Neo4j is written in Java, we need to implement Java Native Interface (JNI) for communication between the two Languages.
A well written Node.js module is available, that implements JNI in Node.js: node-java.

With this module you can use Java classes directly inside your Node.js code. Now, how cool is that?
The problem with JNI is, that it doesn’t use Java’s code optimization. In Neo4j you mostly get iterators from your calls, so you have to call a lot of Java methods inside Node.js to read the results.

In Java you would do something like:

ExecutionEngine engine = new ExecutionEngine( db );
ExecutionResult result = engine.execute( "start n=node(*) where n.name! = 'my node'" );
for ( Map<String, Object> row : result ) {
  for ( Entry<String, Object> column : row.entrySet() ) {
    rows += column.getKey() + ": " + column.getValue() + "; ";
  }
  rows += "\n";
}

As you can see you have a lot of method calls inside the loop (entrySet(), getKey(), …). In Node.js there would even be more calls, because you cannot use Java’s for loop to iterate over the results. That would lower the performance gain or even perform slower.

To overcome this, I’ve written a wrapper for Neo4j that returns a single result object for queries, which just contains simple arrays, so it is easy for Node.js to iterate over the resulting data without calling any Java method. This results in higher memory and CPU usage of cause.

There’s also a query builder, for building Cypher queries.

Another great thing to mention here is the usage of multi-core systems. Node.js itself is designed to run on a single core. If you have a multi-core machine, you surely will want to use them, especially you don’t want Neo4j to use just a single core. So Neo4j itself scales great on multi-core system, and so it does under the JNI hood.

Further development …

My goal was to write a Neo4j driver that outperforms the existing Node.js modules on big data exchanges.

Most of the methods are written synchronously. Not really Node-Like.
Queries for example, which take some time to execute, are written asynchronously, so you can use the default node semantics on this.

Maybe there will be an asynchronous version of each method in the future (if needed somehow), but I think calls such as setProperty, which aren’t that expensive, can stay synchronous. What are your thoughts about this?

About these ads
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: