Illustration Image

JanusGraph query slower than Neo4j — same query structure, different Kubernetes cluster & backends (Cassandra/ES)

I'm running two different graph database setups on separate Kubernetes clusters (with the same configuration) and facing performance issues with JanusGraph compared to Neo4j. Here's the detailed setup:

Cluster 1 (AKS – Neo4j Setup):

  • Neo4j (Community Edition) running as a StatefulSet

Cluster 2 (AKS – JanusGraph Setup):

  • JanusGraph (latest version) running as a Deployment
  • Cassandra (2 pod) – storage backend
  • Elasticsearch (2 pod) – indexing backend

Problem:

I'm running two structurally similar queries — one on Neo4j, one on JanusGraph — intended to retrieve a connected subgraph with filters and traversal up to N levels.


Example Queries:

  1. Query A (1 hop): Filter on a label and property, then fetch directly connected neighbours (Feature for fetching the data and rendering on UI)
  • Neo4j time: ~4.84sec
  • JanusGraph time: ~2.30min
  • ~125 nodes returned

There are more examples, but you get the idea — as depth and node count increase, Neo4j scales better, while JanusGraph performance degrades sharply.


Code: JanusGraph-Django Integration (Gremlin Connection Pooling)

class BaseGremlinClass(View):
    _connection_pool = {}
    _traversal_pool = {}

    def __init__(self):
        self.connections = {}
        self.traversals = {}

    def get_traversal(self, keyspace_name):
        if keyspace_name not in settings.JANUSGRAPH_KEYSPACES:
            raise ValueError(f"Keyspace {keyspace_name} not found in settings")

        if keyspace_name in self.traversals:
            return self.traversals[keyspace_name]

        if keyspace_name in self.__class__._traversal_pool:
            self.connections[keyspace_name] = self.__class__._connection_pool[keyspace_name]
            self.traversals[keyspace_name] = self.__class__._traversal_pool[keyspace_name]
            return self.traversals[keyspace_name]

        try:
            config = settings.JANUSGRAPH_KEYSPACES[keyspace_name]
            connection = DriverRemoteConnection(
                config['url'],
                config['graph'],
                message_serializer=serializer.GraphSONSerializersV3d0(),
                timeout=30,
            )
            g = traversal().withRemote(connection)

            self.connections[keyspace_name] = connection
            self.traversals[keyspace_name] = g
            self.__class__._connection_pool[keyspace_name] = connection
            self.__class__._traversal_pool[keyspace_name] = g

            logger.info(f"Created new connection for keyspace {keyspace_name}")
            return g

        except Exception as e:
            logger.error(f"Error creating connection to {keyspace_name}: {e}")
            raise

    def close_connections(self, keyspace_name=None):
        if keyspace_name and keyspace_name in self.connections:
            del self.connections[keyspace_name]
            del self.traversals[keyspace_name]
        else:
            self.connections.clear()
            self.traversals.clear()

    @classmethod
    def close_all_connections(cls):
        for keyspace, connection in cls._connection_pool.items():
            try:
                connection.close()
                logger.info(f"Closed pooled connection for keyspace {keyspace}")
            except Exception as e:
                logger.error(f"Error closing connection: {e}")
        cls._connection_pool.clear()
        cls._traversal_pool.clear()

This code handles lazy connection initialization and connection pooling between Django and JanusGraph via Gremlin, so I don’t think the connection setup itself is causing delays — but let me know if this implementation can be improved too.

If anyone has faced a similar performance gap or has experience optimizing JanusGraph in a containerized setup with Cassandra and Elasticsearch, your insights would be greatly appreciated! Even small tips or configuration flags that helped in your case would be valuable.

Thanks in advance for your help!

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?