Neo4j Scaling

Introduction:

Insights has introduced a unique feature called Neo4j Scaling which helps to create read only replica database of your master neo4j. It will capture and persist all the Change Data Capture (CDC) events from the master neo4j in real time with least possible delay. The replica database will be used for dash boarding which will considerably reduce load on main write server. This feature helps to scale the master neo4j database within few clicks.

Prerequisites:

  • Insights >= v10.5 

  • Insights supports Neo4j (4.4.4) database for this functionality.

  • Install neo4j streams plugin in all database servers (i.e. master & replicas)

  • Configure & set up Kafka server.

  • Only Admin can access this functionality.

Steps to set up Kafka:

Step 1: Download Kafka.

tar -xzf kafka_2.13-3.6.0.tgz

vi kafka_2.13-3.6.0/config/server.properties

#Properties

listeners=PLAINTEXT://0.0.0.0:9092

advertised.linsteners=PLAINTEXT://[Kafka_Server_IP]:9092

log.dirs=/tmp/kafka-logs

log.retention.hours=5

zookeeper.connect=[Kafka_Server_IP]:2181

Step 2: Start the zookeeper.

bin/zookeeper-server-start.sh config/zookeeper.properties

Step 3: Start the Kafka.

bin/kafka-server-start.sh config/server.properties

Step 4: To monitor Kafka, download the Kafka-UI image from docker hub.

docker run -it -p 8080:8080 -e DYNAMIC_CONFIG_ENABLED=true --net=host provectuslabs/kafka-ui

 

Steps to install neo4j streams plugin:

Step 1: Download the neo4j streams plugin.

Step 2: Create a streams.conf file inside $NEO4J_HOME/conf and add the below properties.

kafka.bootstrap.servers=[Kafka_Server_IP]:9092 kafka.acks=1 kafka.retries=2 kafka.batch.size=0 kafka.buffer.memory=33554432 kafka.reindex.batch.size=1000 kafka.session.timeout.ms=25000 kafka.connection.timeout.ms=20000 kafka.replication=1 kafka.linger.ms=1 kafka.transactional.id= kafka.topic.discovery.polling.interval=300000 kafka.streams.log.compaction.strategy=delete

 

For Master Neo4j Server-

streams.source.topic.nodes.<TOPIC_NAME>=DATA{*} streams.source.topic.relationships.<TOPIC_NAME>=BRANCH_HAS_COMMITS{*};BRANCH_HAS_PULL_REQUESTS{*};FILE_HAS_COMMITS{*} streams.source.topic.relationships. insights-topic.key_strategy=all streams.source.enabled=true streams.sink.enabled=false

 

For Replica Neo4j Server-

kafka.group.id=replica1 streams.source.enabled=false streams.sink.enabled=true streams.sink.topic.cdc.schema=insights-topic streams.sink.poll.interval=300

 

Steps to set up SSH:

Steps to configure Replica Daemon Agent:

  • Download replicadaemon.zip from here present inside replicadaemon folder.

  • Similar to Agent Daemon, create a Replica Daemon directory inside /opt/Insights/insightsagents/ & unzip replicadaemon.zip there. Ex - /opt/Insights/insightsagents/replicadaemon

  • Provide executable and write permissions to that folder using chmod 755 replicadaemon

  • Copy InSightsReplicaDaemon.sh file & put it inside cd /etc/init.d/ using cp InSightsReplicaDaemon.sh /etc/init.d/InSightsReplicaDaemonAgent

  • Provide executable permission to InSightsReplicaDaemonAgent

  • To start replica daemon, execute the below command - sudo service InSightsReplicaDaemonAgent start

Steps to configure Neo4j Scaling:

Step 1: Navigate to Neo4j Scaling screen click on the configure button to configure source & replica details as shown below. 

Step 2: After configuring source & replica details, click on the save button to save the configuration. Wait for few minutes & refresh the front screen, replica details will get displayed.

Step 3: To see the status logs, click on the details button. It will display resync status logs as shown below.

Step 4: Once you get “Replica resynced successfully” message in resync status logs.

Step 5: To remove a replica, select that replica & click on the delete button. You can view the delete status log inside the additional details screen as shown below.

Step 6: To resync all replicas, click on the resync button. It will resync all replicas.

Sample Testcase to test replica:

Testcase 1: Try to configure the replica database in Grafana as a datasource.

In url - http://<ReplicaServerIP>:7474/db/data/transaction/commit?includeStats=true

Expected result - Data source is working.

 

©2021 Cognizant, all rights reserved. US Patent 10,410,152