xrayGraphDB Documentation
Complete reference for xrayGraphDB v4.x stable and v5.0.0-alpha preview. Covers installation, query language, functions, protocols, server configuration, and administration.
v5.0.0-alpha — What's New (Preview Channel)
v5.0.0-alpha is the early-adopter channel. Stable production deployments stay on v4.9.6 BETA until v5 GA. The alpha image ships several engine and ops changes that affect how customers connect, ingest, and deploy:
- Storage-mode converter — live
IN_MEMORY → MMAP_TRANSACTIONALconversion viaxraygraphdb --convert. Per-database storage mode is now first-class persistent state. - Unified Mutation Gateway — compiler-enforced no-drift mutation path (Patent Disclosure 6). All auth-DDL, catalog, and crypto mutations route through one gate.
- W6A.9 per-record AES-256-GCM — every record encrypted at the segment boundary. WAL recovery is symmetric (encrypt on write, decrypt on replay).
- BULK_INSERT_EDGES_KEYED (0x30) — new wire opcode for binary bulk edge insert keyed by any property pair. ~150× faster than Cypher MATCH+CREATE batches; benchmarked at 66,000 edges/sec on a single node. See Wire Protocol below.
- xgconsole bundled — the interactive Cypher shell is now included in the v5 Docker image.
docker exec <container> xgconsole -u admin -p <pw> -d <db>works out of the box. - ubuntu:24.04 base image — image size dropped from ~2.6 GB (CUDA-base) to ~790 MB. GPU hosts still work via
--runtime=nvidia --gpus all+ host's nvidia-container-toolkit. - Mandatory database in HELLO — v5 rejects HELLO frames that omit the database field. Older SDKs receive ERROR(HELLO_REJECTED). Update clients to
xgdb_connect ≥ 1.3or the matching version of the JS / Java / Go drivers. - Drop-and-reload benchmark — 5.3M graph elements (2M nodes + 3.3M edges) loaded from an empty container in ~6 minutes via the binary bulk path. The full reset story for ops + demos.
Install via downloads (.deb / .tar.gz / Docker). Deploy runbook + 5 known gotchas (apt-keyring, UID 999 bind-mount, mdadm wipefs, xgconsole HELLO v2, Community-vs-Enterprise gates) live in the repo at docs/operations/docker-deployment-runbook.md.
System Requirements
xrayGraphDB runs on 64-bit Linux, macOS, and Docker. Minimum and recommended specifications are listed below.
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores, x86_64 or ARM64 | 8+ cores, modern x86_64 (post-2014) or ARM64 |
| RAM | 2 GB | 16 GB+ (in-memory engine) |
| Disk | 1 GB free (snapshots) | SSD, 10 GB+ for snapshots |
| OS | Ubuntu 22.04+, Debian 12+, macOS 13+ | Ubuntu 24.04 LTS |
| Docker | Docker Engine 24+ | Docker Engine 27+ |
| Kernel | Linux 5.15+ | Linux 6.x |
| GPU (optional) | None — CPU-only path is fully supported | NVIDIA, compute capability sm_70+ (Volta and newer), CUDA driver 12.x or later |
xg-gpu-tester — see GPU Setup below. Installing the database first on a host with a half-working GPU silently falls back to CPU and you do not get the analytics speedups.
Host Kernel Tunables (required for production)
Stock Linux distributions ship kernel defaults that are too low for graph workloads. xrayGraphDB will start with the defaults but will crash within minutes on any non-trivial graph build (CSR pass 1 fragments jemalloc arenas; the kernel rejects munmap when the per-process VMA count exceeds the limit; jemalloc treats that as fatal). Set the following before loading datasets larger than a few million edges.
bulk_import_file on Friendster (3.6 B edges) died at PASS1 ~60-90 s with <jemalloc>: Error in munmap(): Cannot allocate memory. The host had vm.max_map_count=65530 (the kernel default). Bumping to 1048576 let the build complete on the first try.
vm.max_map_count — required
Maximum number of memory-mapped regions per process. xrayGraphDB needs at least 262 144 for any graph beyond a few hundred million edges; 1 048 576 recommended for production. The kernel default is 65 530, which is too low. The same setting is documented as required by Elasticsearch, MongoDB, and ClickHouse.
# Apply now (no reboot required) sudo sysctl -w vm.max_map_count=1048576 # Persist across reboots echo 'vm.max_map_count=1048576' | sudo tee /etc/sysctl.d/99-xraygraphdb.conf # Verify sysctl vm.max_map_count # vm.max_map_count = 1048576
vm.max_map_count is not a namespaced sysctl — it must be set on the host, not via --sysctl on docker run. If you are running xrayGraphDB in a container, run the sudo sysctl command above on the underlying host (or pass it through your cloud-init / DaemonSet).
Transparent Huge Pages — recommended
Leave THP at its kernel default (madvise on modern distros). xrayGraphDB explicitly opts in via madvise(MADV_HUGEPAGE) on its mmap'd files; setting THP to always enables it system-wide for all processes (small RSS overhead but no correctness issue), and setting it to never disables it including for our request. Do not change unless you know you need to.
cat /sys/kernel/mm/transparent_hugepage/enabled # expected: always [madvise] never (madvise selected)
Optional: file-max + nofile (heavy WAL / replication workloads)
The default per-process open-file limit on most distros is 1024 soft / 4096 hard, which is fine for low-concurrency installs but cramped on heavy replication or many-tenant deployments. Bump to 65 536 in the systemd drop-in (LimitNOFILE=65536) if you see EMFILE in the logs.
What the daemon checks at startup
The daemon reads /proc/sys/vm/max_map_count on every start and emits a [critical] log line if the value is below 65 536 (will-crash territory) and a [warn] if below 262 144. Look for these lines in journalctl -u xraygraphdb or docker logs immediately after startup — they include the exact remediation command. Silence is success.
GPU Setup (optional, before install)
Skip this section if you do not have an NVIDIA GPU. CPU-only installs are fully supported and use the same packages.
If your host has an NVIDIA GPU, install the driver before the database. The xrayGraphDB daemon probes for CUDA on first start and silently falls back to CPU if the driver is missing or broken — so a half-installed driver looks like a working install but never engages the GPU. Verify the driver with the standalone xg-gpu-tester tool before continuing.
1. Install the NVIDIA driver
# Latest production driver branch (550 as of 2026-Q1; check `apt search nvidia-driver`) sudo apt update sudo apt install -y nvidia-driver-550 nvidia-utils-550 sudo reboot # after reboot: nvidia-smi # should print driver version + GPU model + utilization
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo sudo dnf -y module install nvidia-driver:latest-dkms sudo reboot nvidia-smi
# Add non-free firmware and contrib repos first if you have not already. sudo apt update sudo apt install -y nvidia-driver firmware-misc-nonfree sudo reboot
2. Verify with xg-gpu-tester
Download the standalone tester (less than 100 KB, no dependencies beyond glibc) and run it as root. Do this BEFORE installing xrayGraphDB so a failure is loud and obvious instead of hidden behind a silent CPU fallback.
# Download (replace amd64 with arm64 if applicable) curl -fsSLO https://emtai-xray.emtailabs.com # request licensed access chmod +x xg-gpu-tester-amd64 sudo ./xg-gpu-tester-amd64
A healthy host prints something like:
Driver API version: 12.4 CUDA devices: 1 Device 0: NVIDIA RTX 4090 Compute capability : sm_89 SM count : 128 Total memory : 23.99 GiB Allocating 1 MiB on device 0... OK [PASS] 1 CUDA device(s) visible. Primary: NVIDIA RTX 4090 (sm_89) Verdict: GPU READY
Common failure modes and what to do:
| Exit code | Meaning | Fix |
|---|---|---|
1 | libcuda.so.1 not found | Driver isn't installed; install it (step 1) and reboot. |
2 | cuInit() failed | Driver is installed but the kernel module isn't loaded for the running kernel — reboot, or the user lacks /dev/nvidia* permission. |
3 | Zero CUDA devices visible | Check lspci | grep -i nvidia — the GPU may not be on the PCIe bus, or CUDA_VISIBLE_DEVICES is set empty. |
5 | 1 MiB allocation refused | Some other process is using all GPU memory; nvidia-smi will show the process. Free it before installing the database. |
Only proceed to install xrayGraphDB after xg-gpu-tester reports GPU READY. The same binary ships with the database (/usr/bin/xg-gpu-tester) so you can re-run it any time after install.
GPU acceleration in clusters — today vs. v5
If you run a multi-node cluster where some nodes have a GPU and others do not, it matters which node receives the analytics query:
CALL xg.pagerank(...)sent to a node with a GPU → CUDA path, fast.- Same query sent to a node without a GPU → CPU path, slow — even if a sibling replica has a GPU sitting idle.
Caveat: Replicas are read-only. Pure-read analytics work fine on a GPU-equipped replica. Analytics that materialize results back into the graph (e.g.
SET p.rank = ... after PageRank) must run on MAIN, where the GPU may or may not be present.
SHOW REPLICAS metadata. With --gpu-affinity=auto, MAIN's planner will detect GPU-eligible operators (PageRank, Betweenness, Triangle Count, BFS-over-CSR, ...) and forward them over RPC to a GPU-equipped peer, returning rows back through the original session. Read-only safety is enforced automatically; queries that mutate refuse with a clear error if the local MAIN lacks a GPU. This removes the need for client-side host pinning — you point the client at MAIN, and the cluster figures out where to actually run the kernel.
Installation: Docker
Docker is the fastest way to start xrayGraphDB. The image ships with sensible defaults — storage path, ports, mmap engine, AES-256-GCM tenant crypto. Pull it, bootstrap an admin on the first run, and you're up.
Before you run anything: raise vm.max_map_count on the host. Docker can't override host kernel sysctls — the daemon hard-fails below 65,536 and warns below 262,144. See Host Kernel Tunables above for the one-liner.
First run — bootstrap an admin
The daemon refuses connections without an admin user. Pass --init-admin-user and --init-admin-password on the first start to create one. These flags are bootstrap-only — they only create a user when no users exist, and become silent no-ops on every subsequent boot once the admin record is on disk.
# Pull (or `docker load < xraygraphdb-4.9.4-docker.tar.gz` for the offline tarball) docker pull xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 # First run — creates admin/<your-password> on initial boot docker run -d \ --name xraygraphdb \ --restart unless-stopped \ -p 7689:7689 \ -v xraygraphdb-data:/var/lib/xraygraphdb \ -v xraygraphdb-logs:/var/log/xraygraphdb \ xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \ --license-acknowledge-saved=true \ --init-admin-user=admin \ --init-admin-password=YourStrongPassword!23 # Wait ~5s, then confirm the daemon is listening docker logs xraygraphdb | grep "xrayProtocol listening" # Expected: "xrayProtocol listening on 0.0.0.0:7689 with 4 workers"
Security: rotate the bootstrap flags off after the first boot. Once the daemon confirms it's listening, the admin record is persisted in /var/lib/xraygraphdb/auth/ and the --init-admin-* flags do nothing on restart. But they remain visible in three places — docker inspect xraygraphdb (full Args[]), /var/lib/docker/containers/<id>/config.v2.json on the host, and ps aux inside the container. Recreate the container without the bootstrap flags so the password isn't sitting in process args:
docker stop xraygraphdb && docker rm xraygraphdb docker run -d \ --name xraygraphdb \ --restart unless-stopped \ -p 7689:7689 \ -v xraygraphdb-data:/var/lib/xraygraphdb \ -v xraygraphdb-logs:/var/log/xraygraphdb \ xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \ --license-acknowledge-saved=true
The named volumes xraygraphdb-data and xraygraphdb-logs persist the database state and logs across container recreations. The data path inside the image is /var/lib/xraygraphdb (matches the daemon's default --data-directory).
GPU acceleration (optional)
For PageRank, BFS, triangle count, K-core, Louvain, and label propagation on a CUDA-capable GPU, add --gpus all and ensure nvidia-container-toolkit is installed on the host. The daemon dlopens libcuda.so.1 at runtime from the host driver — no CUDA toolkit ships in the image. Without --gpus, GPU procs fall back to CPU paths cleanly.
docker run -d \ --gpus all \ --shm-size 10g \ --name xraygraphdb \ -p 7689:7689 \ -v xraygraphdb-data:/var/lib/xraygraphdb \ xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \ --license-acknowledge-saved=true docker logs xraygraphdb | grep "GpuKernelManager" # Expected: "GpuKernelManager: CUDA context ready on device 0 (NVIDIA ...)"
Enterprise license (optional)
If you have a license file, mount it read-only at /etc/xraygraphdb/license.xglicense. The daemon auto-detects it on startup, validates the signature, and re-encrypts it at rest with the local storage epoch on first load:
docker run -d \ --name xraygraphdb \ -p 7689:7689 \ -v xraygraphdb-data:/var/lib/xraygraphdb \ -v /path/to/license.xglicense:/etc/xraygraphdb/license.xglicense:ro \ xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \ --license-acknowledge-saved=true docker logs xraygraphdb | grep "License loaded" # Expected: "License loaded: xg-ent-... tier=enterprise org=..."
Without a license, all xg.* community-tier procedures (PageRank, BFS, triangle count, betweenness, Louvain, etc.) run unrestricted. The license unlocks xray.* (xray-vision-specific procs) and the commercial xgated.* namespace.
Docker Compose
services: xraygraphdb: image: xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 restart: unless-stopped ports: - "7689:7689" volumes: - xraygraphdb-data:/var/lib/xraygraphdb - xraygraphdb-logs:/var/log/xraygraphdb # Optional: Enterprise license # - ./license.xglicense:/etc/xraygraphdb/license.xglicense:ro command: - --license-acknowledge-saved=true # --- FIRST RUN ONLY: uncomment for the first `docker compose up`, # --- then comment back out and `docker compose up -d --force-recreate`. # - --init-admin-user=admin # - --init-admin-password=YourStrongPassword!23 volumes: xraygraphdb-data: xraygraphdb-logs:
docker compose up -d docker compose logs -f xraygraphdb | grep "xrayProtocol listening"
Verify the Server
# Container status docker ps --filter name=xraygraphdb # Daemon ready check — xrayProtocol on 7689 (Bolt is OFF by default; opt-in via --bolt-server-name) docker logs xraygraphdb | grep "xrayProtocol listening" # Expected: "xrayProtocol listening on 0.0.0.0:7689 with 4 workers" # Test connection (Python xray_protocol_client; HELLO must carry a database name) python3 -c 'import xray_protocol_client as xg conn = xg.connect(host="localhost", port=7689, auth_token="admin:YourStrongPassword!23", database="xraygraphdb") print(conn.execute_query("MATCH (n) RETURN count(n) AS n_count"))'
Installation: Linux
Two install paths on Ubuntu 24.04 LTS: the .deb package (recommended — auto-resolves runtime deps via apt) or the portable .tar.gz (everything bundled, runs against the system glibc).
.deb — Ubuntu / Debian
# 1. Download wget https://emtai-xray.emtailabs.com # request licensed access4.9.4_amd64.deb # 2. Install — use `apt install -f` so apt auto-resolves the libgdal34t64 + python3 deps. # `dpkg -i` directly will FAIL the first time with "depends on libgdal34t64; however ...". sudo apt install -f ./xraygraphdb_4.9.4_amd64.deb
.tar.gz — portable build (any glibc ≥ 2.39)
The tarball ships every C/C++ runtime we need (libstdc++, libLLVM, libsolclient, libssl, libcrypto, libxraygraphdb_module_support) under lib/. The install.sh wrapper drops them at /usr/lib/xraygraphdb/lib, registers the path with ldconfig, installs the systemd unit, and apt-installs the one external dep (libgdal34t64).
wget https://emtai-xray.emtailabs.com # request licensed access4.9.4-linux-x86_64.tar.gz tar xzf xraygraphdb-4.9.4-linux-x86_64.tar.gz cd xraygraphdb-4.9.4-linux-x86_64 sudo ./install.sh
Configure — required systemd drop-in
The default /lib/systemd/system/xraygraphdb.service ships an empty ExecStart sentinel so you can layer site-local flags via a drop-in without forking the unit file. Create the drop-in before the first systemctl start:
sudo mkdir -p /etc/systemd/system/xraygraphdb.service.d sudo tee /etc/systemd/system/xraygraphdb.service.d/local.conf >/dev/null <<'EOF' [Service] ExecStart= ExecStart=/usr/lib/xraygraphdb/xraygraphdb \ --data-directory=/var/lib/xraygraphdb \ --bolt-port=7687 \ --xray-port=7689 \ --storage-engine=mmap \ --storage-properties-on-edges=true \ --log-level=WARNING \ --license-acknowledge-saved=true \ --bolt_listen_mode=off EOF sudo systemctl daemon-reload sudo systemctl start xraygraphdb sudo systemctl status xraygraphdb ss -tlnp | grep :7689 # xrayProtocol
XG_ALLOW_PLAINTEXT_TENANT_METADATA=1 in the default service file.
xrayGraphDB encrypts per-tenant metadata at rest. On a fresh install there is no tenant encryption key configured yet, so the default /lib/systemd/system/xraygraphdb.service ships with Environment=XG_ALLOW_PLAINTEXT_TENANT_METADATA=1. This tells the daemon “there’s only one tenant on this host, store the metadata in plaintext — that’s fine.” Without it the daemon refuses to bootstrap and crashes with UnknownDatabaseException: "xraygraphdb" on first start.
You can leave it as-is for: single-tenant installs, evaluation, development, and the small-team / single-application deployments most users have. The env var is the supported default and is not a security regression on a single-tenant host.
You should change it for: multi-tenant production where tenants must not see each other's metadata. The procedure is (a) register a tenant encryption provider through the licensed admin API (HashiCorp Vault, AWS KMS, or your enterprise KMS — see the Licensed admin guide), then (b) remove XG_ALLOW_PLAINTEXT_TENANT_METADATA from the unit file and reload. The daemon will now write encrypted tenant metadata and refuse plaintext writes.
Do not just remove the env var without first registering an encryption provider — the daemon will crash-loop until one of the two is true.
SIGTERM (graceful shutdown). Never use kill -9. A forced kill skips the final snapshot and may result in data loss on next recovery.
Storage Engine Selection
xrayGraphDB supports two storage engines, both built on eMTAi's patent-pending storage architecture. Set with --storage-engine on the command line or in the service file.
| Engine | Flag | Behavior | Best For |
|---|---|---|---|
| mmap | --storage-engine=mmap |
Data stored on NVMe, paged into RAM on demand via kernel page cache. Handles datasets larger than physical RAM. | Production, large datasets, bulk loading |
| default (in-memory) | --storage-engine=default |
All data in RAM. No disk I/O during queries. Lowest possible latency (0.1ms). | Low-latency workloads, datasets that fit in RAM |
default (in-memory) engine stores all vertices, edges, properties, and indexes in RAM. A dataset that uses 2GB on disk (mmap) may require 10-20GB in RAM due to in-memory index structures, property heap allocations, and per-record metadata. Plan for approximately 5-10x the on-disk size when using the in-memory engine.
Memory Configuration
xrayGraphDB should be allowed to use most of the available RAM, but always reserve a safety margin for the operating system. If the database exhausts all memory, the OS OOM killer will terminate the process mid-operation, risking data corruption.
# /etc/systemd/system/xraygraphdb.service.d/memory.conf # RECOMMENDED: Use 95% of physical RAM. The remaining 5% keeps the # OS responsive and prevents the OOM killer from crashing the database. [Service] MemoryMax=95% # On shared servers, set an absolute limit with at least 20% headroom # above your expected peak usage for query working memory and OS caches. # Example for a 256GB server with ~150GB expected dataset: # MemoryMax=200G # WARNING: Do not use MemoryMax=infinity. If the database exhausts all # memory, the OOM killer will terminate the process, potentially # corrupting in-flight writes and WAL entries.
After changing service configuration, reload systemd:
sudo systemctl daemon-reload sudo systemctl restart xraygraphdb
Docker: When running in Docker, set the container memory limit with --memory or deploy.resources.limits.memory in Compose. Always leave 5% of host RAM for the OS. For example, on a 64GB host: docker run --memory=60g.
Installation: macOS
macOS is supported for development only. For production workloads use Linux or Docker.
# Download macOS binary (Apple Silicon or Intel) curl -LO https://releases.emtailabs.com/xraygraphdb/xraygraphdb-v4.9.4-macos-arm64.tar.gz # Extract and run tar xzf xraygraphdb-v4.9.4-macos-arm64.tar.gz cd xraygraphdb-v4.9.4 ./bin/xraygraphdb-wrapper
On macOS you may also use Docker Desktop, which is the recommended approach for local development.
First Boot: Bootstrap Admin
Fresh installs ship with no users in the auth store. The daemon will not let you create the first admin user over the wire (chicken-and-egg), so you must run the one-time bootstrap helper before the database is usable. This step does not apply to replica nodes — replicas inherit the admin user from the cluster's MAIN; see Cluster Setup.
sudo xraygraphdb-bootstrap-admin
The helper:
- Generates a 24-character random password.
- Displays it once on your terminal in red.
- Asks you to retype it to confirm you copied it.
- Writes the username/password to
/run/xraygraphdb/bootstrap.env— this is a tmpfs file, never on disk — and starts the daemon. - Schedules an
ExecStartPosthook that wipes the env file ~15 seconds after the daemon comes up.
/var/lib/xraygraphdb and re-run the bootstrap — the existing user data is unrecoverable.
Loading datasets — where to put files
The default systemd unit ships with PrivateTmp=true for sandbox hardening. That gives the daemon its own private /tmp/ namespace, separate from the host's /tmp/. Anything you place in the host's /tmp/ is invisible to the daemon, and bulk-import calls against those paths fail-fast with 0 vertices / 0 edges in under a second — no error in the journal.
/tmp/. Even though the dataset file is world-readable, the daemon cannot see it. Use /var/lib/xraygraphdb/import/ instead — this path is in the unit's ReadWritePaths=, owned by the xraygraphdb user, and shares the daemon's namespace.
# Right way — daemon can read it: sudo mkdir -p /var/lib/xraygraphdb/import sudo mv ~/com-friendster.ungraph.txt /var/lib/xraygraphdb/import/ sudo chown -R xraygraphdb:xraygraphdb /var/lib/xraygraphdb/import # Then in your bench/import script: client.bulk_import_file("/var/lib/xraygraphdb/import/com-friendster.ungraph.txt") # Wrong way — daemon's PrivateTmp namespace makes this invisible: # client.bulk_import_file("/tmp/com-friendster.ungraph.txt") ← returns 0/0 in 1s
If you need a different dataset path (e.g. a large NVMe mount at /data/), add it to the systemd unit's ReadWritePaths= via a drop-in:
[Service] ReadWritePaths=/data
Then systemctl daemon-reload && systemctl restart xraygraphdb, and the daemon can read/write under /data/.
If the host is going to join an existing cluster as a replica, skip this section and use xraygraphdb-cluster-join instead.
Cluster Setup: 3-Node
xrayGraphDB clusters are MAIN + N replicas. The MAIN owns the auth store, all databases, and accepts writes. Replicas are read-only mirrors that receive WAL pushes from the MAIN; they do not have an independent identity.
Step 1 — Set up MAIN
Pick the server that will be MAIN, install xrayGraphDB normally (Docker or .deb), and bootstrap its admin user. Whatever admin password you set on MAIN becomes the cluster-wide admin password — replicas will receive it on first sync.
Add a drop-in to enable primary mode and pre-list the replicas (you may also use REGISTER REPLICA at runtime):
[Service]
ExecStart=
ExecStart=/usr/lib/xraygraphdb/xraygraphdb \
--data-directory=/var/lib/xraygraphdb \
--bolt-port=7687 \
--xray-port=7689 \
--storage-engine=mmap \
--replication-mode=primary \
--replication-replicas=replica1.example.com:7690,replica2.example.com:7690 \
--license-acknowledge-saved=true \
--bolt_listen_mode=off
sudo systemctl daemon-reload sudo systemctl restart xraygraphdb # Verify primary mode echo "SHOW REPLICATION ROLE;" | xgconsole --auth=admin:YOUR_PW
Step 2 — Join each replica
On each replica host (after a fresh xrayGraphDB install — do not run xraygraphdb-bootstrap-admin on replicas):
sudo xraygraphdb-cluster-join \ --main=main.example.com:7690 \ --replica-port=7690 \ --accept-wipe
The helper stops the local daemon, wipes /var/lib/xraygraphdb/{databases,auth,plugin_licenses,settings,replication,snapshots,wal}, writes a replica drop-in, and restarts. Without --accept-wipe it refuses; the daemon itself also refuses to start in replica mode against non-empty data unless that flag is passed (defense-in-depth in case someone hand-edits systemd).
Add --keep-snapshot-backup to mv the data directory to a timestamped .bak. path instead of deleting — useful if you want a one-restart undo.
Step 3 — Register on MAIN (if not pre-listed)
If you did not bake replica addresses into MAIN's --replication-replicas, register them at runtime:
REGISTER REPLICA replica1 SYNC TO "replica1.example.com:7690"; REGISTER REPLICA replica2 ASYNC TO "replica2.example.com:7690"; SHOW REPLICAS; // Both data_info and system_info should populate within a few seconds.
Pick SYNC for replicas where you need durable acknowledgement of every write before MAIN commits, and ASYNC for replicas where you tolerate lag but don't want them to slow MAIN down.
Step 4 — Verify replication
// On MAIN — write a probe row CREATE (:ReplProbe {ts: timestamp(), host: "main"}); // On each REPLICA — read it back (read-only) MATCH (n:ReplProbe) RETURN n.ts, n.host; // Should return the row written on MAIN; latency should be under a second on healthy networks.
If a replica is empty or stuck, check SHOW REPLICAS on MAIN: a healthy replica reports a populated data_info object and an increasing timestamp. A failed sync reports status: "invalid" — check both hosts' journals (journalctl -u xraygraphdb -n 200) and confirm the replica's port 7690 is reachable from MAIN.
Outage scenarios — what happens, what you do
The cluster has explicit, predictable behavior for every node-down case. Memorize this table.
| Scenario | Behavior today (v4.x) | Operator action |
|---|---|---|
| MAIN dies (no coordinators) | Replicas keep serving READS. Writes are refused with a clear error pointing at MAIN's address. No automatic promotion. | Pick a replica, run SET REPLICATION ROLE TO MAIN on it. Re-point the other replicas at the new MAIN with SET REPLICATION ROLE TO REPLICA WITH PORT 7690 ... --replication-primary=<new-main>:7690 (drop-in edit + restart). Cluster identity is preserved — users, passwords, and data continue from the survivor's last state. |
| MAIN dies (coordinator quorum, enterprise) | Coordinators detect missed heartbeats; majority quorum auto-promotes a replica. Promotion is committed via Raft so split-brain is impossible. Detection + promotion typically takes a few seconds. | None. Inspect SHOW INSTANCE on the new MAIN to confirm. |
| Replica down for minutes | For SYNC replicas: MAIN's writes block until the SYNC timeout (configurable, default 1s), then MAIN demotes that replica to ASYNC and continues. ASYNC replicas just lag. | None — the replica catches up via WAL when it returns. |
| Replica down for hours / days | Same as above. When the replica returns, it requests catch-up from MAIN. If MAIN's WAL still covers the gap, sync resumes seamlessly. | None. Confirm with SHOW REPLICAS that data_info.timestamp climbs. |
| Replica down for weeks / months | MAIN's WAL has rotated past the replica's last applied position. Catch-up is impossible without a re-seed. | Run DROP REPLICA <name> on MAIN, then on the returning host run xraygraphdb-cluster-join --main=<main>:7690 --accept-wipe. The replica re-bootstraps from MAIN's current snapshot. |
| Operator wants to dissolve the cluster | n/a (manual today; xraygraphdb-cluster-leave helper coming in v4.9.5). |
On MAIN: DROP REPLICA <name> for each replica. On each replica: systemctl stop xraygraphdb && rm /etc/systemd/system/xraygraphdb.service.d/20-replica.conf && systemctl daemon-reload && systemctl start xraygraphdb. Each ex-replica becomes standalone with its own copy of the data (last sync point) plus the cluster's admin user. |
| Network partition | SYNC replicas time out and demote to ASYNC. ASYNC replicas continue lagging until the partition heals. MAIN never auto-fails-over without coordinators (no split-brain risk). | Verify with SHOW REPLICAS when the partition heals; replicas catch up automatically. |
SYNC vs ASYNC — pick consciously
- SYNC: MAIN waits for replica to acknowledge each commit before responding to the client. Stronger durability, lower throughput, MAIN can stall on a slow / down replica (until SYNC timeout demotes it).
- ASYNC: MAIN commits locally, ships WAL to replica in the background. Faster, but a MAIN crash before the WAL ships means the last few transactions are lost on failover.
- Mixed cluster: common pattern is one SYNC replica in the same DC for durability + N ASYNC replicas geographically distributed for read fan-out.
Replicas are read-only — bench / write workload implications
Every replica refuses any query that touches the write path: CREATE, MERGE, SET, REMOVE, DELETE, bulk inserts, index/constraint creation, and analytics that materialize results back into properties (e.g. CALL xg.pagerank(...) YIELD ... SET p.rank = ...). The error is unambiguous: "Write transaction refused: this node is in REPLICA mode".
If you're running a bench suite that mixes write-heavy and read-heavy workloads, route writes to MAIN and pure-read benchmarks to a replica — or, if you want each host to be independently writable for like-for-like comparison, do not cluster them at all. Replicas only make sense when you want a read-scaled mirror of MAIN, not when you want N independent writable boxes.
Recovery & Diagnostics
xrayGraphDB v4.9.5+ ships a recovery toolchain so operators can self-serve diagnosis and rollback when storage corruption, interrupted bulk imports, or crash loops happen — without ssh-and-grep-journalctl. Five tools, three layers (prevention · detection · action), every destructive command gated behind explicit acknowledgement and supports --dry-run.
The five tools at a glance
| Command | What it does | Modifies data? |
|---|---|---|
xraygraphdb-doctor | Read-only diagnostic. Walks /var/lib/xraygraphdb/, systemd state, the crash-recovery hint, mmap sizes vs meta.json, WAL, snapshots, ports, disk. Prints a structured verdict + remediation. Exit 0..4 by severity. JSON mode (--json) for monitoring. | No |
xraygraphdb-recover auto | Reads the .crash-recovery-hint.json the daemon wrote before std::terminate, dispatches the recommended action. | Yes |
xraygraphdb-recover quarantine | Renames graph/, databases/, the buildlock, and the hint to .corrupt-*-<ts>; restarts daemon empty. Auth + licenses + plugins + replication preserved. | Yes (rename, recoverable) |
xraygraphdb-recover from-wal | Backs up current graph/, drops a one-time --data-recovery-on-startup flag, restarts the daemon to replay WAL past the last clean checkpoint, removes the drop-in. | Yes |
xraygraphdb-recover from-snapshot | Restores from a snapshot dir (auto-picks newest, or pass --path=). Quarantines current state first; rolls back automatically on failure. | Yes |
xraygraphdb-recover list-quarantines | Read-only inventory of .corrupt-* and .recovery-backup-* dirs with sizes + ages. | No |
xraygraphdb-recover cleanup-quarantines | Deletes quarantine + backup dirs older than N days (default 30; --older-than-days=0 for "all"). Always confirms first. | Yes |
xraygraphdb-watchdog | Opt-in auto-recovery via systemd ExecStartPre. After 3+ consecutive crashes and a hint file present, it runs recover auto -y and resets the failure counter. Always exits 0 so a watchdog hiccup never blocks startup. | Yes (when triggered) |
Layer 1 — Prevention: snapshot-on-bulk-import
Every bulk_import_file call now wraps the underlying CSR build in a rollback envelope. Before the build runs, if a previous CSR build exists for the tenant, the daemon atomically renames it to .csr-pre-import-<name>-<timestamp>/. If anything goes wrong — an exception, an interrupted process, a build that produces zero vertices against a non-empty input file (the classic PrivateTmp footgun where the daemon can't see the host's /tmp/) — the daemon removes the partial new build and renames the snapshot back into place. Filesystem rename is atomic on POSIX; there is no window where both directories are partially populated.
On success, the snapshot lingers on disk so an operator can review and either restore from it (if the new build proved unsatisfactory) or delete it (xraygraphdb-recover cleanup-quarantines --older-than-days=7).
Layer 1.5 — Cooperative cancellation of async imports
An async import on Friendster-scale data (3.6 B edges, ~15 GB CSR payload) runs 5–60 minutes. If you realize mid-flight that you sent the wrong file, the wrong directedness flag, or the wrong tenant, you do not need to systemctl restart xraygraphdb — restarting kills every other tenant's open connections as collateral. Instead the daemon exposes four cooperative cancel surfaces; pick whichever matches your tooling. Full reference at docs/bulk-import-cancel.md.
import_id is visible only to the tenant that started it. Cross-tenant lookups return found=false, indistinguishable from “unknown id” (audit #6890 — no enumeration oracle).
Surface 1 — xrayProtocol native (Python bench client, C++ stress client)
Wire opcode BULK_IMPORT_FILE_CANCEL (0x31) — one frame round-trip. Reply carries {import_id, found, cancellable, phase}. Phase values: PENDING (0), RUNNING (1), DONE (2), ERROR (3), CANCELLED (4). Always-in-sync byte-level spec at CALL xg.protocol_messages() YIELD opcode, body WHERE opcode = '0x31'.
# Bench team: 30-second idle TTL so an abandoned tmux session releases the worker slot import_id = client.bulk_import_file_async( "/var/lib/xraygraphdb/import/friendster.ungraph.txt", cancel_idle_timeout_ms=30000) # Wrong file. Abort. result = client.bulk_import_file_cancel(import_id) assert result["found"] and result["cancellable"] # Poll until CANCELLED (typically <5s, bounded by CSR phase boundary). while True: p = client.bulk_import_file_progress(import_id) if p["phase_name"] in ("DONE", "ERROR", "CANCELLED"): break time.sleep(0.5)
Surface 2 — Cypher procedures (Bolt, dbeaver, neo4j-driver, raw EXECUTE)
For tools that don't speak xrayProtocol natively, the bridge exposes three procs in the xg.* community-tier namespace. Same tenant scoping. Same phase semantics.
-- "Which id was that again?" CALL xg.imports_list() YIELD import_id, phase, bytes_total, started_unix_ms WHERE phase = 'RUNNING' RETURN import_id, phase, bytes_total ORDER BY started_unix_ms DESC; -- Abort the wrong-file import. CALL xg.import_cancel(123) YIELD found, cancellable, phase RETURN found, cancellable, phase; -- Confirm it landed. CALL xg.import_progress(123) YIELD phase, error RETURN phase, error;
Discover the full proc surface via the always-in-sync catalog: CALL xg.builtin_functions() YIELD name, signature, description WHERE category = 'Async Import' RETURN *.
Surface 3 — Cancel-on-idle TTL (opt-in)
Pass an optional trailing u32 cancel_idle_timeout_ms in the BULK_IMPORT_FILE_ASYNC body (0x2E) to have the server auto-cancel if no PROGRESS poll arrives within the budget. Defaults to absent — older clients keep the documented “survives client disconnect” behaviour exactly. A per-job idle-watcher thread checks at min(timeout/4, 5000) ms cadence (clamped to ≥ 50 ms) and flips the cancel flag when the budget is exhausted. Worst-case time-to-cancel on a 30 s budget against Friendster-scale data: ~35 s.
Surface 4 — Observability (Prometheus / Grafana)
Four counters under /metrics — one per terminal phase plus the in-flight invariant:
BulkImportFileStarted - (BulkImportFileCompleted
+ BulkImportFileFailed
+ BulkImportFileCancelled) = in-flight count
Suggested Grafana panel for “dataset/flag mismatch upstream of the daemon”: rate(BulkImportFileCancelled[5m]) / rate(BulkImportFileStarted[5m]) — sustained > 10% means the bench pipeline is feeding wrong files, not that the database is broken.
Layer 2 — Detection: .crash-recovery-hint.json
Audit-#64 mmap-size traps (the most common storage-corruption crash class) now write a structured hint file before the daemon's std::terminate fires. Operators reading journalctl still see the C++ backtrace, but they don't need to parse it — /var/lib/xraygraphdb/.crash-recovery-hint.json contains everything xraygraphdb-doctor and xraygraphdb-recover auto need to dispatch the right action.
{
"trap": "audit_64_mmap_size",
"file": "/var/lib/xraygraphdb/graph/vertices.mmap",
"expected_size_meta_json": 203630336,
"actual_size_disk": 268435456,
"requested_size": 6341138644081852544,
"max_virtual_size": 1099511627776,
"verdict": "header_corrupt",
"recommended_action": "xraygraphdb-recover --rebuild-vertices-from-wal",
"timestamp": "2026-05-06T14:21:31Z"
}
Layer 3 — Action: the recover commands
Anything weird? Always start here:
sudo xraygraphdb-doctor
It will tell you what's wrong — daemon state, mmap consistency, auth store, WAL, snapshots, disk space — and end with one of:
Verdict: Healthy(exit 0)Verdict: Daemon crashed and emitted a structured recovery hint(exit 2) + numbered next stepsVerdict: Daemon up but auth store is empty(exit 1) — runxraygraphdb-bootstrap-adminVerdict: Graph mmap state is inconsistent with meta.json(exit 2)
Doctor said "structured recovery hint"? Just dispatch:
sudo xraygraphdb-recover auto --I-understand-this-modifies-data
This reads the hint and runs exactly the recommended action — quarantine, from-wal, or from-snapshot. If you want to pick one explicitly:
sudo xraygraphdb-recover quarantine --I-understand-this-modifies-data sudo xraygraphdb-recover from-wal --I-understand-this-modifies-data sudo xraygraphdb-recover from-snapshot --path=.csr-pre-import-csr___system__-20260506-141500 --I-understand-this-modifies-data
Every command takes --dry-run (prints every step without executing) and -y/--yes (skip the interactive confirm, for CI). Refuses to run destructively in non-interactive shells without --yes.
What got quarantined — and how to free that disk space later:
sudo xraygraphdb-recover list-quarantines sudo xraygraphdb-recover cleanup-quarantines --older-than-days=30 --I-understand-this-modifies-data
Opt-in auto-recovery (production)
For unattended deployments where 2am-on-Tuesday means nobody is watching, install the watchdog drop-in. Off by default; enabled by file presence:
sudo cp /usr/share/xraygraphdb/40-auto-recover.conf.example \ /etc/systemd/system/xraygraphdb.service.d/40-auto-recover.conf sudo systemctl daemon-reload
That drop-in adds ExecStartPre=+-/usr/bin/xraygraphdb-watchdog --threshold=3 to the daemon unit. The watchdog reads systemctl show -p NRestarts; when it's ≥ 3 and a hint file is present, it runs xraygraphdb-recover auto --yes and resets the failure counter on success. Logs every event to syslog as xraygraphdb-watchdog for monitoring.
The watchdog always exits 0 so a watchdog hiccup never blocks the daemon's normal startup. It refuses to act when no hint file is present (a config error or port conflict isn't recoverable by the recover tool).
Operator quick-reference card
Print this once and tape it to the rack:
| Symptom | First command |
|---|---|
| Anything unexpected at all | sudo xraygraphdb-doctor |
| Doctor verdict mentions a hint file | sudo xraygraphdb-recover auto --I-understand-this-modifies-data |
| Daemon won't start, no hint file | journalctl -u xraygraphdb -n 200 --no-pager |
| Bulk import returns 0/0 in <1s | Move dataset out of /tmp/; daemon's PrivateTmp=true hides it |
| Daemon up but admin login fails | sudo xraygraphdb-bootstrap-admin |
Disk filling with .corrupt-* dirs | sudo xraygraphdb-recover cleanup-quarantines --older-than-days=7 -y |
| Hours-into-the-incident, want a sitrep for monitoring | sudo xraygraphdb-doctor --json |
Where the docs live on a server
The same content (this section + a quickstart cheatsheet) ships with the .deb at /usr/share/xraygraphdb/RECOVERY.md. Read it offline with cat /usr/share/xraygraphdb/RECOVERY.md when the daemon is down and you can't reach this site.
Key Rotation (v5 always-encrypted)
xrayGraphDB is always-encrypted as of v5 — there is no plaintext mode, no EncryptionMode::DISABLED, no flag to opt out. Encrypted storage uses a two-level key hierarchy: a cluster-wide KEK (root-of-trust, stored at $data_dir/.storage_epoch, wrapped by the configured KMS provider) and per-tenant DEKs (one current + N historical, stored in $data_dir/.tenant_keys/<tenant_id>.bin). Each rotation appends a key_rotated event to the SecurityLog (audit #7682), and the xg-security-verify CLI verifies the SHA-256 chain end-to-end. Full operator runbook at docs/key-rotation.md.
Two Cypher procs, audit-trail-bound
Both procs are tenant-scoped server-side, and both append a tamper-evident row to $data_dir/security/security.log:
CALL xg.tenant_key_rotate_dek('acme-corp')
YIELD tenant_id, ok, old_version, new_version, error_code
RETURN *;
——
→ tenant_id | ok | old_version | new_version | error_code
→ acme-corp | true | 7 | 8 | 0Increments current_write_version. New writes for this tenant use the new DEK; existing records stay decryptable via prior versions in the keyring. Typically <50 ms on .187-class hardware regardless of tenant size (it's a keyring-file update, not a record re-encryption pass).
CALL xg.tenant_key_rotate_kek('acme-corp', 'arn:aws:kms:us-east-1:…')
YIELD tenant_id, new_kek_ref, ok, error_code
RETURN *;Unwraps each historical DEK with the previous KEK reference and re-wraps under new_kek_ref. Record bytes on disk are unchanged. Per-tenant operation; for cluster-wide rotation, iterate over your tenant list (the runbook ships a reference shell script).
When to rotate
| Trigger | Cadence | Mandatory? |
|---|---|---|
| Routine hygiene | DEK 90 days, KEK 365 days | No |
| Suspected key compromise | Immediate | Yes |
| Admin offboarding with prior key access | <24 h | Yes |
| SOC2 / FedRAMP / HIPAA compliance window | Per audit cycle | Yes if your program requires |
| Post-restore from backup taken before a security event | Before resuming traffic | Yes |
| Hardware key store migration | At cutover | Yes |
The daemon does not rotate automatically — cadence is an operator policy decision. The runbook ships a reference systemd.timer for quarterly tenant-DEK rotation.
Federation v1 Design (in progress)
docs/design/federation-v1-design.md. The operator-facing surface (Cypher procs, write/read mode flags) will land in a future release when Phase A–E ship.
TL;DR
Raft leader-follower. Writes go to the elected leader. Default commit policy is semi-sync (leader + one follower ack). Quorum mode for strict durability. Async mode for bulk ingest / benchmarks / dev. Failover uses Raft heartbeat/election with operator override. Minority partitions become read-only. No active-active writes in v1. No last-writer-wins for graph mutations.
Two replication planes
A core architectural decision — bandwidth and latency profiles differ enough that they get separate channels:
- WAL plane. Nodes / edges / properties / indexes / schema / metadata replicate through the Raft log. Every entry is consensus-committed; followers apply in
commit_lsnorder. - Artifact plane. CSR builds, columnar segments, vector indexes, geo indexes, materialized views, analytics results replicate as versioned snapshots keyed by epoch/LSN on a separate channel (HTTP/2 streaming). The Raft log carries a manifest entry referencing the artifact; followers fetch out-of-band. Putting multi-GB CSR builds in the Raft log would crater throughput on every follower.
Write modes
| Mode | Semantics | Use case |
|---|---|---|
async | Leader commits locally; followers eventual | Dev, benchmarks, bulk ingest, analytics caches |
semi_sync (default) | Leader + 1 follower ack | Normal enterprise deployments |
quorum | Majority ack before commit | Financial, compliance (DoD / SOC2 / FedRAMP / HIPAA), critical workloads |
Read modes
| Mode | Semantics |
|---|---|
read_local (default) | Stale-allowed; replication_lag_ms visible in BATCH reply header (negotiated CAP_REPL_LAG) |
read_leader | Routed to current Raft leader; strongest consistency |
read_majority_committed | Block until commit_lsn covered by N/2+1 followers |
read_snapshot_at_lsn(LSN) | Time-travel for long-running analytical queries |
Explicitly out of scope for v1
- Active-active multi-leader. Graph mutation merging on conflicting writes (same-node update on two leaders, same-edge deleted on one side, property-index divergence) is a multi-quarter problem — revisit only after v1 has been in production for a year.
- Cluster-wide ACID transactions. Each write is independently Raft-committed; multi-statement transactions are leader-local. No cross-shard 2PC.
- Read-your-writes session consistency across followers. A client that writes and then reads from a different follower may see stale data. Clients needing read-your-writes use
read_leaderor pin to one node.
Configuration Flags Reference
xrayGraphDB exposes 183 command-line flags. They are typically set in the ExecStart= line of the systemd unit (/etc/systemd/system/xraygraphdb.service) or loaded en masse via --flag-file=/etc/xraygraphdb/xraygraphdb.conf. Every flag accepts both hyphen and underscore forms (--bolt-port = --bolt_port), and all flags can be set as environment variables using the XRAY_FLAG_NAME convention (uppercase, underscores).
[prod] below are the ones set in the canonical .187 production unit. A minimal-but-complete ExecStart= looks like this:
ExecStart=/usr/lib/xraygraphdb/xraygraphdb \ --storage-engine=default \ --data-directory=/neo4j/xraygraphdb \ --bolt-port=7687 \ --storage-properties-on-edges=true \ --storage-wal-enabled=true \ --storage-snapshot-on-exit=true \ --storage-snapshot-interval-sec=300 \ --data-recovery-on-startup=true \ --xray-workers=128 \ --log-level=WARNING \ --also-log-to-stderr=true \ --query-execution-timeout-sec=0 \ --license-acknowledge-saved=true \ --swim-enabled=true \ --swim-user=<scds-user> \ --swim-pass=@/etc/xraygraphdb/swim.pass \ --swim-queues=<queue-spec>
--license-acknowledge-saved=true, the daemon prints the full license JSON to journalctl on every restart so the operator can save it externally. Once you have the license stored in a password manager / offline backup, set this flag to true in the systemd unit to suppress the dump.
Network — Bolt, xrayProtocol, WebSocket
| Flag | Type | Default | Description |
|---|---|---|---|
--bolt-address | string | "0.0.0.0" | IP address the Bolt server binds to. |
--bolt-advertised-address | string | "" | Address advertised in the Bolt ROUTE routing table (host:port). Required behind NAT / proxy / container. |
--bolt-cert-file | string | "" | Path to the TLS certificate for the Bolt server. |
--bolt-debug-log | bool | false | Log Bolt negotiated version, state transitions, and message types. |
--bolt-emergency-allowlist | string | "" | Comma-separated IPv4/IPv6/CIDR entries permitted to connect when --bolt-listen-mode=emergency. |
--bolt-emergency-until | string | "" | ISO-8601 timestamp when the Bolt emergency window auto-closes. |
--bolt-honor-routing-address | bool | false | Use the routing_context address from ROUTE requests instead of the server config. |
--bolt-key-file | string | "" | Path to the TLS private key for the Bolt server. |
--bolt-listen-mode | string | "off" | Bolt listener mode: off (do not bind 7687), emergency (allowlist + window), on (legacy: accept any peer). xrayProtocol on 7689 is the production transport. |
--bolt-max-auth-retries | int32 | 3 | Maximum failed LOGON attempts before closing a Bolt v5.1+ connection. |
--bolt-num-workers | int32 | 128 | Number of Bolt worker threads. Defaults to the host CPU count. |
--bolt-port [prod] | int32 | 7687 | Port on which the Bolt server listens. |
--bolt-routing-ttl | int32 | 300 | TTL in seconds for the synthetic routing table returned in ROUTE SUCCESS. |
--bolt-server-agent | string | "xrayGraphDB/4.9.1" | Server agent string sent in Bolt HELLO SUCCESS. Set to a Neo4j-shaped string for maximum driver compatibility. |
--bolt-server-name-for-init | string | (version banner) | Server name returned to the client in the Bolt INIT message. |
--bolt-session-timeout | int32 | 0 | Idle Bolt session timeout in seconds; emitted as a connection.recv_timeout_seconds hint. 0 disables. |
--bolt-telemetry-log | bool | false | Log Bolt TELEMETRY API values at INFO level (default is trace-only). |
--bolt-ws-port | int32 | 7688 | Port for the WebSocket-to-Bolt bridge (browser clients, HA cluster traffic). |
--bolt-ws-workers | int32 | 3 | Number of WebSocket bridge IO threads. |
--xray-port | int32 | 7689 | xrayProtocol server port. Set to 0 to disable the xrayProtocol listener. |
--xray-workers [prod] | int32 | 4 | Number of xrayProtocol worker threads. Production deployments size this to the host CPU count. |
--xray-idle-timeout-sec | int32 | 300 | Seconds of inactivity before closing an xrayProtocol connection. |
--xray-max-connections | int32 | 4096 | Maximum concurrent xrayProtocol connections before new connections are rejected. |
--xray-preauth-buffer-kb | int32 | 8 | Per-connection receive buffer size (KiB) before authentication completes. Caps slowloris / OOM exposure. |
--xray-recv-buffer-max-mb | int32 | 128 | Maximum per-connection receive buffer size in MiB. |
--xray-proc-slow-log-threshold-ms | int64 | 1000 | xray.* procedure calls running longer than this emit a structured warning log on completion. 0 disables. |
--websocket-address | string | "127.0.0.1" | Bind address for the WebSocket monitoring server. Set to 0.0.0.0 only behind a firewall. |
--websocket-port | int32 | 0 | Port for the WebSocket monitoring server. 0 means use --monitoring-port. |
--monitoring-address | string | "127.0.0.1" | Bind address for the monitoring WebSocket server. |
--monitoring-port | int32 | 7444 | Port for the monitoring WebSocket server. |
--metrics-address | string | "127.0.0.1" | Bind address for the Prometheus-style metrics endpoint. Set to 0.0.0.0 only behind a firewall / VPN. |
--metrics-port | int32 | 9091 | Port for the metrics endpoint. |
--metrics-auth-token | string | "" | Bearer token required on every /metrics request. Empty refuses any non-loopback bind. |
--rpc-peer-allowlist | string | "" | Comma-separated IPv4/IPv6/CIDR list permitted to connect to internal cluster RPC ports. |
--rpc-shared-secret-path | string | "" | Path to the 32-byte cluster shared secret used to HMAC-SHA256 every RPC payload (mode 0600). |
Storage
| Flag | Type | Default | Description |
|---|---|---|---|
--data-directory [prod] | string | "/var/lib/xraygraphdb" | Directory where all permanent data (snapshots, WAL, auth store) lives. |
--data-recovery-on-startup [prod] | bool | true | Recover persisted data from snapshot + WAL on startup. |
--csr-directory | string | "" | Base directory for CSR edge stores. Defaults to the parent of --data-directory when empty. |
--storage-engine [prod] | string | "memory" | Storage backend: mmap (file-backed), memory (in-RAM SkipList), default/auto (RAM-aware: mmap below 128 GiB, memory at 128 GiB+). |
--storage-mode | string | "IN_MEMORY_TRANSACTIONAL" | Default storage mode at startup: IN_MEMORY_ANALYTICAL, IN_MEMORY_TRANSACTIONAL, ON_DISK_TRANSACTIONAL, MMAP_TRANSACTIONAL. |
--storage-ephemeral | bool | false | Disable WAL, snapshots, and recovery. Data is lost on restart. Use only for dev / test / disposable workloads. |
--storage-properties-on-edges [prod] | bool | true | Allow edges to carry properties. Set false only for pure topology graphs to save memory. |
--storage-wal-enabled [prod] | bool | true | Enable the write-ahead log for crash recovery between snapshots. |
--storage-wal-file-size-kib | uint64 | 20480 | Target file size before a WAL segment rotates (KiB). |
--storage-wal-file-flush-every-n-tx | uint64 | 100000 | fsync the WAL after this many transactions. Set to 1 for fully synchronous durability. |
--storage-snapshot-on-exit [prod] | bool | true | Take a snapshot during clean shutdown. |
--storage-snapshot-interval-sec [prod] | uint64 | 300 | Periodic snapshot interval in seconds. 0 disables periodic snapshots. |
--storage-snapshot-interval | string | "" | Snapshot schedule via cron expression or period in seconds. |
--storage-snapshot-retention-count | uint64 | 3 | How many snapshot files to retain on disk. |
--storage-snapshot-thread-count | uint64 | 128 | Worker threads used to write snapshots when parallel snapshot is on. |
--storage-parallel-snapshot-creation | bool | false | Create snapshots using --storage-snapshot-thread-count threads. |
--storage-recovery-thread-count | uint64 | 128 | Threads used to recover persisted data from disk. |
--storage-parallel-schema-recovery | bool | false | Rebuild indexes and constraints in parallel during recovery. |
--storage-backup-dir-enabled | bool | true | Use the .old directory to retain the previous snapshot and WAL set. |
--storage-gc-cycle-sec | uint64 | 30 | Garbage collector interval in seconds. |
--storage-gc-aggressive | bool | false | Enable aggressive garbage collection. |
--storage-python-gc-cycle-sec | uint64 | 180 | Full Python GC interval in seconds (for Python query modules). |
--storage-access-timeout-sec | uint64 | 1 | Storage-level access timeout for a query in seconds. |
--storage-items-per-batch | uint64 | 0 | Edges and vertices stored per batch in a snapshot file. |
--storage-floating-point-resolution-bits | uint64 | 52 | Floating-point resolution bits for property encoding. |
--storage-delta-on-identical-property-update | bool | true | Create a delta object even when a property is rewritten with the same value. |
--storage-enable-edges-metadata | bool | false | Store extra edge metadata to accelerate certain traversals. |
--storage-enable-schema-metadata | bool | false | Track resident labels and edge types as schema metadata. |
--storage-automatic-edge-type-index-creation-enabled | bool | false | Auto-create edge-type indexes on relationships. |
--storage-automatic-label-index-creation-enabled | bool | false | Auto-create label indexes on vertices. |
--storage-property-store-compression-enabled | bool | false | Enable property-store compression. |
--storage-property-store-compression-level | string | "mid" | Compression level for property storage: low, mid, high. |
--storage-rocksdb-enable-thread-tracking | bool | false | Enable RocksDB thread-status tracking. Off by default for lower syscall overhead. |
--storage-rocksdb-info-log-level | string | "INFO_LEVEL" | RocksDB info log level: DEBUG_LEVEL … FATAL_LEVEL. |
--schema-info-enabled | bool | false | Track run-time schema info (per-tenant label / edge-type roster). |
--orphan-ttl-days | int32 | 30 | Days to keep orphaned database directories in .orphans/ before automatic purge. -1 disables auto-purge. |
--skip-recovery | bool | false | Start with empty storage even if snapshot recovery fails. The next commit overwrites the corrupt snapshot. |
Authentication & License
| Flag | Type | Default | Description |
|---|---|---|---|
--admin-reset | string | "" | Reset the admin user password. Prefix with @ to read from a file. Requires --auth-token. |
--admin-set-tenant | string | "" | Targeted recovery: assign a tenant_id to a single existing user (<username>=<tenant_id>). Requires --auth-token. Never pass =default. |
--auth-token | string | "" | Ownership proof for admin recovery. Accepts the storage epoch key or a license signature hash. Prefix with @ to read from a file. |
--auth-argon2-mem-cost | int32 | 65536 | Argon2id memory cost in KiB for new password hashes (default 64 MiB). |
--auth-argon2-parallelism | int32 | 4 | Argon2id parallelism (threads / runtime concurrency) for new hashes. |
--auth-argon2-lanes | int32 | 0 | Argon2id lanes (RFC 9106 hash-shape parameter). When 0 (default), uses --auth-argon2-parallelism. Set explicitly to decouple lanes from threads for reproducibility across thread-count changes (audit #9607). |
--auth-argon2-time-cost | int32 | 3 | Argon2id iterations for new hashes. |
--auth-bcrypt-max-verify-cost | int32 | 16 | Maximum bcrypt cost Verify() will honour. Higher costs are refused as a DoS amplifier guard. Range [4, 31]. Raise temporarily to authenticate imported high-cost hashes (e.g. Memgraph migration with cost=17/18 admin), then force-reset the affected users and lower it back (audit #9603). |
--plugin-allow-cross-tier-compliance | string | "" | Comma-separated tier-pair (only valid value today is federal,dod) permitting cross-tier plugin loads when the operator carries BOTH compliance regimes. Default empty: federal/dod licenses load matching-tier plugins only, preserving STIG/IL5 vs FedRAMP-moderate distinction. Rank-based downward inclusion (federal ⊇ enterprise ⊇ community) still applies (audit #9614). |
--auth-lockout-threshold | int32 | 5 | Consecutive failed login attempts before exponential-backoff lockout kicks in. |
--auth-lockout-max-delay-sec | int32 | 60 | Cap on the exponential-backoff delay between rate-limited authentication attempts. |
--auth-lockout-table-max-size | int32 | 1000 | Maximum usernames tracked for failed-login rate limiting before LRU eviction. |
--auth-module-mappings | string | "" | Map auth schemes to external modules: "<scheme>:<path>;…". |
--auth-module-timeout-ms | int32 | 10000 | Timeout in milliseconds when waiting for an external auth module response. |
--auth-password-permit-null | bool | false | Allow null / empty passwords. Not recommended. |
--auth-password-strength-regex | string | ".+" | Regex the entire password must match. |
--auth-reject-unsalted-sha256 | bool | false | Refuse to authenticate accounts whose stored hash is unsalted SHA-256, forcing a password reset. Narrower than --auth-reject-sha256-no-stretch; retained as defence-in-depth gate for operators who explicitly opt into the broader migration window. |
--auth-reject-sha256-no-stretch | bool | true | Default true in v5+. SHA-256 has no key stretching and is GPU-brute-forceable at ~1B/s, so xrayGraphDB refuses ANY SHA-256 hash (salted or not). To migrate a v4.x deploy with existing SHA-256 users: run CALL xg.security_list_weak_hash_accounts() to scope the affected users, then set this flag to false for one rotation window. Each successful SHA-256 verify rotates the stored hash inline to bcrypt; once enumeration returns zero rows, set the flag back to true (audit #9604). |
--auth-allow-legacy-bcrypt-truncation | bool | false | When true, bcrypt verify retries with the supplied password truncated to 72 bytes if the strict path fails. This is the audit-#7620 backward-compat path for pre-#7009 accounts. Default false closes the audit-#9602 timing oracle (a doubled verify latency that fingerprints pre-#7009 accounts). Set true ONLY during an upgrade rotation window — xrayGraphDB rotates any matched legacy hash inline so the oracle is paid once per account, then set the flag back to false. |
--auth-tenant-migration-target | string | "" | Target tenant for the legacy empty-tenant_id user migration. Set to the org's default tenant when migrating. |
--auth-user-or-role-name-regex | string | "[a-zA-Z0-9_.+-@]+" | Regex every username and role name must match. |
--init-admin-user | string | "" | Bootstrap admin username. Creates the first user at startup if no users exist. |
--init-admin-password | string | "" | Bootstrap admin password. Required when --init-admin-user is set. |
--init-admin-tenant | string | "__system__" | Tenant id stamped on the bootstrap admin user. Empty or default is rejected (creates a chicken-and-egg lockout). |
--bulk-trust-tenant-id-from-users | string | "" | Comma-separated list of usernames whose BULK_INSERT_NODES / BULK_UPSERT_NODES requests are trusted to set the tenantId property explicitly via prop_names. For internal-pipeline service accounts that connect as one shared DB user but write rows for many tenants (e.g. xray-vision Rust pipeline). Other users still hit the standard reject. String type-check still applies. Empty default = guard applies to everyone. |
--init-data-file | string | "" | Path to a Cypher script run after the server starts (data seed). |
--init-file | string | "" | Path to a Cypher script run before the server starts (users / schema bootstrap). |
--encryption-mode | string | "disabled" | Per-tenant encryption mode: disabled, optional, required. |
--license-acknowledge-saved [prod] | bool | false | Operator confirmation that the license JSON was saved externally. Required to suppress the journal dump on systemd / container deploys. |
--license-file | string | "" | Path to the xrayGraphDB A+ license file. |
--validate-license | string | "" | Validate a license file or inline JSON / JWS without starting the server. |
--lbac-legacy-allow-empty | bool | false | Pre-LBAC backward-compat: when true, a user with no LBAC entries gets implicit ALLOW on every label and edge type. Default false (secure-by-default DENY). |
--repair | string | "" | Repair issues found by --verify-integrity. Value is all or comma-separated subsystems. Requires --auth-token. |
--verify-integrity | string | "" | Run integrity checks. Value is all or comma-separated auth,snapshot,orphans. Requires --auth-token. |
Logging & Diagnostics
xrayGraphDB also inherits the standard glog flags — the most important is --also-log-to-stderr=true (used by the prod systemd unit) which mirrors the daemon log to stderr so journalctl -u xraygraphdb shows the full output. --alsologtostderr is the underscore-free alias.
| Flag | Type | Default | Description |
|---|---|---|---|
--log-level [prod] | string | "WARNING" | Minimum log level: TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL. |
--log-file | string | "" | Path to the daemon log file. Empty defers to journald / stderr. |
--log-format | string | "text" | Log format: text (human-readable) or json (one JSON object per line for log aggregators). |
--log-retention-days | uint64 | 35 | Days a daily log file is preserved before rotation deletes it. |
--logger-type | string | "sync" | Synchronous or asynchronous logger: sync, async. |
--core-dump-directory | string | "/var/lib/xraygraphdb/cores" | Directory for core-dump files (mode 0700). Empty disables core-dump management. |
--debug-query-plans | bool | false | DEBUG-log every candidate query plan considered by the planner. |
--query-log-directory | string | "" | Directory where query logs are stored. Empty disables. |
--nuraft-log-file | string | "" | File where NuRaft (cluster Raft) logs are written. |
--shutdown-watchdog-seconds | int32 | 600 | Deadline for graceful shutdown. If exceeded the daemon SIGABRTs and writes a per-thread diagnostic dump under <data-directory>/shutdown-diagnostics/. Must be less than systemd TimeoutStopSec. |
--memory-warning-threshold | uint64 | 1024 | Free-RAM threshold (MiB) below which the daemon emits a warning. 0 disables. |
--telemetry-enabled | bool | false | Enable telemetry reporting (CPU / memory / vertex-edge counts). |
--support-email | string | "sme@emtailabs.com" | Support contact email for licensing, enterprise features, and production support. |
--timezone | string | "UTC" | Instance timezone in IANA format. |
--help | bool | false | Print help on every flag and exit. |
--help-xml | bool | false | Print help in XML form and exit. |
--version | bool | false | Print version and build info, then exit. |
Audit Logging
| Flag | Type | Default | Description |
|---|---|---|---|
--audit-enabled | bool | false | Enable audit logging (requires enterprise license). |
--audit-buffer-size | int32 | 100000 | Maximum entries held in the audit-log ring buffer. |
--audit-buffer-flush-interval-ms | int32 | 200 | Audit buffer flush interval in milliseconds. |
--audit-overflow-policy | string | "drop" | Behavior when the audit buffer is full: drop (warn), block (compliance mode), fail (fail the query). |
--audit-retention-days | int32 | 90 | Days to retain rotated audit log files before automatic deletion. 0 disables auto-cleanup. |
--audit-rotation-hour-utc | int32 | 0 | Hour of day (UTC, 0-23) at which the audit log rotates. |
Replication & Coordination
| Flag | Type | Default | Description |
|---|---|---|---|
--replication-mode | string | "standalone" | Replication role: standalone, primary, replica. |
--replication-bind-address | string | "0.0.0.0" | Bind address for the replication server spun up after a MAIN to REPLICA demotion. |
--replication-port | int32 | 7690 | Port for incoming replication connections (replica mode). |
--replication-primary | string | "" | Primary server address (host:port) when running as a replica. |
--replication-replicas | string | "" | Comma-separated replica addresses when running as primary. |
--replication-replica-check-frequency-sec | uint64 | 1 | Interval in seconds between replica health-check pings. 0 disables. |
--replication-restore-state-on-startup | bool | true | Re-attach previously registered replicas at startup. |
--accept-wipe | bool | false | Acknowledge that joining a replica wipes local data. Required when starting in replica mode against a non-empty data directory. |
--coordinator-id | int32 | 0 | Unique ID of the Raft server. |
--coordinator-port | int32 | 0 | Port the Raft coordinator listens on. |
--coordinator-hostname | string | "" | Hostname returned in SHOW INSTANCES. |
--management-port | int32 | 0 | Port the coordinator management server listens on. |
--instance-down-timeout-sec | uint32 | 15 | Seconds after which an instance is considered down by the coordinator. |
--instance-health-check-frequency-sec | uint32 | 5 | Seconds between coordinator health-check pings to instances. |
Plugin / Module System
| Flag | Type | Default | Description |
|---|---|---|---|
--query-modules-directory | string | "" | Directory (or comma-separated directories) where custom query modules are loaded from. |
--query-callable-mappings-path | string | "" | Path to a JSON file of alias → procedure mappings, used to alias missing procedures to existing ones. |
--kafka-bootstrap-servers | string | "" | Default Kafka broker list (comma-separated host:port) for stream sources. |
--kms-provider | string | "local" | KMS provider for per-tenant encryption: local or aws. Requires --encryption-mode. |
--stream-transaction-conflict-retries | uint32 | 30 | Times a stream transformation retries on transaction conflict before giving up. |
--stream-transaction-retry-interval | uint32 | 500 | Stream transformation retry interval in milliseconds on transaction conflict. |
Query & Planner
| Flag | Type | Default | Description |
|---|---|---|---|
--query-execution-timeout-sec [prod] | double | 600 | Maximum query execution time in seconds. 0 disables the limit. |
--query-cost-planner | bool | true | Use the cost-estimating query planner. |
--query-max-plans | uint64 | 1000 | Maximum number of candidate plans the planner considers per query. |
--query-plan-cache-max-size | int32 | 1000 | Maximum number of compiled query plans cached. |
--cartesian-product-enabled | bool | true | Allow cartesian product expansion in the planner. |
--isolation-level | string | "SNAPSHOT_ISOLATION" | Default transaction isolation: READ_COMMITTED, READ_UNCOMMITTED, SNAPSHOT_ISOLATION. |
--hops-limit-partial-results | bool | true | Return partial results when the hops limit is reached. |
--enable-index-only-scans | bool | true | Read property values directly from the index without touching the vertex store when possible. |
--enable-simd-filter | bool | true | Enable AVX2-accelerated property filtering with scalar fallback. |
--simd-batch-size | int64 | 32 | Rows batched per SIMD filter operation. Must be at least 4 (AVX2 width). |
--parallel-scan-threshold | int64 | 0 | Vertex-count threshold for auto-selecting parallel scan. 0 disables; -1 forces parallel everywhere. |
--disable-gpu | bool | false | Disable GPU acceleration even if a CUDA device is detected. Analytics fall back to CPU. |
--strict-unbound-identifier | bool | false | Throw QueryRuntimeException on unmapped Identifier::symbol_pos_ instead of returning NULL. |
--memory-limit | uint64 | 0 | Total memory limit in MiB. 0 = auto (100% of physical RAM if swap is enabled, 90% otherwise). Upper bound is roughly 1 PiB. |
--experimental-config | string | "" | Experimental features configuration (JSON object). |
--experimental-enabled | string | "" | Comma-separated experimental features to enable (e.g. planner-v2). |
--v4-compiled-engine | bool | false | Enable the v4 JIT-compiled query engine for supported read-only queries. |
--v4-engine-mode | string | "auto" | v4 routing mode: off, auto (route supported queries), forced (route everything). |
--v4-shadow-compare | bool | false | Run the compiled engine in shadow mode alongside Volcano and log mismatches. |
--cypher-variable-expand-allow-bfs-dedup | bool | false | Route Cypher MATCH (a)-[*lo..hi]->(b) variable-length expansions through the BFS-with-bitset-dedup operator instead of the Volcano DFS path. Deliberate semantic change: BFS dedup emits one row per reachable destination vertex (regardless of how many distinct walks lead to it) and the bound edge-list symbol is empty / lossy. Use only for reachability-style queries (e.g. RETURN DISTINCT b.id). On hub-heavy graphs (LDBC SF1 KNOWS, 1,532-degree Person vertices) the Volcano path explodes combinatorically at depth 4+; this flag is the operator-controlled escape hatch. Default false preserves Cypher's path-enumeration contract. |
--xray-analytics-edges-per-thread | int64 | 50000000 | CSR analytics thread heuristic: one thread per N edges. Lower for higher-bandwidth hardware. |
--xray-analytics-max-threads | int32 | 32 | Absolute thread ceiling for CSR analytics. 0 = no cap. |
--xray-analytics-min-threads | int32 | 4 | Floor on analytics threads for small graphs. |
--xray-clustering-coefficient-sample-cap | int64 | 200 | Maximum neighbours sampled per vertex during clustering-coefficient computation before falling back to Fisher-Yates with bias correction. 0 disables sampling. |
--xray-graph-stats-cache-ttl-sec | int32 | 300 | TTL in seconds for the graph-statistics summary result cache. Successive calls within this window return the cached result instantly. Cache key is (tenant id, label filter, vertex count, edge count); writes that change either total invalidate naturally. 0 disables the cache (recompute every call). Each call emits three additional metrics — cache_hit, cache_age_ms, compute_time_ms — alongside the existing time_ms so monitoring can distinguish hit from miss without query-side timing. |
AWS Integration
| Flag | Type | Default | Description |
|---|---|---|---|
--aws-access-key | string | "" | AWS access key for the AWS integration. |
--aws-secret-key | string | "" | AWS secret key for the AWS integration. |
--aws-region | string | "" | AWS region for the AWS integration. |
--aws-endpoint-url | string | "" | Override AWS endpoint URL (custom regions, S3-compatible stores). |
SWIM (FAA Flight Data Ingestion)
| Flag | Type | Default | Description |
|---|---|---|---|
--swim-enabled [prod] | bool | false | Enable the native FAA SWIM data consumer (Solace PubSub+). Requires --swim-user and --swim-pass. |
--swim-user [prod] | string | "" | SWIM SCDS username (e.g. sme.emtailabs.com). |
--swim-pass [prod] | string | "" | SWIM SCDS password. Prefix with @ to read from a file. Inline values are visible in /proc/cmdline. |
--swim-queues [prod] | string | "" | Comma-separated queue specs NAME:BROKER_URL:VPN:QUEUE_NAME,…. |
--swim-database | string | "" | Database name SWIM ingestion writes Aircraft / flight data into. Empty targets a database called swim. |
--swim-ttl-seconds | uint64 | 7200 | TTL for SWIM aircraft nodes. Older nodes with no updates are deleted. |
--swim-cleanup-interval-seconds | uint64 | 300 | How often the SWIM TTL cleanup thread runs. |
Operational
| Flag | Type | Default | Description |
|---|---|---|---|
--daemonize | bool | false | Run as a Unix daemon (double-fork, detach from the terminal). |
--flag-file | string | "/etc/xraygraphdb/xraygraphdb.conf" | Load flags from a file (one flag per line, gflags syntax). |
--allow-load-csv | bool | false | Allow the LOAD CSV Cypher clause. Off by default for security. |
--load-csv-allowed-paths | string | "" | Comma-separated directory prefixes LOAD CSV may read from. Empty rejects all local paths. |
--file-download-conn-timeout-sec | uint64 | 10 | Timeout for establishing a connection to a remote server when downloading a file. |
--force-recovery-past-corruption | bool | false | Deprecated. WAL recovery always skips past corrupt entries; transactions between a corrupt entry and the next valid entry are silently lost. |
First Connection: Python
xrayGraphDB is compatible with the official Neo4j Python driver. Install it with pip and connect over the Bolt protocol.
pip install neo4j
from neo4j import GraphDatabase driver = GraphDatabase.driver( "bolt://localhost:7687", auth=("admin", "<your-password>") ) with driver.session() as session: # Create a node session.run( "CREATE (n:Person {name: $name, age: $age})", name="Alice", age=30 ) # Read it back result = session.run( "MATCH (n:Person {name: $name}) RETURN n.name, n.age", name="Alice" ) record = result.single() print(record["n.name"], record["n.age"]) # Output: Alice 30 driver.close()
First Connection: JavaScript
npm install neo4j-driver
const neo4j = require('neo4j-driver'); const driver = neo4j.driver( 'bolt://localhost:7687', neo4j.auth.basic('admin', '<your-password>') ); const session = driver.session(); try { // Create a node await session.run( 'CREATE (n:Person {name: $name, age: $age})', { name: 'Bob', age: 25 } ); // Read it back const result = await session.run( 'MATCH (n:Person {name: $name}) RETURN n', { name: 'Bob' } ); console.log(result.records[0].get('n').properties); } finally { await session.close(); await driver.close(); }
First Connection: Java
Add the Neo4j Java driver to your Maven or Gradle project.
<!-- Maven dependency --> <dependency> <groupId>org.neo4j.driver</groupId> <artifactId>neo4j-java-driver</artifactId> <version>5.x</version> </dependency>
import org.neo4j.driver.*; public class XRayExample { public static void main(String[] args) { var driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic("admin", "<your-password>") ); try (var session = driver.session()) { session.run( "CREATE (n:Person {name: $name})", Values.parameters("name", "Carol") ); var result = session.run( "MATCH (n:Person) RETURN n.name" ); while (result.hasNext()) { System.out.println(result.next().get("n.name").asString()); } } driver.close(); } }
First Connection: Go
go get github.com/neo4j/neo4j-go-driver/v5
package main import ( "context" "fmt" "github.com/neo4j/neo4j-go-driver/v5/neo4j" ) func main() { ctx := context.Background() driver, err := neo4j.NewDriverWithContext( "bolt://localhost:7687", neo4j.BasicAuth("admin", "<your-password>", ""), ) if err != nil { panic(err) } defer driver.Close(ctx) session := driver.NewSession(ctx, neo4j.SessionConfig{}) defer session.Close(ctx) _, err = session.Run(ctx, "CREATE (n:Person {name: $name})", map[string]any{"name": "Dave"}, ) if err != nil { panic(err) } result, err := session.Run(ctx, "MATCH (n:Person) RETURN n.name", nil, ) if err != nil { panic(err) } for result.Next(ctx) { fmt.Println(result.Record().Values[0]) } }
First Connection: .NET
dotnet add package Neo4j.Driver
using Neo4j.Driver; var driver = GraphDatabase.Driver( "bolt://localhost:7687", AuthTokens.Basic("admin", "<your-password>") ); await using var session = driver.AsyncSession(); await session.RunAsync( "CREATE (n:Person {name: $name})", new { name = "Eve" } ); var result = await session.RunAsync( "MATCH (n:Person) RETURN n.name" ); var records = await result.ToListAsync(); foreach (var record in records) { Console.WriteLine(record["n.name"].As<string>()); } await driver.DisposeAsync();
Quick Start Tutorial
This tutorial walks through creating a small graph, querying it, and cleaning up. It assumes you have a running xrayGraphDB instance and a Python driver installed.
// Step 1: Create some nodes CREATE (alice:Person {name: "Alice", age: 30}) CREATE (bob:Person {name: "Bob", age: 25}) CREATE (carol:Person {name: "Carol", age: 35}) CREATE (proj:Project {name: "xrayGraphDB"}) RETURN alice, bob, carol, proj; // Step 2: Create relationships MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:KNOWS]->(b) RETURN a, b; MATCH (a:Person {name: "Alice"}), (p:Project {name: "xrayGraphDB"}) CREATE (a)-[:WORKS_ON {role: "lead"}]->(p) RETURN a, p; MATCH (b:Person {name: "Bob"}), (p:Project {name: "xrayGraphDB"}) CREATE (b)-[:WORKS_ON {role: "contributor"}]->(p) RETURN b, p; // Step 3: Query the graph MATCH (p:Person)-[:WORKS_ON]->(proj:Project) RETURN p.name, proj.name; // Step 4: Update properties MATCH (a:Person {name: "Alice"}) SET a.email = "alice@example.com" RETURN a; // Step 5: Clean up MATCH (n) DETACH DELETE n;
$name, $age, etc.) to prevent injection and improve plan cache hit rates.
MATCH
The MATCH clause is the primary read operation. It describes a pattern to find in the graph and binds matching subgraphs to variables.
// Match all nodes with a specific label MATCH (n:Person) RETURN n; // Match a relationship pattern MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name; // Match with relationship variable MATCH (a:Person)-[r:WORKS_ON]->(p:Project) RETURN a.name, r.role, p.name; // Match any direction MATCH (a:Person)-[:KNOWS]-(b:Person) RETURN a.name, b.name; // Match with multiple labels MATCH (n:Person:Employee) RETURN n;
Patterns can include any combination of nodes, relationships, and directions. Nodes are enclosed in parentheses (), relationships in square brackets [], and direction is indicated by arrows -> or <-.
WHERE
The WHERE clause filters results from MATCH patterns. It supports comparison operators, boolean logic, string matching, list predicates, and null checks.
// Comparison operators MATCH (n:Person) WHERE n.age > 25 AND n.age <= 40 RETURN n.name, n.age; // String matching MATCH (n:Person) WHERE n.name STARTS WITH "A" RETURN n; // Regular expression MATCH (n:Person) WHERE n.email =~ ".*@example\\.com" RETURN n; // Null checks MATCH (n:Person) WHERE n.email IS NOT NULL RETURN n; // IN list MATCH (n:Person) WHERE n.name IN ["Alice", "Bob", "Carol"] RETURN n; // Pattern predicates (exists) MATCH (n:Person) WHERE (n)-[:WORKS_ON]->() RETURN n.name;
| Operator | Description | Example |
|---|---|---|
| = | Equal | n.age = 30 |
| <> | Not equal | n.name <> "Alice" |
| <, >, <=, >= | Comparison | n.age >= 18 |
| AND, OR, NOT | Boolean logic | n.age > 20 AND n.active = true |
| IN | List membership | n.status IN ["active", "pending"] |
| STARTS WITH | String prefix | n.name STARTS WITH "Al" |
| ENDS WITH | String suffix | n.name ENDS WITH "ice" |
| CONTAINS | String contains | n.name CONTAINS "li" |
| =~ | Regex match | n.email =~ ".*@example\\.com" |
| IS NULL | Null check | n.deleted IS NULL |
| IS NOT NULL | Not null | n.email IS NOT NULL |
RETURN
RETURN specifies which values to include in the result set. You can return nodes, relationships, properties, expressions, or aggregations.
// Return specific properties MATCH (n:Person) RETURN n.name, n.age; // Alias with AS MATCH (n:Person) RETURN n.name AS person_name, n.age AS years; // Return all properties as a map MATCH (n:Person) RETURN properties(n); // Return distinct values MATCH (n:Person)-[:WORKS_ON]->(p:Project) RETURN DISTINCT p.name; // Expressions in RETURN MATCH (n:Person) RETURN n.name, n.age * 12 AS age_in_months;
ORDER BY / LIMIT / SKIP
Control the ordering and pagination of results.
// Order by a property MATCH (n:Person) RETURN n.name, n.age ORDER BY n.age DESC; // Limit results MATCH (n:Person) RETURN n.name ORDER BY n.name LIMIT 10; // Pagination with SKIP and LIMIT MATCH (n:Person) RETURN n.name ORDER BY n.name SKIP 20 LIMIT 10; // Multiple sort keys MATCH (n:Person) RETURN n.name, n.age ORDER BY n.age DESC, n.name ASC;
WITH
WITH acts as a pipeline separator, allowing you to chain query stages together. Variables not listed in WITH are not available in subsequent clauses.
// Filter intermediate results MATCH (p:Person)-[:WORKS_ON]->(proj:Project) WITH proj, count(p) AS team_size WHERE team_size > 3 RETURN proj.name, team_size ORDER BY team_size DESC; // Chain queries MATCH (n:Person) WITH n ORDER BY n.age DESC LIMIT 5 MATCH (n)-[:KNOWS]->(friend) RETURN n.name, collect(friend.name) AS friends;
UNWIND
UNWIND expands a list into individual rows. Useful for bulk operations and working with list parameters.
// Expand a list UNWIND [1, 2, 3] AS x RETURN x; // Bulk create from parameters UNWIND $people AS person CREATE (n:Person {name: person.name, age: person.age}); // Combine with MATCH UNWIND ["Alice", "Bob"] AS name MATCH (n:Person {name: name}) RETURN n;
OPTIONAL MATCH
OPTIONAL MATCH works like MATCH but returns null for missing parts of the pattern instead of excluding the row entirely. Equivalent to a left outer join.
// Return all people, even those without projects MATCH (p:Person) OPTIONAL MATCH (p)-[:WORKS_ON]->(proj:Project) RETURN p.name, proj.name;
CREATE
CREATE adds new nodes and relationships to the graph. It always creates new elements (use MERGE to avoid duplicates).
// Create a single node CREATE (n:Person {name: "Frank", age: 28}) RETURN n; // Create multiple nodes CREATE (a:Person {name: "Grace"}), (b:Person {name: "Hank"}); // Create a node with multiple labels CREATE (n:Person:Developer {name: "Ivy"}); // Create a relationship between existing nodes MATCH (a:Person {name: "Grace"}), (b:Person {name: "Hank"}) CREATE (a)-[:KNOWS {since: 2024}]->(b) RETURN a, b; // Create a full path in one statement CREATE (a:Module {name: "auth"})-[:IMPORTS]->(b:Module {name: "crypto"}) RETURN a, b;
MERGE
MERGE ensures a pattern exists in the graph. If the pattern is found, it is bound. If not found, it is created. Use ON CREATE SET and ON MATCH SET to conditionally set properties.
// Merge a node (create if not exists) MERGE (n:Person {name: "Alice"}) ON CREATE SET n.created = timestamp() ON MATCH SET n.lastSeen = timestamp() RETURN n; // Merge a relationship MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) MERGE (a)-[r:KNOWS]->(b) ON CREATE SET r.since = 2024 RETURN r;
MERGE matches the entire pattern. If you merge on (a)-[:KNOWS]->(b) and the relationship does not exist, it creates only the relationship, not the nodes (they must already be bound by a preceding MATCH or MERGE).
SET
SET updates properties on nodes and relationships, or adds labels to nodes.
// Set a property MATCH (n:Person {name: "Alice"}) SET n.age = 31 RETURN n; // Set multiple properties MATCH (n:Person {name: "Alice"}) SET n.age = 31, n.email = "alice@example.com" RETURN n; // Replace all properties with a map MATCH (n:Person {name: "Alice"}) SET n = {name: "Alice", age: 31, active: true} RETURN n; // Merge properties (add without removing existing) MATCH (n:Person {name: "Alice"}) SET n += {department: "engineering"} RETURN n; // Add a label MATCH (n:Person {name: "Alice"}) SET n:Employee RETURN n;
REMOVE
REMOVE deletes properties from nodes/relationships and removes labels from nodes.
// Remove a property MATCH (n:Person {name: "Alice"}) REMOVE n.email RETURN n; // Remove a label MATCH (n:Person:Employee {name: "Alice"}) REMOVE n:Employee RETURN labels(n);
DELETE / DETACH DELETE
DELETE removes nodes and relationships. A node cannot be deleted if it still has relationships. Use DETACH DELETE to delete a node and all its relationships in one operation.
// Delete a relationship MATCH (a:Person)-[r:KNOWS]->(b:Person) WHERE a.name = "Alice" AND b.name = "Bob" DELETE r; // Delete a node (must have no relationships) MATCH (n:Person {name: "Frank"}) DELETE n; // Detach delete (node + all relationships) MATCH (n:Person {name: "Alice"}) DETACH DELETE n; // Delete all nodes and relationships in the database MATCH (n) DETACH DELETE n;
MATCH (n) DETACH DELETE n removes the entire graph. There is no undo. Make a snapshot before running destructive queries on production data.
Variable-length Paths
Variable-length path patterns match paths of varying depth using the * syntax inside relationship brackets.
// Paths of exactly 2 hops MATCH (a:Person)-[:KNOWS*2]->(c:Person) RETURN a.name, c.name; // Paths of 1 to 5 hops MATCH (a:Person)-[:KNOWS*1..5]->(c:Person) RETURN a.name, c.name; // Paths of any length (use with caution) MATCH (a:Person {name: "Alice"})-[:KNOWS*]->(c:Person) RETURN DISTINCT c.name; // Capture the path MATCH path = (a:Person {name: "Alice"})-[:KNOWS*1..3]->(c:Person) RETURN path, length(path) AS hops;
* without limits) can be expensive on large graphs. Always set an upper bound when possible.
shortestPath / allShortestPaths
Find the shortest path(s) between two nodes.
// Find one shortest path MATCH (a:Person {name: "Alice"}), (b:Person {name: "Eve"}) MATCH p = shortestPath((a)-[*..10]-(b)) RETURN p, length(p) AS hops; // Find all shortest paths (same length) MATCH (a:Person {name: "Alice"}), (b:Person {name: "Eve"}) MATCH p = allShortestPaths((a)-[*..10]-(b)) RETURN p; // With relationship type filter MATCH (a:Person {name: "Alice"}), (b:Person {name: "Eve"}) MATCH p = shortestPath((a)-[:KNOWS|WORKS_WITH*..10]-(b)) RETURN p;
BFS Traversal
Breadth-first search traversal is available for exploring graphs level by level. BFS guarantees that nodes are visited in order of increasing distance from the start node.
// BFS with upper bound MATCH (start:Person {name: "Alice"}) MATCH path = (start)-[:KNOWS BFS]->(target) RETURN target.name, length(path) AS distance ORDER BY distance;
BFS traversal is the default algorithm used by shortestPath. Use the explicit BFS syntax when you need to enumerate all reachable nodes by distance layer.
Aggregation
Aggregation functions operate on groups of rows. Non-aggregated columns in RETURN act as implicit group keys (similar to SQL GROUP BY).
// Count MATCH (n:Person) RETURN count(n) AS total_people; // Group and count MATCH (p:Person)-[:WORKS_ON]->(proj:Project) RETURN proj.name, count(p) AS team_size ORDER BY team_size DESC; // Sum, average, min, max MATCH (n:Person) RETURN sum(n.age) AS total_age, avg(n.age) AS avg_age, min(n.age) AS youngest, max(n.age) AS oldest; // Collect into a list MATCH (p:Person)-[:WORKS_ON]->(proj:Project) RETURN proj.name, collect(p.name) AS members; // Standard deviation and percentile MATCH (n:Person) RETURN stDev(n.age) AS std_dev, percentileCont(n.age, 0.5) AS median;
| Function | Description | Example |
|---|---|---|
| count(expr) | Number of non-null values | count(n) |
| sum(expr) | Sum of numeric values | sum(n.salary) |
| avg(expr) | Average of numeric values | avg(n.age) |
| min(expr) | Minimum value | min(n.created) |
| max(expr) | Maximum value | max(n.score) |
| collect(expr) | Collect values into a list | collect(n.name) |
| percentileCont(expr, p) | Continuous percentile (interpolated) | percentileCont(n.age, 0.5) |
| percentileDisc(expr, p) | Discrete percentile (nearest value) | percentileDisc(n.age, 0.9) |
| stDev(expr) | Standard deviation (sample) | stDev(n.score) |
| stDevP(expr) | Standard deviation (population) | stDevP(n.score) |
Indexing & Constraints
Indexes accelerate lookups by property value. Constraints enforce data integrity rules.
// Create a label-property index CREATE INDEX ON :Person(name); // Neo4j-compatible named index syntax CREATE INDEX person_name_idx FOR (n:Person) ON (n.name); // Composite index CREATE INDEX ON :Person(name, age); // Drop an index DROP INDEX ON :Person(name); // Unique constraint CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE; // Existence constraint CREATE CONSTRAINT ON (n:Person) ASSERT EXISTS (n.name); // Show index info SHOW INDEX INFO; // List all constraints SHOW CONSTRAINT INFO;
WHERE clauses and MATCH patterns. Without an index, the engine must scan all nodes of a given label.
Transactions
xrayGraphDB supports both auto-commit transactions (single query) and explicit transactions (multi-query).
Auto-commit Transactions
Every query sent via session.run() runs in its own auto-commit transaction. If the query succeeds, it is committed. If it fails, it is rolled back.
Explicit Transactions
Use explicit transactions when you need to execute multiple queries atomically.
with driver.session() as session: tx = session.begin_transaction() try: tx.run("CREATE (a:Account {id: $id, balance: $bal})", id="A001", bal=1000) tx.run("CREATE (a:Account {id: $id, balance: $bal})", id="A002", bal=500) tx.commit() except Exception: tx.rollback() raise
// Explicit transaction commands (Bolt protocol) BEGIN; CREATE (n:Temp {data: "test"}); COMMIT; // Or rollback BEGIN; CREATE (n:Temp {data: "test"}); ROLLBACK;
CALL xg.builtin_functions(). The XRay-Vision plugin adds 68 more (queryable via CALL xg.xray_vision_builtin_functions()). Every entry below is available in the Community edition at no cost.
Aggregation Functions (7)
Aggregation functions compute single values from collections. Most work with pre-collected lists via WITH/collect(). The topK aggregation also supports streaming evaluation.
group_concat
group_concat(list, delimiter?) -> stringConcatenates list elements into a string with optional delimiter. If no delimiter is provided, elements are concatenated without separation. Null values are skipped.
MATCH (n:Person) WITH collect(n.name) AS names RETURN group_concat(names, ", ") AS full_list
Time Complexity: O(N) where N is the list length.
See Also: collect(), split()
histogram
histogram(values, bins) -> listProduces a histogram of values divided into the given number of bins. Returns a list of bin counts. The value range is automatically calculated from min/max of the input.
MATCH (flight:Flight) WITH collect(flight.altitude_ft) AS altitudes RETURN histogram(altitudes, 10) AS altitude_distribution
Parameters: values must be numeric (integer or float).
Output: List contains one count per bin. Bins are equal-width.
percentilecont
percentilecont(list, percentile) -> floatReturns the linearly interpolated percentile value (continuous). For p=0.5, returns the median with interpolation. Input list is assumed unsorted and is sorted internally.
MATCH (a:Aircraft) WITH collect(a.speed_kt) AS speeds RETURN percentilecont(speeds, 0.95) AS speed_95th_percentile
Parameters: percentile must be between 0.0 and 1.0.
Interpolation: Uses linear interpolation between values.
See Also: percentiledisc(), quantile()
percentiledisc
percentiledisc(list, percentile) -> numberReturns the nearest-value percentile (discrete). Unlike percentilecont, this returns an actual value from the list without interpolation. For p=0.5, returns the median (middle value).
MATCH (s:Sensor)
WITH collect(s.temperature) AS temps
RETURN percentiledisc(temps, 0.5) AS median_temp,
percentiledisc(temps, 0.75) AS q3_tempParameters: percentile must be between 0.0 and 1.0.
Return Type: Always an actual value from the input list.
quantile
quantile(list, q) -> numberReturns the q-th quantile value from a sorted list. Quantile q is expressed as a value between 0 and 1. For q=0.5, returns the median. Uses the nearest-rank method.
MATCH (metric:Metric)
WITH collect(metric.latency_ms) AS latencies
RETURN quantile(latencies, 0.99) AS p99_latency,
quantile(latencies, 0.90) AS p90_latencyParameters: q must be between 0.0 and 1.0.
Method: Uses nearest-rank (returns actual value, not interpolated).
topK
topK(list, k) -> list | topK(expr, k) as aggregationTwo modes: (1) As function: applies Space-Saving approximate top-K to a pre-collected list. O(N) time, O(K) memory. (2) As aggregation: streams values through Space-Saving during query execution. Equivalent to ClickHouse topK(K)(column).
MATCH (n:Transaction) WITH collect(n.category) AS categories RETURN topK(categories, 5) AS top_5_categories
MATCH (n:Transaction) RETURN topK(n.category, 5) AS most_common_categories
Algorithm: Space-Saving probabilistic algorithm. Results are approximate but ranked by frequency.
Time Complexity: O(N) function mode, O(1) per value in aggregation mode.
Space Complexity: O(K) regardless of input size.
Return Type: List of the K most frequent values in order.
topKExact
topKExact(list, k) -> listExact histogram + partial_sort for top-K most frequent values. O(N) + O(U log K) where U=unique values. Use when approximate top-K isn't acceptable and you need guaranteed exact results.
MATCH (event:Event) WITH collect(event.event_type) AS event_types RETURN topKExact(event_types, 10) AS exact_top_10_events
When to Use: When exact top-K rankings are required (e.g., compliance audits, financial reporting).
Time Complexity: O(N) to scan + O(U log K) to sort unique values.
Space Complexity: O(U) for the unique value histogram where U is unique count.
Return Type: List of K values sorted by descending frequency (exact).
See Also: topK() for approximate faster version
Aviation Functions (12)
Aviation functions provide computational support for aircraft tracking, performance analysis, and trajectory prediction. These are core to xrayGraphDB's SWIM (System Wide Information Management) integration and aviation-specific analytics.
angular_diff
angular_diff(angle_a, angle_b) -> floatReturns the shortest angular distance in degrees between two bearings (0-180). Handles wraparound at 360 degrees. Always returns the acute angle.
MATCH (a:Aircraft) WITH a.heading_deg AS actual_hdg, a.planned_hdg AS planned_hdg RETURN angular_diff(actual_hdg, planned_hdg) AS heading_deviation
Range: Returns 0-180 degrees (always the shortest distance).
Use Case: Detect course deviations, verify navigation compliance.
bank_angle
bank_angle(speed_kt, turn_rate_dps) -> floatReturns the estimated bank angle in degrees from speed and turn rate. Uses standard aviation physics: bank ≈ atan(turn_rate * speed / g).
MATCH (a:Aircraft)
RETURN bank_angle(a.ground_speed_kt, a.turn_rate_dps) AS estimated_bank_deg,
a.altitude_ft,
a.callsignParameters: speed_kt = ground speed in knots, turn_rate_dps = turn rate in degrees per second.
Physics: Derived from standard rate-of-turn formula.
Typical Values: 15-30 deg for normal turns, 40+ deg for aggressive maneuvers.
deviation_score
deviation_score(predicted, actual, tolerance, sigmoid_centers?, sigmoid_steepness?) -> floatReturns an RMS or sigmoid-based deviation score between predicted and actual values. Scores range 0-1 where 0 = perfect agreement. Optional sigmoid shape control for non-linear sensitivity.
MATCH (flight:Flight)-[:HAS_PREDICTION]->(pred:Prediction)
RETURN deviation_score(pred.altitude_ft, flight.altitude_ft, 500) AS altitude_score,
deviation_score(pred.heading_deg, flight.heading_deg, 15) AS heading_scoreParameters: tolerance = acceptable deviation in same units as predicted/actual.
Optional: sigmoid_centers and sigmoid_steepness for non-linear scoring.
Return Range: 0.0 (perfect) to 1.0 (completely wrong).
distance_3d
distance_3d(lat1, lon1, alt1_ft, lat2, lon2, alt2_ft) -> floatReturns the true 3D slant range in meters between two positions. Combines great-circle distance with altitude difference. Critical for separation assurance.
MATCH (a1:Aircraft)-[:NEAR]-(a2:Aircraft)
WHERE a1.id < a2.id
RETURN distance_3d(a1.lat, a1.lon, a1.altitude_ft,
a2.lat, a2.lon, a2.altitude_ft) AS separation_mOutput Unit: Meters (always).
Horizontal: Uses haversine for lat/lon distance.
Vertical: Altitude difference is converted from feet to meters for 3D calc.
Safety Critical: 1000 ft = ~305 m vertical separation threshold in many airspaces.
envelope_score
envelope_score(value, mean, stddev) -> floatReturns a 0-1 score of how many standard deviations a value is from the mean. Normalized to 0=mean, 1=3*stddev away (outlier). Used for anomaly detection.
MATCH (flight:Flight)
WITH AVG(flight.ground_speed_kt) AS mean_speed,
STDEV(flight.ground_speed_kt) AS stddev_speed
MATCH (f:Flight)
RETURN f.callsign,
envelope_score(f.ground_speed_kt, mean_speed, stddev_speed) AS anomaly_scoreInterpretation: 0.0 = at mean, 0.33 = 1 stddev away, 1.0 = 3+ stddev (extreme outlier).
Use Case: Flag unusual aircraft behavior (excessive speed, altitude changes).
heading_rate_of_change
heading_rate_of_change(headings_list, timestamps_list) -> floatReturns the average heading rate of change (degrees per second) handling angular wraparound correctly. Computes derivative of heading with proper circular arithmetic.
MATCH (a:Aircraft)
WITH collect(a.heading_deg) AS headings,
collect(a.timestamp) AS times
RETURN heading_rate_of_change(headings, times) AS turn_rate_dpsAngular Wraparound: Correctly handles transitions like 350° -> 10° (20° turn, not 340°).
Return Unit: Degrees per second.
Typical Values: 0-5 dps for normal flight, 10+ dps for aggressive turns.
project_3d
project_3d(lat, lon, alt_ft, heading, speed_kt, climb_fpm, turn_dps, accel_ktps, duration_sec) -> [lat, lon, alt, hdg, spd]Projects a moving object forward along a 3D arc for the given duration. Returns predicted position, heading, and speed. Core for trajectory prediction and conflict detection.
MATCH (a:Aircraft)
WITH a,
project_3d(a.lat, a.lon, a.altitude_ft,
a.heading_deg, a.ground_speed_kt, a.climb_rate_fpm,
a.turn_rate_dps, a.acceleration_ktps, 300) AS projected
RETURN a.callsign,
projected[0] AS predicted_lat,
projected[1] AS predicted_lon,
projected[2] AS predicted_alt_ft,
projected[3] AS predicted_heading,
projected[4] AS predicted_speed_ktParameters (in order):
- lat, lon: Current position (WGS84)
- alt_ft: Current altitude in feet
- heading: Current heading in degrees (0-359)
- speed_kt: Current ground speed in knots
- climb_fpm: Climb rate in feet per minute
- turn_dps: Turn rate in degrees per second
- accel_ktps: Acceleration in knots per second
- duration_sec: Projection time in seconds
Output: [lat, lon, alt_ft, heading_deg, speed_kt]
Physics: Integrates position + velocity + turn arc over duration.
rate_of_change
rate_of_change(values_list, timestamps_list) -> floatReturns the average rate of change (derivative) from paired value/time lists. Uses linear regression over the time series for robust slope estimation.
MATCH (a:Aircraft)
WITH collect(a.altitude_ft) AS altitudes,
collect(a.timestamp) AS times
RETURN rate_of_change(altitudes, times) AS climb_rate_fpmCalculation: Least-squares linear regression slope.
Unit: (units of values) per (unit of timestamps).
Example: If altitude in feet and time in seconds, result is feet/second (~101 ft/min per ft/s).
signed_angular_diff
signed_angular_diff(angle_a, angle_b) -> floatReturns the signed shortest angular distance (-180 to +180) between two bearings. Positive = clockwise, negative = counter-clockwise.
MATCH (a:Aircraft)
RETURN a.callsign,
signed_angular_diff(a.heading_deg, a.planned_heading) AS turn_required_degReturn Range: -180 to +180 degrees.
Sign Convention: Positive = turn right (clockwise), negative = turn left (counter-clockwise).
Example: signed_angular_diff(10, 350) = -20 (turn left 20°)
speed_from_components
speed_from_components(vN_mps, vE_mps, vUp_mps) -> [gs_kt, vs_fpm, total_kt]Converts N/E/Up velocity components to ground speed, vertical speed, and total speed. Inverse of velocity_3d().
MATCH (a:Aircraft)
WITH a,
speed_from_components(a.velocity_north_mps, a.velocity_east_mps, a.velocity_up_mps) AS speeds
RETURN a.callsign,
speeds[0] AS ground_speed_kt,
speeds[1] AS vertical_speed_fpm,
speeds[2] AS total_speed_ktInput Units: All velocity components in meters per second (m/s).
Output: [ground_speed_kt, vertical_speed_fpm, total_speed_kt]
Conversion Factors: 1 m/s = 1.944 knots, 1 m/s = 196.85 feet/minute
turn_radius
turn_radius(speed_kt, turn_rate_dps) -> floatReturns the turn radius in meters from speed and turn rate. Uses standard aviation formula: radius = speed / turn_rate (with unit conversions).
MATCH (a:Aircraft)
RETURN a.callsign,
a.ground_speed_kt,
a.turn_rate_dps,
turn_radius(a.ground_speed_kt, a.turn_rate_dps) AS radius_mFormula: r = (speed_kt * 0.5144) / (turn_rate_dps * π/180)
Output Unit: Meters.
Typical Values: 500 m for 5 kt turn at 60 kt, 2000 m at 120 kt.
velocity_3d
velocity_3d(ground_speed_kt, heading_deg, climb_rate_fpm) -> [vN, vE, vUp]Decomposes speed and heading into North/East/Up velocity components in m/s. Inverse of speed_from_components().
MATCH (a:Aircraft)
WITH a,
velocity_3d(a.ground_speed_kt, a.heading_deg, a.climb_rate_fpm) AS velocity_components
RETURN a.callsign,
velocity_components[0] AS velocity_north_mps,
velocity_components[1] AS velocity_east_mps,
velocity_components[2] AS velocity_up_mpsInput Units: ground_speed_kt = knots, heading_deg = degrees (0-359), climb_rate_fpm = feet per minute.
Output Units: All components in meters per second (m/s).
Calculation: Decomposes horizontal (heading) into N/E, vertical into Up.
Bitwise Functions (7)
Bitwise functions operate on integer representations at the bit level. Useful for flags, masks, permission systems, and low-level data encoding.
bit_and
bit_and(a, b) -> integerReturns the bitwise AND of two integers. Each bit position in the result is 1 only if both input bits at that position are 1.
MATCH (u:User)
WITH u.permissions AS perms,
7 AS ADMIN_MASK // Binary: 0111
RETURN u.username,
bit_and(perms, ADMIN_MASK) AS admin_flagsTruth Table: 0&0=0, 0&1=0, 1&0=0, 1&1=1
Use Case: Extract subset of flags (e.g., check if specific bit is set).
Example: bit_and(15, 12) = bit_and(0b1111, 0b1100) = 0b1100 = 12
bit_not
bit_not(x) -> integerReturns the bitwise NOT (complement) of an integer. Flips all bits: 0 becomes 1, 1 becomes 0. For signed integers, this is equivalent to -(x+1).
MATCH (f:Flag)
RETURN f.id,
f.status_bits,
bit_not(f.status_bits) AS inverted_bitsNote: In two's complement (standard on most systems), bit_not(x) = -(x+1).
Example: bit_not(5) = bit_not(0b0101) = ...11111010 = -6 (in 64-bit two's complement)
bit_or
bit_or(a, b) -> integerReturns the bitwise OR of two integers. Each bit position in the result is 1 if at least one of the input bits at that position is 1.
MATCH (app:Application)
WITH app.read_perms AS read_bits,
app.write_perms AS write_bits
RETURN app.name,
bit_or(read_bits, write_bits) AS combined_permsTruth Table: 0|0=0, 0|1=1, 1|0=1, 1|1=1
Use Case: Combine permission flags (set union).
Example: bit_or(12, 10) = bit_or(0b1100, 0b1010) = 0b1110 = 14
bit_shift_left
bit_shift_left(x, n) -> integerShifts bits of x to the left by n positions. Equivalent to multiplying by 2^n. New bits on the right are filled with zeros.
MATCH (data:Bitfield)
RETURN data.id,
data.mask,
bit_shift_left(data.mask, 8) AS shifted_left_8Arithmetic: bit_shift_left(x, n) = x * 2^n
Example: bit_shift_left(5, 2) = bit_shift_left(0b0101, 2) = 0b10100 = 20
Overflow: Bits shifted past the integer width are discarded.
bit_shift_right
bit_shift_right(x, n) -> integerShifts bits of x to the right by n positions. Equivalent to integer division by 2^n. Behavior depends on signedness: unsigned fills with zeros, signed (arithmetic shift) fills with sign bit.
MATCH (sensor:Sensor)
WITH sensor.raw_value AS raw
RETURN sensor.id,
raw,
bit_shift_right(raw, 4) AS high_nibbleArithmetic: bit_shift_right(x, n) = x / 2^n (integer division)
Example: bit_shift_right(20, 2) = 20/4 = 5
Sign Extension: For negative numbers, the sign bit is extended (arithmetic right shift).
bit_xor
bit_xor(a, b) -> integerReturns the bitwise XOR (exclusive or) of two integers. Each bit position in the result is 1 if the input bits at that position differ (one is 1, the other is 0).
MATCH (packet:Packet)
RETURN packet.id,
bit_xor(packet.header, packet.checksum) AS verificationTruth Table: 0^0=0, 0^1=1, 1^0=1, 1^1=0
Use Case: Detect differences, toggle bits, checksums, encryption.
Property: a ^ a = 0, a ^ 0 = a (XOR is self-inverse).
Example: bit_xor(12, 10) = bit_xor(0b1100, 0b1010) = 0b0110 = 6
popcount
popcount(x) -> integerReturns the number of set bits (1s) in an integer. Also known as "population count" or Hamming weight. Used for counting set flags, finding bit density.
MATCH (perms:Permissions)
RETURN perms.username,
perms.flags,
popcount(perms.flags) AS number_of_permissionsAlgorithm: Hardware-accelerated POPCNT instruction where available.
Time Complexity: O(1) on modern CPUs.
Example: popcount(15) = popcount(0b1111) = 4
Use Case: Count enabled features, measure permission breadth, find sparsity.
Compatibility Functions (4)
Compatibility functions help developers migrate from SQL and other systems. They document how to express common patterns in xrayGraphDB's Cypher dialect.
GROUP_BY_TIME_BUCKET
SQL: GROUP BY DATE_TRUNC('hour', ts) -> Cypher: RETURN toStartOfHour(n.ts) AS hour, count(n)Time-bucketed aggregation: use toStartOfHour/toStartOfDay/toStartOfMinute with GROUP BY (implicit via RETURN + aggregation). MySQL DATE_FORMAT equivalent: toStartOfHour. ClickHouse toStartOfHour is identical syntax.
SELECT DATE_TRUNC('hour', created_at) AS hour, COUNT(*) AS cnt
FROM events
GROUP BY DATE_TRUNC('hour', created_at)
ORDER BY hour DESCMATCH (e:Event) RETURN toStartOfHour(e.created_at) AS hour, count(e) AS cnt ORDER BY hour DESC
Time Functions Available:
- toStartOfMinute() - Truncate to minute boundary
- toStartOfHour() - Truncate to hour boundary (00:00)
- toStartOfDay() - Truncate to midnight
Aggregation: GROUP BY is implicit when using RETURN with aggregation functions (count, sum, avg, etc.).
MOVING_AVERAGE_WORKAROUND
SQL: AVG(value) OVER (ORDER BY ts ROWS 4 PRECEDING) -> Cypher: WITH collect(n.value) AS vals RETURN moving_avg(vals, 5)Convert SQL moving average to Cypher: (1) MATCH to get rows in order, (2) collect() into a list, (3) apply moving_avg(list, window_size). The function returns a list of averages, one per position. For MySQL: this replaces window functions entirely. For ClickHouse: replaces groupArrayMovingAvg().
SELECT ts, value,
AVG(value) OVER (ORDER BY ts ROWS 4 PRECEDING) AS moving_avg_5
FROM metrics
ORDER BY tsMATCH (m:Metric) WITH m.timestamp AS ts, m.value AS val ORDER BY ts WITH collect(val) AS values WITH moving_avg(values, 5) AS moving_averages UNWIND moving_averages AS avg_val RETURN avg_val
Pattern: (1) MATCH + ORDER BY to get sorted data, (2) collect() into list, (3) apply window function, (4) UNWIND to flatten results.
Window Size: moving_avg(list, 5) = 5-element window.
Limitation: No true SQL window functions (OVER/PARTITION BY) in Cypher—this workaround is the idiomatic approach.
TOPK_AGGREGATION
SQL: SELECT value, COUNT(*) FROM t GROUP BY value ORDER BY COUNT(*) DESC LIMIT KxrayGraphDB supports topK natively as an aggregation: RETURN topK(n.category, 10) AS top_categories. This replaces the SQL GROUP BY + ORDER BY + LIMIT pattern. Also available as function: WITH collect(n.category) AS cats RETURN topK(cats, 10). ClickHouse equivalent: topK(K)(column).
SELECT category, COUNT(*) as freq FROM transactions GROUP BY category ORDER BY freq DESC LIMIT 10
MATCH (t:Transaction) RETURN topK(t.category, 10) AS top_10_categories
MATCH (t:Transaction) WITH collect(t.category) AS categories RETURN topK(categories, 10) AS top_10_categories
Two Modes: Use aggregation mode for streaming efficiency, function mode for pre-collected lists.
Algorithm: Space-Saving probabilistic top-K (approximate results).
For Exact Results: Use topKExact() instead.
WINDOW_FUNCTION_WORKAROUND
SQL: AVG(val) OVER (ORDER BY ts ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)xrayGraphDB does not have SQL-style window functions (OVER/PARTITION BY). Workaround: collect values first, then apply the scalar function. Example: WITH collect(n.value) AS vals RETURN moving_avg(vals, 5) AS averages. For partitioned windows: MATCH (n:Metric) WITH n.category AS cat, collect(n.value) AS vals RETURN cat, moving_avg(vals, 5). For sliding_window: WITH collect(n.value) AS vals RETURN sliding_window(vals, 5) AS windows.
SELECT id, value,
AVG(value) OVER (ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS centered_avg,
ROW_NUMBER() OVER (ORDER BY ts) AS rank
FROM metricsMATCH (m:Metric) WITH m.category AS category, collect(m.value) AS values WITH category, moving_avg(values, 5) AS moving_averages RETURN category, moving_averages
Core Pattern: No direct OVER/PARTITION BY syntax. Instead: (1) MATCH + collect() to gather rows, (2) Apply aggregate function to the list, (3) RETURN results.
For Ranking: Use explicit list positions or UNWIND + index.
For Partitioning: Add WITH clause with the partition key before collect().
Conditional Functions (7)
Conditional functions evaluate logic and return different values based on conditions. These are essential for expression-based filtering, transformation, and null handling.
case_when
case_when(condition, then_value, else_value) -> valueReturns then_value if condition is true, else_value otherwise. Simple ternary operator. Equivalent to SQL CASE WHEN ... THEN ... ELSE ... END.
MATCH (a:Aircraft)
RETURN a.callsign,
case_when(a.altitude_ft > 35000, 'HIGH', 'LOW') AS altitude_category,
case_when(a.ground_speed_kt > 450, 'FAST', 'NORMAL') AS speed_categorySyntax: Simple ternary—no chaining for multiple conditions.
Null Handling: If condition is NULL, returns else_value.
Type Coercion: then_value and else_value should be same type.
For Multiple Conditions: Nest case_when() calls: case_when(a, v1, case_when(b, v2, v3))
choose
choose(index, values...) -> valueReturns the value at the given one-based index from the arguments. Index must be between 1 and the number of values. Returns null if index is out of range.
MATCH (a:Aircraft)
WITH a,
case_when(a.status = 'LANDED', 1, case_when(a.status = 'AIRBORNE', 2, 3)) AS status_code
RETURN a.callsign,
choose(status_code, 'Landed', 'Airborne', 'Unknown') AS status_nameIndexing: One-based (index 1 = first value, index 2 = second value, etc.).
Out of Range: Returns null if index < 1 or index > number of values.
Type Consistency: All values should be the same type.
Use Case: Switch/case-like behavior for mapping numeric codes to labels.
coalesce_chain
coalesce_chain(values...) -> valueReturns the first non-null value from the arguments. Scans left-to-right and returns immediately upon finding a non-null value. Also known as COALESCE in SQL.
MATCH (a:Aircraft)
RETURN a.callsign,
coalesce_chain(a.icao_code, a.iata_code, a.flight_id) AS identifier,
coalesce_chain(a.operator_name, a.airline, 'Unknown') AS operatorShort-Circuit: Stops evaluating as soon as a non-null value is found.
All Nulls: Returns null if all arguments are null.
Equivalent SQL: COALESCE(val1, val2, val3)
Use Case: Provide fallbacks for missing data.
greatest
greatest(values...) -> valueReturns the largest of all supplied values. Ignores null values (if all values are null, returns null). Works with numbers, strings, dates, and comparable types.
MATCH (a:Aircraft)
RETURN a.callsign,
greatest(a.max_speed_kt, a.cruise_speed_kt, a.approach_speed_kt) AS max_capability,
greatest(a.altitude_ft, a.service_ceiling_ft) AS altitude_recordNull Handling: Skips null values. If all are null, returns null.
Type Consistency: All values should be comparable (same type or numeric).
String Comparison: Uses lexicographic (alphabetical) ordering.
Date Comparison: Uses chronological ordering.
ifnull
ifnull(value, default) -> valueReturns value if not null, otherwise returns default. Two-argument version of coalesce_chain(). Equivalent to SQL IFNULL() or COALESCE(value, default).
MATCH (a:Aircraft)
RETURN a.callsign,
ifnull(a.registration, 'N/A') AS registration,
ifnull(a.manufacturer, 'Unknown') AS manufacturer,
ifnull(a.weight_lbs, 0) AS weightSyntax: Two arguments only. For more, use coalesce_chain().
Type Coercion: value and default should be the same type (or compatible).
Equivalent SQL: IFNULL(value, default) in MySQL, COALESCE(value, default) in standard SQL.
least
least(values...) -> valueReturns the smallest of all supplied values. Ignores null values (if all values are null, returns null). Works with numbers, strings, dates, and comparable types.
MATCH (a:Aircraft)
RETURN a.callsign,
least(a.min_speed_kt, a.landing_speed_kt, a.stall_speed_kt) AS min_safe_speed,
least(a.created_at, a.first_flight_date) AS earliest_dateNull Handling: Skips null values. If all are null, returns null.
Type Consistency: All values should be comparable.
String Comparison: Lexicographic (reverse alphabetical when finding minimum).
Use Case: Find minimum threshold, earliest date, smallest ID.
nullif
nullif(a, b) -> value|nullReturns null if a equals b, otherwise returns a. Opposite of ifnull(). Useful for filtering out specific values (e.g., turn unknown values into nulls).
MATCH (a:Aircraft)
RETURN a.callsign,
nullif(a.manufacturer, 'Unknown') AS known_manufacturer,
nullif(a.registration, 'N/A') AS valid_registration,
nullif(a.altitude_ft, 0) AS non_zero_altitudeComparison: Uses strict equality (a = b).
Use Case: Convert placeholder/sentinel values (Unknown, N/A, 0) to null.
Side Effect: Allows filtering on null: ... WHERE nullif(col, 'Unknown') IS NULL
Inverse: ifnull(nullif(a, sentinel), sentinel) preserves sentinel when a = sentinel.
Data Integrity (2)
Functions for detecting and resolving data quality issues in the graph.
xray.dedup
CALL xray.dedup(label, key_property) YIELD duplicates_found, vertices_deleted, vertices_keptFinds and removes duplicate vertices with the same label and key property value. Keeps the lowest GID (oldest) per group.
CALL xray.dedup('Aircraft', 'icao24')
YIELD duplicates_found, vertices_deleted, vertices_kept
RETURN duplicates_found, vertices_deleted, vertices_keptUse Case: Remove duplicate aircraft identities in ADS-B streams where the same ICAO24 code may have been loaded multiple times.
xray.upsert_behavior
BULK_UPSERT_NODES (0x27) via xrayProtocolUpsert uses 3-tier atomic lookup: (1) Gid cache O(1), (2) label-property index O(1), (3) full scan O(N). Auto-creates property index on first upsert. BULK_INSERT (0x21) allows duplicates.
BULK_UPSERT_NODES (0x27): [u32 rows=100] [u16 cols=3] [u8 type=0x05][u32 len=5]["icao24"] // key_column (upsert key) [u8 type=0x05][u32 len=8]["callsign"] [u8 type=0x04][u32 len=0] // altitude // then row data ["ABC123"] ["N123US"] [35000.0] ["ABC124"] ["N456UA"] [28500.0]
Per-property type tags (row-oriented value encoding): 0=String, 1=Int64, 2=Double, 3=Bool, 4=Null. As of v5.0, two recursive nested tags are added when the session negotiates CAP_TYPED_NESTED in HELLO: 5=List (u32 count + recursive typed values) and 6=Map<String,*> (u32 count + u16 key_len + key + recursive typed value). The v5 extension closes a gap that prevented populating spec §7.1 envelope columns (source_refs, disclosure_notes, both List<String>) through this opcode — pre-v5 clients had to omit those columns, triggering --xg-envelope-enforcement=on rejection. Pre-CAP_TYPED_NESTED behavior is preserved verbatim: byte values >4 still fall through to the legacy all-string length-prefix path.
Note: First upsert column is the unique key. Index is auto-created. Use BULK_INSERT for initial load when duplicates are acceptable.
Data Retention (1)
Functions for managing data lifecycle and automatic deletion policies.
ttl.delete_expired
CALL ttl.delete_expired(label, timestamp_prop, max_age_days, exempt_prop?) YIELD deleted_count, scanned_count, exempt_countTime-based data retention with optional exemption. Scans vertices with the given label, compares timestamp_prop against max_age_days, and detach-deletes expired vertices. If exempt_prop is provided, vertices with that property set to true are preserved. Equivalent to MySQL EVENT + DELETE WHERE created_at < DATE_SUB(NOW(), INTERVAL N DAY).
CALL ttl.delete_expired('AdsReport', 'timestamp', 30, 'is_archived')
YIELD deleted_count, scanned_count, exempt_count
RETURN 'Deleted ' + deleted_count + ' reports older than 30 days. Scanned: ' + scanned_count + ', Exempt: ' + exempt_countUse Case: Automatically purge ADS-B reports older than 30 days, but keep archived reports. Runs periodically to maintain storage efficiency.
Database Management (7)
Multi-database administration, isolation, and connection routing.
CREATE DATABASE
CREATE DATABASE <name>Creates a new named database with its own isolated storage, WAL, indexes, and data directory. Each database has completely separate vertices, edges, and properties. Requires Enterprise license. Example: CREATE DATABASE swim
CREATE DATABASE swim
Effect: Creates new database "swim" with isolated storage. All subsequent queries can select this database via USE DATABASE swim.
DROP DATABASE
DROP DATABASE <name>Permanently deletes a named database and all its data. Cannot drop the default database. Requires Enterprise license.
DROP DATABASE swim
Warning: This is irreversible. All data in the database is permanently deleted. The default 'xraygraphdb' database cannot be dropped.
MULTI_DATABASE_BOLT
Bolt Multi-Database UsagePython: session = driver.session(database='swim'). Node.js: session = driver.session({ database: 'swim' }). Java: session = driver.session(SessionConfig.forDatabase('swim')). If no database specified, uses the default 'xraygraphdb' database.
from neo4j import GraphDatabase
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('user', 'password'))
# Connect to swim database
swim_session = driver.session(database='swim')
result = swim_session.run('MATCH (a:Aircraft) RETURN COUNT(a) AS count')
print(result.single()['count'])
# Connect to default database
default_session = driver.session()
result = default_session.run('MATCH (n) RETURN COUNT(n) AS total')
print(result.single()['total'])Multi-DB Architecture: Each database is fully isolated. The same driver can manage multiple databases by opening different sessions.
MULTI_DATABASE_OVERVIEW
Multi-Database ArchitectureEach database has completely isolated storage. Queries on one database CANNOT see or modify data in another database. Use cases: isolate SWIM aviation data from code intelligence, per-tenant data separation, staging vs production. SWIM uses the 'swim' database. XRay-Vision uses the 'xray' database. Default database is 'xraygraphdb'.
-- List all available databases
SHOW DATABASES
-- Use specific database (Bolt: pass in session config)
-- Via xrayProtocol: send database name in HELLO frame
-- Example: Switch to swim database for aviation data
USE DATABASE swim
MATCH (a:Aircraft {icao24: 'ABC1234'}) RETURN a
-- Different session: switch to xray database for code intelligence
USE DATABASE xray
MATCH (m:Method {name: 'parseAltitude'}) RETURN mIsolation Guarantee: No cross-database queries. Each database is independently indexed, cached, and transactional.
MULTI_DATABASE_XRAYPROTOCOL
xrayProtocol Multi-Database UsagePass database name in HELLO message: [u16 version][u16 caps][u32 token_len][token][u32 db_len][db_name]. The server routes all operations on that connection to the specified database. Empty db_name = default database.
-- HELLO frame selecting 'swim' database [u32 payload_len=50] [u8 msg_type=0x01] // HELLO [u8 flags=0x00] [u16 query_id=0x0000] [u16 version=1] [u16 capabilities=0x0003] [u32 token_len=11]["user:pass"] [u32 db_len=4]["swim"] // Route to swim database -- All subsequent EXECUTE frames on this connection target 'swim' [u32 payload_len=...] [u8 msg_type=0x03] // EXECUTE [u8 flags=0x00] [u16 query_id=0x0001] [u8 language=0] // Cypher [u32 query_len=40] // MATCH (a:Aircraft) RETURN COUNT(a) ...
Connection Affinity: Once a connection selects a database in HELLO, all queries on that connection route to that database. No per-query override.
SHOW DATABASES
SHOW DATABASESLists all databases. Returns database name for each. The default database is named 'xraygraphdb'. Additional databases are created with CREATE DATABASE.
SHOW DATABASES
Output: A result table with column 'name' listing all available databases. Default output includes 'xraygraphdb', 'swim', 'xray'.
USE DATABASE
USE DATABASE <name>Switches the current session to a different database. All subsequent queries execute against the selected database. Via Bolt: pass database name in session options. Via xrayProtocol: pass database name in HELLO message.
-- Switch to swim database
USE DATABASE swim
-- All subsequent queries operate on swim database
MATCH (a:Aircraft) RETURN COUNT(a) AS aircraft_count
-- Switch to xray database
USE DATABASE xray
-- Queries now operate on xray database
MATCH (c:Class {name: 'Parser'}) RETURN COUNT(c) AS class_countScope: Database selection is per-session. Each client connection (Bolt or xrayProtocol) maintains its own database context.
DateTime (Capital D & T)
Temporal data creation and manipulation functions with timezone support.
date
date(map?) -> dateCreates a Date from a map of components or the current date.
-- Current date
RETURN date() AS today
-- Date from components
RETURN date({year: 2026, month: 4, day: 15}) AS flight_departure
-- Use in graph query
MATCH (r:AdsReport)
WHERE r.received_date > date({year: 2026, month: 3, day: 1})
RETURN COUNT(r) AS reports_since_marchType: Returns a Date object without time component. Useful for day-level comparisons.
date_to_millis
date_to_millis(date_string) -> integerConverts a date string to epoch milliseconds.
-- Convert date string to milliseconds
RETURN date_to_millis('2026-04-15') AS millis
-- Use for timestamp comparison
WITH date_to_millis('2026-03-01') AS march_start
MATCH (r:AdsReport)
WHERE r.timestamp_ms > march_start
RETURN COUNT(r) AS reportsFormat: Input is ISO 8601 date format. Returns UNIX epoch milliseconds for storage/comparison.
datetime
datetime(map?) -> datetimeCreates a DateTime (with timezone) from a map or the current instant.
-- Current datetime with timezone
RETURN datetime() AS now
-- DateTime with UTC timezone
RETURN datetime({year: 2026, month: 4, day: 15, hour: 14, minute: 30, second: 0, timezone: 'UTC'}) AS scheduled_event
-- Use in event tracking
CREATE (e:Event {timestamp: datetime(), name: 'Takeoff', flight_id: 'UA123'})
RETURN eType: Returns ZonedDateTime. Includes both date, time, and timezone information.
duration
duration(map) -> durationCreates a Duration from a map of time components.
-- Flight duration of 5 hours 30 minutes
RETURN duration({hours: 5, minutes: 30}) AS flight_duration
-- Use in calculations
MATCH (f:Flight)
WHERE f.scheduled_duration > duration({hours: 6})
RETURN f.callsign, f.scheduled_durationComponents: Supports years, months, weeks, days, hours, minutes, seconds, milliseconds.
duration_between
duration_between(dt1, dt2) -> durationReturns the duration between two temporal values.
-- Calculate flight time MATCH (f:Flight) WITH f, duration_between(f.departure_time, f.arrival_time) AS elapsed RETURN f.callsign, elapsed.hours + ' hours ' + (elapsed.minutes % 60) + ' minutes' AS duration -- Find long-running queries MATCH (q:QueryLog) WITH q, duration_between(q.start_time, q.end_time) AS exec_time WHERE exec_time.seconds > 60 RETURN q.query_text, exec_time.seconds
Use Case: Measure elapsed time between two datetime values. Automatically handles timezone conversions.
epoch_millis
epoch_millis() -> integerReturns the current epoch time in milliseconds.
-- Get current time as milliseconds RETURN epoch_millis() AS current_timestamp_ms -- Use for performance tracking WITH epoch_millis() AS start_ms MATCH (a:Aircraft) RETURN COUNT(a) WITH epoch_millis() - start_ms AS query_time_ms RETURN query_time_ms + ' ms' AS execution_time
Type: Returns integer. Zero reference is 1970-01-01T00:00:00Z (UNIX epoch).
epoch_seconds
epoch_seconds() -> integerReturns the current epoch time in seconds.
-- Get current time as seconds
RETURN epoch_seconds() AS current_timestamp_s
-- Record timestamp in node
CREATE (a:AirspaceAlert {alert_id: 'AL001', detection_time: epoch_seconds()})
RETURN aPrecision: One-second granularity. Use epoch_millis() for sub-second precision.
localdatetime
localdatetime(map?) -> localdatetimeCreates a LocalDateTime from a map or the current date and time.
-- Current local datetime (no timezone)
RETURN localdatetime() AS local_now
-- LocalDateTime at specific local time
RETURN localdatetime({year: 2026, month: 4, day: 15, hour: 14, minute: 30}) AS local_departure
-- Use for scheduling without timezone conversion
MATCH (e:Event)
WHERE e.local_start_time > localdatetime({year: 2026, month: 4, day: 15, hour: 12, minute: 0})
RETURN e.name, e.local_start_timeType: LocalDateTime without timezone. Useful for recording times independent of geographic location.
localtime
localtime(map?) -> localtimeCreates a LocalTime from a map of components or the current time.
-- Current local time
RETURN localtime() AS current_time
-- Create time at 14:30
RETURN localtime({hour: 14, minute: 30, second: 0}) AS afternoon_time
-- Find flights departing after 10:00 AM
MATCH (f:Flight)
WHERE f.departure_time > localtime({hour: 10, minute: 0})
RETURN f.callsign, f.departure_timeType: LocalTime without date or timezone. Useful for daily schedules and time-of-day filters.
millis_to_date
millis_to_date(millis) -> stringConverts epoch milliseconds to a date string.
-- Convert milliseconds to readable date RETURN millis_to_date(1745606400000) AS readable_date -- Use with stored timestamps MATCH (r:AdsReport) RETURN r.callsign, millis_to_date(r.timestamp_ms) AS report_date LIMIT 10
Format: Returns ISO 8601 date string (YYYY-MM-DD). Inverse of date_to_millis().
Datetime (Lowercase)
Additional datetime utility functions with aliases for JavaScript/ClickHouse compatibility.
epochMillis
epochMillis(datetime) -> integerReturns epoch milliseconds from a datetime for JavaScript/Unix interop. Equivalent to MySQL UNIX_TIMESTAMP(dt)*1000.
-- Convert datetime to milliseconds for JavaScript
MATCH (e:Event)
RETURN e.name, epochMillis(e.event_time) AS timestamp_js
-- Store in database with millisecond precision
CREATE (r:Report {recorded_at: epochMillis(datetime())})
RETURN rJavaScript Interop: Use for APIs expecting Unix milliseconds (Date.getTime()). Equivalent to MySQL UNIX_TIMESTAMP()*1000.
now
now() -> datetimeReturns the current datetime. Alias for datetime() with current time. Equivalent to MySQL NOW() or ClickHouse now().
-- Get current time
RETURN now() AS server_time
-- Record creation timestamp
CREATE (a:AirspaceAlert {alert_type: 'tfr', created_at: now(), alert_id: 'ALR20260415001'})
RETURN a
-- Find recent activity
MATCH (a:Activity)
WHERE a.timestamp > now() - duration({days: 1})
RETURN a.description, a.timestampAlias: Equivalent to datetime(). Use for clarity in familiar SQL-style patterns.
toDate
toDate(datetime) -> dateExtracts the Date portion from a ZonedDateTime or LocalDateTime. Equivalent to MySQL DATE() or ClickHouse toDate().
-- Extract date from datetime
RETURN toDate(now()) AS today
-- Group reports by date
MATCH (r:AdsReport)
WITH toDate(r.timestamp) AS report_date, COUNT(r) AS count
RETURN report_date, count
ORDER BY report_date DESC
-- Find flights on a specific date
MATCH (f:Flight)
WHERE toDate(f.departure_time) = date({year: 2026, month: 4, day: 15})
RETURN f.callsign, f.departure_timeUse Case: Extract date component from timestamp for day-level grouping or filtering. Discards time and timezone.
toStartOfDay
toStartOfDay(datetime) -> datetimeTruncates a datetime to midnight. Equivalent to MySQL DATE(dt) or ClickHouse toStartOfDay().
-- Truncate to start of day RETURN toStartOfDay(now()) AS midnight_today -- Find records from start of day MATCH (r:AdsReport) WHERE r.timestamp >= toStartOfDay(now()) RETURN COUNT(r) AS today_reports -- Group by day MATCH (r:AdsReport) WITH toStartOfDay(r.timestamp) AS day, COUNT(r) AS daily_count RETURN day, daily_count ORDER BY day DESC
Result: Returns datetime at 00:00:00 of the same day. Preserves timezone.
toStartOfHour
toStartOfHour(datetime) -> datetimeTruncates a datetime to the start of the hour boundary. Equivalent to MySQL DATE_FORMAT(dt, '%Y-%m-%d %H:00:00') or ClickHouse toStartOfHour().
-- Truncate to start of hour RETURN toStartOfHour(now()) AS hourly_boundary -- Hourly aggregation MATCH (r:AdsReport) WITH toStartOfHour(r.timestamp) AS hour, COUNT(r) AS hourly_count, AVG(r.altitude) AS avg_alt RETURN hour, hourly_count, avg_alt ORDER BY hour DESC -- Find reports from the last hour MATCH (r:AdsReport) WHERE r.timestamp >= toStartOfHour(now()) RETURN COUNT(r) AS recent_reports
Use Case: Create hourly buckets for time-series analysis. Returns datetime at HH:00:00 of the same hour.
toStartOfMinute
toStartOfMinute(datetime) -> datetimeTruncates a datetime to the start of the minute. Equivalent to MySQL DATE_FORMAT(dt, '%Y-%m-%d %H:%i:00') or ClickHouse toStartOfMinute().
-- Truncate to start of minute RETURN toStartOfMinute(now()) AS minute_boundary -- Minute-level aggregation MATCH (r:AdsReport) WITH toStartOfMinute(r.timestamp) AS minute, COUNT(r) AS reports_per_minute WHERE reports_per_minute > 100 -- High activity threshold RETURN minute, reports_per_minute ORDER BY minute DESC -- Find events in the last minute MATCH (e:Event) WHERE e.timestamp >= toStartOfMinute(now()) RETURN e.description, e.timestamp
Precision: Returns datetime at MM:00 of the same minute. Useful for sub-hourly bucketing in high-frequency data.
GIS / Spatial Functions (33)
bbox_from_radius
bbox_from_radius(lat, lon, radius_m) -> [min_lat, min_lon, max_lat, max_lon]Returns a bounding box around a point with the given radius. Useful for spatial filtering and geographic searches within a circular area.
RETURN bbox_from_radius(40.7128, -74.0060, 5000) AS nyc_5km_bbox // Returns: [40.66856, -74.06269, 40.75704, -73.94931] // Bounding box for 5km radius around New York City
bearing
bearing(lat1, lon1, lat2, lon2) -> floatReturns the initial bearing in degrees (0-360) from one geographic point to another. North is 0°, East is 90°, South is 180°, West is 270°.
RETURN bearing(40.7128, -74.0060, 51.5074, -0.1278) AS bearing_nyc_to_london // Returns: 51.27 (approximately northeast) // Initial bearing from NYC to London
cartesian_to_polar
cartesian_to_polar(x, y) -> [r, theta]Converts 2D Cartesian coordinates (x, y) to polar coordinates (radius, angle in radians). Angle is measured counter-clockwise from positive x-axis.
RETURN cartesian_to_polar(3, 4) AS polar_coords // Returns: [5.0, 0.9273] // Converts point (3,4) to radius 5, angle ~53 degrees
deg_to_dms
deg_to_dms(decimal_degrees) -> stringConverts decimal degrees to degrees-minutes-seconds (DMS) format. Returns a formatted string like "40°42'46.08\"N".
RETURN deg_to_dms(40.7128) AS latitude_dms // Returns: "40°42'46.08\"N" // Converts NYC latitude to DMS format
distance
distance(lat1, lon1, lat2, lon2) -> floatNeo4j-compatible alias for haversine_distance. Returns the great-circle distance in meters between two geographic points.
RETURN distance(40.7128, -74.0060, 34.0522, -118.2437) AS distance_nyc_to_lax // Returns: 3944000 (approximately 3944 km) // Distance from NYC to LAX airport in meters
dms_to_deg
dms_to_deg(dms_string) -> floatConverts a degrees-minutes-seconds string to decimal degrees. Supports formats like "40°42'46.08\"N" or "40 42 46.08".
RETURN dms_to_deg("40°42'46.08\"N") AS decimal_latitude
// Returns: 40.7128
// Converts DMS format back to decimal degreesecef_to_wgs84
ecef_to_wgs84(x, y, z) -> [lat, lon, alt_m]Converts Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates to WGS84 geodetic coordinates (latitude, longitude, altitude).
RETURN ecef_to_wgs84(1334636, -4653242, 4137881) AS wgs84_coords // Returns: [40.7128, -74.0060, 0] // Converts satellite/GPS ECEF coordinates to lat/lon
geo.bearing
geo.bearing(lat1, lon1, lat2, lon2) -> floatReturns the initial bearing in degrees from one point to another. Namespace variant of bearing function.
RETURN geo.bearing(51.5074, -0.1278, 48.8566, 2.3522) AS bearing_london_to_paris // Returns: 171.03 (approximately south) // Initial bearing from London to Paris
geo.destination
geo.destination(lat, lon, bearing, distance_m) -> [lat, lon]Computes the destination point given a start location, initial bearing (degrees), and distance (meters).
RETURN geo.destination(40.7128, -74.0060, 45, 10000) AS dest_northeast_10km // Returns: [40.80829, -73.93467] // Point 10km northeast of NYC
geo.distance
geo.distance(lat1, lon1, lat2, lon2) -> floatReturns the great-circle distance in meters between two lat/lon pairs. Namespace variant of haversine_distance.
RETURN geo.distance(40.6892, -74.0445, 40.8448, -73.8648) AS distance_jfk_to_lga // Returns: 20000 (approximately 20 km) // Distance between JFK and LaGuardia airports in meters
geo.is_ahead
geo.is_ahead(lat1, lon1, heading, lat2, lon2) -> boolReturns true if the target point is ahead of the current heading direction. Useful for navigation and route following.
RETURN geo.is_ahead(40.7128, -74.0060, 45, 40.80, -73.93) AS target_ahead // Returns: true // Check if northeast point is ahead when heading 45° (northeast)
geo.reachable_score
geo.reachable_score(lat, lon, speed, time, target_lat, target_lon) -> floatScores whether a target is reachable given speed (m/s) and time (seconds) constraints. Returns 0.0 to 1.0 where 1.0 means easily reachable.
RETURN geo.reachable_score(40.7128, -74.0060, 100, 3600, 40.9, -73.8) AS reachability // Returns: 0.95 // Score for reaching a point 20km away at 100 m/s in 1 hour
geo.route_score
geo.route_score(path, waypoints) -> floatScores how well a path follows a set of waypoints. Returns 0.0 to 1.0 where 1.0 means perfect alignment.
WITH [[40.7128, -74.0060], [40.80, -73.93], [40.90, -73.85]] AS path,
[[40.7128, -74.0060], [40.8, -73.93], [40.9, -73.85]] AS waypoints
RETURN geo.route_score(path, waypoints) AS alignment_score
// Returns: 0.98
// Near-perfect alignment between actual and expected waypointsgeo.to_mgrs
geo.to_mgrs(lat, lon) -> stringConverts WGS84 coordinates to Military Grid Reference System (MGRS) format. Used by military, emergency services, and surveying applications.
RETURN geo.to_mgrs(40.7128, -74.0060) AS nyc_mgrs // Returns: "18TWL8026064799" // MGRS grid reference for New York City
geo.to_utm
geo.to_utm(lat, lon) -> mapConverts WGS84 coordinates to UTM (Universal Transverse Mercator) easting/northing and zone. Returns a map with easting, northing, and zone.
RETURN geo.to_utm(40.7128, -74.0060) AS nyc_utm
// Returns: {easting: 583960, northing: 4507523, zone: 18}
// UTM coordinates for NYC (Zone 18N)geo_destination
geo_destination(lat, lon, bearing_deg, distance_m) -> [lat, lon]Computes the destination point from a start location, bearing (degrees), and distance (meters). Underscore variant of geo.destination.
RETURN geo_destination(51.5074, -0.1278, 135, 50000) AS dest_southeast_50km // Returns: [51.14287, 0.52194] // Point 50km southeast of London
geo_midpoint
geo_midpoint(lat1, lon1, lat2, lon2) -> [lat, lon]Returns the geographic midpoint between two coordinates. Useful for calculating meeting points or center locations.
RETURN geo_midpoint(40.7128, -74.0060, 34.0522, -118.2437) AS us_midpoint // Returns: [37.8825, -96.1253] // Geographic center point between NYC and LAX
geohash_decode
geohash_decode(geohash) -> [lat, lon]Decodes a geohash string back to a lat/lon pair. Geohashes provide spatial indexing with hierarchical precision.
RETURN geohash_decode("dr5regw") AS decoded
// Returns: [40.714, -74.010]
// Decodes geohash back to coordinates near NYCgeohash_encode
geohash_encode(lat, lon, precision?) -> stringEncodes a lat/lon pair as a geohash string. Precision defaults to 11 (about 1m accuracy). Higher precision = longer string.
RETURN geohash_encode(40.7128, -74.0060, 7) AS nyc_geohash // Returns: "dr5regw" // 7-character geohash for NYC (~150m precision)
haversine
haversine(lat1, lon1, lat2, lon2) -> floatAlias for haversine_distance. Returns the great-circle distance in meters between two geographic points.
RETURN haversine(40.7128, -74.0060, 48.8566, 2.3522) AS distance_nyc_to_paris // Returns: 5837000 (approximately 5837 km) // Great-circle distance in meters
haversine_distance
haversine_distance(lat1, lon1, lat2, lon2) -> floatReturns the great-circle distance in meters between two geographic points. Uses the Haversine formula accounting for Earth's curvature.
RETURN haversine_distance(37.7749, -122.4194, 34.0522, -118.2437) AS distance_sf_to_lax // Returns: 559000 (approximately 559 km) // Distance between San Francisco and Los Angeles
mercator_to_wgs84
mercator_to_wgs84(x, y) -> [lat, lon]Converts Web Mercator projection coordinates back to WGS84 lat/lon. Web Mercator is used by Google Maps and most web mapping libraries.
RETURN mercator_to_wgs84(-8235802, 4960145) AS wgs84_coords // Returns: [40.7128, -74.0060] // Converts Web Mercator to NYC coordinates
point
point(map) -> pointCreates a 2D or 3D geographic/cartesian point from a map. Map can contain x/y for Cartesian or latitude/longitude for geographic points.
RETURN point({latitude: 40.7128, longitude: -74.0060}) AS nyc_point
// Returns: point{latitude: 40.7128, longitude: -74.0060}
// Creates a geographic point for NYCpoint.distance
point.distance(p1, p2) -> floatReturns the geodesic distance in meters between two point objects. Works with both geographic and Cartesian points.
WITH point({latitude: 40.7128, longitude: -74.0060}) AS p1,
point({latitude: 40.9, longitude: -73.8}) AS p2
RETURN point.distance(p1, p2) AS distance_meters
// Returns: 23400 (approximately 23.4 km)
// Distance between two geographic pointspoint.withinbbox
point.withinbbox(point, lowerLeft, upperRight) -> boolReturns true if a point is within the bounding box defined by lowerLeft and upperRight corners.
WITH point({latitude: 40.7128, longitude: -74.0060}) AS nyc,
point({latitude: 40.5, longitude: -74.3}) AS lower_left,
point({latitude: 41.0, longitude: -73.7}) AS upper_right
RETURN point.withinbbox(nyc, lower_left, upper_right) AS in_bbox
// Returns: true
// NYC is within the defined bounding boxpolar_to_cartesian
polar_to_cartesian(r, theta) -> [x, y]Converts polar coordinates (radius, angle in radians) to 2D Cartesian coordinates (x, y).
RETURN polar_to_cartesian(5, 0.9273) AS cartesian_coords // Returns: [3.0, 4.0] // Converts polar (r=5, θ≈53°) to Cartesian (3,4)
to_geojson
to_geojson(lat, lon) -> stringReturns a GeoJSON Point string for the given coordinates. GeoJSON is a standard format for geographic data interchange.
RETURN to_geojson(40.7128, -74.0060) AS geojson
// Returns: '{"type":"Point","coordinates":[-74.0060,40.7128]}'
// Standard GeoJSON format for NYCto_wkt
to_wkt(lat, lon) -> stringReturns a WKT (Well-Known Text) POINT string for the given coordinates. WKT is a standard format for spatial data.
RETURN to_wkt(40.7128, -74.0060) AS wkt // Returns: 'POINT(-74.0060 40.7128)' // Standard WKT format for NYC
utm_to_wgs84
utm_to_wgs84(easting, northing, zone, hemisphere) -> [lat, lon]Converts UTM coordinates back to WGS84 lat/lon. Hemisphere should be "N" for northern or "S" for southern hemisphere.
RETURN utm_to_wgs84(583960, 4507523, 18, "N") AS wgs84_coords // Returns: [40.7128, -74.0060] // Converts UTM Zone 18N back to NYC coordinates
utm_zone
utm_zone(lat, lon) -> integerReturns the UTM zone number for a given coordinate. UTM divides Earth into 60 zones, each 6 degrees of longitude wide.
RETURN utm_zone(40.7128, -74.0060) AS zone_number // Returns: 18 // NYC is in UTM Zone 18
wgs84_to_ecef
wgs84_to_ecef(lat, lon, alt_m) -> [x, y, z]Converts WGS84 geodetic coordinates (lat/lon/altitude) to Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates. Used for satellite and GPS calculations.
RETURN wgs84_to_ecef(40.7128, -74.0060, 10) AS ecef_coords // Returns: [1334636, -4653242, 4137881] // ECEF representation of NYC at 10m altitude
wgs84_to_mercator
wgs84_to_mercator(lat, lon) -> [x, y]Converts WGS84 coordinates to Web Mercator projection. Web Mercator is standard for online mapping (Google Maps, OpenStreetMap, etc.).
RETURN wgs84_to_mercator(40.7128, -74.0060) AS mercator_coords // Returns: [-8235802, 4960145] // Web Mercator projection for NYC
wgs84_to_utm
wgs84_to_utm(lat, lon) -> [easting, northing, zone]Converts WGS84 coordinates to UTM easting/northing and zone. Returns a list [easting, northing, zone] for the appropriate UTM zone.
RETURN wgs84_to_utm(40.7128, -74.0060) AS utm_coords // Returns: [583960, 4507523, 18] // NYC in UTM Zone 18N coordinates
Graph Analytics Functions (27)
adamic_adar
adamic_adar(node1, node2) -> floatReturns the Adamic-Adar link prediction score between two nodes. This measure is commonly used in recommendation systems to predict likely connections based on shared neighbors and their degrees.
MATCH (alice:User {name: "Alice"}), (bob:User {name: "Bob"})
RETURN xg.adamic_adar(alice, bob) AS prediction_scorebridge_score
bridge_score(edge) -> floatReturns how likely an edge is a bridge between communities. A bridge edge connects different communities and has high importance in network connectivity.
MATCH (a:User)-[edge:FOLLOWS]->(b:User) RETURN edge, xg.bridge_score(edge) AS bridge_likelihood ORDER BY bridge_likelihood DESC LIMIT 10
chain_avg_by
chain_avg_by(list, property) -> floatAverages a property across all elements in a list. Useful for aggregating metrics from a collection of nodes or relationships.
MATCH (user:User)-[:POSTED]->(posts:Post) WITH user, collect(posts) AS user_posts RETURN user.name, xg.chain_avg_by(user_posts, "engagement_score") AS avg_engagement
chain_count_by
chain_count_by(list, property) -> mapCounts occurrences of each distinct property value. Returns a map where keys are property values and values are occurrence counts.
MATCH (post:Post)<-[:POSTED]-(:User) WITH collect(post) AS all_posts RETURN xg.chain_count_by(all_posts, "category") AS category_distribution
chain_filter_eq
chain_filter_eq(list, property, value) -> listFilters a list keeping elements where a property equals a value. Returns a new list containing only matching elements.
MATCH (user:User)-[:RATED]->(movie:Movie) WITH user, collect(movie) AS movies RETURN user.name, xg.chain_filter_eq(movies, "genre", "Action") AS action_movies
chain_filter_gt
chain_filter_gt(list, property, threshold) -> listFilters a list keeping elements where a property is greater than a threshold. Returns a new list with qualifying elements.
MATCH (user:User)-[:RATED]->(movie:Movie) WITH user, collect(movie) AS rated_movies RETURN user.name, xg.chain_filter_gt(rated_movies, "rating", 7.5) AS highly_rated
chain_filter_lt
chain_filter_lt(list, property, threshold) -> listFilters a list keeping elements where a property is less than a threshold. Returns a new list with matching elements.
MATCH (person:Person)-[:HAS_TRANSACTION]->(txn:Transaction) WITH person, collect(txn) AS transactions RETURN person.name, xg.chain_filter_lt(transactions, "amount", 100.0) AS small_transactions
chain_group_by
chain_group_by(list, property) -> mapGroups list elements by a property into a map of lists. Keys are property values, values are lists of elements with that property value.
MATCH (user:User)-[:COMPLETED]->(course:Course) WITH user, collect(course) AS user_courses RETURN user.name, xg.chain_group_by(user_courses, "difficulty_level") AS courses_by_level
chain_map
chain_map(list, property) -> listExtracts a property from each element in a list (projection). Returns a new list containing only the specified property values.
MATCH (team:Team)-[:HAS_MEMBER]->(player:Player) WITH team, collect(player) AS members RETURN team.name, xg.chain_map(members, "salary") AS all_salaries
chain_max_by
chain_max_by(list, property) -> valueReturns the element with the maximum value of a property. Useful for finding the highest-scoring or highest-valued item in a collection.
MATCH (user:User)-[:AUTHORED]->(article:Article) WITH user, collect(article) AS articles RETURN user.name, xg.chain_max_by(articles, "view_count") AS most_viewed_article
chain_min_by
chain_min_by(list, property) -> valueReturns the element with the minimum value of a property. Useful for finding the lowest-scoring or lowest-valued item.
MATCH (warehouse:Warehouse)-[:STORES]->(item:Item) WITH warehouse, collect(item) AS inventory RETURN warehouse.location, xg.chain_min_by(inventory, "stock_level") AS lowest_stock_item
chain_sort_by
chain_sort_by(list, property) -> listSorts a list of elements by a given property. Returns a new list ordered by the property value.
MATCH (conference:Conference)-[:INCLUDES]->(session:Session) WITH conference, collect(session) AS sessions RETURN conference.name, xg.chain_sort_by(sessions, "start_time") AS ordered_schedule
chain_sum_by
chain_sum_by(list, property) -> numberSums a property across all elements in a list. Returns the total of all property values.
MATCH (customer:Customer)-[:MADE_PURCHASE]->(order:Order) WITH customer, collect(order) AS purchases RETURN customer.name, xg.chain_sum_by(purchases, "total_amount") AS lifetime_value
common_neighbor_count
common_neighbor_count(node1, node2) -> integerReturns the number of common neighbors between two nodes. Useful for measuring similarity or predicting new connections.
MATCH (person1:Person {name: "John"}), (person2:Person {name: "Jane"})
RETURN xg.common_neighbor_count(person1, person2) AS mutual_friendscommunity_modularity
community_modularity(partition) -> floatReturns the modularity score of a community partition. Higher scores indicate stronger community structure.
MATCH (node:Node) WHERE node.community_id IS NOT NULL WITH collect(node) AS partition RETURN xg.community_modularity(partition) AS modularity_score
edge_weight_normalize
edge_weight_normalize(edges, property) -> listNormalizes edge weights to the 0-1 range by property. Returns a list of edges with normalized weight values.
MATCH (a:Page)-[edge:LINKS_TO]->(b:Page) WITH collect(edge) AS all_edges RETURN xg.edge_weight_normalize(all_edges, "frequency") AS normalized_links
graph_density
graph_density(node_count, edge_count) -> floatReturns the density of a graph or subgraph. Density is calculated as edges / (nodes * (nodes - 1) / 2).
MATCH (n:Node) WITH count(DISTINCT n) AS total_nodes MATCH ()-[e:CONNECTED]->() WITH total_nodes, count(DISTINCT e) AS total_edges RETURN xg.graph_density(total_nodes, total_edges) AS density
graph_diameter_approx
graph_diameter_approx(node_count?) -> floatReturns an approximate graph diameter. The optional node_count parameter can improve estimation accuracy.
MATCH (n:Node) WITH count(n) AS total_nodes RETURN xg.graph_diameter_approx(total_nodes) AS estimated_diameter
graph_radius_approx
graph_radius_approx(node_count?) -> floatReturns an approximate graph radius. The optional node_count parameter can improve estimation accuracy.
MATCH (n:Node) WITH count(n) AS total_nodes RETURN xg.graph_radius_approx(total_nodes) AS estimated_radius
hits_authority
hits_authority(node) -> floatReturns the HITS authority score of a node. Authority nodes are pointed to by many hub nodes (useful for finding quality sources).
MATCH (page:WebPage) RETURN page.url, xg.hits_authority(page) AS authority_score ORDER BY authority_score DESC LIMIT 20
hits_hub
hits_hub(node) -> floatReturns the HITS hub score of a node. Hub nodes point to many authority nodes (useful for finding directory pages).
MATCH (page:WebPage) RETURN page.url, xg.hits_hub(page) AS hub_score ORDER BY hub_score DESC LIMIT 20
influence_spread
influence_spread(node, steps?) -> floatEstimates the influence spread of a node in the graph. Returns a score representing how far influence propagates, optionally limited to a number of steps.
MATCH (influencer:User {name: "Alice"})
RETURN xg.influence_spread(influencer, 3) AS reach_3_hopskatz_centrality
katz_centrality(node, alpha?, beta?) -> floatReturns the Katz centrality score of a node. Measures importance considering both direct and indirect connections. Alpha (damping) and beta (bias) parameters are optional.
MATCH (person:Person) RETURN person.name, xg.katz_centrality(person, 0.1, 1.0) AS centrality ORDER BY centrality DESC
preferential_attachment
preferential_attachment(node1, node2) -> integerReturns the preferential attachment score (degree product). Used to predict link formation based on node degrees in growing networks.
MATCH (user1:User {name: "Alice"}), (user2:User {name: "Bob"})
RETURN xg.preferential_attachment(user1, user2) AS attachment_scoreresource_allocation
resource_allocation(node1, node2) -> floatReturns the resource allocation index for link prediction. Measures the likelihood of a connection forming between two nodes.
MATCH (user1:User)-[]->(mutual:User)<-[]-(:User {name: "Bob"})
WITH DISTINCT user1, collect(DISTINCT mutual) AS mutual_friends
RETURN user1.name, xg.resource_allocation(user1, mutual_friends[0]) AS link_probabilityshortest_path_weight
shortest_path_weight(path) -> floatReturns the total weight of the shortest weighted path. Sums edge weights along the path.
MATCH (start:City {name: "New York"}), (end:City {name: "Los Angeles"})
MATCH path = shortestPath((start)-[:ROUTE*]->(end))
RETURN xg.shortest_path_weight(path) AS total_distancetemporal_decay_rank
temporal_decay_rank(node, half_life_days) -> floatReturns a rank score that decays over time from last activity. Useful for time-sensitive rankings where recent activity is more valuable.
MATCH (item:Item) WHERE item.last_accessed IS NOT NULL RETURN item.id, xg.temporal_decay_rank(item, 30) AS recency_score ORDER BY recency_score DESC
Hash Functions (3)
crc32
crc32(value) -> integerReturns the CRC-32 checksum of a value. Useful for data integrity verification and content-based routing in distributed graphs.
MATCH (p:Person {name: 'Alice'})
RETURN p.name, crc32(p.email) AS email_hash
LIMIT 1fnv1a
fnv1a(value) -> integerReturns the FNV-1a hash of a value. Fast non-cryptographic hash with good distribution, ideal for bucketing and sharding.
MATCH (d:Document) RETURN d.id, fnv1a(d.content) % 8 AS shard_bucket ORDER BY shard_bucket
murmur3
murmur3(value) -> integerReturns the MurmurHash3 hash of a value. Excellent hash distribution, commonly used for deduplication and set operations.
MATCH (u:User)-[r:POSTED]->(t:Tweet)
WITH u, t, murmur3(t.content) AS content_hash
WHERE NOT (u)-[:DUPLICATE_DETECTED]->(t)
MATCH (u2:User)-[r2:POSTED]->(t2:Tweet)
WHERE murmur3(t2.content) = content_hash AND t2 <> t
CREATE (u)-[:DUPLICATE_DETECTED {confidence: 0.95}]->(t2)List Functions (32)
keys
keys(map_or_node) -> listReturns a list of all property keys from a map or entity node. Essential for schema introspection and dynamic property handling.
MATCH (u:User) WHERE size(keys(u)) > 10 RETURN u.id, keys(u) AS all_properties ORDER BY u.created_at DESC LIMIT 5
labels
labels(node) -> listReturns a list of labels assigned to a node. Used for dynamic filtering and type introspection in polymorphic graphs.
MATCH (n) WHERE 'Account' IN labels(n) RETURN n.id, labels(n) AS node_types LIMIT 10
list.avg
list.avg(list) -> floatReturns the average of all numeric elements in a list. Perfect for computing average metrics from collected values.
MATCH (s:Sensor)-[r:RECORDED {timestamp: {year: 2026, month: 4}}]->(v:Value)
WITH s.id, collect(r.temperature) AS temps
RETURN s.id, list.avg(temps) AS avg_temperature
ORDER BY avg_temperature DESClist.contains
list.contains(list, value) -> boolReturns true if the list contains the given value. Efficient membership testing for filtering results.
WITH ['active', 'pending', 'verified'] AS allowed_statuses
MATCH (a:Account {status: 'pending'})
WHERE list.contains(allowed_statuses, a.status)
RETURN a.id, a.status
LIMIT 20list.difference
list.difference(list1, list2) -> listReturns elements in list1 that are not in list2. Useful for set operations like finding removed items or exclusions.
MATCH (u:User {id: 1})-[r:FOLLOWING]->(f:User)
WITH collect(f.id) AS current_following
WITH current_following, [2, 3, 4, 5] AS previous_following
RETURN list.difference(previous_following, current_following) AS unfollowed_idslist.distinct
list.distinct(list) -> listReturns a list with duplicate elements removed. Critical for deduplication in aggregation pipelines.
MATCH (p:Post)-[r:LIKED_BY]->(u:User) WITH p.id, collect(u.category) AS categories RETURN p.id, list.distinct(categories) AS unique_user_categories
list.drop
list.drop(list, n) -> listReturns a list with the first n elements removed. Useful for pagination and skipping initial results.
MATCH (a:Article) WITH collect(a.id) AS all_articles RETURN list.drop(all_articles, 10) AS articles_after_skip_10
list.flatten
list.flatten(nested_list) -> listFlattens a nested list into a single-depth list. Essential for processing multi-level aggregations.
MATCH (g:Group)-[r:CONTAINS]->(t:Team) WITH g.id, collect(collect(t.id)) AS nested_teams RETURN g.id, list.flatten(nested_teams) AS all_team_ids
list.indexof
list.indexof(list, value) -> integerReturns the zero-based index of the first occurrence of a value, or -1 if not found. Perfect for position-based logic.
WITH ['red', 'green', 'blue', 'yellow'] AS colors RETURN list.indexof(colors, 'blue') AS blue_position
list.intersect
list.intersect(list1, list2) -> listReturns elements common to both lists. Critical for finding shared properties across entities.
MATCH (u1:User {id: 1})-[r1:KNOWS]->(c1:Contact)
MATCH (u2:User {id: 2})-[r2:KNOWS]->(c2:Contact)
WITH collect(c1.id) AS friends_u1, collect(c2.id) AS friends_u2
RETURN list.intersect(friends_u1, friends_u2) AS mutual_friendslist.length
list.length(list) -> integerReturns the number of elements in a list. Fundamental for counting aggregated results and filtering by size.
MATCH (a:Author)-[r:WROTE]->(b:Book) WITH a.name, collect(b.id) AS books WHERE list.length(books) > 5 RETURN a.name, list.length(books) AS book_count ORDER BY book_count DESC
list.max
list.max(list) -> numberReturns the maximum value in a list. Essential for finding peak metrics and threshold analysis.
MATCH (d:Device)-[r:RECORDED]->(m:Metric) WITH d.id, collect(r.value) AS metric_values RETURN d.id, list.max(metric_values) AS peak_value WHERE list.max(metric_values) > 100
list.median
list.median(list) -> numberReturns the median value of numeric elements in a list. Better than average for skewed distributions.
MATCH (s:Store)-[r:SOLD {date: {year: 2026, month: 4}}]->(p:Product)
WITH s.id, collect(r.price) AS prices
RETURN s.id, list.median(prices) AS median_price
ORDER BY median_pricelist.min
list.min(list) -> numberReturns the minimum value in a list. Essential for anomaly detection and finding lower bounds.
MATCH (c:Cache)-[r:ACCESSED]->(i:Item) WITH c.id, collect(r.latency_ms) AS latencies WHERE list.min(latencies) > 10 RETURN c.id, list.min(latencies) AS min_latency
list.percentile
list.percentile(list, percentile) -> numberReturns the value at the given percentile in a sorted list. Critical for SLA and performance analysis.
MATCH (api:APIEndpoint)-[r:INVOKED]->(t:Transaction) WITH api.name, collect(r.response_time_ms) AS times RETURN api.name, list.percentile(times, 0.50) AS p50, list.percentile(times, 0.95) AS p95, list.percentile(times, 0.99) AS p99
list.product
list.product(list) -> numberReturns the product of all numeric elements in a list. Useful for growth rates and compound calculations.
MATCH (inv:Investment)-[r:QUARTERLY_RETURN]->(q:Quarter) WITH inv.name, collect(r.growth_factor) AS growth_factors RETURN inv.name, list.product(growth_factors) AS total_growth ORDER BY total_growth DESC
list.reverse
list.reverse(list) -> listReturns a copy of a list in reverse order. Useful for timeline reversal and LIFO processing.
MATCH (t:Timeline)-[r:EVENT_AT]->(e:Event) WITH collect(e.timestamp) AS chronological RETURN list.reverse(chronological) AS reverse_chronological
list.slice
list.slice(list, start, end?) -> listReturns a sub-list from start index to optional end index. Essential for window operations and pagination.
MATCH (q:Queue)-[r:TASK_IN]->(t:Task) WITH collect(t.id) AS all_tasks RETURN list.slice(all_tasks, 10, 20) AS page_2_tasks
list.sort
list.sort(list) -> listReturns a sorted copy of a list in ascending order. Foundation for ranking and ordered aggregation.
MATCH (c:Country)-[r:EXPORTS]->(g:Good) WITH c.name, collect(r.value_usd) AS trade_values RETURN c.name, list.sort(trade_values) AS sorted_exports ORDER BY c.name
list.stddev
list.stddev(list) -> floatReturns the population standard deviation of numeric elements. Critical for variance analysis and quality control.
MATCH (m:Machine)-[r:PRODUCED]->(p:Part) WITH m.id, collect(r.weight_grams) AS weights WHERE list.stddev(weights) > 2.5 RETURN m.id, list.stddev(weights) AS weight_variance
list.sum
list.sum(list) -> numberReturns the sum of all numeric elements in a list. Fundamental aggregation for totaling metrics.
MATCH (o:Order)-[r:CONTAINS]->(i:Item) WITH o.id, collect(r.quantity * r.unit_price) AS amounts RETURN o.id, list.sum(amounts) AS order_total ORDER BY order_total DESC
list.take
list.take(list, n) -> listReturns the first n elements of a list. Essential for LIMIT-like behavior on collected results.
MATCH (u:User)-[r:VIEWED]->(v:Video) WITH u.id, collect(v.id ORDER BY r.timestamp DESC) AS watched RETURN u.id, list.take(watched, 5) AS last_5_videos
list.union
list.union(list1, list2) -> listReturns the set union of two lists. Perfect for combining datasources and avoiding duplicates.
MATCH (t:Team {id: 1})-[r:MEMBER_OF]->(p:Person)
MATCH (t)-[r2:AFFILIATE_OF]->(a:Affiliate)
WITH collect(p.id) AS core_members, collect(a.person_id) AS affiliates
RETURN list.union(core_members, affiliates) AS all_participantslist.variance
list.variance(list) -> floatReturns the population variance of numeric elements in a list. Essential for statistical analysis and risk assessment.
MATCH (p:Portfolio)-[r:HOLDING]->(s:Stock) WITH p.id, collect(r.return_percent) AS returns RETURN p.id, list.variance(returns) AS portfolio_variance ORDER BY portfolio_variance DESC
list.zip
list.zip(list1, list2) -> listZips two lists into a list of pairs. Critical for aligned pairwise operations and correlation analysis.
WITH ['A', 'B', 'C'] AS keys, [1, 2, 3] AS values RETURN list.zip(keys, values) AS key_value_pairs
nodes
nodes(path) -> listReturns all nodes in a path as a list. Essential for extracting node sequences from traversals.
MATCH p = (a:Author {name: 'Alice'})-[:COLLABORATED*1..3]->(z:Author {name: 'Zoe'})
WITH p, nodes(p) AS collaboration_chain
RETURN [node IN collaboration_chain | node.name] AS author_names
LIMIT 1range
range(start, end, step?) -> listGenerates a list of integers from start to end with optional step. Foundation for generating synthetic data and sequences.
UNWIND range(1, 10, 2) AS odd_number
MATCH (n {id: odd_number})
RETURN collect(n.value) AS odd_valuesrange_generate
range_generate(start, end, step?) -> listGenerates a list of integers from start to end with optional step. Alias and alternative for range function.
MATCH (m:Month) WHERE m.number IN range_generate(1, 12, 3) RETURN m.name
relationships
relationships(path) -> listReturns all relationships in a path as a list. Critical for analyzing edge sequences and relationship types in traversals.
MATCH p = (s:Source)-[*1..5]->(t:Target) WITH p, relationships(p) AS edge_path RETURN [rel IN edge_path | type(rel)] AS relationship_types
tail
tail(list) -> listReturns all elements of a list except the first. Useful for skip-one processing and tail recursion patterns.
MATCH (h:HeadNode)-[r:LINKED]->(n:NextNode) WITH collect(n.id) AS all_nodes RETURN tail(all_nodes) AS nodes_after_first
toset
toset(list) -> listRemoves duplicate elements from a list. Alias for deduplication, same as list.distinct().
MATCH (p:Publication)-[r:WRITTEN_BY]->(a:Author) WITH collect(a.field) AS author_fields RETURN toset(author_fields) AS unique_fields
uniformsample
uniformsample(list, count) -> listReturns a uniformly random sample of elements from a list. Essential for statistical sampling and monte carlo analysis.
MATCH (p:Patient)-[r:TEST_RESULT]->(t:Test) WITH collect(r.value) AS all_results RETURN uniformsample(all_results, 50) AS sample_results
Map Functions (6)
map_get
map_get(map, key, default?) -> valueReturns the value for a key from a map, or a default if missing. Essential for safe property access without null errors.
MATCH (c:Config) RETURN c.id, map_get(c.settings, 'timeout', 30) AS timeout, map_get(c.settings, 'retries', 3) AS retries
map_has_key
map_has_key(map, key) -> boolReturns true if the map contains the given key. Perfect for conditional logic based on key presence.
MATCH (p:Product) WHERE map_has_key(p.attributes, 'weight_kg') RETURN p.id, p.attributes AS all_attributes LIMIT 10
map_keys
map_keys(map) -> listReturns all keys from a map as a list. Critical for schema inspection and dynamic property enumeration.
MATCH (u:User {id: 123})
RETURN u.id, map_keys(u.profile) AS profile_fieldsmap_merge
map_merge(map1, map2) -> mapMerges two maps into one, with the second overriding duplicates. Essential for configuration composition and patching.
MATCH (u:User {id: 1})
WITH u.base_settings AS base, u.custom_overrides AS custom
RETURN map_merge(base, custom) AS final_settingsmap_values
map_values(map) -> listReturns all values from a map as a list. Useful for aggregating values across a property map.
MATCH (s:Store {id: 1})
RETURN s.id, map_values(s.inventory) AS stock_countsvalues
values(map) -> listReturns all values from a map as a list. Alias for map_values, used interchangeably.
MATCH (r:Report) WITH r.metrics AS metrics_map RETURN values(metrics_map) AS all_metrics
Math Functions (35)
abs
abs(x) -> numberReturns the absolute value of a number. Essential for magnitude calculations and distance metrics.
MATCH (t:Transaction) WHERE abs(t.balance_change) > 1000 RETURN t.id, abs(t.balance_change) AS magnitude ORDER BY magnitude DESC
acos
acos(x) -> floatReturns the arc cosine of x in radians. Used in geometric and navigation calculations.
WITH 0.5 AS cosine_value RETURN acos(cosine_value) AS angle_radians
asin
asin(x) -> floatReturns the arc sine of x in radians. Used for angle recovery from sine values in spatial calculations.
WITH 0.866 AS sine_value RETURN asin(sine_value) AS angle_radians
atan
atan(x) -> floatReturns the arc tangent of x in radians. Used in slope and gradient calculations.
MATCH (r:Road) RETURN r.id, atan(r.slope) AS gradient_angle
atan2
atan2(y, x) -> floatReturns the arc tangent of y/x using the signs of both to determine the quadrant. Critical for bearing calculations.
MATCH (a:Aircraft)-[r:POSITION]->(p:Point) RETURN a.id, atan2(p.lat_change, p.lon_change) AS bearing_radians
cbrt
cbrt(x) -> floatReturns the cube root of x. Used in volume calculations and scaling operations.
MATCH (c:Container) RETURN c.id, cbrt(c.volume_cubic_units) AS side_length
ceil
ceil(x) -> integerRounds a number up to the nearest integer. Essential for allocation and quota calculations.
MATCH (i:Invoice) RETURN i.id, ceil(i.amount / 100) AS hundreds_required
clamp
clamp(value, min, max) -> numberClamps a value between a minimum and maximum. Critical for enforcing range constraints.
MATCH (s:Setting) RETURN s.id, clamp(s.refresh_interval, 100, 5000) AS bounded_interval
cos
cos(x) -> floatReturns the cosine of x (in radians). Foundation for circular and oscillatory calculations.
MATCH (w:Wave) RETURN w.id, cos(w.phase_radians) AS amplitude
cosh
cosh(x) -> floatReturns the hyperbolic cosine of x. Used in exponential growth models and catenary curves.
WITH 1.5 AS exponent RETURN cosh(exponent) AS hyperbolic_value
degrees
degrees(radians) -> floatConverts radians to degrees. Essential for making angle calculations human-readable.
MATCH (d:Direction) RETURN d.id, degrees(d.heading_radians) AS heading_degrees
e
e() -> floatReturns the mathematical constant e (Euler's number, ~2.71828). Foundation for exponential functions.
RETURN e() * e() AS e_squared
exp
exp(x) -> floatReturns e raised to the power of x. Critical for exponential growth and decay models.
MATCH (g:Growth) RETURN g.id, exp(g.growth_rate * g.time_years) AS multiplier
floor
floor(x) -> integerRounds a number down to the nearest integer. Essential for binning and conservative rounding.
MATCH (p:Price) RETURN p.id, floor(p.amount) AS whole_dollars
haversin
haversin(theta) -> floatReturns the haversine of an angle (in radians). Foundation for great-circle distance calculations.
MATCH (g:GeoPoint) RETURN g.id, haversin(g.latitude_radians) AS hav_value
hypot
hypot(x, y) -> floatReturns the Euclidean distance sqrt(x*x + y*y). Perfect for 2D distance calculations.
MATCH (p1:Point {id: 1}), (p2:Point {id: 2})
RETURN hypot(p1.x - p2.x, p1.y - p2.y) AS distancelerp
lerp(a, b, t) -> floatLinearly interpolates between a and b by factor t (clamped 0-1). Essential for animation and gradual transitions.
WITH 100 AS start, 200 AS end, 0.3 AS progress RETURN lerp(start, end, progress) AS current_value
log
log(x) -> floatReturns the natural logarithm (base e) of x. Essential for growth rate analysis and inverse exponentials.
MATCH (m:Metric) RETURN m.id, log(m.value) AS log_value WHERE m.value > 0
log10
log10(x) -> floatReturns the base-10 logarithm of x. Used for magnitude scales like pH, Richter, and dB.
MATCH (s:Sound) RETURN s.id, 20 * log10(s.amplitude) AS decibels
log2
log2(x) -> floatReturns the base-2 logarithm of x. Used for bit/information calculations and complexity analysis.
MATCH (d:Dataset) RETURN d.id, log2(d.cardinality) AS information_bits
max_val
max_val(a, b) -> numberReturns the larger of two numeric values. Pairwise maximum for conditional logic.
MATCH (c:Comparison) RETURN c.id, max_val(c.value_a, c.value_b) AS larger
min_val
min_val(a, b) -> numberReturns the smaller of two numeric values. Pairwise minimum for constraint enforcement.
MATCH (s:Schedule) RETURN s.task_id, min_val(s.deadline, s.estimated_end) AS critical_date
mod
mod(dividend, divisor) -> numberReturns the remainder of integer division. Essential for cycling, sharding, and periodic calculations.
MATCH (i:Item) WHERE mod(i.id, 10) = 0 RETURN i.id AS every_tenth_item
pi
pi() -> floatReturns the mathematical constant pi (~3.14159). Foundation for circular and polar calculations.
MATCH (c:Circle) RETURN c.id, pi() * c.radius * c.radius AS area
power
power(base, exponent) -> floatReturns base raised to the given exponent. Essential for polynomial growth, scaling, and physics calculations.
MATCH (c:Compound) RETURN c.id, power(2, c.doubling_count) AS final_amount
radians
radians(degrees) -> floatConverts degrees to radians. Essential for trigonometric function inputs.
MATCH (d:Direction) RETURN d.id, sin(radians(d.bearing_degrees)) AS sine_component
rand
rand() -> floatReturns a random float between 0.0 (inclusive) and 1.0 (exclusive). Foundation for stochastic sampling and randomization.
MATCH (u:User) WHERE rand() < 0.05 RETURN u.id AS sampled_user
round
round(x) -> integerRounds a number to the nearest integer. Essential for displaying and rounding metrics.
MATCH (r:Rating) RETURN r.id, round(r.score) AS rounded_score
sign
sign(x) -> integerReturns the signum of a number: -1 (negative), 0 (zero), or 1 (positive). Perfect for trend detection.
MATCH (t:Transaction)
RETURN t.id,
CASE sign(t.amount)
WHEN -1 THEN 'Withdrawal'
WHEN 1 THEN 'Deposit'
ELSE 'No Change'
END AS transaction_typesin
sin(x) -> floatReturns the sine of x (in radians). Foundation for circular motion and oscillation models.
MATCH (o:OscillatingSignal) RETURN o.id, sin(o.phase) AS amplitude
sinh
sinh(x) -> floatReturns the hyperbolic sine of x. Used in hyperbolic geometry and exponential models.
WITH 2.0 AS exponent RETURN sinh(exponent) AS hyperbolic_sine
sqrt
sqrt(x) -> floatReturns the square root of x. Essential for distance calculations and variance operations.
MATCH (n:Node) RETURN n.id, sqrt(n.variance) AS standard_deviation WHERE n.variance >= 0
tan
tan(x) -> floatReturns the tangent of x (in radians). Used for slope and angle relationship calculations.
MATCH (a:Angle) RETURN a.id, tan(a.radians) AS slope
tanh
tanh(x) -> floatReturns the hyperbolic tangent of x. Used in neural networks and sigmoid-like compression functions.
MATCH (v:Vector) RETURN v.id, tanh(v.value) AS normalized_value
truncate
truncate(x, places) -> floatTruncates a number to the specified number of decimal places. Precise for financial and display formatting.
MATCH (i:Interest) RETURN i.id, truncate(i.rate, 4) AS rate_4_decimals
Materialized Views
Create, manage, and refresh materialized views for cached query results. Similar to ClickHouse MATERIALIZED VIEW.
mv.create
CALL mv.create(name, query) YIELD name, statusCreates a materialized view by storing a Cypher query definition as a :MaterializedView node. Equivalent to ClickHouse CREATE MATERIALIZED VIEW.
CALL mv.create("top_users", "MATCH (u:User)-[c:CREATED]->(p:Post) RETURN u.id, COUNT(p) AS post_count ORDER BY post_count DESC LIMIT 100") YIELD name, status RETURN name, status;mv.drop
CALL mv.drop(name) YIELD name, statusDrops a materialized view definition.
CALL mv.drop("top_users") YIELD name, status RETURN name, status;mv.due
CALL mv.due() YIELD name, query, overdue_secReturns materialized views that are overdue for refresh based on their interval.
CALL mv.due() YIELD name, query, overdue_sec WHERE overdue_sec > 300 RETURN name, overdue_sec ORDER BY overdue_sec DESC;
mv.list
CALL mv.list() YIELD name, queryLists all materialized view definitions.
CALL mv.list() YIELD name, query RETURN name, LENGTH(query) AS query_length ORDER BY name;
mv.refresh
CALL mv.refresh(name) YIELD name, query, statusMarks a materialized view as refreshed and returns its query. The caller executes the returned query. Cron-based refresh: * * * * * echo "CALL mv.due() YIELD name WITH name CALL mv.refresh(name) YIELD status RETURN status;" | xgconsole. Equivalent to ClickHouse SYSTEM REFRESH MATERIALIZED VIEW.
CALL mv.refresh("top_users") YIELD name, query, status WITH query CALL apoc.cypher.run(query, {}) YIELD value RETURN name, status, COUNT(value) AS row_count;mv.set_interval
CALL mv.set_interval(name, interval_sec) YIELD name, interval_secSets the refresh interval in seconds for a materialized view.
CALL mv.set_interval("top_users", 3600) YIELD name, interval_sec RETURN name, interval_sec;Particle Physics
High-energy physics functions for particle detector analysis, event reconstruction, and kinematic calculations.
particle.aplanarity
particle.aplanarity(px_list, py_list, pz_list) -> floatComputes the aplanarity event-shape variable.
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.id = "evt_001" WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.aplanarity(px, py, pz) AS aplanarity;
particle.corrected_mass
particle.corrected_mass(visible_mass, pt_miss) -> floatReturns the corrected mass accounting for missing transverse momentum.
MATCH (e:Event {id: "evt_042"}) RETURN particle.corrected_mass(125.5, 47.3) AS corrected_mass;particle.delta_r
particle.delta_r(eta1, phi1, eta2, phi2) -> floatReturns the angular separation deltaR in eta-phi space.
MATCH (j1:Jet {id: "jet_1"}), (j2:Jet {id: "jet_2"}) RETURN particle.delta_r(j1.eta, j1.phi, j2.eta, j2.phi) AS separation_deltaR;particle.dira
particle.dira(momentum, flight_direction) -> floatReturns the direction angle cosine between momentum and flight direction.
MATCH (d:DecayChain {id: "B_decay_001"}) RETURN particle.dira([1.254, -0.387, 2.105], [0.891, -0.143, 0.883]) AS dira_cosine;particle.flight_distance
particle.flight_distance(prod_vertex, decay_vertex) -> floatReturns the flight distance between production and decay vertices.
MATCH (decay:Decay) RETURN particle.flight_distance([0.012, 0.008, 0.003], [3.450, 2.134, 1.678]) AS flight_distance_mm;
particle.fox_wolfram
particle.fox_wolfram(px_list, py_list, pz_list, order) -> floatComputes a Fox-Wolfram moment of the given order.
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.id = "evt_089" WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.fox_wolfram(px, py, pz, 2) AS fw_moment_2;
particle.impact_parameter
particle.impact_parameter(track, vertex) -> floatReturns the impact parameter distance of a track to a vertex.
MATCH (t:Track {id: "trk_234"}), (v:Vertex {id: "pv_001"}) RETURN particle.impact_parameter(t, v) AS impact_parameter_um;particle.invariant_mass
particle.invariant_mass(E1, px1, py1, pz1, E2, px2, py2, pz2) -> floatComputes the invariant mass of a two-particle system from 4-momenta.
RETURN particle.invariant_mass(125.5, 0.8, 2.1, 45.3, 127.2, -1.2, 1.9, 44.7) AS invariant_mass_GeV;
particle.invariant_mass_pt
particle.invariant_mass_pt(pt1, eta1, phi1, m1, pt2, eta2, phi2, m2) -> floatComputes invariant mass from pT, eta, phi, and mass.
MATCH (l1:Lepton {type: "muon"}), (l2:Lepton {type: "muon"}) WHERE l1.charge != l2.charge RETURN particle.invariant_mass_pt(28.3, 1.2, 2.45, 0.1057, 35.7, -0.8, 5.63, 0.1057) AS dilepton_mass_GeV;particle.missing_et
particle.missing_et(px_list, py_list) -> floatReturns the missing transverse energy from lists of px/py.
MATCH (e:Event {id: "evt_156"})-[:CONTAINS]->(p:Particle) WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py RETURN particle.missing_et(px, py) AS missing_energy_GeV;particle.pseudorapidity
particle.pseudorapidity(px, py, pz) -> floatReturns the pseudorapidity eta from a 3-momentum vector.
MATCH (j:Jet {id: "jet_45"}) RETURN particle.pseudorapidity(j.px, j.py, j.pz) AS eta;particle.rapidity
particle.rapidity(E, pz) -> floatReturns the rapidity y from energy and longitudinal momentum.
MATCH (p:Particle {id: "pho_78"}) RETURN particle.rapidity(p.energy, p.pz) AS rapidity_y;particle.sphericity
particle.sphericity(px_list, py_list, pz_list) -> floatComputes the sphericity event-shape variable.
MATCH (e:Event {id: "evt_203"})-[:CONTAINS]->(p:Particle) WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.sphericity(px, py, pz) AS sphericity_value;particle.thrust
particle.thrust(px_list, py_list, pz_list) -> floatComputes the thrust event-shape variable.
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.run_number = 2024 WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.thrust(px, py, pz) AS thrust_axis ORDER BY thrust_axis DESC LIMIT 1;
particle.transverse_momentum
particle.transverse_momentum(px, py) -> floatReturns the transverse momentum pT from px and py components.
MATCH (l:Lepton {type: "muon"}) WHERE l.px IS NOT NULL RETURN l.id, particle.transverse_momentum(l.px, l.py) AS pT_GeV ORDER BY pT_GeV DESC LIMIT 10;particle.vertex_chi2
particle.vertex_chi2(tracks_list) -> floatReturns the chi-squared of a vertex fit.
MATCH (v:Vertex {id: "sv_001"})-[:CONTAINS]->(t:Track) WITH v, COLLECT(t) AS tracks RETURN v.id, particle.vertex_chi2(tracks) AS chi2_fit WHERE chi2_fit < 10.0;Patent (Automated Test Generation)
Patent-protected automated test generation (ATG) functions for code coverage and mutation testing analysis.
atg_boundary_values
atg_boundary_values(node) -> listReturns suggested boundary values for automated test inputs.
MATCH (fn:Function {name: "process_transaction"}) RETURN fn.name, atg_boundary_values(fn) AS test_boundaries;atg_coverage
atg_coverage(node) -> floatReturns the automated test generation coverage score for a code node.
MATCH (fn:Function) RETURN fn.name, atg_coverage(fn) AS coverage_score ORDER BY coverage_score ASC LIMIT 20;
atg_mutation_score
atg_mutation_score(node) -> floatReturns the mutation testing score from ATG analysis.
MATCH (fn:Function {module: "core_engine"}) RETURN fn.name, atg_mutation_score(fn) AS mutation_score WHERE mutation_score > 0.85 ORDER BY mutation_score DESC;atg_test_priority
atg_test_priority(node) -> floatReturns a priority score for which tests should be generated first.
MATCH (fn:Function {in_hot_path: true}) RETURN fn.name, atg_test_priority(fn) AS priority ORDER BY priority DESC LIMIT 15;Physics
Physics functions for ballistics, orbital mechanics, oceanography, and relativistic calculations.
mil.ballistic_range
mil.ballistic_range(velocity_mps, angle_deg) -> floatComputes the ballistic range in meters for a projectile.
RETURN mil.ballistic_range(850.0, 35.0) AS range_meters;
mil.radar_horizon
mil.radar_horizon(antenna_height_m, target_height_m?) -> floatReturns the radar horizon distance in meters.
MATCH (r:Radar {id: "radar_01"}) RETURN mil.radar_horizon(45.0, 8000.0) AS horizon_km UNION RETURN mil.radar_horizon(45.0) AS horizon_sea_level;ocean.drift
ocean.drift(lat, lon, current_speed, current_dir, time_sec) -> [lat, lon]Predicts position after drifting in an ocean current.
MATCH (s:Ship {id: "SS_001"}) RETURN ocean.drift(s.lat, s.lon, 1.2, 45.0, 3600.0) AS drifted_position;ocean.sound_speed
ocean.sound_speed(temp_c, salinity_psu, depth_m) -> floatReturns the speed of sound in seawater in m/s using the UNESCO formula.
RETURN ocean.sound_speed(15.5, 35.0, 1000.0) AS sound_speed_ms;
orbital.escape_velocity
orbital.escape_velocity(radius_m, mass_kg) -> floatReturns the escape velocity in m/s at a given radius from a body.
WITH 6.371e6 AS earth_radius_m, 5.972e24 AS earth_mass_kg RETURN orbital.escape_velocity(earth_radius_m, earth_mass_kg) AS escape_velocity_ms;
orbital.hohmann
orbital.hohmann(r1_m, r2_m, mass_kg) -> mapComputes delta-v values for a Hohmann transfer orbit between two radii.
WITH 6.678e6 AS leo_m, 4.224e7 AS geo_m, 5.972e24 AS earth_mass_kg RETURN orbital.hohmann(leo_m, geo_m, earth_mass_kg) AS transfer_deltav;
orbital.period
orbital.period(semi_major_axis_m, central_mass_kg) -> floatReturns the orbital period in seconds from Kepler's third law.
WITH 4.224e7 AS geostationary_sma_m, 5.972e24 AS earth_mass_kg RETURN orbital.period(geostationary_sma_m, earth_mass_kg) / 86400.0 AS period_days;
orbital.velocity
orbital.velocity(radius_m, central_mass_kg) -> floatReturns the circular orbital velocity in m/s at a given radius.
WITH 6.678e6 AS leo_orbit_m, 5.972e24 AS earth_mass_kg RETURN orbital.velocity(leo_orbit_m, earth_mass_kg) AS leo_velocity_ms;
physics.gps_correction
physics.gps_correction(orbit_radius_m, velocity_mps) -> floatReturns the relativistic clock correction in seconds per second for GPS.
WITH 2.662e7 AS gps_orbit_radius_m, 3874.0 AS gps_velocity_ms RETURN physics.gps_correction(gps_orbit_radius_m, gps_velocity_ms) AS relativistic_correction;
physics.lorentz
physics.lorentz(velocity_mps) -> floatReturns the Lorentz factor gamma for a given velocity.
WITH 0.9 * 299792458 AS relativistic_velocity_ms RETURN physics.lorentz(relativistic_velocity_ms) AS gamma_factor;
physics.time_dilation
physics.time_dilation(velocity_mps) -> floatReturns the Lorentz time dilation factor at a given velocity.
WITH 10000000.0 AS high_velocity_ms RETURN physics.time_dilation(high_velocity_ms) AS time_dilation_factor;
Predicate Functions
Type and existence check functions for validation and filtering.
isempty
isempty(value) -> boolReturns true if a list, map, or string is empty.
MATCH (u:User {id: 1})
WHERE isempty(u.tags)
RETURN u.name, u.tags AS empty_tagsisinf
isinf(x) -> boolReturns true if the value is positive or negative infinity.
MATCH (r:Reading {sensor_id: 'A1'})
WHERE NOT isinf(r.temperature)
RETURN r.timestamp, r.temperature
LIMIT 10isnan
isnan(x) -> boolReturns true if the value is NaN.
MATCH (m:Measurement) WHERE isnan(m.calibration_factor) RETURN m.id, m.device, m.calibration_factor AS invalid_value ORDER BY m.timestamp DESC
ProcessFlow Functions
Traverse and validate flow graphs, state machines, and decision trees.
xray.flow_trace
CALL xray.flow_trace(start_label, start_prop, start_val, max_depth) YIELD step, depth, node_name, node_type, condition, actionTraces execution paths through flow graphs (decision trees, state machines). Follows FLOW_EDGE, NEXT, DECISION, and TRANSITION edges.
CALL xray.flow_trace('LoanApproval', 'loan_id', '12345', 5)
YIELD step, depth, node_name, node_type, condition, action
RETURN step, depth, node_name, node_type, condition, action
ORDER BY stepxray.flow_validate
CALL xray.flow_validate(flow_name) YIELD flow_name, node_count, start_nodes, end_nodes, orphan_nodes, valid, issuesValidates flow graph structure: checks for start/end nodes, orphans, and connectivity.
CALL xray.flow_validate('OrderProcessing')
YIELD flow_name, node_count, valid, issues
WHERE NOT valid
RETURN flow_name, node_count, issuesProtocol Functions (xrayProtocol)
Wire format reference for xrayProtocol v1 (native transport on port 7689). All messages are 8-byte header + payload.
XRAYPROTOCOL_FRAME
Frame: [u32 payload_length][u8 msg_type][u8 flags][u16 query_id][payload...]Every xrayProtocol message is an 8-byte header + payload. payload_length excludes the header. flags bit 0 = LZ4 compression. query_id for multiplexing. Default port: 7689.
HELLO Frame: [u32 0x0000002A] // payload length: 42 bytes [u8 0x01] // msg_type: HELLO [u8 0x00] // flags: no compression [u16 0x0001 LE] // query_id: 1 [payload: 42 bytes] // auth + version
XRAYPROTOCOL_HELLO
HELLO (0x01): [u16 version=1][u16 capabilities][u32 token_len][token_bytes][u32 db_len?][db_bytes?]Client→Server. auth_token is 'user:password' UTF-8. database_name is optional (empty = default). Server responds with HELLO_OK (0x02).
HELLO payload (frame msg_type=0x01): [u16 0x0001 LE] // version: 1 [u16 0x0007 LE] // capabilities: PROFILE|EXPLAIN|COMPRESSED [u32 0x0000000C LE] // token_len: 12 [12 bytes: "admin:pass"] [u32 0x00000000 LE] // db_len: 0 (use default) // Server responds: HELLO_OK (0x02) frame
XRAYPROTOCOL_EXECUTE
EXECUTE (0x03): [u8 language][u32 query_len][query_bytes][u32 param_count][params...][u32 options]Client→Server. language: 0=Cypher, 1=GFQL. params: each is String name + typed value. options: bitmask (1=PROFILE, 2=EXPLAIN, 4=READ_ONLY). Server responds: SCHEMA(0x04) + BATCH(0x05)* + COMPLETE(0x06), or ERROR(0x07).
EXECUTE payload (frame msg_type=0x03): [u8 0x00] // language: Cypher [u32 0x0000001F LE] // query_len: 31 [31 bytes: "MATCH (n:User) RETURN n LIMIT 1"] [u32 0x00000000 LE] // param_count: 0 [u32 0x00000001 LE] // options: PROFILE // Server responds: SCHEMA, BATCH(es), COMPLETE
XRAYPROTOCOL_RESULTS
SCHEMA(0x04): [u16 col_count][col_defs...] | BATCH(0x05): [u32 row_count][rows...] | COMPLETE(0x06): [u32 total_rows][u32 exec_us][u32 compile_us]Server→Client result flow: one SCHEMA, zero or more BATCHes, one COMPLETE. Each value in BATCH: u8 type_tag + value bytes. Type tags match ColumnType enum (0x01=null, 0x02=bool, 0x03=int64, 0x04=double, 0x05=string).
SCHEMA + BATCH + COMPLETE payload chain:
SCHEMA (0x04): [u16 0x0001 LE] [u8 0x03] [u32 0x00000001] ['n']
// 1 column, type INT64 (0x03), name_len=1, name='n'
BATCH (0x05): [u32 0x00000001 LE] [u8 0x03] [i64 0x000000000000012C LE]
// 1 row, column type INT64, value=300
COMPLETE (0x06): [u32 0x00000001 LE] [u32 0x000003E8 LE] [u32 0x000000FA LE]
// total_rows=1, exec_time=1000 µs, compile_time=250 µsXRAYPROTOCOL_BULK
BULK_INSERT_NODES(0x21) / BULK_UPSERT_NODES(0x27): [u32 rows][u16 cols][col_defs...][col_data...]Columnar batch write. First start with BULK_INSERT_BEGIN(0x20). Upsert uses first property as key with 3-tier lookup: GID cache O(1), label-property index O(1), full scan O(N). End with BULK_INSERT_COMMIT(0x24). ACK: 0x25, Error: 0x26.
BULK_INSERT_NODES (0x21) payload: [u32 0x00000002 LE] // rows: 2 [u16 0x0003 LE] // cols: 3 (id, name, age) // Column definitions: [u8 0x05] [u32 0x00000002] ['id'] // STRING, name_len=2 [u8 0x05] [u32 0x00000004] ['name'] // STRING, name_len=4 [u8 0x03] [u32 0x00000003] ['age'] // INT64, name_len=3 // Column data (row 1): [u8 0x05] [u32 0x00000002] ['42'] [u8 0x05] [u32 0x00000004] ['Alice'] [u8 0x03] [i64 28] // Column data (row 2): [u8 0x05] [u32 0x00000002] ['43'] [u8 0x05] [u32 0x00000003] ['Bob'] [u8 0x03] [i64 35]
RAG/LLM Functions
Text processing, embeddings, semantic search, and context management for retrieval-augmented generation.
embed
embed(text) -> listGenerates a 384-dimensional vector embedding from text using ONNX or hash fallback.
MATCH (doc:Document {id: 'doc_001'})
SET doc.embedding = embed(doc.content)
RETURN doc.id, size(doc.embedding) AS dimensionsvector_dot_product
vector_dot_product(vec1, vec2) -> floatReturns the dot product of two numeric vectors.
WITH embed('machine learning') AS query_emb
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
WITH doc, vector_dot_product(query_emb, doc.embedding) AS similarity
RETURN doc.title, similarity
ORDER BY similarity DESC
LIMIT 5vector_normalize
vector_normalize(vec) -> listReturns the L2-normalized unit vector.
MATCH (doc:Document) WHERE doc.embedding IS NOT NULL SET doc.normalized_embedding = vector_normalize(doc.embedding) RETURN count(doc) AS documents_normalized
vector_norm
vector_norm(vec) -> floatReturns the L2 (Euclidean) norm of a vector.
WITH [3.0, 4.0] AS vec RETURN vector_norm(vec) AS magnitude
vector_scale
vector_scale(vec, scalar) -> listReturns a vector scaled by a scalar multiplier.
WITH [1.0, 2.0, 3.0] AS vec RETURN vector_scale(vec, 2.5) AS scaled
vector_add
vector_add(vec1, vec2) -> listReturns the element-wise sum of two vectors.
WITH [1.0, 2.0, 3.0] AS v1, [4.0, 5.0, 6.0] AS v2 RETURN vector_add(v1, v2) AS sum_vec
vector_dimension
vector_dimension(vec) -> integerReturns the number of dimensions in a vector.
MATCH (doc:Document) WHERE doc.embedding IS NOT NULL RETURN doc.id, vector_dimension(doc.embedding) AS dims LIMIT 1
bm25_score
bm25_score(query, document, doc_count, avg_doc_len) -> floatComputes the BM25 relevance score for a query against a document.
MATCH (doc:Document)
WITH doc,
'information retrieval' AS query,
1000 AS total_docs,
450 AS avg_length
RETURN doc.title,
bm25_score(query, doc.content, total_docs, avg_length) AS score
ORDER BY score DESC
LIMIT 10tf_idf
tf_idf(term, document, corpus_size, doc_freq) -> floatComputes the TF-IDF score for a term in a document.
WITH 'neural' AS term,
'deep learning requires neural networks' AS doc,
500 AS corpus_size,
42 AS docs_with_term
RETURN tf_idf(term, doc, corpus_size, docs_with_term) AS scoretext_similarity
text_similarity(s1, s2) -> floatReturns a 0-1 similarity score between two text strings.
WITH 'The quick brown fox' AS text1,
'A fast brown fox' AS text2
RETURN text_similarity(text1, text2) AS similaritylevenshtein_distance
levenshtein_distance(s1, s2) -> integerReturns the edit distance between two strings.
WITH 'kitten' AS s1, 'sitting' AS s2 RETURN levenshtein_distance(s1, s2) AS edit_distance
ngram_similarity
ngram_similarity(s1, s2, n?) -> floatReturns the n-gram similarity score between two strings.
WITH 'analysis' AS word1, 'analytics' AS word2 RETURN ngram_similarity(word1, word2, 3) AS trigram_sim
relevance_score
relevance_score(query_vec, doc_vec) -> floatReturns a composite relevance score between a query and document vector.
WITH embed('natural language processing') AS query_vec
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
WITH doc, relevance_score(query_vec, doc.embedding) AS score
RETURN doc.title, score
ORDER BY score DESC
LIMIT 5context_rank
context_rank(query, candidates) -> listRanks candidate texts by relevance to a query.
WITH 'What is machine learning?' AS query,
['ML is a type of AI',
'Cooking is fun',
'Algorithms enable machine learning'] AS candidates
RETURN context_rank(query, candidates) AS ranked_candidatesextract_keywords
extract_keywords(text, max_keywords?) -> listExtracts the most significant keywords from a text.
MATCH (article:Article {id: 'art_123'})
RETURN extract_keywords(article.body, 10) AS top_keywordstext_chunk_by_size
text_chunk_by_size(text, max_chars) -> listSplits text into chunks of at most max_chars characters.
MATCH (doc:Document {id: 'doc_001'})
WITH text_chunk_by_size(doc.content, 256) AS chunks
UNWIND chunks AS chunk
RETURN chunk AS text_chunk
LIMIT 5text_chunk_by_words
text_chunk_by_words(text, max_words) -> listSplits text into chunks of at most max_words words.
MATCH (doc:Document) WITH text_chunk_by_words(doc.content, 100) AS chunks UNWIND chunks AS chunk SET doc.chunks = chunks RETURN count(doc) AS documents_chunked
text_chunk_overlap
text_chunk_overlap(text, chunk_size, overlap) -> listSplits text into overlapping chunks of given size and overlap.
MATCH (manual:Manual {product: 'XR-2000'})
WITH text_chunk_overlap(manual.text, 512, 64) AS chunks
UNWIND chunks AS chunk
CREATE (chunk_node:Chunk {text: chunk})
RETURN count(chunk_node) AS chunks_createdestimate_tokens
estimate_tokens(text) -> integerEstimates the token count for a text string.
MATCH (prompt:Prompt {name: 'rag_context'})
RETURN estimate_tokens(prompt.template) AS token_countfits_in_context
fits_in_context(text, max_tokens) -> boolReturns true if the text fits within the given token limit.
MATCH (doc:Document) WHERE fits_in_context(doc.content, 2048) RETURN doc.title, estimate_tokens(doc.content) AS tokens LIMIT 20
truncate_to_tokens
truncate_to_tokens(text, max_tokens) -> stringTruncates text to fit within a token limit.
MATCH (doc:Document {id: 'doc_001'})
WITH truncate_to_tokens(doc.content, 1024) AS truncated
RETURN truncated AS limited_contextcontext_utilization
context_utilization(text, max_tokens) -> floatReturns the fraction of the context window used by the text.
MATCH (query:Query {id: 'q_789'})
WITH context_utilization(query.prompt, 4096) AS usage
WHERE usage > 0.8
RETURN query.id, usage AS context_percentformat_prompt
format_prompt(template, variables_map) -> stringSubstitutes variables into a prompt template string.
WITH {user: 'Alice', topic: 'quantum computing'} AS vars
RETURN format_prompt(
'Hello {user}, tell me about {topic}',
vars
) AS formatted_promptgraph_context
graph_context(node, depth?) -> stringReturns a textual context summary of a node and its neighborhood.
MATCH (person:Person {id: 'p_456'})
RETURN graph_context(person, 2) AS neighborhood_summaryword_count
word_count(text) -> integerReturns the number of words in a text string.
MATCH (article:Article) WHERE word_count(article.body) > 500 RETURN article.title, word_count(article.body) AS word_len ORDER BY word_len DESC LIMIT 10
sentence_count
sentence_count(text) -> integerReturns the number of sentences in a text string.
MATCH (summary:Summary) WHERE sentence_count(summary.content) BETWEEN 5 AND 20 RETURN summary.id, sentence_count(summary.content) AS sentences
readability_score
readability_score(text) -> floatReturns the Flesch-Kincaid readability score of a text.
MATCH (content:Content) WITH readability_score(content.body) AS readability WHERE readability < 12.0 RETURN content.id, readability AS flesch_kincaid_grade
text_fingerprint
text_fingerprint(text) -> integerReturns a fast hash fingerprint of a text string.
MATCH (doc:Document) RETURN doc.id, text_fingerprint(doc.content) AS content_hash LIMIT 100
reciprocal_rank_fusion
reciprocal_rank_fusion(rank_lists, k?) -> listMerges multiple ranked lists using reciprocal rank fusion.
WITH [['doc_1', 'doc_2', 'doc_3'],
['doc_2', 'doc_1', 'doc_4']] AS rankings
RETURN reciprocal_rank_fusion(rankings, 60) AS fused_rankingReactive Engine Functions (5)
engine.create_model
CALL engine.create_model(label, property, type, min_samples?, sigma_threshold?) YIELD label, property, type, statusCreates a statistical model bound to a (label, property) pair. The engine learns baselines from writes and detects deviations inline during SetProperty — O(K) per write, K bounded by schema. Types: envelope, exact, frequency, trend, seasonal, hybrid, auto. Model stored as :Model graph node for persistence and replication.
CALL engine.create_model("User", "age", "envelope", 10, 2.5) YIELD statusengine.drain_events
CALL engine.drain_events(limit?) YIELD vertex_gid, label_id, property_id, sigma, expected, actual, model_type, timestampDrains deviation events from the lock-free ring buffer. Events are consumed (not re-readable). Use for alerting, dashboards, or external system integration. Buffer holds 100K events; oldest evicted on overflow.
CALL engine.drain_events(50) YIELD vertex_gid, sigma, expected, actual
engine.drop_model
CALL engine.drop_model(label, property) YIELD label, property, statusDrops a statistical model. Deviation checks stop immediately. Removes the :Model graph node.
CALL engine.drop_model("User", "age") YIELD statusengine.show_model_types
CALL engine.show_model_types() YIELD type, fits, parameters, auto_detect_whenLists all available model types with parameters and auto-detection heuristics. Use this to understand which model type fits your data.
CALL engine.show_model_types() YIELD type, parameters
engine.show_models
CALL engine.show_models() YIELD label, property, type, status, sample_countLists all active models with current status (cold/learning/warm) and sample counts.
CALL engine.show_models() YIELD label, property, type, status
Scalar Functions (31)
degree
degree(node) -> integerReturns the total degree (in + out) of a node.
MATCH (n:User) RETURN n.name, degree(n) ORDER BY degree(n) DESC
endnode
endnode(edge) -> nodeReturns the end (target) node of a relationship.
MATCH (s:User)-[r:FOLLOWS]->(t) RETURN s.name, endnode(r) as target
frombytestring
frombytestring(string) -> valueDecodes a byte-encoded string back to a value.
RETURN frombytestring("SGVsbG8gV29ybGQ=") AS decoded_valuehead
head(list) -> valueReturns the first element of a list.
WITH [1, 2, 3, 4, 5] AS nums RETURN head(nums) AS first_element
id
id(node_or_edge) -> integerReturns the internal ID of a node or relationship.
MATCH (n:User) WHERE n.name = "Alice" RETURN id(n) AS user_id
indegree
indegree(node) -> integerReturns the number of incoming edges of a node.
MATCH (n:User) RETURN n.name, indegree(n) AS incoming_edges
json_extract
json_extract(json_string, path) -> valueExtracts a value from a JSON string using a path expression.
WITH '{"user": "Alice", "age": 30}' AS json_str RETURN json_extract(json_str, "$.user")last
last(list) -> valueReturns the last element of a list.
WITH [10, 20, 30, 40] AS values RETURN last(values) AS final_element
length
length(value) -> integerAlias for size(); returns length of a list, path, or string.
WITH [1, 2, 3, 4, 5] AS list RETURN length(list) AS list_size
outdegree
outdegree(node) -> integerReturns the number of outgoing edges of a node.
MATCH (n:User) RETURN n.name, outdegree(n) AS outgoing_edges
parse_float
parse_float(string) -> floatParses a string to a floating-point number.
WITH "3.14159" AS num_str RETURN parse_float(num_str) AS pi_value
parse_int
parse_int(string, radix?) -> integerParses a string to an integer with an optional radix.
WITH "FF" AS hex_str RETURN parse_int(hex_str, 16) AS decimal_value
properties
properties(node_or_edge) -> mapReturns a map of all properties on a node or relationship.
MATCH (n:User {name: "Bob"}) RETURN properties(n) AS user_propertiespropertysize
propertysize(entity, key) -> integerReturns the byte size of a specific property value.
MATCH (n:User) RETURN n.name, propertysize(n, "description") AS size_bytes
randomuuid
randomuuid() -> stringGenerates a random UUID v4 string.
RETURN randomuuid() AS new_uuid
size
size(value) -> integerReturns the size of a list, map, path, or string.
MATCH (n:User)-[r]->() RETURN n.name, size(r) AS rel_count
startnode
startnode(edge) -> nodeReturns the start (source) node of a relationship.
MATCH (s:User)-[r:FOLLOWS]->(t) RETURN startnode(r) as follower, s.name
timestamp
timestamp() -> integerReturns the current epoch time in milliseconds.
RETURN timestamp() AS current_time_ms
to_json
to_json(value) -> stringSerializes a value to a JSON string.
MATCH (n:User {name: "Carol"}) RETURN to_json(properties(n)) AS json_outputtoboolean
toboolean(value) -> boolConverts a value to a boolean.
WITH ["true", "false", "1", "0"] AS bool_strs RETURN [b in bool_strs | toboolean(b)]
tobooleanlist
tobooleanlist(list) -> listConverts all elements in a list to booleans.
WITH [[1, 0, 1]] AS int_list RETURN tobooleanlist(int_list[0])
tobytestring
tobytestring(value) -> stringConverts a value to a byte-encoded string.
RETURN tobytestring("Hello") AS encodedtoenum
toenum(type, value) -> enumConverts a type name and value to an enum.
RETURN toenum("Status", "ACTIVE") AS status_enumtofloat
tofloat(value) -> floatConverts a value to a floating-point number.
WITH ["1.5", "2.7", "3.9"] AS nums RETURN [n in nums | tofloat(n)]
tofloatlist
tofloatlist(list) -> listConverts all elements in a list to floats.
WITH [["1.2", "3.4"]] AS num_strs RETURN tofloatlist(num_strs[0])
tointeger
tointeger(value) -> integerConverts a value to an integer.
WITH "42" AS num_str RETURN tointeger(num_str) AS int_value
tointegerlist
tointegerlist(list) -> listConverts all elements in a list to integers.
WITH [["10", "20", "30"]] AS int_strs RETURN tointegerlist(int_strs[0])
type
type(edge) -> stringReturns the type name of a relationship.
MATCH (u:User)-[r:FOLLOWS]->(v) RETURN type(r) AS relationship_type
typeof
typeof(value) -> stringReturns the runtime type name of a value.
RETURN [typeof(1), typeof("text"), typeof([1,2,3]), typeof({a: 1})]uuid_generate
uuid_generate() -> stringGenerates a new UUID v4 string.
RETURN uuid_generate() AS generated_uuid
valuetype
valuetype(value) -> stringReturns the type name of any value as a string.
RETURN valuetype(42) AS int_type, valuetype("text") AS string_typeString Functions (35)
base64_decode
base64_decode(base64_string) -> valueDecodes a Base64 string back to bytes.
RETURN base64_decode("SGVsbG8gV29ybGQ=") AS decoded_textbase64_encode
base64_encode(value) -> stringEncodes a value as a Base64 string.
RETURN base64_encode("Hello World") AS encoded_textchar_length
char_length(string) -> integerReturns the number of characters in a string.
WITH "xrayGraphDB" AS text RETURN char_length(text) AS char_count
concat
concat(values...) -> stringConcatenates all arguments into a single string.
RETURN concat("Hello", " ", "World", "!") AS greetingcontains
contains(string, substring) -> boolReturns true if the string contains the given substring.
WITH "xrayGraphDB" AS text RETURN contains(text, "Graph") AS has_graph
ends_with_func
ends_with_func(string, suffix) -> boolReturns true if the string ends with the given suffix.
WITH "database.cpp" AS filename RETURN ends_with_func(filename, ".cpp") AS is_cpp_file
endswith
endswith(string, suffix) -> boolReturns true if the string ends with the given suffix.
WITH "query.cypher" AS file RETURN endswith(file, ".cypher") AS is_cypher
format_number
format_number(number, format) -> stringFormats a number according to a format pattern string.
RETURN format_number(1234.5678, "#,##0.00") AS formatted
hex_decode
hex_decode(hex_string) -> valueDecodes a hexadecimal string back to bytes.
RETURN hex_decode("48656c6c6f") AS decoded_hexhex_encode
hex_encode(value) -> stringEncodes a value as a hexadecimal string.
RETURN hex_encode("Hello") AS hex_encodedindexof
indexof(string, substring) -> integerReturns the zero-based index of the first occurrence of a substring, or -1.
WITH "xrayGraphDB" AS text RETURN indexof(text, "Graph") AS graph_pos
join
join(list, delimiter) -> stringJoins list elements into a single string with a delimiter.
WITH ["Alice", "Bob", "Carol"] AS names RETURN join(names, ", ") AS name_list
left
left(string, n) -> stringReturns the leftmost n characters of a string.
WITH "xrayGraphDB" AS text RETURN left(text, 4) AS first_four
lpad
lpad(string, length, pad_char?) -> stringLeft-pads a string to the specified length.
WITH "42" AS num RETURN lpad(num, 5, "0") AS padded_num
ltrim
ltrim(string) -> stringRemoves leading whitespace from a string.
WITH " leading spaces" AS text RETURN ltrim(text) AS trimmed
pad_number
pad_number(number, width, pad_char?) -> stringFormats a number as a zero-padded string of the given width.
RETURN pad_number(7, 4, "0") AS zero_padded
regex_match
regex_match(string, pattern) -> list|nullReturns regex capture groups if the pattern matches, or null.
WITH "user@example.com" AS email RETURN regex_match(email, "([^@]+)@(.+)") AS parts
repeat
repeat(string, count) -> stringReturns a string repeated the given number of times.
WITH "x" AS char RETURN repeat(char, 10) AS repeated_x
replace
replace(string, search, replacement) -> stringReplaces all occurrences of search with replacement in a string.
WITH "Hello World" AS text RETURN replace(text, "World", "xrayGraphDB") AS replaced
replace_string
replace_string(string, search, replacement) -> stringReplaces occurrences of a search string within a string.
WITH "old value" AS text RETURN replace_string(text, "old", "new") AS new_text
reverse
reverse(string) -> stringReturns the string reversed.
WITH "stressed" AS word RETURN reverse(word) AS reversed_word
reverse_string
reverse_string(string) -> stringReturns the string with characters in reverse order.
WITH "GraphDB" AS text RETURN reverse_string(text) AS backwards
right
right(string, n) -> stringReturns the rightmost n characters of a string.
WITH "xrayGraphDB" AS text RETURN right(text, 2) AS last_two
rpad
rpad(string, length, pad_char?) -> stringRight-pads a string to the specified length.
WITH "42" AS num RETURN rpad(num, 5, ".") AS right_padded
rtrim
rtrim(string) -> stringRemoves trailing whitespace from a string.
WITH "trailing spaces " AS text RETURN rtrim(text) AS trimmed
split
split(string, delimiter) -> listSplits a string by delimiter into a list of strings.
WITH "Alice,Bob,Carol" AS names RETURN split(names, ",") AS name_list
starts_with_func
starts_with_func(string, prefix) -> boolReturns true if the string starts with the given prefix.
WITH "xrayGraphDB" AS text RETURN starts_with_func(text, "xray") AS starts_x
startswith
startswith(string, prefix) -> boolReturns true if the string starts with the given prefix.
WITH "GraphQL" AS text RETURN startswith(text, "Graph") AS starts_g
substring
substring(string, start, length?) -> stringReturns a substring starting at the given index with optional length.
WITH "xrayGraphDB" AS text RETURN substring(text, 4, 5) AS substr
to_string
to_string(value) -> stringConverts any value to its string representation.
RETURN to_string(42) AS num_as_string, to_string([1,2,3]) AS list_as_string
tolower
tolower(string) -> stringConverts a string to lowercase.
WITH "XRAYGRAPHDB" AS text RETURN tolower(text) AS lowercase
tostring
tostring(value) -> stringConverts any value to its string representation.
RETURN tostring(true) AS bool_string, tostring(3.14) AS float_string
tostringornull
tostringornull(value) -> string|nullConverts a value to a string, returning null if conversion fails.
RETURN tostringornull(42) AS num_str, tostringornull(null) AS null_result
toupper
toupper(string) -> stringConverts a string to uppercase.
WITH "xrayGraphDB" AS text RETURN toupper(text) AS uppercase
trim
trim(string) -> stringRemoves leading and trailing whitespace from a string.
WITH " whitespace " AS text RETURN trim(text) AS trimmed_text
System Functions
System functions provide database diagnostics, user identification, and snapshot management.
assert
assert(condition, message?) -> boolThrows an exception if the condition is false. Useful for validating query results and enforcing invariants during execution.
MATCH (n:User) WHERE assert(n.age > 0, "Age must be positive") RETURN n.name, n.age
counter
counter(name, initial?) -> integerReturns an auto-incrementing counter for the given name. Each call increments the counter. Useful for generating sequential IDs or tracking operation counts.
CREATE (n:Task {id: counter("task_id", 1000), name: "Process payment", created: timestamp()}) RETURN n.id, n.nameexplain_analyze
explain_analyze(query_string) -> mapReturns execution plan analysis with cost estimates and row count predictions. Provides insights into query optimization and plan selection.
RETURN explain_analyze("MATCH (n:Product) WHERE n.price > 100 RETURN n LIMIT 10") AS plangethopscounter
gethopscounter(path) -> integerReturns the hop count from a variable-length path traversal. Counts the number of relationships in a path.
MATCH p = (src:Airport {code: "LAX"})-[:ROUTE*1..5]->(dst:Airport {code: "JFK"}) RETURN gethopscounter(p) AS num_hops, length(p) AS path_length LIMIT 1graph_diff
graph_diff(snapshot1, snapshot2) -> mapReturns the differences between two graph snapshots. Shows added, removed, and modified nodes and relationships.
WITH graph_snapshot("Account") AS snap1, graph_snapshot("Account") AS snap2 RETURN graph_diff(snap1, snap2) AS changesgraph_snapshot
graph_snapshot(label?) -> mapCaptures a point-in-time snapshot of the graph or a label subset. Returns a snapshot object that can be compared with other snapshots.
RETURN graph_snapshot("Transaction") AS txn_snapshot, graph_snapshot() AS full_snapshotroles
roles() -> listReturns the list of roles assigned to the current authenticated user. Used for role-based access control (RBAC) queries.
RETURN username() AS current_user, roles() AS assigned_roles
username
username() -> stringReturns the current authenticated username. Returns NULL if running without authentication.
MATCH (a:AuditLog) WHERE a.performed_by = username() AND a.timestamp > timestamp() - 86400000 RETURN COUNT(a) AS actions_last_24h
Unit Conversion Functions
Unit conversion functions handle standard measurements for distance, temperature, weight, and velocity—essential for aviation, logistics, and scientific applications.
c_to_f
c_to_f(c) -> floatAlias for celsius_to_fahrenheit. Converts a temperature from Celsius to Fahrenheit.
MATCH (s:Sensor {location: "LAX_Terminal1"}) WHERE s.temperature_c IS NOT NULL RETURN s.location, s.temperature_c, c_to_f(s.temperature_c) AS temperature_fcelsius_to_fahrenheit
celsius_to_fahrenheit(c) -> floatConverts a temperature from Celsius to Fahrenheit. Formula: (C × 9/5) + 32
RETURN celsius_to_fahrenheit(0) AS freezing, celsius_to_fahrenheit(25) AS room_temp, celsius_to_fahrenheit(100) AS boiling
convert
convert(value, from_unit, to_unit) -> floatConverts a value between compatible measurement units. Supports distance, weight, temperature, and velocity conversions.
MATCH (f:Flight {id: "UA123"}) RETURN f.id, f.cruise_altitude_ft, convert(f.cruise_altitude_ft, "feet", "meters") AS cruise_altitude_mf_to_c
f_to_c(f) -> floatAlias for fahrenheit_to_celsius. Converts a temperature from Fahrenheit to Celsius.
MATCH (w:WeatherReport) WHERE w.temp_f > 95 RETURN w.station_code, w.temp_f, f_to_c(w.temp_f) AS temp_c
fahrenheit_to_celsius
fahrenheit_to_celsius(f) -> floatConverts a temperature from Fahrenheit to Celsius. Formula: (F - 32) × 5/9
RETURN fahrenheit_to_celsius(32) AS freezing, fahrenheit_to_celsius(98.6) AS body_temp, fahrenheit_to_celsius(212) AS boiling
feet_to_meters
feet_to_meters(ft) -> floatConverts feet to meters. Factor: 1 foot = 0.3048 meters.
MATCH (a:Aircraft {model: "B787"}) RETURN a.model, a.wingspan_ft, feet_to_meters(a.wingspan_ft) AS wingspan_mfpm_to_mps
fpm_to_mps(fpm) -> floatConverts feet per minute to meters per second. Used for climb/descent rates in aviation.
MATCH (t:TrajectoryPoint) WHERE t.vertical_rate_fpm > 1000 RETURN t.timestamp, t.vertical_rate_fpm, fpm_to_mps(t.vertical_rate_fpm) AS vertical_rate_mps
ft_to_m
ft_to_m(ft) -> floatAlias for feet_to_meters. Converts feet to meters.
MATCH (r:Runway {airport: "JFK"}) RETURN r.name, r.length_ft, ft_to_m(r.length_ft) AS length_mkg_to_lb
kg_to_lb(kg) -> floatAlias for kilograms_to_pounds. Converts kilograms to pounds.
MATCH (c:Cargo {id: "CARGO_001"}) RETURN c.id, c.weight_kg, kg_to_lb(c.weight_kg) AS weight_lbkilograms_to_pounds
kilograms_to_pounds(kg) -> floatConverts kilograms to pounds. Factor: 1 kg = 2.20462 pounds.
MATCH (p:Person) WHERE p.weight_kg IS NOT NULL RETURN p.name, p.weight_kg, kilograms_to_pounds(p.weight_kg) AS weight_lbs ORDER BY p.weight_kg DESC LIMIT 5
kilometers_to_miles
kilometers_to_miles(km) -> floatConverts kilometers to miles. Factor: 1 km = 0.621371 miles.
MATCH (r:Route) WHERE r.distance_km > 5000 RETURN r.origin, r.destination, r.distance_km, kilometers_to_miles(r.distance_km) AS distance_mi
km_to_mi
km_to_mi(km) -> floatAlias for kilometers_to_miles. Converts kilometers to miles.
MATCH (s:Segment {type: "LON_to_BOS"}) RETURN s.type, s.distance_km, km_to_mi(s.distance_km) AS distance_milesknots_to_mps
knots_to_mps(knots) -> floatConverts knots to meters per second. Used for airspeed conversions in aviation. Factor: 1 knot = 0.51444 m/s.
MATCH (v:Vector {source: "ADS-B"}) WHERE v.speed_knots > 450 RETURN v.timestamp, v.speed_knots, knots_to_mps(v.speed_knots) AS speed_mpslb_to_kg
lb_to_kg(lb) -> floatAlias for pounds_to_kilograms. Converts pounds to kilograms.
MATCH (i:Item {warehouse: "PHX"}) RETURN i.name, i.weight_lb, lb_to_kg(i.weight_lb) AS weight_kgm_to_ft
m_to_ft(m) -> floatAlias for meters_to_feet. Converts meters to feet.
MATCH (b:Building {city: "New York"}) RETURN b.name, b.height_m, m_to_ft(b.height_m) AS height_ftmeters_to_feet
meters_to_feet(m) -> floatConverts meters to feet. Factor: 1 meter = 3.28084 feet.
MATCH (altimeter:Sensor) RETURN altimeter.id, altimeter.altitude_m, meters_to_feet(altimeter.altitude_m) AS altitude_ft
mi_to_km
mi_to_km(mi) -> floatAlias for miles_to_kilometers. Converts miles to kilometers.
MATCH (c:City) WHERE c.distance_from_airport_mi > 10 RETURN c.name, c.distance_from_airport_mi, mi_to_km(c.distance_from_airport_mi) AS distance_km
miles_to_kilometers
miles_to_kilometers(mi) -> floatConverts miles to kilometers. Factor: 1 mile = 1.60934 kilometers.
MATCH (route:Route {airline: "AA"}) WHERE route.distance_miles IS NOT NULL RETURN route.origin, route.destination, route.distance_miles, miles_to_kilometers(route.distance_miles) AS distance_kmmps_to_knots
mps_to_knots(mps) -> floatConverts meters per second to knots. Used for airspeed conversions in aviation. Factor: 1 m/s = 1.94384 knots.
MATCH (wind:WeatherData {station: "KJFK"}) RETURN wind.station, wind.wind_speed_mps, mps_to_knots(wind.wind_speed_mps) AS wind_speed_knotspounds_to_kilograms
pounds_to_kilograms(lb) -> floatConverts pounds to kilograms. Factor: 1 pound = 0.453592 kilograms.
MATCH (aircraft:Aircraft) WHERE aircraft.max_weight_lb IS NOT NULL RETURN aircraft.registration, aircraft.max_weight_lb, pounds_to_kilograms(aircraft.max_weight_lb) AS max_weight_kg
Procedure Reference
Complete reference for all 107 xrayGraphDB v4.9.4 procedures, organized by module prefix. Analytics procedures (PageRank, Betweenness, Triangle Count, Community Detection, etc.) are powered by xrayGraphDB's patent-pending high-performance graph engine.
db.* — Database Introspection (3 procedures)
CALL db.indexes()
db.indexes() :: (labelsOrTypes :: STRING, name :: STRING, properties :: STRING, type :: STRING)Lists all database indexes including label indexes, property indexes, and relationship type indexes. Returns metadata about each index including the properties it covers and its type (e.g., HASH, RANGE, TEXT).
CALL db.indexes() YIELD name, type, properties RETURN name, type, properties ORDER BY type;
CALL db.labels()
db.labels() :: (label :: STRING)Returns all vertex labels currently defined in the database. Each row represents a distinct label name used by one or more vertices in the graph.
CALL db.labels() YIELD label RETURN label ORDER BY label;
CALL db.relationshipTypes()
db.relationshipTypes() :: (relationshipType :: STRING)Returns all relationship types currently defined in the database. Each row represents a distinct relationship type used by one or more edges in the graph.
CALL db.relationshipTypes() YIELD relationshipType RETURN relationshipType ORDER BY relationshipType;
engine.* — Reactive Engine Management (5 procedures)
CALL engine.create_model()
engine.create_model(label :: STRING, property :: STRING, type :: STRING, min_samples :: INTEGER?, sigma_threshold :: FLOAT?) :: (label :: STRING, property :: STRING, status :: STRING, type :: STRING)Creates an anomaly detection model for a specific property on vertices with a given label. The model uses statistical analysis to identify outliers based on mean and standard deviation. Optional parameters control minimum training samples and detection sensitivity.
CALL engine.create_model('Aircraft', 'altitude', 'ZSCORE', 100, 2.5)
YIELD label, property, status, type
RETURN label, property, status, type;CALL engine.drain_events()
engine.drain_events(limit :: INTEGER?) :: (actual :: FLOAT, expected :: FLOAT, label_id :: INTEGER, model_type :: STRING, property_id :: INTEGER, sigma :: FLOAT, timestamp :: INTEGER, vertex_gid :: INTEGER)Retrieves pending anomaly detection events from the reactive engine. Returns detected anomalies with actual vs expected values, deviation metrics in standard deviations (sigma), and affected vertex information. Optional limit parameter controls maximum events returned per call.
CALL engine.drain_events(1000) YIELD vertex_gid, actual, expected, sigma, timestamp WHERE sigma > 3.0 RETURN vertex_gid, actual, expected, sigma, timestamp ORDER BY timestamp DESC;
CALL engine.drop_model()
engine.drop_model(label :: STRING, property :: STRING) :: (label :: STRING, property :: STRING, status :: STRING)Removes an anomaly detection model for a specific property on vertices with a given label. Stops all event detection for that model.
CALL engine.drop_model('Aircraft', 'altitude')
YIELD label, property, status
RETURN label, property, status;CALL engine.show_model_types()
engine.show_model_types() :: (auto_detect_when :: STRING, fits :: STRING, parameters :: STRING, type :: STRING)Lists all available anomaly detection model types supported by the reactive engine. Shows when each model type is automatically selected, what data distributions it fits best, and required parameters.
CALL engine.show_model_types() YIELD type, fits, parameters RETURN type, fits, parameters ORDER BY type;
CALL engine.show_models()
engine.show_models() :: (label :: STRING, property :: STRING, sample_count :: INTEGER, status :: STRING, type :: STRING)Lists all active anomaly detection models currently running in the reactive engine. Shows model type, training status, and sample count for each model.
CALL engine.show_models() YIELD label, property, status, sample_count WHERE status = 'ACTIVE' RETURN label, property, sample_count ORDER BY sample_count DESC;
mv.* — Materialized Views (6 procedures)
CALL mv.create()
mv.create(name :: STRING, query :: STRING) :: (name :: STRING, status :: STRING)Creates a new materialized view that stores query results for efficient refresh. The query is cached and can be refreshed on-demand or on a scheduled interval.
CALL mv.create('flight_stats',
'MATCH (f:Flight) RETURN COUNT(*) as total, AVG(f.altitude) as avg_alt')
YIELD name, status
RETURN name, status;CALL mv.drop()
mv.drop(name :: STRING) :: (name :: STRING, status :: STRING)Removes a materialized view and its cached results. The view name is no longer available for queries.
CALL mv.drop('flight_stats')
YIELD name, status
RETURN name, status;CALL mv.due()
mv.due() :: (name :: STRING, overdue_sec :: INTEGER, query :: STRING)Lists materialized views that are due for refresh based on their scheduled interval. Shows how many seconds each view is overdue for an update.
CALL mv.due() YIELD name, overdue_sec WHERE overdue_sec > 0 RETURN name, overdue_sec ORDER BY overdue_sec DESC;
CALL mv.list()
mv.list() :: (name :: STRING, query :: STRING)Lists all materialized views in the database with their underlying queries.
CALL mv.list() YIELD name, query RETURN name, query;
CALL mv.refresh()
mv.refresh(name :: STRING) :: (name :: STRING, query :: STRING, status :: STRING)Refreshes a materialized view by re-executing its query and updating the cached results.
CALL mv.refresh('flight_stats')
YIELD name, status
RETURN name, status;CALL mv.set_interval()
mv.set_interval(name :: STRING, interval_sec :: INTEGER) :: (interval_sec :: INTEGER, name :: STRING)Sets the auto-refresh interval (in seconds) for a materialized view. The view will be automatically refreshed at this interval.
CALL mv.set_interval('flight_stats', 300)
YIELD name, interval_sec
RETURN name, interval_sec;repl.* — Replication (2 procedures)
CALL repl.set_sync_policy()
repl.set_sync_policy(address :: STRING, mode :: STRING, targets :: STRING) :: (status :: STRING)Configures replication synchronization policy for a replica. Allows setting the replica address, sync mode (ASYNC, SYNC, SEMISYNC), and target databases for replication.
CALL repl.set_sync_policy('192.168.1.100:7687', 'SEMISYNC', 'main,analytics')
YIELD status
RETURN status;CALL repl.show_replicas()
repl.show_replicas() :: (acked_lsn :: INTEGER, address :: STRING, status :: STRING, sync_mode :: STRING, sync_targets :: STRING)Lists all configured replicas and their replication status. Shows each replica's address, sync mode, acknowledged log sequence number (LSN), and target databases.
CALL repl.show_replicas() YIELD address, status, sync_mode, acked_lsn RETURN address, status, sync_mode, acked_lsn;
ttl.* — TTL/Data Retention (1 procedure)
CALL ttl.delete_expired()
ttl.delete_expired(label :: STRING, timestamp_property :: STRING, max_age_days :: INTEGER, exemption_property :: STRING) :: (deleted_count :: INTEGER, exempt_count :: INTEGER, scanned_count :: INTEGER)Deletes vertices with a given label that are older than max_age_days based on a timestamp property. Respects an optional exemption property that can mark vertices to keep. Returns counts of vertices deleted, exempted, and scanned.
CALL ttl.delete_expired('LogEntry', 'timestamp', 90, 'keep_permanently')
YIELD deleted_count, exempt_count, scanned_count
RETURN deleted_count, exempt_count, scanned_count;xray.* — Graph Analytics
Compute-heavy procedures over the CSR-mmapped graph. Every procedure ships in Community-tier; results stream back via the same xrayProtocol BATCH frames as ordinary Cypher rows. Analytic procs back onto a per-tenant persistent scratch pool — the first call after a daemon restart pays a one-time sizing cost (typically a few seconds) and every subsequent call hits the warm path.
Betweenness Centrality — three procedures, customer picks the precision
Three BC variants are available; choose by what you actually need. All three share the same output schema (node_id, centrality, name, time_ms) so they drop into existing dashboards without re-mapping.
| Procedure | Best for | Friendster (3.6 B undirected edges) |
|---|---|---|
xray.betweenness_centrality_sampled |
Ranked BC distribution across the whole graph. Source-sampled — produces a graded centrality distribution. | Tens of seconds to minutes at ε=0.05 (k≈5,700 source-sampled BFSes). |
xray.betweenness_pair_sampled |
Triage / "is this vertex significant at all" + customer-knob output resolution. Pair-sampled — produces an integer-multiple ladder. | Sub-second to ~17 s depending on knobs. |
xray.betweenness_pair_sampled_adaptive |
Top-K leaderboard. Stops as soon as the top-K vertex set has stabilised across consecutive batches. | Faster than non-adaptive when top_k stabilises quickly; same fidelity caveat. |
If you want a real ranked BC distribution on a large graph: use betweenness_centrality_sampled. If you want sub-second triage with a precision knob: use betweenness_pair_sampled with target_buckets. If you only care about the top-K bridges: use betweenness_pair_sampled_adaptive.
CALL xray.betweenness_centrality_sampled()
xray.betweenness_centrality_sampled(epsilon :: FLOAT, delta :: FLOAT, label :: STRING) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)Source-sampled betweenness centrality with a formal (ε, δ) confidence bound. Samples k = ceil((log(N) + log(2/δ)) / (2ε²)) vertices uniformly at random as BFS sources; each BFS contributes counts to every vertex it reaches, so the resulting centrality column is graded — different vertices have different values across the full BC distribution. Cost is O(k·E). Use this when ranking matters more than wall time.
Defaults: ε = 0.05, δ = 0.05. Empty label means "every vertex".
CALL xray.betweenness_centrality_sampled(0.05, 0.05, '') YIELD node_id, centrality, name, time_ms RETURN node_id, centrality ORDER BY centrality DESC LIMIT 50;
betweenness_centrality_sampled(epsilon=0.05, delta=0.05, label='')
Output fidelity: graded — every emitted vertex's centrality reflects its expected fraction of all-pairs-shortest-paths weight. Use this proc when downstream consumers rank or threshold on the centrality value itself.
CALL xray.betweenness_pair_sampled()
xray.betweenness_pair_sampled(epsilon :: FLOAT, delta :: FLOAT, label :: STRING, target_buckets :: INT, max_k_multiplier :: INT) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)ABRA-style pair-sampled BC. Samples k = ceil(0.5·(VC + log(2/δ))/ε²) uniform-random vertex pairs and runs bidirectional BFS per pair; internal vertices on each shortest path receive +1. Sub-second on small-world graphs, but the per-vertex count is integer-valued so at low k the centrality column produces only a few distinct values.
Three customer knobs:
epsilon,delta— accuracy bound (defaults 0.05, 0.05). Every estimate is within ε of the true BC with probability ≥ 1−δ.target_buckets— minimum distinct centrality values the proc tries to emit (default 50). The proc auto-grows k beyond the (ε, δ) target until ≥ target_buckets distinct values are produced. Set to 0 to disable auto-grow (legacy behaviour).max_k_multiplier— hard cap on auto-grow as a multiple of the (ε, δ) target k (default 8, max 64). Bounds runtime even when the graph is too uniform to reach target_buckets.
If target_buckets can't be reached within max_k_multiplier × k_target samples, the daemon emits a single WARN log line with concrete advice ("graph appears too uniform; lower ε, or use betweenness_centrality_sampled for a graded distribution"). The (ε, δ) accuracy bound is the floor — auto-grow only ever samples MORE pairs than the bound requires; the accuracy guarantee is preserved verbatim.
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 0, 8) YIELD node_id, centrality, name, time_ms RETURN node_id, centrality ORDER BY centrality DESC LIMIT 50;
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 200, 16) YIELD node_id, centrality, name, time_ms RETURN node_id, centrality ORDER BY centrality DESC LIMIT 200;
betweenness_pair_sampled(epsilon=0.05, delta=0.05, label='', target_buckets=200)
Output fidelity caveat: at coarse precision (ε ≥ 0.05) on Friendster-scale graphs, the count column produces ~3-10 distinct values even with auto-grow at the maximum multiplier. This is mathematically correct (ABRA's per-vertex count is integer-valued and most vertices get 1-2 hits at small k) but is NOT a graded ranking. For graded ranking on large graphs, use xray.betweenness_centrality_sampled.
CALL xray.betweenness_pair_sampled_adaptive()
xray.betweenness_pair_sampled_adaptive(epsilon :: FLOAT, delta :: FLOAT, label :: STRING, top_k :: INT, stability_threshold :: FLOAT) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)kADABRA-flavored adaptive BC. Runs pair-sampled batches until the top_k vertex set stabilises across consecutive batches (Jaccard similarity ≥ stability_threshold), or the (ε, δ) target k is reached. The early-stop is on top-K membership, not on individual centrality values — same fidelity caveat as the non-adaptive variant.
Defaults: ε = 0.05, δ = 0.05, top_k = 50, stability_threshold = 0.95.
CALL xray.betweenness_pair_sampled_adaptive(0.05, 0.05, '', 50, 0.95) YIELD node_id, centrality, name, time_ms RETURN node_id, centrality ORDER BY centrality DESC LIMIT 50;
betweenness_pair_sampled_adaptive(epsilon=0.05, delta=0.05, label='', top_k=50, stability_threshold=0.95)
CALL xray.find_path_bidirectional()
xray.find_path_bidirectional(start_id :: INT, end_id :: INT, max_hops :: INT) :: (path_nodes :: STRING, hops :: INT, explored_nodes_fwd :: INT, explored_nodes_bwd :: INT)Bidirectional BFS shortest-path on undirected / symmetrised CSR. Forward and backward BFS expand toward each other; when they meet via direction-bitmap detection, the path is reconstructed. On small-world graphs at depth d, this visits roughly O(bd/2) vertices instead of O(bd) — five orders of magnitude fewer touches than unidirectional BFS at d ≈ 6.
Defaults: max_hops = 64.
Returns one row with the shortest path as a comma-separated list of GIDs from start to end, the hop count, and the number of vertices explored from each direction (useful for tuning).
CALL xray.find_path_bidirectional(81306110, 20676652, 8) YIELD path_nodes, hops, explored_nodes_fwd, explored_nodes_bwd RETURN path_nodes, hops, explored_nodes_fwd, explored_nodes_bwd;
find_path_bidirectional(start_id=81306110, end_id=20676652, max_hops=8)
Bench-team recipe — running BC the way we ran it
The 2026-05-05 sub-second BC validation on Friendster used these knobs and queries. Reproduce by running on the same dataset; numbers will vary by hardware.
// Each pair-sampled call seeds its RNG from (worker_id, batch_id, num_verts). // Three runs at the same knob settings → bit-identical top-K. Use this to // verify a fresh build doesn't silently change behaviour. CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 50, 8) YIELD node_id, centrality RETURN node_id, centrality ORDER BY centrality DESC, node_id ASC LIMIT 200;
// Sub-second indicator (3-10 distinct values on Friendster): CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 0, 8) YIELD node_id, centrality RETURN node_id ORDER BY centrality DESC LIMIT 50; // Ranked distribution (slower, graded centrality): CALL xray.betweenness_centrality_sampled(0.05, 0.05, '') YIELD node_id, centrality RETURN node_id, centrality ORDER BY centrality DESC LIMIT 50;
Production port: always 7689 on .187. Soak ports 17689 (TSAN) and 27689 (ASAN) run separate binaries — never validate correctness against soak ports.
xg.* — Module Management (15 procedures)
CALL xg.builtin_functions()
xg.builtin_functions() :: (category :: STRING, description :: STRING, name :: STRING, signature :: STRING)Lists all built-in Cypher functions available in xrayGraphDB, including aggregation, scalar, string, list, and mathematical functions. Shows function signature and category.
CALL xg.builtin_functions() YIELD name, category, signature WHERE category = 'String' RETURN name, signature ORDER BY name;
CALL xg.create_module_file()
xg.create_module_file(filename :: STRING, content :: STRING) :: (path :: STRING)Creates a new module file (Cypher or GFQL custom function) with the given content. Returns the full path to the created file.
CALL xg.create_module_file('my_functions.cypher',
'FUNCTION my_sqrt(x) RETURN sqrt(abs(x))')
YIELD path
RETURN path;CALL xg.delete_module_file()
xg.delete_module_file(path :: STRING) :: ()Deletes a module file from the module directory. The file is removed and no longer available.
CALL xg.delete_module_file('modules/my_functions.cypher');CALL xg.functions()
xg.functions() :: (description :: STRING, is_editable :: BOOLEAN, mode :: STRING, name :: STRING, path :: STRING, signature :: STRING)Lists all custom functions loaded from module files. Shows function name, signature, path, and whether it is editable.
CALL xg.functions() YIELD name, path, signature WHERE is_editable = true RETURN name, path, signature;
CALL xg.get_module_file()
xg.get_module_file(path :: STRING) :: (content :: STRING)Retrieves the full content of a module file by its path.
CALL xg.get_module_file('modules/my_functions.cypher')
YIELD content
RETURN content;CALL xg.get_module_files()
xg.get_module_files() :: (is_editable :: BOOLEAN, path :: STRING)Lists all module files currently available, showing paths and whether each is editable.
CALL xg.get_module_files() YIELD path, is_editable RETURN path, is_editable;
CALL xg.load()
xg.load(module_name :: STRING) :: ()Loads a specific module by name, making its functions and procedures available for execution.
CALL xg.load('my_functions');CALL xg.load_all()
xg.load_all() :: ()Loads all available modules, making all custom functions and procedures available for execution.
CALL xg.load_all();
CALL xg.plugins_disable()
xg.plugins_disable(name :: STRING) :: (message :: STRING, success :: STRING)Disables a plugin, preventing it from running. The plugin state is changed to DISABLED.
CALL xg.plugins_disable('xray-vision')
YIELD success, message
RETURN success, message;CALL xg.plugins_enable()
xg.plugins_enable(name :: STRING) :: (message :: STRING, success :: STRING)Enables a plugin, allowing it to run. The plugin state is changed to ENABLED.
CALL xg.plugins_enable('xray-vision')
YIELD success, message
RETURN success, message;CALL xg.plugins_license()
xg.plugins_license(name :: STRING, key :: STRING) :: (message :: STRING, success :: STRING)Applies a license key to a plugin, enabling licensed features or extending the license expiry date.
CALL xg.plugins_license('xray-vision', 'YOUR_LICENSE_KEY_HERE')
YIELD success, message
RETURN success, message;CALL xg.plugins_list()
xg.plugins_list() :: (author :: STRING, crash_count :: INTEGER, description :: STRING, display_name :: STRING, last_error :: STRING, license_expires_at :: INTEGER, license_issued_to :: STRING, license_tier :: STRING, licensed :: STRING, name :: STRING, pid :: INTEGER, state :: STRING, type :: STRING, version :: STRING)Lists all installed plugins with their metadata including author, version, license status, crash count, and current state (ENABLED, DISABLED, ERROR).
CALL xg.plugins_list() YIELD name, version, state, licensed RETURN name, version, state, licensed ORDER BY name;
CALL xg.plugins_revoke()
xg.plugins_revoke(name :: STRING) :: (message :: STRING, success :: STRING)Revokes the license from a plugin, reverting it to unlicensed state. The plugin continues to run but with limited functionality.
CALL xg.plugins_revoke('xray-vision')
YIELD success, message
RETURN success, message;CALL xg.plugins_scan()
xg.plugins_scan() :: (message :: STRING, success :: STRING)Scans the plugins directory for new or updated plugins and reloads them.
CALL xg.plugins_scan() YIELD success, message RETURN success, message;
CALL xg.procedures()
xg.procedures() :: (description :: STRING, is_editable :: BOOLEAN, is_write :: BOOLEAN, mode :: STRING, name :: STRING, path :: STRING, signature :: STRING)Lists all custom procedures loaded from module files. Shows procedure name, signature, path, editability, and whether it performs write operations.
CALL xg.procedures() YIELD name, path, is_write WHERE is_write = true RETURN name, path;
CALL xg.transformations()
xg.transformations() :: (is_editable :: BOOLEAN, name :: STRING, path :: STRING)Lists all transformation definitions available in the database. Transformations are special operations for bulk graph modifications.
CALL xg.transformations() YIELD name, path RETURN name, path;
CALL xg.update_module_file()
xg.update_module_file(path :: STRING, content :: STRING) :: ()Updates the content of an existing module file. The file is modified in-place and changes are immediately available.
CALL xg.update_module_file('modules/my_functions.cypher',
'FUNCTION my_sqrt(x) RETURN sqrt(abs(x) + 1)');CALL xg.xray_vision_builtin_functions()
xg.xray_vision_builtin_functions() :: (category :: STRING, description :: STRING, name :: STRING, signature :: STRING)Lists all XRay-Vision plugin functions. XRay-Vision plugin required. Shows available code analysis functions with categories and signatures.
CALL xg.xray_vision_builtin_functions() YIELD name, category WHERE category = 'CodeMetrics' RETURN name ORDER BY name;
Note: XRay-Vision plugin required.
GFQL Overview
GFQL (Graph Frame Query Language) is a dataframe-native query language for graph traversal and analysis. It is designed for data scientists and developers who prefer chainable, functional-style operations over declarative pattern matching.
GFQL queries run natively inside the xrayGraphDB engine (patent pending) alongside Cypher. They operate on the same in-memory graph as Cypher queries, with the same transaction isolation guarantees.
SET GFQL_CONTEXT
Before executing GFQL operations, set a query context that defines the working scope (labels, edge types, or property filters).
// Set context to all Function nodes SET GFQL_CONTEXT label='Function'; // Set context with edge filter SET GFQL_CONTEXT label='Module', edge_type='IMPORTS'; // Set context with property filter SET GFQL_CONTEXT label='Person', WHERE age > 25;
chain(), n(), e_forward(), e_reverse()
GFQL operations are chained together using a fluent API. The core primitives are:
| Function | Description | Example |
|---|---|---|
| chain() | Start a GFQL operation chain | chain() |
| n() | Select nodes (optionally filtered) | n(label='Person') |
| e_forward() | Traverse outgoing edges | e_forward(type='CALLS') |
| e_reverse() | Traverse incoming edges | e_reverse(type='CALLS') |
| .filter() | Filter current frame | .filter(complexity > 5) |
| .hop() | Multi-hop traversal | .hop(edge_type='CALLS', depth=3) |
| .select() | Project specific columns | .select('name', 'module') |
| .aggregate() | Group and aggregate | .aggregate(by='module', count='name') |
// Find high-complexity functions and their callees chain() .n(label='Function') .filter(complexity > 10) .e_forward(type='CALLS') .select('source.name', 'target.name'); // Multi-hop traversal chain() .n(label='Module', name='auth') .hop(edge_type='IMPORTS', depth=3) .select('name', '_hop_depth'); // Aggregate by module chain() .n(label='Function') .aggregate(by='module', count='name', avg_complexity='complexity');
Filter Predicates
GFQL supports the following predicates inside .filter() expressions:
| Operator | Description | Example |
|---|---|---|
| =, != | Equality / inequality | .filter(status = 'active') |
| >, <, >=, <= | Comparison | .filter(age >= 18) |
| AND, OR | Logical operators | .filter(age > 18 AND active = true) |
| NOT | Logical negation | .filter(NOT deleted) |
| IN | List membership | .filter(status IN ['active', 'pending']) |
| LIKE | Pattern matching (% wildcard) | .filter(name LIKE 'auth%') |
| IS NULL | Null check | .filter(email IS NULL) |
| IS NOT NULL | Not null | .filter(email IS NOT NULL) |
Bolt Protocol (port 7687)
Bolt is the primary protocol for xrayGraphDB. It supports the complete Cypher feature set including stored procedures (CALL...YIELD), DDL operations (CREATE INDEX, CREATE CONSTRAINT), EXPLAIN/PROFILE, and all query types. Use Bolt for application development and any operation that requires the full query engine.
xrayGraphDB is compatible with the entire Neo4j driver ecosystem. Any application using a Neo4j driver can connect without code changes.
Supported Bolt Versions
| Version | Status | Notes |
|---|---|---|
| Bolt v3 | Supported | Minimum version, used by older drivers |
| Bolt v4.x | Supported | Multi-database messages accepted (single-db only) |
| Bolt v5.0-5.6 | Supported | Full feature support including notifications |
Bolt TLS is supported. Pass the --bolt-cert-file and --bolt-key-file flags to enable encrypted connections.
Driver Compatibility
| Driver | Language | Minimum Version | Status |
|---|---|---|---|
| neo4j-python-driver | Python | 5.0+ | Tested |
| neo4j-driver (npm) | JavaScript | 5.0+ | Tested |
| neo4j-java-driver | Java | 5.0+ | Tested |
| neo4j-go-driver | Go | 5.0+ | Tested |
| Neo4j.Driver (NuGet) | C# / .NET | 5.0+ | Tested |
| py2neo | Python | 2021.1+ | Compatible |
| neomodel | Python | 5.0+ | Compatible |
Bolt Connection Examples
All Neo4j drivers connect the same way. The only requirement is to point the driver at the xrayGraphDB Bolt port.
# Unencrypted bolt://localhost:7687 # With TLS (if server has cert configured) bolt+s://xraygraphdb.example.com:7687 # Neo4j scheme (resolves to bolt://) neo4j://localhost:7687
Authentication uses the same credentials as xrayGraphDB user management. The default admin account is admin with the password you set during setup.
xrayProtocol (port 7689)
xrayProtocol is xrayGraphDB's native binary protocol (patent pending), optimized for high-throughput, columnar data streaming. It delivers up to 24x higher throughput than Bolt for large result sets.
Protocol Feature Comparison
| Feature | Bolt (7687) | xrayProtocol (7689) |
|---|---|---|
| MATCH / CREATE / MERGE / DELETE | Yes | Yes |
| RETURN with result streaming | Yes | Yes (columnar, 24x faster) |
| Parameters ($param) | Yes | Yes |
| Transactions (BEGIN/COMMIT) | Yes | Yes |
| CALL...YIELD (stored procedures) | Yes | Not yet |
| EXPLAIN / PROFILE | Yes | Not yet |
| CREATE INDEX / CONSTRAINT | Yes | Not yet |
| GFQL queries | Yes | Yes |
| LZ4 compression | No | Yes |
| Query pipelining | No | Yes |
| Neo4j driver compatible | Yes | No (native client) |
| Best for | Full feature set, stored procedures, DDL | Bulk analytics, large result sets, data pipelines |
When to Use xrayProtocol
- Bulk data export (millions of rows) — 24x throughput advantage
- Analytics queries returning large result sets
- Server-to-server data pipelines
- GFQL queries for graph analytics
When to Use Bolt
- Stored procedures — all CALL...YIELD operations require Bolt
- DDL operations — CREATE INDEX, CREATE CONSTRAINT, SHOW INDEX INFO
- EXPLAIN / PROFILE — query plan analysis
- Application development with existing Neo4j drivers
- Interactive queries from browser tools
# xrayProtocol URI xray://localhost:7689
xrayProtocol Message Types
xrayProtocol uses a binary message format with the following core message types:
| Message | Direction | Description |
|---|---|---|
| HELLO | Client to Server | Initiate connection with auth credentials |
| WELCOME | Server to Client | Confirm authentication and protocol version |
| RUN | Client to Server | Execute a query with optional parameters |
| COLUMNS | Server to Client | Column metadata for the result set |
| CHUNK | Server to Client | Columnar data batch (Arrow-compatible) |
| DONE | Server to Client | Query complete with summary statistics |
| ERROR | Server to Client | Error response with code and message |
| GOODBYE | Either | Close the connection gracefully |
Connection Lifecycle
An xrayProtocol session follows this sequence:
Client Server
| |
|------ HELLO (auth) ------------>|
|<----- WELCOME (version) --------|
| |
|------ RUN (query, params) ----->|
|<----- COLUMNS (metadata) -------|
|<----- CHUNK (data batch 1) -----|
|<----- CHUNK (data batch 2) -----|
|<----- ... |
|<----- DONE (summary) -----------|
| |
|------ RUN (next query) -------->|
|<----- ... |
| |
|------ GOODBYE ----------------->|
|<----- GOODBYE ------------------|
| |
Data is streamed in columnar CHUNK messages. Each CHUNK contains a batch of rows (default batch size: 8192). The client does not need to wait for all chunks before processing.
Bulk Insert (xrayProtocol)
Bulk insert bypasses Cypher parsing entirely for maximum write throughput. Instead of individual CREATE statements, clients stream columnar batches of nodes and edges directly into the storage engine.
| Message | Code | Direction | Description |
|---|---|---|---|
| BULK_INSERT_BEGIN | 0x20 | Client → Server | Open a bulk insert session |
| BULK_INSERT_NODES | 0x21 | Client → Server | Columnar node batch (labels + properties) |
| BULK_INSERT_EDGES | 0x22 | Client → Server | Columnar edge batch (from, to, type, properties) |
| BULK_INSERT_DELETE | 0x23 | Client → Server | Batch delete by ID list |
| BULK_INSERT_COMMIT | 0x24 | Client → Server | Commit bulk session |
| BULK_INSERT_ACK | 0x25 | Server → Client | Batch acknowledged (count + timing) |
| BULK_INSERT_ERROR | 0x26 | Server → Client | Batch error |
Node Batch Wire Format (0x21)
// All integers are little-endian. Strings are length-prefixed UTF-8. uint32 node_count // number of nodes in this batch uint32 prop_count // number of properties per node // Property name declarations [prop_count]: uint32 name_len char[] name // e.g., "fnid", "name", "complexity" // Per-node data [node_count]: [prop_count]: uint32 value_len char[] value // string-encoded property value uint32 label_count [label_count]: uint32 label_len char[] label // e.g., "Function", "File"
Edge Batch Wire Format (0x22)
uint32 edge_count uint32 prop_count [prop_count]: uint32 name_len char[] name [edge_count]: uint32 from_id_len char[] from_id // source node identifier (e.g., fnid) uint32 to_id_len char[] to_id // target node identifier uint32 type_len char[] type // edge type (e.g., "CALLS", "IMPORTS") [prop_count]: uint32 value_len char[] value
ACK Response Payload
uint32 count // number of items processed double milliseconds // server-side processing time
Session Flow
Client Server | | |------ HELLO (auth) ----------------->| |<----- HELLO_OK ----------------------| | | |------ BULK_INSERT_BEGIN ------------>| |<----- BULK_INSERT_ACK --------------| | | |------ BULK_INSERT_NODES (batch 1) -->| 1000 nodes |<----- BULK_INSERT_ACK {1000, 12ms} -| |------ BULK_INSERT_NODES (batch 2) -->| 1000 nodes |<----- BULK_INSERT_ACK {1000, 11ms} -| | | |------ BULK_INSERT_EDGES (batch 1) -->| 5000 edges |<----- BULK_INSERT_ACK {5000, 45ms} -| | | |------ BULK_INSERT_COMMIT ----------->| |<----- BULK_INSERT_ACK --------------| | |
tenantId property is automatically set from the authenticated session.
HA Replication
xrayGraphDB supports Primary-Replica high availability via delta streaming over xrayProtocol. Every write on the Primary is captured and streamed to Replicas in real time.
Setup
# Primary server xraygraphdb \ --replication-mode=primary \ --replication-replicas=replica-host:7690 # Replica server xraygraphdb \ --replication-mode=replica \ --replication-port=7690
How It Works
- Primary captures every mutation (vertex/edge create/delete, label add/remove, property set/remove) as a ReplicationDelta
- Deltas are batched and streamed to each Replica over TCP (port 7690)
- Replica applies deltas using name-based ID resolution (safe across independent NameIdMappers)
- Keepalive pings every 5 seconds detect dead connections with automatic reconnect
- Replica ACKs each batch with the Log Sequence Number (LSN) for lag monitoring
Replicated Operations
| Delta Type | Description |
|---|---|
| VERTEX_CREATE | New node inserted into the vertex store |
| VERTEX_DELETE | Node removed from the vertex store |
| EDGE_CREATE | New edge with adjacency list linking |
| EDGE_DELETE | Edge removed from adjacency lists |
| LABEL_ADD | Label assigned to node (name-based resolution) |
| LABEL_REMOVE | Label removed from node |
| PROPERTY_SET | Property value set on node (name-based resolution) |
| PROPERTY_REMOVE | Property removed from node |
| Flag | Default | Description |
|---|---|---|
| --replication-mode | standalone | standalone, primary, or replica |
| --replication-replicas | (empty) | Comma-separated replica addresses (host:port) |
| --replication-port | 7690 | Port for incoming replication connections (replica mode) |
Server Flags
xrayGraphDB is configured via command-line flags. Flags can also be set via environment variables using the format XRAY_FLAG_NAME (uppercase, underscores instead of hyphens).
# Command-line ./bin/xraygraphdb-wrapper --bolt-port=7687 --data-directory=/data # Environment variable equivalent export XRAY_BOLT_PORT=7687 export XRAY_DATA_DIRECTORY=/data ./bin/xraygraphdb-wrapper
Data & Storage Flags
| Flag | Default | Description |
|---|---|---|
| --data-directory | /var/lib/xraygraphdb | Directory for snapshots and WAL files |
| --storage-gc-cycle-sec | 30 | Garbage collection interval in seconds |
| --storage-snapshot-interval-sec | 300 | Automatic snapshot interval (0 to disable) |
| --storage-snapshot-on-exit | true | Take snapshot on graceful shutdown |
| --storage-recover-on-startup | true | Load latest snapshot on startup |
| --storage-snapshot-retention-count | 3 | Number of snapshots to keep |
| --storage-wal-enabled | true | Enable write-ahead logging |
Network Flags
| Flag | Default | Description |
|---|---|---|
| --bolt-port | 7687 | Port for the Bolt protocol |
| --bolt-address | 0.0.0.0 | Bind address for Bolt |
| --bolt-num-workers | (auto) | Number of Bolt worker threads |
| --bolt-cert-file | (none) | Path to TLS certificate for Bolt |
| --bolt-key-file | (none) | Path to TLS private key for Bolt |
| --xray-port | 7689 | Port for xrayProtocol |
| --xray-address | 0.0.0.0 | Bind address for xrayProtocol |
Snapshot & Persistence Flags
| Flag | Default | Description |
|---|---|---|
| --storage-snapshot-interval-sec | 300 | Seconds between automatic snapshots (0 = manual only) |
| --storage-snapshot-on-exit | true | Create snapshot during graceful shutdown |
| --storage-snapshot-retention-count | 3 | How many snapshot files to retain on disk |
| --storage-recover-on-startup | true | Automatically load latest snapshot on start |
| --storage-wal-enabled | true | Write-ahead log for crash recovery between snapshots |
| --storage-wal-file-size-kib | 20480 | Max WAL file size before rotation (KiB) |
Performance Flags
| Flag | Default | Description |
|---|---|---|
| --query-execution-timeout-sec | 180 | Max execution time per query (0 = no limit) |
| --plan-cache-size | 1024 | Number of compiled query plans to cache |
| --bolt-session-inactivity-timeout | 1800 | Idle session timeout in seconds |
| --storage-parallel-index-recovery | true | Rebuild indexes in parallel during recovery |
| --storage-parallel-schema-recovery | true | Rebuild schemas in parallel during recovery |
Auth Flags
| Flag | Default | Description |
|---|---|---|
| --auth-enabled | true | Enable authentication (set false for local dev only) |
| --auth-password-strength | 8 | Minimum password length |
| --auth-module-timeout-ms | 10000 | Timeout for external auth module calls |
--auth-enabled=false flag is for isolated development environments only.
License Flags
| Flag | Default | Description |
|---|---|---|
| --license-key | (none) | Path to license key JSON file |
| --organization-name | (none) | Licensed organization name (must match key) |
Without a license key, xrayGraphDB operates in Community mode with all core features available. A license unlocks commercial features (HA clustering, per-tenant encryption, RBAC). Reads, writes, and startup are never blocked by license status.
Persistence Model
xrayGraphDB supports two storage engines: mmap (data on NVMe, paged into RAM on demand) and default (all data in RAM). Both engines persist data to disk via snapshots and WAL. See Storage Engine Selection for configuration details.
Snapshots
Snapshots are full-state dumps of the graph written to the configured --data-directory. They are triggered by:
- Automatic interval (
--storage-snapshot-interval-sec, default 300s) - Graceful shutdown (if
--storage-snapshot-on-exit=true) - Manual trigger via admin CLI
Recovery on Startup
When --storage-recover-on-startup=true, the server loads the most recent snapshot on start, then replays any WAL entries created after that snapshot. This provides durability with minimal data loss.
kill -9 to stop xrayGraphDB. A forced kill skips the exit snapshot and may corrupt in-progress WAL writes. Always use SIGTERM, systemctl stop, or docker stop.
Snapshot File Layout
/var/lib/xraygraphdb/
snapshots/
snapshot-20260401T120000.bin
snapshot-20260401T115500.bin
snapshot-20260401T115000.bin
wal/
wal-000001.bin
wal-000002.bin
Memory Configuration
Since xrayGraphDB stores all data in RAM, memory planning is critical.
Memory Estimation
- Each node: ~100-300 bytes base + property values
- Each relationship: ~100-200 bytes base + property values
- Each index entry: ~50-100 bytes
- String properties: string length + 32 bytes overhead
- Vector properties (384-dim): ~1.5 KB per vector
As a rule of thumb, allocate 2x the estimated raw data size to account for indexes, query working memory, and GC overhead.
Docker Memory Limits
# Limit container to 16 GB docker run -d \ --name xraygraphdb \ --memory=16g \ -p 7687:7687 \ xraygraphdb:v4.9.4
User Management
xrayGraphDB supports built-in user management via Cypher commands. Users are authenticated against the internal user store.
// Create a new user CREATE USER analyst IDENTIFIED BY "<strong-password>"; // Change a user's password ALTER USER analyst SET PASSWORD "<new-password>"; // Drop a user DROP USER analyst; // List all users SHOW USERS;
admin user is created at first startup. Set a strong password immediately. All examples in this documentation use admin / <your-password> as the credentials.
Role Management
Roles define permissions that are granted to users. A user can have multiple roles. Permissions are additive across roles.
// Create a role CREATE ROLE analyst; // Grant read access on all labels GRANT READ ON LABELS * TO analyst; // Grant read access on specific label GRANT READ ON LABELS :Person TO analyst; // Deny write access DENY WRITE ON LABELS * TO analyst; // Assign a role to a user SET ROLE FOR analyst_user TO analyst; // Revoke a permission REVOKE READ ON LABELS :Secret FROM analyst; // Show all roles and their privileges SHOW PRIVILEGES; // Drop a role DROP ROLE analyst;
| Permission | Scope | Description |
|---|---|---|
| READ | LABELS, EDGE_TYPES | Read nodes/relationships of specified types |
| WRITE | LABELS, EDGE_TYPES | Create/update/delete specified types |
| CREATE | LABELS | Create nodes of specified labels |
| DELETE | LABELS | Delete nodes of specified labels |
| INDEX | GLOBAL | Create and drop indexes |
| AUTH | GLOBAL | Manage users and roles |
Admin CLI (xraygraph-admin)
The xraygraph-admin command-line tool provides administrative operations that cannot be performed through Cypher. It connects directly to the data directory and does not require a running server for some operations.
| Command | Description | Requires Running Server |
|---|---|---|
| --validate-license <file> | Verify license key signature, expiry, and organization | No |
| --admin-reset --auth-token=<epoch> | Reset admin password. Auth token is the Unix epoch from the server's first-start timestamp. | No |
| --verify-integrity --auth-token=<epoch> | Check snapshot and WAL integrity hashes | No |
| --repair --auth-token=<epoch> | Attempt automatic repair of corrupted snapshot segments | No |
# Validate a license key file xraygraph-admin --validate-license /path/to/license.json # Reset admin password (requires auth token from first-start log) xraygraph-admin \ --admin-reset \ --auth-token=1711929600 \ --data-directory=/var/lib/xraygraphdb # Verify data integrity xraygraph-admin \ --verify-integrity \ --auth-token=1711929600 \ --data-directory=/var/lib/xraygraphdb # Attempt repair of corrupted data xraygraph-admin \ --repair \ --auth-token=1711929600 \ --data-directory=/var/lib/xraygraphdb
--repair command modifies snapshot files. Always make a backup of the data directory before running repair.
Backup & Recovery
Backup xrayGraphDB by copying the snapshot files from the data directory. The safest approach is to stop the server, copy the files, and restart.
Online Backup (server running)
# Trigger a snapshot first # (connect via driver and run) # FREE MEMORY; -- triggers GC + snapshot # Then copy the latest snapshot cp /var/lib/xraygraphdb/snapshots/snapshot-latest.bin \ /backup/xraygraphdb-$(date +%Y%m%d).bin
Offline Backup (server stopped)
# Stop the server (creates exit snapshot) sudo systemctl stop xraygraphdb # Copy the entire data directory cp -r /var/lib/xraygraphdb /backup/xraygraphdb-$(date +%Y%m%d) # Restart sudo systemctl start xraygraphdb
Restore from Backup
# Stop the server sudo systemctl stop xraygraphdb # Replace data directory with backup rm -rf /var/lib/xraygraphdb cp -r /backup/xraygraphdb-20260401 /var/lib/xraygraphdb # Restart (will load the restored snapshot) sudo systemctl start xraygraphdb
License Management
xrayGraphDB uses Ed25519-signed JSON license keys. Community mode (no license) provides full query functionality with single-node deployment. Licensed mode unlocks HA clustering, per-tenant encryption (patent pending), and RBAC.
Install a License at Startup
./bin/xraygraphdb-wrapper \ --license-key=/path/to/license.json \ --organization-name="Your Company"
Validate a License Offline
xraygraph-admin --validate-license /path/to/license.json # Output: # Organization: Your Company # Expires: 2027-04-01 # Features: ha, encryption, rbac # Signature: VALID
License Key Format
{
"organization": "Your Company",
"issued": "2026-04-01T00:00:00Z",
"expires": "2027-04-01T00:00:00Z",
"features": ["ha", "encryption", "rbac"],
"signature": "<ed25519-signature>"
}
Plugin System Overview
xrayGraphDB supports a generic plugin system (patent pending) for extending the database with external data sources, analyzers, and connectors. Plugins run as isolated subprocesses via fork()+exec(), providing complete memory and library isolation from the core database process.
Key characteristics:
- Process isolation: Each plugin runs in its own process. A crashing plugin cannot take down the database.
- Library isolation: Plugins can bundle their own dependencies (including OpenSSL versions) without conflicting with the database.
- Database-backed licensing: Plugin licenses are stored in the database and managed via Cypher commands. No license files on disk.
- Hot management: Install, enable, disable, pause, and resume plugins at runtime without restarting the database.
- Auto-restart: Crashed plugins are automatically restarted with configurable backoff and max-retry limits.
Plugin Architecture
Plugins live in a directory under the xrayGraphDB data path (default: <data-dir>/plugins/, configurable via --plugins-dir). Each plugin is a subdirectory containing a plugin.json manifest and its binaries.
Directory Structure
plugins/
swim/
plugin.json # Manifest (name, version, type, entry point)
bin/
xraygraphdb-swim # Plugin executable
lib/
libsolclient.so # Plugin-specific dependencies
config/
swim.json # Plugin configuration
schema/
aircraft.cypher # Schema init (indexes, constraints)
Plugin Types
| Type | Communication | Description |
|---|---|---|
| data-source | Pipe (stdin/stdout) | Subprocess that writes data to the database via a pipe. The database reads serialized records and routes them to the appropriate storage handler. |
| analyzer | Shared library (dlopen) | In-process library that registers custom Cypher procedures. |
| connector | Bidirectional socket | Subprocess with two-way communication for request/response protocols. |
| extension | Shared library | In-process library that extends core database functionality. |
Lifecycle
Install (untar) → Scan → License → Enable → Running
↓
Pause ↔ Resume Stop → Disable
↓
Crash → Auto-restart (up to max_failures)
Installing Plugins
Plugins are distributed as tarballs. Extract into the plugins directory:
# Extract plugin into the plugins directory tar -xzf xraygraphdb-plugin-swim-1.0.0.tar.gz \ -C /var/lib/xraygraphdb/plugins/ # Tell the running database to rescan # (connect via any Cypher client) CALL xg.plugins_scan() YIELD message;
After scanning, the plugin appears in the list but is not yet running. You must activate it with a license key.
To remove a plugin, disable it first, then delete its directory:
-- Disable and revoke license CALL xg.plugins_revoke('swim') YIELD message;
rm -rf /var/lib/xraygraphdb/plugins/swim
Plugin Licensing
Plugin licenses are Ed25519-signed JSON keys, stored in the database (not on disk). This means:
- Licenses survive restarts (persisted in the system keyspace).
- Licenses can be managed remotely via any Cypher client.
- Enable/disable is instant — one command, no file management.
- Each plugin has an independent license with its own tier and expiration.
Activate a Plugin License
-- Store and validate a license key (provided by eMTAi) CALL xg.plugins_license( 'swim', '{"plugin":"swim","tier":"enterprise","issued_to":"Your Org","issued_at":1744700000,"expires_at":0,"signing_key_id":"xg-key-2026-001","signature":"..."}' ) YIELD success, message;
The database validates the Ed25519 signature, checks the tier requirement (the plugin declares its minimum tier in plugin.json), and checks expiration. If valid, the license is stored and the plugin moves to licensed state.
Enable After Licensing
-- Start the plugin subprocess CALL xg.plugins_enable('swim') YIELD success, message;
License Tiers
| Tier | Rank | Description |
|---|---|---|
| community | 1 | Free plugins. No license key required. |
| enterprise | 2 | Commercial plugins. Requires a valid enterprise license key. |
| dod | 3 | Government/military plugins. Requires a DOD-tier license key. |
A license with a higher tier can activate any plugin that requires a lower tier. For example, a dod license activates plugins that require enterprise or community.
expires_at is 0, the license never expires. Otherwise, it is an epoch timestamp in seconds. An expired license prevents the plugin from starting, but does not kill a running plugin mid-flight. The expiration is re-checked on each restart.
Management Commands
All plugin management is done via Cypher procedures in the xg module.
List All Plugins
CALL xg.plugins_list() YIELD name, version, state, licensed, license_tier, license_issued_to, license_expires_at, pid, crash_count;
Return Columns
| Column | Type | Description |
|---|---|---|
| name | String | Plugin identifier (matches directory name) |
| display_name | String | Human-readable name |
| version | String | Plugin version (semver) |
| description | String | Plugin description |
| author | String | Plugin author |
| type | String | data-source, analyzer, connector, extension |
| state | String | discovered, licensed, unlicensed, starting, running, paused, stopped, crashed, error |
| licensed | String | "true" or "false" |
| license_tier | String | Tier of the installed license |
| license_issued_to | String | Organization name from license |
| license_expires_at | Integer | Epoch seconds (0 = never) |
| pid | Integer | OS process ID (-1 if not running) |
| crash_count | Integer | Number of times the plugin has crashed since last enable |
| last_error | String | Last error message (empty if healthy) |
All Management Procedures
| Procedure | Arguments | Description |
|---|---|---|
xg.plugins_list() | none | List all discovered plugins with full details. |
xg.plugins_license(name, key) | name: String, key: String (JSON) | Store and validate a license key for a plugin. |
xg.plugins_enable(name) | name: String | Start a licensed plugin. Persists as enabled across restarts. |
xg.plugins_disable(name) | name: String | Stop a plugin. License is kept but plugin will not auto-start. |
xg.plugins_revoke(name) | name: String | Stop a plugin and delete its stored license. |
xg.plugins_scan() | none | Rescan the plugins directory for new or updated plugins. |
All management procedures return success (String: "true"/"false") and message (String) columns, except plugins_list which returns the detail columns above.
Example: Full Plugin Lifecycle
-- 1. Scan after installing a new plugin CALL xg.plugins_scan() YIELD message; -- 2. Check what's available CALL xg.plugins_list() YIELD name, state, licensed RETURN name, state, licensed; -- 3. Activate with license key (provided by eMTAi) CALL xg.plugins_license('swim', '{"plugin":"swim",...}') YIELD success; -- 4. Enable (starts the subprocess) CALL xg.plugins_enable('swim') YIELD success; -- 5. Pause temporarily (SIGSTOP) -- CALL xg.plugins_disable('swim') YIELD success; -- 6. Resume -- CALL xg.plugins_enable('swim') YIELD success; -- 7. Fully remove -- CALL xg.plugins_revoke('swim') YIELD message;
Plugin Manifest (plugin.json)
Every plugin must include a plugin.json file in its root directory. This manifest declares the plugin's identity, type, entry point, and requirements.
{
"name": "swim",
"display_name": "FAA SWIM Consumer",
"version": "1.0.0",
"description": "FAA SWIM native consumer for aviation data.",
"author": "eMTAi LLC",
"license_tier": "enterprise",
"type": "data-source",
"entry_point": "bin/xraygraphdb-swim",
"communication": "pipe",
"config_schema": "config/swim.json",
"schema_init": "schema/aircraft.cypher",
"requires": {
"xraygraphdb": "4.2.0",
"databases": ["swim"]
},
"health_check": {
"interval_seconds": 30,
"max_failures": 5,
"restart_delay_seconds": 10
}
}
Manifest Fields
| Field | Required | Description |
|---|---|---|
| name | Yes | Unique plugin identifier. Must match the directory name. |
| display_name | No | Human-readable name. Defaults to name. |
| version | No | Semver version string. |
| description | No | Short description shown in plugins_list. |
| author | No | Author or organization. |
| license_tier | No | Minimum required tier: community, enterprise, or dod. Default: community. |
| type | No | Plugin type: data-source, analyzer, connector, extension. Default: data-source. |
| entry_point | Yes | Path to executable, relative to the plugin directory. |
| communication | No | Communication method: pipe, socket, dlopen. Default: pipe. |
| config_schema | No | Path to config file, relative to plugin directory. Passed as argv[2] to the entry point. |
| schema_init | No | Cypher file to run on first start (e.g., CREATE INDEX). |
| requires.xraygraphdb | No | Minimum xrayGraphDB version. |
| requires.databases | No | List of database names the plugin needs created. |
| health_check.interval_seconds | No | How often to check plugin health. Default: 30. |
| health_check.max_failures | No | Maximum consecutive crashes before disabling. Default: 3. |
| health_check.restart_delay_seconds | No | Seconds to wait before restarting a crashed plugin. Default: 5. |
Developing Plugins
A data-source plugin is a standalone executable that writes serialized records to a pipe. The database launches it with:
# The database calls exec() with:
./bin/your-plugin <write_fd> [config_file_path]
Where write_fd is a file descriptor number for the write end of a pipe. The plugin writes length-prefixed binary messages to this fd:
Wire Format (Pipe Protocol)
For each message: [uint32_t payload_length] // little-endian, max 65536 bytes [payload_length bytes] // plugin-specific serialization
The database reads these messages in a dedicated reader thread and passes the raw bytes to a data callback. The callback deserializes the plugin-specific format and routes data to the appropriate storage handler.
Plugin Process Rules
- The plugin must not write to stdout — stdout is not connected. Use the pipe fd for data and stderr for logging.
- The plugin should handle
SIGTERMfor graceful shutdown. After 3 seconds, the database sendsSIGKILL. - The plugin can bundle any dependencies in its
lib/directory. The database setsLD_LIBRARY_PATHto include this directory beforeexec(). - The plugin runs with the same user as the database process.
- If the plugin exits unexpectedly, the database marks it as
crashedand auto-restarts afterrestart_delay_seconds, up tomax_failurestimes.
Minimal Plugin Example (C)
// my-sensor-plugin.c — writes sensor readings to xrayGraphDB #include <signal.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <unistd.h> static volatile int running = 1; void handle_sigterm(int sig) { running = 0; } int main(int argc, char **argv) { if (argc < 2) return 1; int write_fd = atoi(argv[1]); signal(SIGTERM, handle_sigterm); while (running) { // Build your serialized record uint8_t payload[] = { /* your data */ }; uint32_t len = sizeof(payload); // Write length prefix + payload write(write_fd, &len, 4); write(write_fd, payload, len); sleep(1); } close(write_fd); return 0; }
config/ and reference it as config_schema in the manifest. The database passes the full path to your config file as argv[2].
Supported Neo4j Syntax
xrayGraphDB supports the openCypher standard plus commonly-used Neo4j extensions. The following table summarizes compatibility.
| Feature | Status | Notes |
|---|---|---|
| MATCH / RETURN / WHERE | Full | openCypher standard |
| CREATE / MERGE / SET / REMOVE / DELETE | Full | openCypher standard |
| Variable-length paths | Full | Including shortestPath and allShortestPaths |
| Aggregation functions | Full | count, sum, avg, min, max, collect, percentile, stDev |
| List comprehensions | Full | Including reduce, filter, map |
| Pattern comprehensions | Full | [(n)-->(m) | m.name] |
| CASE expressions | Full | Simple and generic CASE |
| Named indexes (CREATE INDEX name FOR ...) | Full | Neo4j 4.x+ syntax |
| SHOW INDEX INFO | Full | |
| SHOW CONSTRAINT INFO | Full | |
| Explicit transactions (BEGIN/COMMIT/ROLLBACK) | Full | |
| EXPLAIN / PROFILE | Full | Query plan inspection |
| CALL procedures | Partial | Built-in procedures only; no APOC |
| Multi-database | Not supported | Single database per instance |
| Subqueries (CALL { ... }) | Not supported | Use WITH for query chaining |
| LOAD CSV | Not supported | Use driver-side bulk import instead |
Driver Compatibility
xrayGraphDB is wire-compatible with Neo4j Bolt protocol versions 3 through 5.6. Any driver that speaks Bolt can connect. The recommended driver versions are listed in the Driver Compatibility table.
Connection URI Schemes
| Scheme | Behavior |
|---|---|
| bolt:// | Direct unencrypted connection |
| bolt+s:// | Direct TLS connection (verify server cert) |
| bolt+ssc:// | Direct TLS connection (self-signed cert accepted) |
| neo4j:// | Routing driver (resolves to bolt://, no cluster routing) |
| neo4j+s:// | Routing driver with TLS |
neo4j:// scheme is accepted for compatibility. Since xrayGraphDB Community runs as a single instance, routing resolves to a direct connection. Cluster routing is available with a license.
Migration from Neo4j
Migrating from Neo4j to xrayGraphDB is straightforward for most applications.
Step 1: Export Data from Neo4j
Use the Neo4j APOC export or neo4j-admin dump to extract your data as Cypher statements.
# Using APOC in Neo4j to export Cypher statements # Run inside Neo4j Browser: # CALL apoc.export.cypher.all("/export/data.cypher", {}) # Or use neo4j-admin neo4j-admin database dump neo4j --to-path=/export/
Step 2: Import into xrayGraphDB
Feed the Cypher export statements into xrayGraphDB via a driver script.
from neo4j import GraphDatabase driver = GraphDatabase.driver( "bolt://localhost:7687", auth=("admin", "<your-password>") ) with open("data.cypher") as f: statements = f.read().split(";") with driver.session() as session: for stmt in statements: stmt = stmt.strip() if stmt: session.run(stmt) driver.close() print("Import complete")
Step 3: Update Connection String
Update your application's connection string from the Neo4j URI to the xrayGraphDB URI. No other code changes are needed if you use standard Cypher.
Differences to Be Aware Of
- No APOC: APOC procedures are not available. Replace with equivalent Cypher or driver-side logic.
- No multi-database: xrayGraphDB runs a single database per instance.
- No subqueries: Replace
CALL { ... }withWITHchaining. - No LOAD CSV: Import data using driver scripts or bulk loaders.
- Internal IDs: Node and relationship IDs may differ. Never rely on internal IDs for application logic.