🔒 xrayGraphDB is now licensed-access only. Install instructions on this page reference URLs that no longer serve binaries. Request licensed access →
Home / Documentation

xrayGraphDB Documentation

Complete reference for xrayGraphDB v4.x stable and v5.0.0-alpha preview. Covers installation, query language, functions, protocols, server configuration, and administration.

v5.0.0-alpha — What's New (Preview Channel)

v5.0.0-alpha is the early-adopter channel. Stable production deployments stay on v4.9.6 BETA until v5 GA. The alpha image ships several engine and ops changes that affect how customers connect, ingest, and deploy:

Install via downloads (.deb / .tar.gz / Docker). Deploy runbook + 5 known gotchas (apt-keyring, UID 999 bind-mount, mdadm wipefs, xgconsole HELLO v2, Community-vs-Enterprise gates) live in the repo at docs/operations/docker-deployment-runbook.md.

System Requirements

xrayGraphDB runs on 64-bit Linux, macOS, and Docker. Minimum and recommended specifications are listed below.

ComponentMinimumRecommended
CPU2 cores, x86_64 or ARM648+ cores, modern x86_64 (post-2014) or ARM64
RAM2 GB16 GB+ (in-memory engine)
Disk1 GB free (snapshots)SSD, 10 GB+ for snapshots
OSUbuntu 22.04+, Debian 12+, macOS 13+Ubuntu 24.04 LTS
DockerDocker Engine 24+Docker Engine 27+
KernelLinux 5.15+Linux 6.x
GPU (optional)None — CPU-only path is fully supportedNVIDIA, compute capability sm_70+ (Volta and newer), CUDA driver 12.x or later
GPU is optional. If you DO have an NVIDIA GPU, install the driver before the database and verify it with xg-gpu-tester — see GPU Setup below. Installing the database first on a host with a half-working GPU silently falls back to CPU and you do not get the analytics speedups.
Note: xrayGraphDB supports two storage engines. The mmap engine (recommended) stores data on NVMe and pages into RAM on demand — datasets can exceed physical RAM. The default (in-memory) engine keeps all data in RAM for lowest latency but requires sufficient memory. See Storage Engine Selection.

Host Kernel Tunables (required for production)

Stock Linux distributions ship kernel defaults that are too low for graph workloads. xrayGraphDB will start with the defaults but will crash within minutes on any non-trivial graph build (CSR pass 1 fragments jemalloc arenas; the kernel rejects munmap when the per-process VMA count exceeds the limit; jemalloc treats that as fatal). Set the following before loading datasets larger than a few million edges.

Why this matters — real incident. 2026-05-09 GPU bench server (NVIDIA RTX PRO 6000 Blackwell, 144 GB RAM, default Ubuntu 22.04): every bulk_import_file on Friendster (3.6 B edges) died at PASS1 ~60-90 s with <jemalloc>: Error in munmap(): Cannot allocate memory. The host had vm.max_map_count=65530 (the kernel default). Bumping to 1048576 let the build complete on the first try.

vm.max_map_count — required

Maximum number of memory-mapped regions per process. xrayGraphDB needs at least 262 144 for any graph beyond a few hundred million edges; 1 048 576 recommended for production. The kernel default is 65 530, which is too low. The same setting is documented as required by Elasticsearch, MongoDB, and ClickHouse.

Set live + persist
# Apply now (no reboot required)
sudo sysctl -w vm.max_map_count=1048576

# Persist across reboots
echo 'vm.max_map_count=1048576' | sudo tee /etc/sysctl.d/99-xraygraphdb.conf

# Verify
sysctl vm.max_map_count
# vm.max_map_count = 1048576
Docker / Kubernetes: vm.max_map_count is not a namespaced sysctl — it must be set on the host, not via --sysctl on docker run. If you are running xrayGraphDB in a container, run the sudo sysctl command above on the underlying host (or pass it through your cloud-init / DaemonSet).

Transparent Huge Pages — recommended

Leave THP at its kernel default (madvise on modern distros). xrayGraphDB explicitly opts in via madvise(MADV_HUGEPAGE) on its mmap'd files; setting THP to always enables it system-wide for all processes (small RSS overhead but no correctness issue), and setting it to never disables it including for our request. Do not change unless you know you need to.

Verify
cat /sys/kernel/mm/transparent_hugepage/enabled
# expected: always [madvise] never  (madvise selected)

Optional: file-max + nofile (heavy WAL / replication workloads)

The default per-process open-file limit on most distros is 1024 soft / 4096 hard, which is fine for low-concurrency installs but cramped on heavy replication or many-tenant deployments. Bump to 65 536 in the systemd drop-in (LimitNOFILE=65536) if you see EMFILE in the logs.

What the daemon checks at startup

The daemon reads /proc/sys/vm/max_map_count on every start and emits a [critical] log line if the value is below 65 536 (will-crash territory) and a [warn] if below 262 144. Look for these lines in journalctl -u xraygraphdb or docker logs immediately after startup — they include the exact remediation command. Silence is success.

GPU Setup (optional, before install)

Skip this section if you do not have an NVIDIA GPU. CPU-only installs are fully supported and use the same packages.

If your host has an NVIDIA GPU, install the driver before the database. The xrayGraphDB daemon probes for CUDA on first start and silently falls back to CPU if the driver is missing or broken — so a half-installed driver looks like a working install but never engages the GPU. Verify the driver with the standalone xg-gpu-tester tool before continuing.

1. Install the NVIDIA driver

Ubuntu 22.04 / 24.04
# Latest production driver branch (550 as of 2026-Q1; check `apt search nvidia-driver`)
sudo apt update
sudo apt install -y nvidia-driver-550 nvidia-utils-550
sudo reboot
# after reboot:
nvidia-smi   # should print driver version + GPU model + utilization
Rocky Linux 9 / RHEL 9
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf -y module install nvidia-driver:latest-dkms
sudo reboot
nvidia-smi
Debian 12
# Add non-free firmware and contrib repos first if you have not already.
sudo apt update
sudo apt install -y nvidia-driver firmware-misc-nonfree
sudo reboot

2. Verify with xg-gpu-tester

Download the standalone tester (less than 100 KB, no dependencies beyond glibc) and run it as root. Do this BEFORE installing xrayGraphDB so a failure is loud and obvious instead of hidden behind a silent CPU fallback.

Shell
# Download (replace amd64 with arm64 if applicable)
curl -fsSLO https://emtai-xray.emtailabs.com  # request licensed access
chmod +x xg-gpu-tester-amd64
sudo ./xg-gpu-tester-amd64

A healthy host prints something like:

Output
Driver API version: 12.4
CUDA devices: 1
  Device 0: NVIDIA RTX 4090
    Compute capability : sm_89
    SM count           : 128
    Total memory       : 23.99 GiB

Allocating 1 MiB on device 0... OK

[PASS] 1 CUDA device(s) visible. Primary: NVIDIA RTX 4090 (sm_89)

Verdict: GPU READY

Common failure modes and what to do:

Exit codeMeaningFix
1libcuda.so.1 not foundDriver isn't installed; install it (step 1) and reboot.
2cuInit() failedDriver is installed but the kernel module isn't loaded for the running kernel — reboot, or the user lacks /dev/nvidia* permission.
3Zero CUDA devices visibleCheck lspci | grep -i nvidia — the GPU may not be on the PCIe bus, or CUDA_VISIBLE_DEVICES is set empty.
51 MiB allocation refusedSome other process is using all GPU memory; nvidia-smi will show the process. Free it before installing the database.

Only proceed to install xrayGraphDB after xg-gpu-tester reports GPU READY. The same binary ships with the database (/usr/bin/xg-gpu-tester) so you can re-run it any time after install.

GPU acceleration in clusters — today vs. v5

If you run a multi-node cluster where some nodes have a GPU and others do not, it matters which node receives the analytics query:

Today (v4.x): Each node runs analytics on its own hardware against its own copy of the data. There is no automatic forwarding of GPU-eligible operators to a GPU-equipped peer.
  • CALL xg.pagerank(...) sent to a node with a GPU → CUDA path, fast.
  • Same query sent to a node without a GPU → CPU path, slow — even if a sibling replica has a GPU sitting idle.
Operational pattern: route reads / analytics to the GPU-equipped node directly (env-var the client, or use a thin xrayProtocol-aware load balancer); route writes to MAIN. xray-vision already does this kind of host-aware routing.

Caveat: Replicas are read-only. Pure-read analytics work fine on a GPU-equipped replica. Analytics that materialize results back into the graph (e.g. SET p.rank = ... after PageRank) must run on MAIN, where the GPU may or may not be present.
Coming in v5: cluster-aware GPU auto-routing. Every node will advertise GPU presence in SHOW REPLICAS metadata. With --gpu-affinity=auto, MAIN's planner will detect GPU-eligible operators (PageRank, Betweenness, Triangle Count, BFS-over-CSR, ...) and forward them over RPC to a GPU-equipped peer, returning rows back through the original session. Read-only safety is enforced automatically; queries that mutate refuse with a clear error if the local MAIN lacks a GPU. This removes the need for client-side host pinning — you point the client at MAIN, and the cluster figures out where to actually run the kernel.

Installation: Docker

Docker is the fastest way to start xrayGraphDB. The image ships with sensible defaults — storage path, ports, mmap engine, AES-256-GCM tenant crypto. Pull it, bootstrap an admin on the first run, and you're up.

Before you run anything: raise vm.max_map_count on the host. Docker can't override host kernel sysctls — the daemon hard-fails below 65,536 and warns below 262,144. See Host Kernel Tunables above for the one-liner.

First run — bootstrap an admin

The daemon refuses connections without an admin user. Pass --init-admin-user and --init-admin-password on the first start to create one. These flags are bootstrap-only — they only create a user when no users exist, and become silent no-ops on every subsequent boot once the admin record is on disk.

Shell — first run only
# Pull (or `docker load < xraygraphdb-4.9.4-docker.tar.gz` for the offline tarball)
docker pull xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4

# First run — creates admin/<your-password> on initial boot
docker run -d \
  --name xraygraphdb \
  --restart unless-stopped \
  -p 7689:7689 \
  -v xraygraphdb-data:/var/lib/xraygraphdb \
  -v xraygraphdb-logs:/var/log/xraygraphdb \
  xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \
  --license-acknowledge-saved=true \
  --init-admin-user=admin \
  --init-admin-password=YourStrongPassword!23

# Wait ~5s, then confirm the daemon is listening
docker logs xraygraphdb | grep "xrayProtocol listening"
# Expected: "xrayProtocol listening on 0.0.0.0:7689 with 4 workers"

Security: rotate the bootstrap flags off after the first boot. Once the daemon confirms it's listening, the admin record is persisted in /var/lib/xraygraphdb/auth/ and the --init-admin-* flags do nothing on restart. But they remain visible in three places — docker inspect xraygraphdb (full Args[]), /var/lib/docker/containers/<id>/config.v2.json on the host, and ps aux inside the container. Recreate the container without the bootstrap flags so the password isn't sitting in process args:

Shell — every run after the first
docker stop xraygraphdb && docker rm xraygraphdb
docker run -d \
  --name xraygraphdb \
  --restart unless-stopped \
  -p 7689:7689 \
  -v xraygraphdb-data:/var/lib/xraygraphdb \
  -v xraygraphdb-logs:/var/log/xraygraphdb \
  xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \
  --license-acknowledge-saved=true

The named volumes xraygraphdb-data and xraygraphdb-logs persist the database state and logs across container recreations. The data path inside the image is /var/lib/xraygraphdb (matches the daemon's default --data-directory).

GPU acceleration (optional)

For PageRank, BFS, triangle count, K-core, Louvain, and label propagation on a CUDA-capable GPU, add --gpus all and ensure nvidia-container-toolkit is installed on the host. The daemon dlopens libcuda.so.1 at runtime from the host driver — no CUDA toolkit ships in the image. Without --gpus, GPU procs fall back to CPU paths cleanly.

Shell
docker run -d \
  --gpus all \
  --shm-size 10g \
  --name xraygraphdb \
  -p 7689:7689 \
  -v xraygraphdb-data:/var/lib/xraygraphdb \
  xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \
  --license-acknowledge-saved=true

docker logs xraygraphdb | grep "GpuKernelManager"
# Expected: "GpuKernelManager: CUDA context ready on device 0 (NVIDIA ...)"

Enterprise license (optional)

If you have a license file, mount it read-only at /etc/xraygraphdb/license.xglicense. The daemon auto-detects it on startup, validates the signature, and re-encrypts it at rest with the local storage epoch on first load:

Shell
docker run -d \
  --name xraygraphdb \
  -p 7689:7689 \
  -v xraygraphdb-data:/var/lib/xraygraphdb \
  -v /path/to/license.xglicense:/etc/xraygraphdb/license.xglicense:ro \
  xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4 \
  --license-acknowledge-saved=true

docker logs xraygraphdb | grep "License loaded"
# Expected: "License loaded: xg-ent-... tier=enterprise org=..."

Without a license, all xg.* community-tier procedures (PageRank, BFS, triangle count, betweenness, Louvain, etc.) run unrestricted. The license unlocks xray.* (xray-vision-specific procs) and the commercial xgated.* namespace.

Docker Compose

YAML — docker-compose.yml
services:
  xraygraphdb:
    image: xraygraphdb.emtailabs.com/xraygraphdb:v4.9.4
    restart: unless-stopped
    ports:
      - "7689:7689"
    volumes:
      - xraygraphdb-data:/var/lib/xraygraphdb
      - xraygraphdb-logs:/var/log/xraygraphdb
      # Optional: Enterprise license
      # - ./license.xglicense:/etc/xraygraphdb/license.xglicense:ro
    command:
      - --license-acknowledge-saved=true
      # --- FIRST RUN ONLY: uncomment for the first `docker compose up`,
      # --- then comment back out and `docker compose up -d --force-recreate`.
      # - --init-admin-user=admin
      # - --init-admin-password=YourStrongPassword!23

volumes:
  xraygraphdb-data:
  xraygraphdb-logs:
Shell
docker compose up -d
docker compose logs -f xraygraphdb | grep "xrayProtocol listening"

Verify the Server

Shell
# Container status
docker ps --filter name=xraygraphdb

# Daemon ready check — xrayProtocol on 7689 (Bolt is OFF by default; opt-in via --bolt-server-name)
docker logs xraygraphdb | grep "xrayProtocol listening"
# Expected: "xrayProtocol listening on 0.0.0.0:7689 with 4 workers"

# Test connection (Python xray_protocol_client; HELLO must carry a database name)
python3 -c 'import xray_protocol_client as xg
conn = xg.connect(host="localhost", port=7689,
                  auth_token="admin:YourStrongPassword!23",
                  database="xraygraphdb")
print(conn.execute_query("MATCH (n) RETURN count(n) AS n_count"))'

Installation: Linux

Two install paths on Ubuntu 24.04 LTS: the .deb package (recommended — auto-resolves runtime deps via apt) or the portable .tar.gz (everything bundled, runs against the system glibc).

.deb — Ubuntu / Debian

Shell
# 1. Download
wget https://emtai-xray.emtailabs.com  # request licensed access4.9.4_amd64.deb

# 2. Install — use `apt install -f` so apt auto-resolves the libgdal34t64 + python3 deps.
#    `dpkg -i` directly will FAIL the first time with "depends on libgdal34t64; however ...".
sudo apt install -f ./xraygraphdb_4.9.4_amd64.deb

.tar.gz — portable build (any glibc ≥ 2.39)

The tarball ships every C/C++ runtime we need (libstdc++, libLLVM, libsolclient, libssl, libcrypto, libxraygraphdb_module_support) under lib/. The install.sh wrapper drops them at /usr/lib/xraygraphdb/lib, registers the path with ldconfig, installs the systemd unit, and apt-installs the one external dep (libgdal34t64).

Shell
wget https://emtai-xray.emtailabs.com  # request licensed access4.9.4-linux-x86_64.tar.gz
tar xzf xraygraphdb-4.9.4-linux-x86_64.tar.gz
cd xraygraphdb-4.9.4-linux-x86_64
sudo ./install.sh

Configure — required systemd drop-in

The default /lib/systemd/system/xraygraphdb.service ships an empty ExecStart sentinel so you can layer site-local flags via a drop-in without forking the unit file. Create the drop-in before the first systemctl start:

Shell
sudo mkdir -p /etc/systemd/system/xraygraphdb.service.d
sudo tee /etc/systemd/system/xraygraphdb.service.d/local.conf >/dev/null <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/lib/xraygraphdb/xraygraphdb \
    --data-directory=/var/lib/xraygraphdb \
    --bolt-port=7687 \
    --xray-port=7689 \
    --storage-engine=mmap \
    --storage-properties-on-edges=true \
    --log-level=WARNING \
    --license-acknowledge-saved=true \
    --bolt_listen_mode=off
EOF

sudo systemctl daemon-reload
sudo systemctl start xraygraphdb
sudo systemctl status xraygraphdb
ss -tlnp | grep :7689     # xrayProtocol
About XG_ALLOW_PLAINTEXT_TENANT_METADATA=1 in the default service file.

xrayGraphDB encrypts per-tenant metadata at rest. On a fresh install there is no tenant encryption key configured yet, so the default /lib/systemd/system/xraygraphdb.service ships with Environment=XG_ALLOW_PLAINTEXT_TENANT_METADATA=1. This tells the daemon “there’s only one tenant on this host, store the metadata in plaintext — that’s fine.” Without it the daemon refuses to bootstrap and crashes with UnknownDatabaseException: "xraygraphdb" on first start.

You can leave it as-is for: single-tenant installs, evaluation, development, and the small-team / single-application deployments most users have. The env var is the supported default and is not a security regression on a single-tenant host.

You should change it for: multi-tenant production where tenants must not see each other's metadata. The procedure is (a) register a tenant encryption provider through the licensed admin API (HashiCorp Vault, AWS KMS, or your enterprise KMS — see the Licensed admin guide), then (b) remove XG_ALLOW_PLAINTEXT_TENANT_METADATA from the unit file and reload. The daemon will now write encrypted tenant metadata and refuse plaintext writes.

Do not just remove the env var without first registering an encryption provider — the daemon will crash-loop until one of the two is true.

Critical: Always stop xrayGraphDB with SIGTERM (graceful shutdown). Never use kill -9. A forced kill skips the final snapshot and may result in data loss on next recovery.

Storage Engine Selection

xrayGraphDB supports two storage engines, both built on eMTAi's patent-pending storage architecture. Set with --storage-engine on the command line or in the service file.

EngineFlagBehaviorBest For
mmap --storage-engine=mmap Data stored on NVMe, paged into RAM on demand via kernel page cache. Handles datasets larger than physical RAM. Production, large datasets, bulk loading
default (in-memory) --storage-engine=default All data in RAM. No disk I/O during queries. Lowest possible latency (0.1ms). Low-latency workloads, datasets that fit in RAM
Important — In-memory engine memory requirements: The default (in-memory) engine stores all vertices, edges, properties, and indexes in RAM. A dataset that uses 2GB on disk (mmap) may require 10-20GB in RAM due to in-memory index structures, property heap allocations, and per-record metadata. Plan for approximately 5-10x the on-disk size when using the in-memory engine.

Memory Configuration

xrayGraphDB should be allowed to use most of the available RAM, but always reserve a safety margin for the operating system. If the database exhausts all memory, the OS OOM killer will terminate the process mid-operation, risking data corruption.

INI
# /etc/systemd/system/xraygraphdb.service.d/memory.conf
# RECOMMENDED: Use 95% of physical RAM. The remaining 5% keeps the
# OS responsive and prevents the OOM killer from crashing the database.
[Service]
MemoryMax=95%

# On shared servers, set an absolute limit with at least 20% headroom
# above your expected peak usage for query working memory and OS caches.
# Example for a 256GB server with ~150GB expected dataset:
# MemoryMax=200G

# WARNING: Do not use MemoryMax=infinity. If the database exhausts all
# memory, the OOM killer will terminate the process, potentially
# corrupting in-flight writes and WAL entries.

After changing service configuration, reload systemd:

Shell
sudo systemctl daemon-reload
sudo systemctl restart xraygraphdb

Docker: When running in Docker, set the container memory limit with --memory or deploy.resources.limits.memory in Compose. Always leave 5% of host RAM for the OS. For example, on a 64GB host: docker run --memory=60g.

Installation: macOS

macOS is supported for development only. For production workloads use Linux or Docker.

Shell
# Download macOS binary (Apple Silicon or Intel)
curl -LO https://releases.emtailabs.com/xraygraphdb/xraygraphdb-v4.9.4-macos-arm64.tar.gz

# Extract and run
tar xzf xraygraphdb-v4.9.4-macos-arm64.tar.gz
cd xraygraphdb-v4.9.4
./bin/xraygraphdb-wrapper

On macOS you may also use Docker Desktop, which is the recommended approach for local development.

First Boot: Bootstrap Admin

Fresh installs ship with no users in the auth store. The daemon will not let you create the first admin user over the wire (chicken-and-egg), so you must run the one-time bootstrap helper before the database is usable. This step does not apply to replica nodes — replicas inherit the admin user from the cluster's MAIN; see Cluster Setup.

Shell
sudo xraygraphdb-bootstrap-admin

The helper:

  1. Generates a 24-character random password.
  2. Displays it once on your terminal in red.
  3. Asks you to retype it to confirm you copied it.
  4. Writes the username/password to /run/xraygraphdb/bootstrap.env — this is a tmpfs file, never on disk — and starts the daemon.
  5. Schedules an ExecStartPost hook that wipes the env file ~15 seconds after the daemon comes up.
The password is shown exactly once. The script does not write it to any persistent file on this server. Copy it to your password manager when prompted. If you lose it, the only recovery is to wipe /var/lib/xraygraphdb and re-run the bootstrap — the existing user data is unrecoverable.

Loading datasets — where to put files

The default systemd unit ships with PrivateTmp=true for sandbox hardening. That gives the daemon its own private /tmp/ namespace, separate from the host's /tmp/. Anything you place in the host's /tmp/ is invisible to the daemon, and bulk-import calls against those paths fail-fast with 0 vertices / 0 edges in under a second — no error in the journal.

Do not put datasets in /tmp/. Even though the dataset file is world-readable, the daemon cannot see it. Use /var/lib/xraygraphdb/import/ instead — this path is in the unit's ReadWritePaths=, owned by the xraygraphdb user, and shares the daemon's namespace.
Shell
# Right way — daemon can read it:
sudo mkdir -p /var/lib/xraygraphdb/import
sudo mv ~/com-friendster.ungraph.txt /var/lib/xraygraphdb/import/
sudo chown -R xraygraphdb:xraygraphdb /var/lib/xraygraphdb/import

# Then in your bench/import script:
client.bulk_import_file("/var/lib/xraygraphdb/import/com-friendster.ungraph.txt")

# Wrong way — daemon's PrivateTmp namespace makes this invisible:
#   client.bulk_import_file("/tmp/com-friendster.ungraph.txt")    ← returns 0/0 in 1s

If you need a different dataset path (e.g. a large NVMe mount at /data/), add it to the systemd unit's ReadWritePaths= via a drop-in:

/etc/systemd/system/xraygraphdb.service.d/datasets.conf
[Service]
ReadWritePaths=/data

Then systemctl daemon-reload && systemctl restart xraygraphdb, and the daemon can read/write under /data/.

If the host is going to join an existing cluster as a replica, skip this section and use xraygraphdb-cluster-join instead.

Cluster Setup: 3-Node

xrayGraphDB clusters are MAIN + N replicas. The MAIN owns the auth store, all databases, and accepts writes. Replicas are read-only mirrors that receive WAL pushes from the MAIN; they do not have an independent identity.

Replicas are wiped on join. When you turn an existing standalone server into a replica, all of its local users, passwords, databases, snapshots, and plugin licenses are permanently destroyed and replaced by what MAIN ships. There is no merge. Choose your MAIN deliberately.

Step 1 — Set up MAIN

Pick the server that will be MAIN, install xrayGraphDB normally (Docker or .deb), and bootstrap its admin user. Whatever admin password you set on MAIN becomes the cluster-wide admin password — replicas will receive it on first sync.

Add a drop-in to enable primary mode and pre-list the replicas (you may also use REGISTER REPLICA at runtime):

/etc/systemd/system/xraygraphdb.service.d/local.conf (on MAIN)
[Service]
ExecStart=
ExecStart=/usr/lib/xraygraphdb/xraygraphdb \
    --data-directory=/var/lib/xraygraphdb \
    --bolt-port=7687 \
    --xray-port=7689 \
    --storage-engine=mmap \
    --replication-mode=primary \
    --replication-replicas=replica1.example.com:7690,replica2.example.com:7690 \
    --license-acknowledge-saved=true \
    --bolt_listen_mode=off
Shell
sudo systemctl daemon-reload
sudo systemctl restart xraygraphdb
# Verify primary mode
echo "SHOW REPLICATION ROLE;" | xgconsole --auth=admin:YOUR_PW

Step 2 — Join each replica

On each replica host (after a fresh xrayGraphDB install — do not run xraygraphdb-bootstrap-admin on replicas):

Shell (on each replica)
sudo xraygraphdb-cluster-join \
    --main=main.example.com:7690 \
    --replica-port=7690 \
    --accept-wipe

The helper stops the local daemon, wipes /var/lib/xraygraphdb/{databases,auth,plugin_licenses,settings,replication,snapshots,wal}, writes a replica drop-in, and restarts. Without --accept-wipe it refuses; the daemon itself also refuses to start in replica mode against non-empty data unless that flag is passed (defense-in-depth in case someone hand-edits systemd).

Add --keep-snapshot-backup to mv the data directory to a timestamped .bak. path instead of deleting — useful if you want a one-restart undo.

Step 3 — Register on MAIN (if not pre-listed)

If you did not bake replica addresses into MAIN's --replication-replicas, register them at runtime:

Cypher (on MAIN)
REGISTER REPLICA replica1 SYNC TO "replica1.example.com:7690";
REGISTER REPLICA replica2 ASYNC TO "replica2.example.com:7690";
SHOW REPLICAS;
// Both data_info and system_info should populate within a few seconds.

Pick SYNC for replicas where you need durable acknowledgement of every write before MAIN commits, and ASYNC for replicas where you tolerate lag but don't want them to slow MAIN down.

Step 4 — Verify replication

Cypher
// On MAIN — write a probe row
CREATE (:ReplProbe {ts: timestamp(), host: "main"});

// On each REPLICA — read it back (read-only)
MATCH (n:ReplProbe) RETURN n.ts, n.host;
// Should return the row written on MAIN; latency should be under a second on healthy networks.

If a replica is empty or stuck, check SHOW REPLICAS on MAIN: a healthy replica reports a populated data_info object and an increasing timestamp. A failed sync reports status: "invalid" — check both hosts' journals (journalctl -u xraygraphdb -n 200) and confirm the replica's port 7690 is reachable from MAIN.

Outage scenarios — what happens, what you do

The cluster has explicit, predictable behavior for every node-down case. Memorize this table.

Scenario Behavior today (v4.x) Operator action
MAIN dies (no coordinators) Replicas keep serving READS. Writes are refused with a clear error pointing at MAIN's address. No automatic promotion. Pick a replica, run SET REPLICATION ROLE TO MAIN on it. Re-point the other replicas at the new MAIN with SET REPLICATION ROLE TO REPLICA WITH PORT 7690 ... --replication-primary=<new-main>:7690 (drop-in edit + restart). Cluster identity is preserved — users, passwords, and data continue from the survivor's last state.
MAIN dies (coordinator quorum, enterprise) Coordinators detect missed heartbeats; majority quorum auto-promotes a replica. Promotion is committed via Raft so split-brain is impossible. Detection + promotion typically takes a few seconds. None. Inspect SHOW INSTANCE on the new MAIN to confirm.
Replica down for minutes For SYNC replicas: MAIN's writes block until the SYNC timeout (configurable, default 1s), then MAIN demotes that replica to ASYNC and continues. ASYNC replicas just lag. None — the replica catches up via WAL when it returns.
Replica down for hours / days Same as above. When the replica returns, it requests catch-up from MAIN. If MAIN's WAL still covers the gap, sync resumes seamlessly. None. Confirm with SHOW REPLICAS that data_info.timestamp climbs.
Replica down for weeks / months MAIN's WAL has rotated past the replica's last applied position. Catch-up is impossible without a re-seed. Run DROP REPLICA <name> on MAIN, then on the returning host run xraygraphdb-cluster-join --main=<main>:7690 --accept-wipe. The replica re-bootstraps from MAIN's current snapshot.
Operator wants to dissolve the cluster n/a (manual today; xraygraphdb-cluster-leave helper coming in v4.9.5). On MAIN: DROP REPLICA <name> for each replica. On each replica: systemctl stop xraygraphdb && rm /etc/systemd/system/xraygraphdb.service.d/20-replica.conf && systemctl daemon-reload && systemctl start xraygraphdb. Each ex-replica becomes standalone with its own copy of the data (last sync point) plus the cluster's admin user.
Network partition SYNC replicas time out and demote to ASYNC. ASYNC replicas continue lagging until the partition heals. MAIN never auto-fails-over without coordinators (no split-brain risk). Verify with SHOW REPLICAS when the partition heals; replicas catch up automatically.

SYNC vs ASYNC — pick consciously

Replicas are read-only — bench / write workload implications

Every replica refuses any query that touches the write path: CREATE, MERGE, SET, REMOVE, DELETE, bulk inserts, index/constraint creation, and analytics that materialize results back into properties (e.g. CALL xg.pagerank(...) YIELD ... SET p.rank = ...). The error is unambiguous: "Write transaction refused: this node is in REPLICA mode".

If you're running a bench suite that mixes write-heavy and read-heavy workloads, route writes to MAIN and pure-read benchmarks to a replica — or, if you want each host to be independently writable for like-for-like comparison, do not cluster them at all. Replicas only make sense when you want a read-scaled mirror of MAIN, not when you want N independent writable boxes.

Recovery & Diagnostics

xrayGraphDB v4.9.5+ ships a recovery toolchain so operators can self-serve diagnosis and rollback when storage corruption, interrupted bulk imports, or crash loops happen — without ssh-and-grep-journalctl. Five tools, three layers (prevention · detection · action), every destructive command gated behind explicit acknowledgement and supports --dry-run.

The five tools at a glance

CommandWhat it doesModifies data?
xraygraphdb-doctorRead-only diagnostic. Walks /var/lib/xraygraphdb/, systemd state, the crash-recovery hint, mmap sizes vs meta.json, WAL, snapshots, ports, disk. Prints a structured verdict + remediation. Exit 0..4 by severity. JSON mode (--json) for monitoring.No
xraygraphdb-recover autoReads the .crash-recovery-hint.json the daemon wrote before std::terminate, dispatches the recommended action.Yes
xraygraphdb-recover quarantineRenames graph/, databases/, the buildlock, and the hint to .corrupt-*-<ts>; restarts daemon empty. Auth + licenses + plugins + replication preserved.Yes (rename, recoverable)
xraygraphdb-recover from-walBacks up current graph/, drops a one-time --data-recovery-on-startup flag, restarts the daemon to replay WAL past the last clean checkpoint, removes the drop-in.Yes
xraygraphdb-recover from-snapshotRestores from a snapshot dir (auto-picks newest, or pass --path=). Quarantines current state first; rolls back automatically on failure.Yes
xraygraphdb-recover list-quarantinesRead-only inventory of .corrupt-* and .recovery-backup-* dirs with sizes + ages.No
xraygraphdb-recover cleanup-quarantinesDeletes quarantine + backup dirs older than N days (default 30; --older-than-days=0 for "all"). Always confirms first.Yes
xraygraphdb-watchdogOpt-in auto-recovery via systemd ExecStartPre. After 3+ consecutive crashes and a hint file present, it runs recover auto -y and resets the failure counter. Always exits 0 so a watchdog hiccup never blocks startup.Yes (when triggered)

Layer 1 — Prevention: snapshot-on-bulk-import

Every bulk_import_file call now wraps the underlying CSR build in a rollback envelope. Before the build runs, if a previous CSR build exists for the tenant, the daemon atomically renames it to .csr-pre-import-<name>-<timestamp>/. If anything goes wrong — an exception, an interrupted process, a build that produces zero vertices against a non-empty input file (the classic PrivateTmp footgun where the daemon can't see the host's /tmp/) — the daemon removes the partial new build and renames the snapshot back into place. Filesystem rename is atomic on POSIX; there is no window where both directories are partially populated.

On success, the snapshot lingers on disk so an operator can review and either restore from it (if the new build proved unsatisfactory) or delete it (xraygraphdb-recover cleanup-quarantines --older-than-days=7).

Layer 1.5 — Cooperative cancellation of async imports

An async import on Friendster-scale data (3.6 B edges, ~15 GB CSR payload) runs 5–60 minutes. If you realize mid-flight that you sent the wrong file, the wrong directedness flag, or the wrong tenant, you do not need to systemctl restart xraygraphdb — restarting kills every other tenant's open connections as collateral. Instead the daemon exposes four cooperative cancel surfaces; pick whichever matches your tooling. Full reference at docs/bulk-import-cancel.md.

Tenant scoping. All four surfaces enforce server-side isolation: a job's import_id is visible only to the tenant that started it. Cross-tenant lookups return found=false, indistinguishable from “unknown id” (audit #6890 — no enumeration oracle).

Surface 1 — xrayProtocol native (Python bench client, C++ stress client)

Wire opcode BULK_IMPORT_FILE_CANCEL (0x31) — one frame round-trip. Reply carries {import_id, found, cancellable, phase}. Phase values: PENDING (0), RUNNING (1), DONE (2), ERROR (3), CANCELLED (4). Always-in-sync byte-level spec at CALL xg.protocol_messages() YIELD opcode, body WHERE opcode = '0x31'.

python — xraygraphdb-bench/xray_protocol_client.py
# Bench team: 30-second idle TTL so an abandoned tmux session releases the worker slot
import_id = client.bulk_import_file_async(
    "/var/lib/xraygraphdb/import/friendster.ungraph.txt",
    cancel_idle_timeout_ms=30000)

# Wrong file. Abort.
result = client.bulk_import_file_cancel(import_id)
assert result["found"] and result["cancellable"]

# Poll until CANCELLED (typically <5s, bounded by CSR phase boundary).
while True:
    p = client.bulk_import_file_progress(import_id)
    if p["phase_name"] in ("DONE", "ERROR", "CANCELLED"):
        break
    time.sleep(0.5)

Surface 2 — Cypher procedures (Bolt, dbeaver, neo4j-driver, raw EXECUTE)

For tools that don't speak xrayProtocol natively, the bridge exposes three procs in the xg.* community-tier namespace. Same tenant scoping. Same phase semantics.

cypher
-- "Which id was that again?"
CALL xg.imports_list()
YIELD import_id, phase, bytes_total, started_unix_ms
WHERE phase = 'RUNNING'
RETURN import_id, phase, bytes_total
ORDER BY started_unix_ms DESC;

-- Abort the wrong-file import.
CALL xg.import_cancel(123) YIELD found, cancellable, phase
RETURN found, cancellable, phase;

-- Confirm it landed.
CALL xg.import_progress(123) YIELD phase, error
RETURN phase, error;

Discover the full proc surface via the always-in-sync catalog: CALL xg.builtin_functions() YIELD name, signature, description WHERE category = 'Async Import' RETURN *.

Surface 3 — Cancel-on-idle TTL (opt-in)

Pass an optional trailing u32 cancel_idle_timeout_ms in the BULK_IMPORT_FILE_ASYNC body (0x2E) to have the server auto-cancel if no PROGRESS poll arrives within the budget. Defaults to absent — older clients keep the documented “survives client disconnect” behaviour exactly. A per-job idle-watcher thread checks at min(timeout/4, 5000) ms cadence (clamped to ≥ 50 ms) and flips the cancel flag when the budget is exhausted. Worst-case time-to-cancel on a 30 s budget against Friendster-scale data: ~35 s.

Surface 4 — Observability (Prometheus / Grafana)

Four counters under /metrics — one per terminal phase plus the in-flight invariant:

BulkImportFileStarted - (BulkImportFileCompleted
                       + BulkImportFileFailed
                       + BulkImportFileCancelled) = in-flight count

Suggested Grafana panel for “dataset/flag mismatch upstream of the daemon”: rate(BulkImportFileCancelled[5m]) / rate(BulkImportFileStarted[5m]) — sustained > 10% means the bench pipeline is feeding wrong files, not that the database is broken.

Layer 2 — Detection: .crash-recovery-hint.json

Audit-#64 mmap-size traps (the most common storage-corruption crash class) now write a structured hint file before the daemon's std::terminate fires. Operators reading journalctl still see the C++ backtrace, but they don't need to parse it — /var/lib/xraygraphdb/.crash-recovery-hint.json contains everything xraygraphdb-doctor and xraygraphdb-recover auto need to dispatch the right action.

/var/lib/xraygraphdb/.crash-recovery-hint.json (example)
{
  "trap":                    "audit_64_mmap_size",
  "file":                    "/var/lib/xraygraphdb/graph/vertices.mmap",
  "expected_size_meta_json": 203630336,
  "actual_size_disk":        268435456,
  "requested_size":          6341138644081852544,
  "max_virtual_size":        1099511627776,
  "verdict":                 "header_corrupt",
  "recommended_action":      "xraygraphdb-recover --rebuild-vertices-from-wal",
  "timestamp":                "2026-05-06T14:21:31Z"
}

Layer 3 — Action: the recover commands

Anything weird? Always start here:

Shell
sudo xraygraphdb-doctor

It will tell you what's wrong — daemon state, mmap consistency, auth store, WAL, snapshots, disk space — and end with one of:

Doctor said "structured recovery hint"? Just dispatch:

Shell
sudo xraygraphdb-recover auto --I-understand-this-modifies-data

This reads the hint and runs exactly the recommended action — quarantine, from-wal, or from-snapshot. If you want to pick one explicitly:

Shell
sudo xraygraphdb-recover quarantine --I-understand-this-modifies-data
sudo xraygraphdb-recover from-wal --I-understand-this-modifies-data
sudo xraygraphdb-recover from-snapshot --path=.csr-pre-import-csr___system__-20260506-141500 --I-understand-this-modifies-data

Every command takes --dry-run (prints every step without executing) and -y/--yes (skip the interactive confirm, for CI). Refuses to run destructively in non-interactive shells without --yes.

What got quarantined — and how to free that disk space later:

Shell
sudo xraygraphdb-recover list-quarantines
sudo xraygraphdb-recover cleanup-quarantines --older-than-days=30 --I-understand-this-modifies-data

Opt-in auto-recovery (production)

For unattended deployments where 2am-on-Tuesday means nobody is watching, install the watchdog drop-in. Off by default; enabled by file presence:

Shell
sudo cp /usr/share/xraygraphdb/40-auto-recover.conf.example \
        /etc/systemd/system/xraygraphdb.service.d/40-auto-recover.conf
sudo systemctl daemon-reload

That drop-in adds ExecStartPre=+-/usr/bin/xraygraphdb-watchdog --threshold=3 to the daemon unit. The watchdog reads systemctl show -p NRestarts; when it's ≥ 3 and a hint file is present, it runs xraygraphdb-recover auto --yes and resets the failure counter on success. Logs every event to syslog as xraygraphdb-watchdog for monitoring.

The watchdog always exits 0 so a watchdog hiccup never blocks the daemon's normal startup. It refuses to act when no hint file is present (a config error or port conflict isn't recoverable by the recover tool).

Operator quick-reference card

Print this once and tape it to the rack:

SymptomFirst command
Anything unexpected at allsudo xraygraphdb-doctor
Doctor verdict mentions a hint filesudo xraygraphdb-recover auto --I-understand-this-modifies-data
Daemon won't start, no hint filejournalctl -u xraygraphdb -n 200 --no-pager
Bulk import returns 0/0 in <1sMove dataset out of /tmp/; daemon's PrivateTmp=true hides it
Daemon up but admin login failssudo xraygraphdb-bootstrap-admin
Disk filling with .corrupt-* dirssudo xraygraphdb-recover cleanup-quarantines --older-than-days=7 -y
Hours-into-the-incident, want a sitrep for monitoringsudo xraygraphdb-doctor --json

Where the docs live on a server

The same content (this section + a quickstart cheatsheet) ships with the .deb at /usr/share/xraygraphdb/RECOVERY.md. Read it offline with cat /usr/share/xraygraphdb/RECOVERY.md when the daemon is down and you can't reach this site.

Key Rotation (v5 always-encrypted)

xrayGraphDB is always-encrypted as of v5 — there is no plaintext mode, no EncryptionMode::DISABLED, no flag to opt out. Encrypted storage uses a two-level key hierarchy: a cluster-wide KEK (root-of-trust, stored at $data_dir/.storage_epoch, wrapped by the configured KMS provider) and per-tenant DEKs (one current + N historical, stored in $data_dir/.tenant_keys/<tenant_id>.bin). Each rotation appends a key_rotated event to the SecurityLog (audit #7682), and the xg-security-verify CLI verifies the SHA-256 chain end-to-end. Full operator runbook at docs/key-rotation.md.

Two Cypher procs, audit-trail-bound

Both procs are tenant-scoped server-side, and both append a tamper-evident row to $data_dir/security/security.log:

Rotate a tenant's DEK (90-day cadence)
CALL xg.tenant_key_rotate_dek('acme-corp')
YIELD tenant_id, ok, old_version, new_version, error_code
RETURN *;
——
→ tenant_id  | ok   | old_version | new_version | error_code
→ acme-corp  | true | 7           | 8           | 0

Increments current_write_version. New writes for this tenant use the new DEK; existing records stay decryptable via prior versions in the keyring. Typically <50 ms on .187-class hardware regardless of tenant size (it's a keyring-file update, not a record re-encryption pass).

Re-wrap a tenant's keyring under a different KEK
CALL xg.tenant_key_rotate_kek('acme-corp', 'arn:aws:kms:us-east-1:…')
YIELD tenant_id, new_kek_ref, ok, error_code
RETURN *;

Unwraps each historical DEK with the previous KEK reference and re-wraps under new_kek_ref. Record bytes on disk are unchanged. Per-tenant operation; for cluster-wide rotation, iterate over your tenant list (the runbook ships a reference shell script).

When to rotate

TriggerCadenceMandatory?
Routine hygieneDEK 90 days, KEK 365 daysNo
Suspected key compromiseImmediateYes
Admin offboarding with prior key access<24 hYes
SOC2 / FedRAMP / HIPAA compliance windowPer audit cycleYes if your program requires
Post-restore from backup taken before a security eventBefore resuming trafficYes
Hardware key store migrationAt cutoverYes

The daemon does not rotate automatically — cadence is an operator policy decision. The runbook ships a reference systemd.timer for quarterly tenant-DEK rotation.

Federation v1 Design (in progress)

Status: design locked-in for v5 release on 2026-05-12. Implementation is multi-phase and not user-visible yet — single-node deployments are unaffected. Full design at docs/design/federation-v1-design.md. The operator-facing surface (Cypher procs, write/read mode flags) will land in a future release when Phase A–E ship.

TL;DR

Raft leader-follower. Writes go to the elected leader. Default commit policy is semi-sync (leader + one follower ack). Quorum mode for strict durability. Async mode for bulk ingest / benchmarks / dev. Failover uses Raft heartbeat/election with operator override. Minority partitions become read-only. No active-active writes in v1. No last-writer-wins for graph mutations.

Two replication planes

A core architectural decision — bandwidth and latency profiles differ enough that they get separate channels:

  1. WAL plane. Nodes / edges / properties / indexes / schema / metadata replicate through the Raft log. Every entry is consensus-committed; followers apply in commit_lsn order.
  2. Artifact plane. CSR builds, columnar segments, vector indexes, geo indexes, materialized views, analytics results replicate as versioned snapshots keyed by epoch/LSN on a separate channel (HTTP/2 streaming). The Raft log carries a manifest entry referencing the artifact; followers fetch out-of-band. Putting multi-GB CSR builds in the Raft log would crater throughput on every follower.

Write modes

ModeSemanticsUse case
asyncLeader commits locally; followers eventualDev, benchmarks, bulk ingest, analytics caches
semi_sync (default)Leader + 1 follower ackNormal enterprise deployments
quorumMajority ack before commitFinancial, compliance (DoD / SOC2 / FedRAMP / HIPAA), critical workloads

Read modes

ModeSemantics
read_local (default)Stale-allowed; replication_lag_ms visible in BATCH reply header (negotiated CAP_REPL_LAG)
read_leaderRouted to current Raft leader; strongest consistency
read_majority_committedBlock until commit_lsn covered by N/2+1 followers
read_snapshot_at_lsn(LSN)Time-travel for long-running analytical queries

Explicitly out of scope for v1

Configuration Flags Reference

xrayGraphDB exposes 183 command-line flags. They are typically set in the ExecStart= line of the systemd unit (/etc/systemd/system/xraygraphdb.service) or loaded en masse via --flag-file=/etc/xraygraphdb/xraygraphdb.conf. Every flag accepts both hyphen and underscore forms (--bolt-port = --bolt_port), and all flags can be set as environment variables using the XRAY_FLAG_NAME convention (uppercase, underscores).

Reference deployment. The flags marked [prod] below are the ones set in the canonical .187 production unit. A minimal-but-complete ExecStart= looks like this:
/etc/systemd/system/xraygraphdb.service (excerpt)
ExecStart=/usr/lib/xraygraphdb/xraygraphdb \
  --storage-engine=default \
  --data-directory=/neo4j/xraygraphdb \
  --bolt-port=7687 \
  --storage-properties-on-edges=true \
  --storage-wal-enabled=true \
  --storage-snapshot-on-exit=true \
  --storage-snapshot-interval-sec=300 \
  --data-recovery-on-startup=true \
  --xray-workers=128 \
  --log-level=WARNING \
  --also-log-to-stderr=true \
  --query-execution-timeout-sec=0 \
  --license-acknowledge-saved=true \
  --swim-enabled=true \
  --swim-user=<scds-user> \
  --swim-pass=@/etc/xraygraphdb/swim.pass \
  --swim-queues=<queue-spec>
License JSON to journal. Without --license-acknowledge-saved=true, the daemon prints the full license JSON to journalctl on every restart so the operator can save it externally. Once you have the license stored in a password manager / offline backup, set this flag to true in the systemd unit to suppress the dump.

Network — Bolt, xrayProtocol, WebSocket

FlagTypeDefaultDescription
--bolt-addressstring"0.0.0.0"IP address the Bolt server binds to.
--bolt-advertised-addressstring""Address advertised in the Bolt ROUTE routing table (host:port). Required behind NAT / proxy / container.
--bolt-cert-filestring""Path to the TLS certificate for the Bolt server.
--bolt-debug-logboolfalseLog Bolt negotiated version, state transitions, and message types.
--bolt-emergency-allowliststring""Comma-separated IPv4/IPv6/CIDR entries permitted to connect when --bolt-listen-mode=emergency.
--bolt-emergency-untilstring""ISO-8601 timestamp when the Bolt emergency window auto-closes.
--bolt-honor-routing-addressboolfalseUse the routing_context address from ROUTE requests instead of the server config.
--bolt-key-filestring""Path to the TLS private key for the Bolt server.
--bolt-listen-modestring"off"Bolt listener mode: off (do not bind 7687), emergency (allowlist + window), on (legacy: accept any peer). xrayProtocol on 7689 is the production transport.
--bolt-max-auth-retriesint323Maximum failed LOGON attempts before closing a Bolt v5.1+ connection.
--bolt-num-workersint32128Number of Bolt worker threads. Defaults to the host CPU count.
--bolt-port [prod]int327687Port on which the Bolt server listens.
--bolt-routing-ttlint32300TTL in seconds for the synthetic routing table returned in ROUTE SUCCESS.
--bolt-server-agentstring"xrayGraphDB/4.9.1"Server agent string sent in Bolt HELLO SUCCESS. Set to a Neo4j-shaped string for maximum driver compatibility.
--bolt-server-name-for-initstring(version banner)Server name returned to the client in the Bolt INIT message.
--bolt-session-timeoutint320Idle Bolt session timeout in seconds; emitted as a connection.recv_timeout_seconds hint. 0 disables.
--bolt-telemetry-logboolfalseLog Bolt TELEMETRY API values at INFO level (default is trace-only).
--bolt-ws-portint327688Port for the WebSocket-to-Bolt bridge (browser clients, HA cluster traffic).
--bolt-ws-workersint323Number of WebSocket bridge IO threads.
--xray-portint327689xrayProtocol server port. Set to 0 to disable the xrayProtocol listener.
--xray-workers [prod]int324Number of xrayProtocol worker threads. Production deployments size this to the host CPU count.
--xray-idle-timeout-secint32300Seconds of inactivity before closing an xrayProtocol connection.
--xray-max-connectionsint324096Maximum concurrent xrayProtocol connections before new connections are rejected.
--xray-preauth-buffer-kbint328Per-connection receive buffer size (KiB) before authentication completes. Caps slowloris / OOM exposure.
--xray-recv-buffer-max-mbint32128Maximum per-connection receive buffer size in MiB.
--xray-proc-slow-log-threshold-msint641000xray.* procedure calls running longer than this emit a structured warning log on completion. 0 disables.
--websocket-addressstring"127.0.0.1"Bind address for the WebSocket monitoring server. Set to 0.0.0.0 only behind a firewall.
--websocket-portint320Port for the WebSocket monitoring server. 0 means use --monitoring-port.
--monitoring-addressstring"127.0.0.1"Bind address for the monitoring WebSocket server.
--monitoring-portint327444Port for the monitoring WebSocket server.
--metrics-addressstring"127.0.0.1"Bind address for the Prometheus-style metrics endpoint. Set to 0.0.0.0 only behind a firewall / VPN.
--metrics-portint329091Port for the metrics endpoint.
--metrics-auth-tokenstring""Bearer token required on every /metrics request. Empty refuses any non-loopback bind.
--rpc-peer-allowliststring""Comma-separated IPv4/IPv6/CIDR list permitted to connect to internal cluster RPC ports.
--rpc-shared-secret-pathstring""Path to the 32-byte cluster shared secret used to HMAC-SHA256 every RPC payload (mode 0600).

Storage

FlagTypeDefaultDescription
--data-directory [prod]string"/var/lib/xraygraphdb"Directory where all permanent data (snapshots, WAL, auth store) lives.
--data-recovery-on-startup [prod]booltrueRecover persisted data from snapshot + WAL on startup.
--csr-directorystring""Base directory for CSR edge stores. Defaults to the parent of --data-directory when empty.
--storage-engine [prod]string"memory"Storage backend: mmap (file-backed), memory (in-RAM SkipList), default/auto (RAM-aware: mmap below 128 GiB, memory at 128 GiB+).
--storage-modestring"IN_MEMORY_TRANSACTIONAL"Default storage mode at startup: IN_MEMORY_ANALYTICAL, IN_MEMORY_TRANSACTIONAL, ON_DISK_TRANSACTIONAL, MMAP_TRANSACTIONAL.
--storage-ephemeralboolfalseDisable WAL, snapshots, and recovery. Data is lost on restart. Use only for dev / test / disposable workloads.
--storage-properties-on-edges [prod]booltrueAllow edges to carry properties. Set false only for pure topology graphs to save memory.
--storage-wal-enabled [prod]booltrueEnable the write-ahead log for crash recovery between snapshots.
--storage-wal-file-size-kibuint6420480Target file size before a WAL segment rotates (KiB).
--storage-wal-file-flush-every-n-txuint64100000fsync the WAL after this many transactions. Set to 1 for fully synchronous durability.
--storage-snapshot-on-exit [prod]booltrueTake a snapshot during clean shutdown.
--storage-snapshot-interval-sec [prod]uint64300Periodic snapshot interval in seconds. 0 disables periodic snapshots.
--storage-snapshot-intervalstring""Snapshot schedule via cron expression or period in seconds.
--storage-snapshot-retention-countuint643How many snapshot files to retain on disk.
--storage-snapshot-thread-countuint64128Worker threads used to write snapshots when parallel snapshot is on.
--storage-parallel-snapshot-creationboolfalseCreate snapshots using --storage-snapshot-thread-count threads.
--storage-recovery-thread-countuint64128Threads used to recover persisted data from disk.
--storage-parallel-schema-recoveryboolfalseRebuild indexes and constraints in parallel during recovery.
--storage-backup-dir-enabledbooltrueUse the .old directory to retain the previous snapshot and WAL set.
--storage-gc-cycle-secuint6430Garbage collector interval in seconds.
--storage-gc-aggressiveboolfalseEnable aggressive garbage collection.
--storage-python-gc-cycle-secuint64180Full Python GC interval in seconds (for Python query modules).
--storage-access-timeout-secuint641Storage-level access timeout for a query in seconds.
--storage-items-per-batchuint640Edges and vertices stored per batch in a snapshot file.
--storage-floating-point-resolution-bitsuint6452Floating-point resolution bits for property encoding.
--storage-delta-on-identical-property-updatebooltrueCreate a delta object even when a property is rewritten with the same value.
--storage-enable-edges-metadataboolfalseStore extra edge metadata to accelerate certain traversals.
--storage-enable-schema-metadataboolfalseTrack resident labels and edge types as schema metadata.
--storage-automatic-edge-type-index-creation-enabledboolfalseAuto-create edge-type indexes on relationships.
--storage-automatic-label-index-creation-enabledboolfalseAuto-create label indexes on vertices.
--storage-property-store-compression-enabledboolfalseEnable property-store compression.
--storage-property-store-compression-levelstring"mid"Compression level for property storage: low, mid, high.
--storage-rocksdb-enable-thread-trackingboolfalseEnable RocksDB thread-status tracking. Off by default for lower syscall overhead.
--storage-rocksdb-info-log-levelstring"INFO_LEVEL"RocksDB info log level: DEBUG_LEVELFATAL_LEVEL.
--schema-info-enabledboolfalseTrack run-time schema info (per-tenant label / edge-type roster).
--orphan-ttl-daysint3230Days to keep orphaned database directories in .orphans/ before automatic purge. -1 disables auto-purge.
--skip-recoveryboolfalseStart with empty storage even if snapshot recovery fails. The next commit overwrites the corrupt snapshot.

Authentication & License

FlagTypeDefaultDescription
--admin-resetstring""Reset the admin user password. Prefix with @ to read from a file. Requires --auth-token.
--admin-set-tenantstring""Targeted recovery: assign a tenant_id to a single existing user (<username>=<tenant_id>). Requires --auth-token. Never pass =default.
--auth-tokenstring""Ownership proof for admin recovery. Accepts the storage epoch key or a license signature hash. Prefix with @ to read from a file.
--auth-argon2-mem-costint3265536Argon2id memory cost in KiB for new password hashes (default 64 MiB).
--auth-argon2-parallelismint324Argon2id parallelism (threads / runtime concurrency) for new hashes.
--auth-argon2-lanesint320Argon2id lanes (RFC 9106 hash-shape parameter). When 0 (default), uses --auth-argon2-parallelism. Set explicitly to decouple lanes from threads for reproducibility across thread-count changes (audit #9607).
--auth-argon2-time-costint323Argon2id iterations for new hashes.
--auth-bcrypt-max-verify-costint3216Maximum bcrypt cost Verify() will honour. Higher costs are refused as a DoS amplifier guard. Range [4, 31]. Raise temporarily to authenticate imported high-cost hashes (e.g. Memgraph migration with cost=17/18 admin), then force-reset the affected users and lower it back (audit #9603).
--plugin-allow-cross-tier-compliancestring""Comma-separated tier-pair (only valid value today is federal,dod) permitting cross-tier plugin loads when the operator carries BOTH compliance regimes. Default empty: federal/dod licenses load matching-tier plugins only, preserving STIG/IL5 vs FedRAMP-moderate distinction. Rank-based downward inclusion (federal ⊇ enterprise ⊇ community) still applies (audit #9614).
--auth-lockout-thresholdint325Consecutive failed login attempts before exponential-backoff lockout kicks in.
--auth-lockout-max-delay-secint3260Cap on the exponential-backoff delay between rate-limited authentication attempts.
--auth-lockout-table-max-sizeint321000Maximum usernames tracked for failed-login rate limiting before LRU eviction.
--auth-module-mappingsstring""Map auth schemes to external modules: "<scheme>:<path>;…".
--auth-module-timeout-msint3210000Timeout in milliseconds when waiting for an external auth module response.
--auth-password-permit-nullboolfalseAllow null / empty passwords. Not recommended.
--auth-password-strength-regexstring".+"Regex the entire password must match.
--auth-reject-unsalted-sha256boolfalseRefuse to authenticate accounts whose stored hash is unsalted SHA-256, forcing a password reset. Narrower than --auth-reject-sha256-no-stretch; retained as defence-in-depth gate for operators who explicitly opt into the broader migration window.
--auth-reject-sha256-no-stretchbooltrueDefault true in v5+. SHA-256 has no key stretching and is GPU-brute-forceable at ~1B/s, so xrayGraphDB refuses ANY SHA-256 hash (salted or not). To migrate a v4.x deploy with existing SHA-256 users: run CALL xg.security_list_weak_hash_accounts() to scope the affected users, then set this flag to false for one rotation window. Each successful SHA-256 verify rotates the stored hash inline to bcrypt; once enumeration returns zero rows, set the flag back to true (audit #9604).
--auth-allow-legacy-bcrypt-truncationboolfalseWhen true, bcrypt verify retries with the supplied password truncated to 72 bytes if the strict path fails. This is the audit-#7620 backward-compat path for pre-#7009 accounts. Default false closes the audit-#9602 timing oracle (a doubled verify latency that fingerprints pre-#7009 accounts). Set true ONLY during an upgrade rotation window — xrayGraphDB rotates any matched legacy hash inline so the oracle is paid once per account, then set the flag back to false.
--auth-tenant-migration-targetstring""Target tenant for the legacy empty-tenant_id user migration. Set to the org's default tenant when migrating.
--auth-user-or-role-name-regexstring"[a-zA-Z0-9_.+-@]+"Regex every username and role name must match.
--init-admin-userstring""Bootstrap admin username. Creates the first user at startup if no users exist.
--init-admin-passwordstring""Bootstrap admin password. Required when --init-admin-user is set.
--init-admin-tenantstring"__system__"Tenant id stamped on the bootstrap admin user. Empty or default is rejected (creates a chicken-and-egg lockout).
--bulk-trust-tenant-id-from-usersstring""Comma-separated list of usernames whose BULK_INSERT_NODES / BULK_UPSERT_NODES requests are trusted to set the tenantId property explicitly via prop_names. For internal-pipeline service accounts that connect as one shared DB user but write rows for many tenants (e.g. xray-vision Rust pipeline). Other users still hit the standard reject. String type-check still applies. Empty default = guard applies to everyone.
--init-data-filestring""Path to a Cypher script run after the server starts (data seed).
--init-filestring""Path to a Cypher script run before the server starts (users / schema bootstrap).
--encryption-modestring"disabled"Per-tenant encryption mode: disabled, optional, required.
--license-acknowledge-saved [prod]boolfalseOperator confirmation that the license JSON was saved externally. Required to suppress the journal dump on systemd / container deploys.
--license-filestring""Path to the xrayGraphDB A+ license file.
--validate-licensestring""Validate a license file or inline JSON / JWS without starting the server.
--lbac-legacy-allow-emptyboolfalsePre-LBAC backward-compat: when true, a user with no LBAC entries gets implicit ALLOW on every label and edge type. Default false (secure-by-default DENY).
--repairstring""Repair issues found by --verify-integrity. Value is all or comma-separated subsystems. Requires --auth-token.
--verify-integritystring""Run integrity checks. Value is all or comma-separated auth,snapshot,orphans. Requires --auth-token.

Logging & Diagnostics

xrayGraphDB also inherits the standard glog flags — the most important is --also-log-to-stderr=true (used by the prod systemd unit) which mirrors the daemon log to stderr so journalctl -u xraygraphdb shows the full output. --alsologtostderr is the underscore-free alias.

FlagTypeDefaultDescription
--log-level [prod]string"WARNING"Minimum log level: TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL.
--log-filestring""Path to the daemon log file. Empty defers to journald / stderr.
--log-formatstring"text"Log format: text (human-readable) or json (one JSON object per line for log aggregators).
--log-retention-daysuint6435Days a daily log file is preserved before rotation deletes it.
--logger-typestring"sync"Synchronous or asynchronous logger: sync, async.
--core-dump-directorystring"/var/lib/xraygraphdb/cores"Directory for core-dump files (mode 0700). Empty disables core-dump management.
--debug-query-plansboolfalseDEBUG-log every candidate query plan considered by the planner.
--query-log-directorystring""Directory where query logs are stored. Empty disables.
--nuraft-log-filestring""File where NuRaft (cluster Raft) logs are written.
--shutdown-watchdog-secondsint32600Deadline for graceful shutdown. If exceeded the daemon SIGABRTs and writes a per-thread diagnostic dump under <data-directory>/shutdown-diagnostics/. Must be less than systemd TimeoutStopSec.
--memory-warning-thresholduint641024Free-RAM threshold (MiB) below which the daemon emits a warning. 0 disables.
--telemetry-enabledboolfalseEnable telemetry reporting (CPU / memory / vertex-edge counts).
--support-emailstring"sme@emtailabs.com"Support contact email for licensing, enterprise features, and production support.
--timezonestring"UTC"Instance timezone in IANA format.
--helpboolfalsePrint help on every flag and exit.
--help-xmlboolfalsePrint help in XML form and exit.
--versionboolfalsePrint version and build info, then exit.

Audit Logging

FlagTypeDefaultDescription
--audit-enabledboolfalseEnable audit logging (requires enterprise license).
--audit-buffer-sizeint32100000Maximum entries held in the audit-log ring buffer.
--audit-buffer-flush-interval-msint32200Audit buffer flush interval in milliseconds.
--audit-overflow-policystring"drop"Behavior when the audit buffer is full: drop (warn), block (compliance mode), fail (fail the query).
--audit-retention-daysint3290Days to retain rotated audit log files before automatic deletion. 0 disables auto-cleanup.
--audit-rotation-hour-utcint320Hour of day (UTC, 0-23) at which the audit log rotates.

Replication & Coordination

FlagTypeDefaultDescription
--replication-modestring"standalone"Replication role: standalone, primary, replica.
--replication-bind-addressstring"0.0.0.0"Bind address for the replication server spun up after a MAIN to REPLICA demotion.
--replication-portint327690Port for incoming replication connections (replica mode).
--replication-primarystring""Primary server address (host:port) when running as a replica.
--replication-replicasstring""Comma-separated replica addresses when running as primary.
--replication-replica-check-frequency-secuint641Interval in seconds between replica health-check pings. 0 disables.
--replication-restore-state-on-startupbooltrueRe-attach previously registered replicas at startup.
--accept-wipeboolfalseAcknowledge that joining a replica wipes local data. Required when starting in replica mode against a non-empty data directory.
--coordinator-idint320Unique ID of the Raft server.
--coordinator-portint320Port the Raft coordinator listens on.
--coordinator-hostnamestring""Hostname returned in SHOW INSTANCES.
--management-portint320Port the coordinator management server listens on.
--instance-down-timeout-secuint3215Seconds after which an instance is considered down by the coordinator.
--instance-health-check-frequency-secuint325Seconds between coordinator health-check pings to instances.

Plugin / Module System

FlagTypeDefaultDescription
--query-modules-directorystring""Directory (or comma-separated directories) where custom query modules are loaded from.
--query-callable-mappings-pathstring""Path to a JSON file of aliasprocedure mappings, used to alias missing procedures to existing ones.
--kafka-bootstrap-serversstring""Default Kafka broker list (comma-separated host:port) for stream sources.
--kms-providerstring"local"KMS provider for per-tenant encryption: local or aws. Requires --encryption-mode.
--stream-transaction-conflict-retriesuint3230Times a stream transformation retries on transaction conflict before giving up.
--stream-transaction-retry-intervaluint32500Stream transformation retry interval in milliseconds on transaction conflict.

Query & Planner

FlagTypeDefaultDescription
--query-execution-timeout-sec [prod]double600Maximum query execution time in seconds. 0 disables the limit.
--query-cost-plannerbooltrueUse the cost-estimating query planner.
--query-max-plansuint641000Maximum number of candidate plans the planner considers per query.
--query-plan-cache-max-sizeint321000Maximum number of compiled query plans cached.
--cartesian-product-enabledbooltrueAllow cartesian product expansion in the planner.
--isolation-levelstring"SNAPSHOT_ISOLATION"Default transaction isolation: READ_COMMITTED, READ_UNCOMMITTED, SNAPSHOT_ISOLATION.
--hops-limit-partial-resultsbooltrueReturn partial results when the hops limit is reached.
--enable-index-only-scansbooltrueRead property values directly from the index without touching the vertex store when possible.
--enable-simd-filterbooltrueEnable AVX2-accelerated property filtering with scalar fallback.
--simd-batch-sizeint6432Rows batched per SIMD filter operation. Must be at least 4 (AVX2 width).
--parallel-scan-thresholdint640Vertex-count threshold for auto-selecting parallel scan. 0 disables; -1 forces parallel everywhere.
--disable-gpuboolfalseDisable GPU acceleration even if a CUDA device is detected. Analytics fall back to CPU.
--strict-unbound-identifierboolfalseThrow QueryRuntimeException on unmapped Identifier::symbol_pos_ instead of returning NULL.
--memory-limituint640Total memory limit in MiB. 0 = auto (100% of physical RAM if swap is enabled, 90% otherwise). Upper bound is roughly 1 PiB.
--experimental-configstring""Experimental features configuration (JSON object).
--experimental-enabledstring""Comma-separated experimental features to enable (e.g. planner-v2).
--v4-compiled-engineboolfalseEnable the v4 JIT-compiled query engine for supported read-only queries.
--v4-engine-modestring"auto"v4 routing mode: off, auto (route supported queries), forced (route everything).
--v4-shadow-compareboolfalseRun the compiled engine in shadow mode alongside Volcano and log mismatches.
--cypher-variable-expand-allow-bfs-dedupboolfalseRoute Cypher MATCH (a)-[*lo..hi]->(b) variable-length expansions through the BFS-with-bitset-dedup operator instead of the Volcano DFS path. Deliberate semantic change: BFS dedup emits one row per reachable destination vertex (regardless of how many distinct walks lead to it) and the bound edge-list symbol is empty / lossy. Use only for reachability-style queries (e.g. RETURN DISTINCT b.id). On hub-heavy graphs (LDBC SF1 KNOWS, 1,532-degree Person vertices) the Volcano path explodes combinatorically at depth 4+; this flag is the operator-controlled escape hatch. Default false preserves Cypher's path-enumeration contract.
--xray-analytics-edges-per-threadint6450000000CSR analytics thread heuristic: one thread per N edges. Lower for higher-bandwidth hardware.
--xray-analytics-max-threadsint3232Absolute thread ceiling for CSR analytics. 0 = no cap.
--xray-analytics-min-threadsint324Floor on analytics threads for small graphs.
--xray-clustering-coefficient-sample-capint64200Maximum neighbours sampled per vertex during clustering-coefficient computation before falling back to Fisher-Yates with bias correction. 0 disables sampling.
--xray-graph-stats-cache-ttl-secint32300TTL in seconds for the graph-statistics summary result cache. Successive calls within this window return the cached result instantly. Cache key is (tenant id, label filter, vertex count, edge count); writes that change either total invalidate naturally. 0 disables the cache (recompute every call). Each call emits three additional metrics — cache_hit, cache_age_ms, compute_time_ms — alongside the existing time_ms so monitoring can distinguish hit from miss without query-side timing.

AWS Integration

FlagTypeDefaultDescription
--aws-access-keystring""AWS access key for the AWS integration.
--aws-secret-keystring""AWS secret key for the AWS integration.
--aws-regionstring""AWS region for the AWS integration.
--aws-endpoint-urlstring""Override AWS endpoint URL (custom regions, S3-compatible stores).

SWIM (FAA Flight Data Ingestion)

FlagTypeDefaultDescription
--swim-enabled [prod]boolfalseEnable the native FAA SWIM data consumer (Solace PubSub+). Requires --swim-user and --swim-pass.
--swim-user [prod]string""SWIM SCDS username (e.g. sme.emtailabs.com).
--swim-pass [prod]string""SWIM SCDS password. Prefix with @ to read from a file. Inline values are visible in /proc/cmdline.
--swim-queues [prod]string""Comma-separated queue specs NAME:BROKER_URL:VPN:QUEUE_NAME,….
--swim-databasestring""Database name SWIM ingestion writes Aircraft / flight data into. Empty targets a database called swim.
--swim-ttl-secondsuint647200TTL for SWIM aircraft nodes. Older nodes with no updates are deleted.
--swim-cleanup-interval-secondsuint64300How often the SWIM TTL cleanup thread runs.

Operational

FlagTypeDefaultDescription
--daemonizeboolfalseRun as a Unix daemon (double-fork, detach from the terminal).
--flag-filestring"/etc/xraygraphdb/xraygraphdb.conf"Load flags from a file (one flag per line, gflags syntax).
--allow-load-csvboolfalseAllow the LOAD CSV Cypher clause. Off by default for security.
--load-csv-allowed-pathsstring""Comma-separated directory prefixes LOAD CSV may read from. Empty rejects all local paths.
--file-download-conn-timeout-secuint6410Timeout for establishing a connection to a remote server when downloading a file.
--force-recovery-past-corruptionboolfalseDeprecated. WAL recovery always skips past corrupt entries; transactions between a corrupt entry and the next valid entry are silently lost.

First Connection: Python

xrayGraphDB is compatible with the official Neo4j Python driver. Install it with pip and connect over the Bolt protocol.

Shell
pip install neo4j
Python
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("admin", "<your-password>")
)

with driver.session() as session:
    # Create a node
    session.run(
        "CREATE (n:Person {name: $name, age: $age})",
        name="Alice", age=30
    )

    # Read it back
    result = session.run(
        "MATCH (n:Person {name: $name}) RETURN n.name, n.age",
        name="Alice"
    )
    record = result.single()
    print(record["n.name"], record["n.age"])
    # Output: Alice 30

driver.close()

First Connection: JavaScript

Shell
npm install neo4j-driver
JavaScript
const neo4j = require('neo4j-driver');

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('admin', '<your-password>')
);

const session = driver.session();

try {
  // Create a node
  await session.run(
    'CREATE (n:Person {name: $name, age: $age})',
    { name: 'Bob', age: 25 }
  );

  // Read it back
  const result = await session.run(
    'MATCH (n:Person {name: $name}) RETURN n',
    { name: 'Bob' }
  );

  console.log(result.records[0].get('n').properties);
} finally {
  await session.close();
  await driver.close();
}

First Connection: Java

Add the Neo4j Java driver to your Maven or Gradle project.

XML
<!-- Maven dependency -->
<dependency>
  <groupId>org.neo4j.driver</groupId>
  <artifactId>neo4j-java-driver</artifactId>
  <version>5.x</version>
</dependency>
Java
import org.neo4j.driver.*;

public class XRayExample {
    public static void main(String[] args) {
        var driver = GraphDatabase.driver(
            "bolt://localhost:7687",
            AuthTokens.basic("admin", "<your-password>")
        );

        try (var session = driver.session()) {
            session.run(
                "CREATE (n:Person {name: $name})",
                Values.parameters("name", "Carol")
            );

            var result = session.run(
                "MATCH (n:Person) RETURN n.name"
            );

            while (result.hasNext()) {
                System.out.println(result.next().get("n.name").asString());
            }
        }
        driver.close();
    }
}

First Connection: Go

Shell
go get github.com/neo4j/neo4j-go-driver/v5
Go
package main

import (
    "context"
    "fmt"
    "github.com/neo4j/neo4j-go-driver/v5/neo4j"
)

func main() {
    ctx := context.Background()

    driver, err := neo4j.NewDriverWithContext(
        "bolt://localhost:7687",
        neo4j.BasicAuth("admin", "<your-password>", ""),
    )
    if err != nil { panic(err) }
    defer driver.Close(ctx)

    session := driver.NewSession(ctx, neo4j.SessionConfig{})
    defer session.Close(ctx)

    _, err = session.Run(ctx,
        "CREATE (n:Person {name: $name})",
        map[string]any{"name": "Dave"},
    )
    if err != nil { panic(err) }

    result, err := session.Run(ctx,
        "MATCH (n:Person) RETURN n.name", nil,
    )
    if err != nil { panic(err) }

    for result.Next(ctx) {
        fmt.Println(result.Record().Values[0])
    }
}

First Connection: .NET

Shell
dotnet add package Neo4j.Driver
C#
using Neo4j.Driver;

var driver = GraphDatabase.Driver(
    "bolt://localhost:7687",
    AuthTokens.Basic("admin", "<your-password>")
);

await using var session = driver.AsyncSession();

await session.RunAsync(
    "CREATE (n:Person {name: $name})",
    new { name = "Eve" }
);

var result = await session.RunAsync(
    "MATCH (n:Person) RETURN n.name"
);

var records = await result.ToListAsync();
foreach (var record in records)
{
    Console.WriteLine(record["n.name"].As<string>());
}

await driver.DisposeAsync();

Quick Start Tutorial

This tutorial walks through creating a small graph, querying it, and cleaning up. It assumes you have a running xrayGraphDB instance and a Python driver installed.

Cypher
// Step 1: Create some nodes
CREATE (alice:Person {name: "Alice", age: 30})
CREATE (bob:Person {name: "Bob", age: 25})
CREATE (carol:Person {name: "Carol", age: 35})
CREATE (proj:Project {name: "xrayGraphDB"})
RETURN alice, bob, carol, proj;

// Step 2: Create relationships
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS]->(b)
RETURN a, b;

MATCH (a:Person {name: "Alice"}), (p:Project {name: "xrayGraphDB"})
CREATE (a)-[:WORKS_ON {role: "lead"}]->(p)
RETURN a, p;

MATCH (b:Person {name: "Bob"}), (p:Project {name: "xrayGraphDB"})
CREATE (b)-[:WORKS_ON {role: "contributor"}]->(p)
RETURN b, p;

// Step 3: Query the graph
MATCH (p:Person)-[:WORKS_ON]->(proj:Project)
RETURN p.name, proj.name;

// Step 4: Update properties
MATCH (a:Person {name: "Alice"})
SET a.email = "alice@example.com"
RETURN a;

// Step 5: Clean up
MATCH (n) DETACH DELETE n;
Tip: Use parameterized queries in production code. Inline values in Cypher strings are shown here for readability but should be replaced with parameters ($name, $age, etc.) to prevent injection and improve plan cache hit rates.

MATCH

The MATCH clause is the primary read operation. It describes a pattern to find in the graph and binds matching subgraphs to variables.

Cypher
// Match all nodes with a specific label
MATCH (n:Person)
RETURN n;

// Match a relationship pattern
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a.name, b.name;

// Match with relationship variable
MATCH (a:Person)-[r:WORKS_ON]->(p:Project)
RETURN a.name, r.role, p.name;

// Match any direction
MATCH (a:Person)-[:KNOWS]-(b:Person)
RETURN a.name, b.name;

// Match with multiple labels
MATCH (n:Person:Employee)
RETURN n;

Patterns can include any combination of nodes, relationships, and directions. Nodes are enclosed in parentheses (), relationships in square brackets [], and direction is indicated by arrows -> or <-.

WHERE

The WHERE clause filters results from MATCH patterns. It supports comparison operators, boolean logic, string matching, list predicates, and null checks.

Cypher
// Comparison operators
MATCH (n:Person)
WHERE n.age > 25 AND n.age <= 40
RETURN n.name, n.age;

// String matching
MATCH (n:Person)
WHERE n.name STARTS WITH "A"
RETURN n;

// Regular expression
MATCH (n:Person)
WHERE n.email =~ ".*@example\\.com"
RETURN n;

// Null checks
MATCH (n:Person)
WHERE n.email IS NOT NULL
RETURN n;

// IN list
MATCH (n:Person)
WHERE n.name IN ["Alice", "Bob", "Carol"]
RETURN n;

// Pattern predicates (exists)
MATCH (n:Person)
WHERE (n)-[:WORKS_ON]->()
RETURN n.name;
OperatorDescriptionExample
=Equaln.age = 30
<>Not equaln.name <> "Alice"
<, >, <=, >=Comparisonn.age >= 18
AND, OR, NOTBoolean logicn.age > 20 AND n.active = true
INList membershipn.status IN ["active", "pending"]
STARTS WITHString prefixn.name STARTS WITH "Al"
ENDS WITHString suffixn.name ENDS WITH "ice"
CONTAINSString containsn.name CONTAINS "li"
=~Regex matchn.email =~ ".*@example\\.com"
IS NULLNull checkn.deleted IS NULL
IS NOT NULLNot nulln.email IS NOT NULL

RETURN

RETURN specifies which values to include in the result set. You can return nodes, relationships, properties, expressions, or aggregations.

Cypher
// Return specific properties
MATCH (n:Person)
RETURN n.name, n.age;

// Alias with AS
MATCH (n:Person)
RETURN n.name AS person_name, n.age AS years;

// Return all properties as a map
MATCH (n:Person)
RETURN properties(n);

// Return distinct values
MATCH (n:Person)-[:WORKS_ON]->(p:Project)
RETURN DISTINCT p.name;

// Expressions in RETURN
MATCH (n:Person)
RETURN n.name, n.age * 12 AS age_in_months;

ORDER BY / LIMIT / SKIP

Control the ordering and pagination of results.

Cypher
// Order by a property
MATCH (n:Person)
RETURN n.name, n.age
ORDER BY n.age DESC;

// Limit results
MATCH (n:Person)
RETURN n.name
ORDER BY n.name
LIMIT 10;

// Pagination with SKIP and LIMIT
MATCH (n:Person)
RETURN n.name
ORDER BY n.name
SKIP 20
LIMIT 10;

// Multiple sort keys
MATCH (n:Person)
RETURN n.name, n.age
ORDER BY n.age DESC, n.name ASC;

WITH

WITH acts as a pipeline separator, allowing you to chain query stages together. Variables not listed in WITH are not available in subsequent clauses.

Cypher
// Filter intermediate results
MATCH (p:Person)-[:WORKS_ON]->(proj:Project)
WITH proj, count(p) AS team_size
WHERE team_size > 3
RETURN proj.name, team_size
ORDER BY team_size DESC;

// Chain queries
MATCH (n:Person)
WITH n
ORDER BY n.age DESC
LIMIT 5
MATCH (n)-[:KNOWS]->(friend)
RETURN n.name, collect(friend.name) AS friends;

UNWIND

UNWIND expands a list into individual rows. Useful for bulk operations and working with list parameters.

Cypher
// Expand a list
UNWIND [1, 2, 3] AS x
RETURN x;

// Bulk create from parameters
UNWIND $people AS person
CREATE (n:Person {name: person.name, age: person.age});

// Combine with MATCH
UNWIND ["Alice", "Bob"] AS name
MATCH (n:Person {name: name})
RETURN n;

OPTIONAL MATCH

OPTIONAL MATCH works like MATCH but returns null for missing parts of the pattern instead of excluding the row entirely. Equivalent to a left outer join.

Cypher
// Return all people, even those without projects
MATCH (p:Person)
OPTIONAL MATCH (p)-[:WORKS_ON]->(proj:Project)
RETURN p.name, proj.name;

CREATE

CREATE adds new nodes and relationships to the graph. It always creates new elements (use MERGE to avoid duplicates).

Cypher
// Create a single node
CREATE (n:Person {name: "Frank", age: 28})
RETURN n;

// Create multiple nodes
CREATE (a:Person {name: "Grace"}),
       (b:Person {name: "Hank"});

// Create a node with multiple labels
CREATE (n:Person:Developer {name: "Ivy"});

// Create a relationship between existing nodes
MATCH (a:Person {name: "Grace"}), (b:Person {name: "Hank"})
CREATE (a)-[:KNOWS {since: 2024}]->(b)
RETURN a, b;

// Create a full path in one statement
CREATE (a:Module {name: "auth"})-[:IMPORTS]->(b:Module {name: "crypto"})
RETURN a, b;

MERGE

MERGE ensures a pattern exists in the graph. If the pattern is found, it is bound. If not found, it is created. Use ON CREATE SET and ON MATCH SET to conditionally set properties.

Cypher
// Merge a node (create if not exists)
MERGE (n:Person {name: "Alice"})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.lastSeen = timestamp()
RETURN n;

// Merge a relationship
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
MERGE (a)-[r:KNOWS]->(b)
ON CREATE SET r.since = 2024
RETURN r;
Note: MERGE matches the entire pattern. If you merge on (a)-[:KNOWS]->(b) and the relationship does not exist, it creates only the relationship, not the nodes (they must already be bound by a preceding MATCH or MERGE).

SET

SET updates properties on nodes and relationships, or adds labels to nodes.

Cypher
// Set a property
MATCH (n:Person {name: "Alice"})
SET n.age = 31
RETURN n;

// Set multiple properties
MATCH (n:Person {name: "Alice"})
SET n.age = 31, n.email = "alice@example.com"
RETURN n;

// Replace all properties with a map
MATCH (n:Person {name: "Alice"})
SET n = {name: "Alice", age: 31, active: true}
RETURN n;

// Merge properties (add without removing existing)
MATCH (n:Person {name: "Alice"})
SET n += {department: "engineering"}
RETURN n;

// Add a label
MATCH (n:Person {name: "Alice"})
SET n:Employee
RETURN n;

REMOVE

REMOVE deletes properties from nodes/relationships and removes labels from nodes.

Cypher
// Remove a property
MATCH (n:Person {name: "Alice"})
REMOVE n.email
RETURN n;

// Remove a label
MATCH (n:Person:Employee {name: "Alice"})
REMOVE n:Employee
RETURN labels(n);

DELETE / DETACH DELETE

DELETE removes nodes and relationships. A node cannot be deleted if it still has relationships. Use DETACH DELETE to delete a node and all its relationships in one operation.

Cypher
// Delete a relationship
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE a.name = "Alice" AND b.name = "Bob"
DELETE r;

// Delete a node (must have no relationships)
MATCH (n:Person {name: "Frank"})
DELETE n;

// Detach delete (node + all relationships)
MATCH (n:Person {name: "Alice"})
DETACH DELETE n;

// Delete all nodes and relationships in the database
MATCH (n) DETACH DELETE n;
Warning: MATCH (n) DETACH DELETE n removes the entire graph. There is no undo. Make a snapshot before running destructive queries on production data.

Variable-length Paths

Variable-length path patterns match paths of varying depth using the * syntax inside relationship brackets.

Cypher
// Paths of exactly 2 hops
MATCH (a:Person)-[:KNOWS*2]->(c:Person)
RETURN a.name, c.name;

// Paths of 1 to 5 hops
MATCH (a:Person)-[:KNOWS*1..5]->(c:Person)
RETURN a.name, c.name;

// Paths of any length (use with caution)
MATCH (a:Person {name: "Alice"})-[:KNOWS*]->(c:Person)
RETURN DISTINCT c.name;

// Capture the path
MATCH path = (a:Person {name: "Alice"})-[:KNOWS*1..3]->(c:Person)
RETURN path, length(path) AS hops;
Note: Unbounded variable-length paths (* without limits) can be expensive on large graphs. Always set an upper bound when possible.

shortestPath / allShortestPaths

Find the shortest path(s) between two nodes.

Cypher
// Find one shortest path
MATCH (a:Person {name: "Alice"}),
      (b:Person {name: "Eve"})
MATCH p = shortestPath((a)-[*..10]-(b))
RETURN p, length(p) AS hops;

// Find all shortest paths (same length)
MATCH (a:Person {name: "Alice"}),
      (b:Person {name: "Eve"})
MATCH p = allShortestPaths((a)-[*..10]-(b))
RETURN p;

// With relationship type filter
MATCH (a:Person {name: "Alice"}),
      (b:Person {name: "Eve"})
MATCH p = shortestPath((a)-[:KNOWS|WORKS_WITH*..10]-(b))
RETURN p;

BFS Traversal

Breadth-first search traversal is available for exploring graphs level by level. BFS guarantees that nodes are visited in order of increasing distance from the start node.

Cypher
// BFS with upper bound
MATCH (start:Person {name: "Alice"})
MATCH path = (start)-[:KNOWS BFS]->(target)
RETURN target.name, length(path) AS distance
ORDER BY distance;

BFS traversal is the default algorithm used by shortestPath. Use the explicit BFS syntax when you need to enumerate all reachable nodes by distance layer.

Aggregation

Aggregation functions operate on groups of rows. Non-aggregated columns in RETURN act as implicit group keys (similar to SQL GROUP BY).

Cypher
// Count
MATCH (n:Person)
RETURN count(n) AS total_people;

// Group and count
MATCH (p:Person)-[:WORKS_ON]->(proj:Project)
RETURN proj.name, count(p) AS team_size
ORDER BY team_size DESC;

// Sum, average, min, max
MATCH (n:Person)
RETURN
  sum(n.age) AS total_age,
  avg(n.age) AS avg_age,
  min(n.age) AS youngest,
  max(n.age) AS oldest;

// Collect into a list
MATCH (p:Person)-[:WORKS_ON]->(proj:Project)
RETURN proj.name, collect(p.name) AS members;

// Standard deviation and percentile
MATCH (n:Person)
RETURN
  stDev(n.age) AS std_dev,
  percentileCont(n.age, 0.5) AS median;
FunctionDescriptionExample
count(expr)Number of non-null valuescount(n)
sum(expr)Sum of numeric valuessum(n.salary)
avg(expr)Average of numeric valuesavg(n.age)
min(expr)Minimum valuemin(n.created)
max(expr)Maximum valuemax(n.score)
collect(expr)Collect values into a listcollect(n.name)
percentileCont(expr, p)Continuous percentile (interpolated)percentileCont(n.age, 0.5)
percentileDisc(expr, p)Discrete percentile (nearest value)percentileDisc(n.age, 0.9)
stDev(expr)Standard deviation (sample)stDev(n.score)
stDevP(expr)Standard deviation (population)stDevP(n.score)

Indexing & Constraints

Indexes accelerate lookups by property value. Constraints enforce data integrity rules.

Cypher
// Create a label-property index
CREATE INDEX ON :Person(name);

// Neo4j-compatible named index syntax
CREATE INDEX person_name_idx
FOR (n:Person)
ON (n.name);

// Composite index
CREATE INDEX ON :Person(name, age);

// Drop an index
DROP INDEX ON :Person(name);

// Unique constraint
CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE;

// Existence constraint
CREATE CONSTRAINT ON (n:Person) ASSERT EXISTS (n.name);

// Show index info
SHOW INDEX INFO;

// List all constraints
SHOW CONSTRAINT INFO;
Tip: Always create indexes on properties used in WHERE clauses and MATCH patterns. Without an index, the engine must scan all nodes of a given label.

Transactions

xrayGraphDB supports both auto-commit transactions (single query) and explicit transactions (multi-query).

Auto-commit Transactions

Every query sent via session.run() runs in its own auto-commit transaction. If the query succeeds, it is committed. If it fails, it is rolled back.

Explicit Transactions

Use explicit transactions when you need to execute multiple queries atomically.

Python
with driver.session() as session:
    tx = session.begin_transaction()
    try:
        tx.run("CREATE (a:Account {id: $id, balance: $bal})",
               id="A001", bal=1000)
        tx.run("CREATE (a:Account {id: $id, balance: $bal})",
               id="A002", bal=500)
        tx.commit()
    except Exception:
        tx.rollback()
        raise
Cypher
// Explicit transaction commands (Bolt protocol)
BEGIN;
CREATE (n:Temp {data: "test"});
COMMIT;

// Or rollback
BEGIN;
CREATE (n:Temp {data: "test"});
ROLLBACK;
468 Built-in Functions and Procedures. xrayGraphDB ships with 442 scalar / aggregate functions, 14 procedures, 4 GFQL operators, and 8 reference info-pages across 23 categories — all queryable at runtime via CALL xg.builtin_functions(). The XRay-Vision plugin adds 68 more (queryable via CALL xg.xray_vision_builtin_functions()). Every entry below is available in the Community edition at no cost.

Aggregation Functions (7)

Aggregation functions compute single values from collections. Most work with pre-collected lists via WITH/collect(). The topK aggregation also supports streaming evaluation.

group_concat

group_concat(list, delimiter?) -> string

Concatenates list elements into a string with optional delimiter. If no delimiter is provided, elements are concatenated without separation. Null values are skipped.

Cypher
MATCH (n:Person)
WITH collect(n.name) AS names
RETURN group_concat(names, ", ") AS full_list

Time Complexity: O(N) where N is the list length.

See Also: collect(), split()

histogram

histogram(values, bins) -> list

Produces a histogram of values divided into the given number of bins. Returns a list of bin counts. The value range is automatically calculated from min/max of the input.

Cypher
MATCH (flight:Flight)
WITH collect(flight.altitude_ft) AS altitudes
RETURN histogram(altitudes, 10) AS altitude_distribution

Parameters: values must be numeric (integer or float).

Output: List contains one count per bin. Bins are equal-width.

percentilecont

percentilecont(list, percentile) -> float

Returns the linearly interpolated percentile value (continuous). For p=0.5, returns the median with interpolation. Input list is assumed unsorted and is sorted internally.

Cypher
MATCH (a:Aircraft)
WITH collect(a.speed_kt) AS speeds
RETURN percentilecont(speeds, 0.95) AS speed_95th_percentile

Parameters: percentile must be between 0.0 and 1.0.

Interpolation: Uses linear interpolation between values.

See Also: percentiledisc(), quantile()

percentiledisc

percentiledisc(list, percentile) -> number

Returns the nearest-value percentile (discrete). Unlike percentilecont, this returns an actual value from the list without interpolation. For p=0.5, returns the median (middle value).

Cypher
MATCH (s:Sensor)
WITH collect(s.temperature) AS temps
RETURN percentiledisc(temps, 0.5) AS median_temp,
       percentiledisc(temps, 0.75) AS q3_temp

Parameters: percentile must be between 0.0 and 1.0.

Return Type: Always an actual value from the input list.

quantile

quantile(list, q) -> number

Returns the q-th quantile value from a sorted list. Quantile q is expressed as a value between 0 and 1. For q=0.5, returns the median. Uses the nearest-rank method.

Cypher
MATCH (metric:Metric)
WITH collect(metric.latency_ms) AS latencies
RETURN quantile(latencies, 0.99) AS p99_latency,
       quantile(latencies, 0.90) AS p90_latency

Parameters: q must be between 0.0 and 1.0.

Method: Uses nearest-rank (returns actual value, not interpolated).

topK

topK(list, k) -> list | topK(expr, k) as aggregation

Two modes: (1) As function: applies Space-Saving approximate top-K to a pre-collected list. O(N) time, O(K) memory. (2) As aggregation: streams values through Space-Saving during query execution. Equivalent to ClickHouse topK(K)(column).

Cypher (Function Mode)
MATCH (n:Transaction)
WITH collect(n.category) AS categories
RETURN topK(categories, 5) AS top_5_categories
Cypher (Aggregation Mode)
MATCH (n:Transaction)
RETURN topK(n.category, 5) AS most_common_categories

Algorithm: Space-Saving probabilistic algorithm. Results are approximate but ranked by frequency.

Time Complexity: O(N) function mode, O(1) per value in aggregation mode.

Space Complexity: O(K) regardless of input size.

Return Type: List of the K most frequent values in order.

topKExact

topKExact(list, k) -> list

Exact histogram + partial_sort for top-K most frequent values. O(N) + O(U log K) where U=unique values. Use when approximate top-K isn't acceptable and you need guaranteed exact results.

Cypher
MATCH (event:Event)
WITH collect(event.event_type) AS event_types
RETURN topKExact(event_types, 10) AS exact_top_10_events

When to Use: When exact top-K rankings are required (e.g., compliance audits, financial reporting).

Time Complexity: O(N) to scan + O(U log K) to sort unique values.

Space Complexity: O(U) for the unique value histogram where U is unique count.

Return Type: List of K values sorted by descending frequency (exact).

See Also: topK() for approximate faster version

Aviation Functions (12)

Aviation functions provide computational support for aircraft tracking, performance analysis, and trajectory prediction. These are core to xrayGraphDB's SWIM (System Wide Information Management) integration and aviation-specific analytics.

angular_diff

angular_diff(angle_a, angle_b) -> float

Returns the shortest angular distance in degrees between two bearings (0-180). Handles wraparound at 360 degrees. Always returns the acute angle.

Cypher
MATCH (a:Aircraft)
WITH a.heading_deg AS actual_hdg, a.planned_hdg AS planned_hdg
RETURN angular_diff(actual_hdg, planned_hdg) AS heading_deviation

Range: Returns 0-180 degrees (always the shortest distance).

Use Case: Detect course deviations, verify navigation compliance.

bank_angle

bank_angle(speed_kt, turn_rate_dps) -> float

Returns the estimated bank angle in degrees from speed and turn rate. Uses standard aviation physics: bank ≈ atan(turn_rate * speed / g).

Cypher
MATCH (a:Aircraft)
RETURN bank_angle(a.ground_speed_kt, a.turn_rate_dps) AS estimated_bank_deg,
       a.altitude_ft,
       a.callsign

Parameters: speed_kt = ground speed in knots, turn_rate_dps = turn rate in degrees per second.

Physics: Derived from standard rate-of-turn formula.

Typical Values: 15-30 deg for normal turns, 40+ deg for aggressive maneuvers.

deviation_score

deviation_score(predicted, actual, tolerance, sigmoid_centers?, sigmoid_steepness?) -> float

Returns an RMS or sigmoid-based deviation score between predicted and actual values. Scores range 0-1 where 0 = perfect agreement. Optional sigmoid shape control for non-linear sensitivity.

Cypher
MATCH (flight:Flight)-[:HAS_PREDICTION]->(pred:Prediction)
RETURN deviation_score(pred.altitude_ft, flight.altitude_ft, 500) AS altitude_score,
       deviation_score(pred.heading_deg, flight.heading_deg, 15) AS heading_score

Parameters: tolerance = acceptable deviation in same units as predicted/actual.

Optional: sigmoid_centers and sigmoid_steepness for non-linear scoring.

Return Range: 0.0 (perfect) to 1.0 (completely wrong).

distance_3d

distance_3d(lat1, lon1, alt1_ft, lat2, lon2, alt2_ft) -> float

Returns the true 3D slant range in meters between two positions. Combines great-circle distance with altitude difference. Critical for separation assurance.

Cypher
MATCH (a1:Aircraft)-[:NEAR]-(a2:Aircraft)
WHERE a1.id < a2.id
RETURN distance_3d(a1.lat, a1.lon, a1.altitude_ft,
                   a2.lat, a2.lon, a2.altitude_ft) AS separation_m

Output Unit: Meters (always).

Horizontal: Uses haversine for lat/lon distance.

Vertical: Altitude difference is converted from feet to meters for 3D calc.

Safety Critical: 1000 ft = ~305 m vertical separation threshold in many airspaces.

envelope_score

envelope_score(value, mean, stddev) -> float

Returns a 0-1 score of how many standard deviations a value is from the mean. Normalized to 0=mean, 1=3*stddev away (outlier). Used for anomaly detection.

Cypher
MATCH (flight:Flight)
WITH AVG(flight.ground_speed_kt) AS mean_speed,
     STDEV(flight.ground_speed_kt) AS stddev_speed
MATCH (f:Flight)
RETURN f.callsign,
       envelope_score(f.ground_speed_kt, mean_speed, stddev_speed) AS anomaly_score

Interpretation: 0.0 = at mean, 0.33 = 1 stddev away, 1.0 = 3+ stddev (extreme outlier).

Use Case: Flag unusual aircraft behavior (excessive speed, altitude changes).

heading_rate_of_change

heading_rate_of_change(headings_list, timestamps_list) -> float

Returns the average heading rate of change (degrees per second) handling angular wraparound correctly. Computes derivative of heading with proper circular arithmetic.

Cypher
MATCH (a:Aircraft)
WITH collect(a.heading_deg) AS headings,
     collect(a.timestamp) AS times
RETURN heading_rate_of_change(headings, times) AS turn_rate_dps

Angular Wraparound: Correctly handles transitions like 350° -> 10° (20° turn, not 340°).

Return Unit: Degrees per second.

Typical Values: 0-5 dps for normal flight, 10+ dps for aggressive turns.

project_3d

project_3d(lat, lon, alt_ft, heading, speed_kt, climb_fpm, turn_dps, accel_ktps, duration_sec) -> [lat, lon, alt, hdg, spd]

Projects a moving object forward along a 3D arc for the given duration. Returns predicted position, heading, and speed. Core for trajectory prediction and conflict detection.

Cypher
MATCH (a:Aircraft)
WITH a,
     project_3d(a.lat, a.lon, a.altitude_ft,
                a.heading_deg, a.ground_speed_kt, a.climb_rate_fpm,
                a.turn_rate_dps, a.acceleration_ktps, 300) AS projected
RETURN a.callsign,
       projected[0] AS predicted_lat,
       projected[1] AS predicted_lon,
       projected[2] AS predicted_alt_ft,
       projected[3] AS predicted_heading,
       projected[4] AS predicted_speed_kt

Parameters (in order):

  • lat, lon: Current position (WGS84)
  • alt_ft: Current altitude in feet
  • heading: Current heading in degrees (0-359)
  • speed_kt: Current ground speed in knots
  • climb_fpm: Climb rate in feet per minute
  • turn_dps: Turn rate in degrees per second
  • accel_ktps: Acceleration in knots per second
  • duration_sec: Projection time in seconds

Output: [lat, lon, alt_ft, heading_deg, speed_kt]

Physics: Integrates position + velocity + turn arc over duration.

rate_of_change

rate_of_change(values_list, timestamps_list) -> float

Returns the average rate of change (derivative) from paired value/time lists. Uses linear regression over the time series for robust slope estimation.

Cypher
MATCH (a:Aircraft)
WITH collect(a.altitude_ft) AS altitudes,
     collect(a.timestamp) AS times
RETURN rate_of_change(altitudes, times) AS climb_rate_fpm

Calculation: Least-squares linear regression slope.

Unit: (units of values) per (unit of timestamps).

Example: If altitude in feet and time in seconds, result is feet/second (~101 ft/min per ft/s).

signed_angular_diff

signed_angular_diff(angle_a, angle_b) -> float

Returns the signed shortest angular distance (-180 to +180) between two bearings. Positive = clockwise, negative = counter-clockwise.

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       signed_angular_diff(a.heading_deg, a.planned_heading) AS turn_required_deg

Return Range: -180 to +180 degrees.

Sign Convention: Positive = turn right (clockwise), negative = turn left (counter-clockwise).

Example: signed_angular_diff(10, 350) = -20 (turn left 20°)

speed_from_components

speed_from_components(vN_mps, vE_mps, vUp_mps) -> [gs_kt, vs_fpm, total_kt]

Converts N/E/Up velocity components to ground speed, vertical speed, and total speed. Inverse of velocity_3d().

Cypher
MATCH (a:Aircraft)
WITH a,
     speed_from_components(a.velocity_north_mps, a.velocity_east_mps, a.velocity_up_mps) AS speeds
RETURN a.callsign,
       speeds[0] AS ground_speed_kt,
       speeds[1] AS vertical_speed_fpm,
       speeds[2] AS total_speed_kt

Input Units: All velocity components in meters per second (m/s).

Output: [ground_speed_kt, vertical_speed_fpm, total_speed_kt]

Conversion Factors: 1 m/s = 1.944 knots, 1 m/s = 196.85 feet/minute

turn_radius

turn_radius(speed_kt, turn_rate_dps) -> float

Returns the turn radius in meters from speed and turn rate. Uses standard aviation formula: radius = speed / turn_rate (with unit conversions).

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       a.ground_speed_kt,
       a.turn_rate_dps,
       turn_radius(a.ground_speed_kt, a.turn_rate_dps) AS radius_m

Formula: r = (speed_kt * 0.5144) / (turn_rate_dps * π/180)

Output Unit: Meters.

Typical Values: 500 m for 5 kt turn at 60 kt, 2000 m at 120 kt.

velocity_3d

velocity_3d(ground_speed_kt, heading_deg, climb_rate_fpm) -> [vN, vE, vUp]

Decomposes speed and heading into North/East/Up velocity components in m/s. Inverse of speed_from_components().

Cypher
MATCH (a:Aircraft)
WITH a,
     velocity_3d(a.ground_speed_kt, a.heading_deg, a.climb_rate_fpm) AS velocity_components
RETURN a.callsign,
       velocity_components[0] AS velocity_north_mps,
       velocity_components[1] AS velocity_east_mps,
       velocity_components[2] AS velocity_up_mps

Input Units: ground_speed_kt = knots, heading_deg = degrees (0-359), climb_rate_fpm = feet per minute.

Output Units: All components in meters per second (m/s).

Calculation: Decomposes horizontal (heading) into N/E, vertical into Up.

Bitwise Functions (7)

Bitwise functions operate on integer representations at the bit level. Useful for flags, masks, permission systems, and low-level data encoding.

bit_and

bit_and(a, b) -> integer

Returns the bitwise AND of two integers. Each bit position in the result is 1 only if both input bits at that position are 1.

Cypher
MATCH (u:User)
WITH u.permissions AS perms,
     7 AS ADMIN_MASK  // Binary: 0111
RETURN u.username,
       bit_and(perms, ADMIN_MASK) AS admin_flags

Truth Table: 0&0=0, 0&1=0, 1&0=0, 1&1=1

Use Case: Extract subset of flags (e.g., check if specific bit is set).

Example: bit_and(15, 12) = bit_and(0b1111, 0b1100) = 0b1100 = 12

bit_not

bit_not(x) -> integer

Returns the bitwise NOT (complement) of an integer. Flips all bits: 0 becomes 1, 1 becomes 0. For signed integers, this is equivalent to -(x+1).

Cypher
MATCH (f:Flag)
RETURN f.id,
       f.status_bits,
       bit_not(f.status_bits) AS inverted_bits

Note: In two's complement (standard on most systems), bit_not(x) = -(x+1).

Example: bit_not(5) = bit_not(0b0101) = ...11111010 = -6 (in 64-bit two's complement)

bit_or

bit_or(a, b) -> integer

Returns the bitwise OR of two integers. Each bit position in the result is 1 if at least one of the input bits at that position is 1.

Cypher
MATCH (app:Application)
WITH app.read_perms AS read_bits,
     app.write_perms AS write_bits
RETURN app.name,
       bit_or(read_bits, write_bits) AS combined_perms

Truth Table: 0|0=0, 0|1=1, 1|0=1, 1|1=1

Use Case: Combine permission flags (set union).

Example: bit_or(12, 10) = bit_or(0b1100, 0b1010) = 0b1110 = 14

bit_shift_left

bit_shift_left(x, n) -> integer

Shifts bits of x to the left by n positions. Equivalent to multiplying by 2^n. New bits on the right are filled with zeros.

Cypher
MATCH (data:Bitfield)
RETURN data.id,
       data.mask,
       bit_shift_left(data.mask, 8) AS shifted_left_8

Arithmetic: bit_shift_left(x, n) = x * 2^n

Example: bit_shift_left(5, 2) = bit_shift_left(0b0101, 2) = 0b10100 = 20

Overflow: Bits shifted past the integer width are discarded.

bit_shift_right

bit_shift_right(x, n) -> integer

Shifts bits of x to the right by n positions. Equivalent to integer division by 2^n. Behavior depends on signedness: unsigned fills with zeros, signed (arithmetic shift) fills with sign bit.

Cypher
MATCH (sensor:Sensor)
WITH sensor.raw_value AS raw
RETURN sensor.id,
       raw,
       bit_shift_right(raw, 4) AS high_nibble

Arithmetic: bit_shift_right(x, n) = x / 2^n (integer division)

Example: bit_shift_right(20, 2) = 20/4 = 5

Sign Extension: For negative numbers, the sign bit is extended (arithmetic right shift).

bit_xor

bit_xor(a, b) -> integer

Returns the bitwise XOR (exclusive or) of two integers. Each bit position in the result is 1 if the input bits at that position differ (one is 1, the other is 0).

Cypher
MATCH (packet:Packet)
RETURN packet.id,
       bit_xor(packet.header, packet.checksum) AS verification

Truth Table: 0^0=0, 0^1=1, 1^0=1, 1^1=0

Use Case: Detect differences, toggle bits, checksums, encryption.

Property: a ^ a = 0, a ^ 0 = a (XOR is self-inverse).

Example: bit_xor(12, 10) = bit_xor(0b1100, 0b1010) = 0b0110 = 6

popcount

popcount(x) -> integer

Returns the number of set bits (1s) in an integer. Also known as "population count" or Hamming weight. Used for counting set flags, finding bit density.

Cypher
MATCH (perms:Permissions)
RETURN perms.username,
       perms.flags,
       popcount(perms.flags) AS number_of_permissions

Algorithm: Hardware-accelerated POPCNT instruction where available.

Time Complexity: O(1) on modern CPUs.

Example: popcount(15) = popcount(0b1111) = 4

Use Case: Count enabled features, measure permission breadth, find sparsity.

Compatibility Functions (4)

Compatibility functions help developers migrate from SQL and other systems. They document how to express common patterns in xrayGraphDB's Cypher dialect.

GROUP_BY_TIME_BUCKET

SQL: GROUP BY DATE_TRUNC('hour', ts) -> Cypher: RETURN toStartOfHour(n.ts) AS hour, count(n)

Time-bucketed aggregation: use toStartOfHour/toStartOfDay/toStartOfMinute with GROUP BY (implicit via RETURN + aggregation). MySQL DATE_FORMAT equivalent: toStartOfHour. ClickHouse toStartOfHour is identical syntax.

SQL (Original)
SELECT DATE_TRUNC('hour', created_at) AS hour, COUNT(*) AS cnt
FROM events
GROUP BY DATE_TRUNC('hour', created_at)
ORDER BY hour DESC
Cypher (xrayGraphDB)
MATCH (e:Event)
RETURN toStartOfHour(e.created_at) AS hour, count(e) AS cnt
ORDER BY hour DESC

Time Functions Available:

  • toStartOfMinute() - Truncate to minute boundary
  • toStartOfHour() - Truncate to hour boundary (00:00)
  • toStartOfDay() - Truncate to midnight

Aggregation: GROUP BY is implicit when using RETURN with aggregation functions (count, sum, avg, etc.).

MOVING_AVERAGE_WORKAROUND

SQL: AVG(value) OVER (ORDER BY ts ROWS 4 PRECEDING) -> Cypher: WITH collect(n.value) AS vals RETURN moving_avg(vals, 5)

Convert SQL moving average to Cypher: (1) MATCH to get rows in order, (2) collect() into a list, (3) apply moving_avg(list, window_size). The function returns a list of averages, one per position. For MySQL: this replaces window functions entirely. For ClickHouse: replaces groupArrayMovingAvg().

SQL (Original)
SELECT ts, value,
       AVG(value) OVER (ORDER BY ts ROWS 4 PRECEDING) AS moving_avg_5
FROM metrics
ORDER BY ts
Cypher (xrayGraphDB)
MATCH (m:Metric)
WITH m.timestamp AS ts, m.value AS val
ORDER BY ts
WITH collect(val) AS values
WITH moving_avg(values, 5) AS moving_averages
UNWIND moving_averages AS avg_val
RETURN avg_val

Pattern: (1) MATCH + ORDER BY to get sorted data, (2) collect() into list, (3) apply window function, (4) UNWIND to flatten results.

Window Size: moving_avg(list, 5) = 5-element window.

Limitation: No true SQL window functions (OVER/PARTITION BY) in Cypher—this workaround is the idiomatic approach.

TOPK_AGGREGATION

SQL: SELECT value, COUNT(*) FROM t GROUP BY value ORDER BY COUNT(*) DESC LIMIT K

xrayGraphDB supports topK natively as an aggregation: RETURN topK(n.category, 10) AS top_categories. This replaces the SQL GROUP BY + ORDER BY + LIMIT pattern. Also available as function: WITH collect(n.category) AS cats RETURN topK(cats, 10). ClickHouse equivalent: topK(K)(column).

SQL (Original)
SELECT category, COUNT(*) as freq
FROM transactions
GROUP BY category
ORDER BY freq DESC
LIMIT 10
Cypher (Aggregation Mode)
MATCH (t:Transaction)
RETURN topK(t.category, 10) AS top_10_categories
Cypher (Function Mode)
MATCH (t:Transaction)
WITH collect(t.category) AS categories
RETURN topK(categories, 10) AS top_10_categories

Two Modes: Use aggregation mode for streaming efficiency, function mode for pre-collected lists.

Algorithm: Space-Saving probabilistic top-K (approximate results).

For Exact Results: Use topKExact() instead.

WINDOW_FUNCTION_WORKAROUND

SQL: AVG(val) OVER (ORDER BY ts ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)

xrayGraphDB does not have SQL-style window functions (OVER/PARTITION BY). Workaround: collect values first, then apply the scalar function. Example: WITH collect(n.value) AS vals RETURN moving_avg(vals, 5) AS averages. For partitioned windows: MATCH (n:Metric) WITH n.category AS cat, collect(n.value) AS vals RETURN cat, moving_avg(vals, 5). For sliding_window: WITH collect(n.value) AS vals RETURN sliding_window(vals, 5) AS windows.

SQL (Original)
SELECT id, value,
       AVG(value) OVER (ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS centered_avg,
       ROW_NUMBER() OVER (ORDER BY ts) AS rank
FROM metrics
Cypher (Partitioned Window)
MATCH (m:Metric)
WITH m.category AS category, collect(m.value) AS values
WITH category, moving_avg(values, 5) AS moving_averages
RETURN category, moving_averages

Core Pattern: No direct OVER/PARTITION BY syntax. Instead: (1) MATCH + collect() to gather rows, (2) Apply aggregate function to the list, (3) RETURN results.

For Ranking: Use explicit list positions or UNWIND + index.

For Partitioning: Add WITH clause with the partition key before collect().

Conditional Functions (7)

Conditional functions evaluate logic and return different values based on conditions. These are essential for expression-based filtering, transformation, and null handling.

case_when

case_when(condition, then_value, else_value) -> value

Returns then_value if condition is true, else_value otherwise. Simple ternary operator. Equivalent to SQL CASE WHEN ... THEN ... ELSE ... END.

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       case_when(a.altitude_ft > 35000, 'HIGH', 'LOW') AS altitude_category,
       case_when(a.ground_speed_kt > 450, 'FAST', 'NORMAL') AS speed_category

Syntax: Simple ternary—no chaining for multiple conditions.

Null Handling: If condition is NULL, returns else_value.

Type Coercion: then_value and else_value should be same type.

For Multiple Conditions: Nest case_when() calls: case_when(a, v1, case_when(b, v2, v3))

choose

choose(index, values...) -> value

Returns the value at the given one-based index from the arguments. Index must be between 1 and the number of values. Returns null if index is out of range.

Cypher
MATCH (a:Aircraft)
WITH a,
     case_when(a.status = 'LANDED', 1, case_when(a.status = 'AIRBORNE', 2, 3)) AS status_code
RETURN a.callsign,
       choose(status_code, 'Landed', 'Airborne', 'Unknown') AS status_name

Indexing: One-based (index 1 = first value, index 2 = second value, etc.).

Out of Range: Returns null if index < 1 or index > number of values.

Type Consistency: All values should be the same type.

Use Case: Switch/case-like behavior for mapping numeric codes to labels.

coalesce_chain

coalesce_chain(values...) -> value

Returns the first non-null value from the arguments. Scans left-to-right and returns immediately upon finding a non-null value. Also known as COALESCE in SQL.

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       coalesce_chain(a.icao_code, a.iata_code, a.flight_id) AS identifier,
       coalesce_chain(a.operator_name, a.airline, 'Unknown') AS operator

Short-Circuit: Stops evaluating as soon as a non-null value is found.

All Nulls: Returns null if all arguments are null.

Equivalent SQL: COALESCE(val1, val2, val3)

Use Case: Provide fallbacks for missing data.

greatest

greatest(values...) -> value

Returns the largest of all supplied values. Ignores null values (if all values are null, returns null). Works with numbers, strings, dates, and comparable types.

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       greatest(a.max_speed_kt, a.cruise_speed_kt, a.approach_speed_kt) AS max_capability,
       greatest(a.altitude_ft, a.service_ceiling_ft) AS altitude_record

Null Handling: Skips null values. If all are null, returns null.

Type Consistency: All values should be comparable (same type or numeric).

String Comparison: Uses lexicographic (alphabetical) ordering.

Date Comparison: Uses chronological ordering.

ifnull

ifnull(value, default) -> value

Returns value if not null, otherwise returns default. Two-argument version of coalesce_chain(). Equivalent to SQL IFNULL() or COALESCE(value, default).

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       ifnull(a.registration, 'N/A') AS registration,
       ifnull(a.manufacturer, 'Unknown') AS manufacturer,
       ifnull(a.weight_lbs, 0) AS weight

Syntax: Two arguments only. For more, use coalesce_chain().

Type Coercion: value and default should be the same type (or compatible).

Equivalent SQL: IFNULL(value, default) in MySQL, COALESCE(value, default) in standard SQL.

least

least(values...) -> value

Returns the smallest of all supplied values. Ignores null values (if all values are null, returns null). Works with numbers, strings, dates, and comparable types.

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       least(a.min_speed_kt, a.landing_speed_kt, a.stall_speed_kt) AS min_safe_speed,
       least(a.created_at, a.first_flight_date) AS earliest_date

Null Handling: Skips null values. If all are null, returns null.

Type Consistency: All values should be comparable.

String Comparison: Lexicographic (reverse alphabetical when finding minimum).

Use Case: Find minimum threshold, earliest date, smallest ID.

nullif

nullif(a, b) -> value|null

Returns null if a equals b, otherwise returns a. Opposite of ifnull(). Useful for filtering out specific values (e.g., turn unknown values into nulls).

Cypher
MATCH (a:Aircraft)
RETURN a.callsign,
       nullif(a.manufacturer, 'Unknown') AS known_manufacturer,
       nullif(a.registration, 'N/A') AS valid_registration,
       nullif(a.altitude_ft, 0) AS non_zero_altitude

Comparison: Uses strict equality (a = b).

Use Case: Convert placeholder/sentinel values (Unknown, N/A, 0) to null.

Side Effect: Allows filtering on null: ... WHERE nullif(col, 'Unknown') IS NULL

Inverse: ifnull(nullif(a, sentinel), sentinel) preserves sentinel when a = sentinel.

Data Integrity (2)

Functions for detecting and resolving data quality issues in the graph.

xray.dedup

CALL xray.dedup(label, key_property) YIELD duplicates_found, vertices_deleted, vertices_kept

Finds and removes duplicate vertices with the same label and key property value. Keeps the lowest GID (oldest) per group.

Cypher
CALL xray.dedup('Aircraft', 'icao24')
YIELD duplicates_found, vertices_deleted, vertices_kept
RETURN duplicates_found, vertices_deleted, vertices_kept

Use Case: Remove duplicate aircraft identities in ADS-B streams where the same ICAO24 code may have been loaded multiple times.

xray.upsert_behavior

BULK_UPSERT_NODES (0x27) via xrayProtocol

Upsert uses 3-tier atomic lookup: (1) Gid cache O(1), (2) label-property index O(1), (3) full scan O(N). Auto-creates property index on first upsert. BULK_INSERT (0x21) allows duplicates.

xrayProtocol (raw wire format)
BULK_UPSERT_NODES (0x27):
  [u32 rows=100]
  [u16 cols=3]
  [u8 type=0x05][u32 len=5]["icao24"]  // key_column (upsert key)
  [u8 type=0x05][u32 len=8]["callsign"]
  [u8 type=0x04][u32 len=0]            // altitude
  // then row data
  ["ABC123"] ["N123US"] [35000.0]
  ["ABC124"] ["N456UA"] [28500.0]

Per-property type tags (row-oriented value encoding): 0=String, 1=Int64, 2=Double, 3=Bool, 4=Null. As of v5.0, two recursive nested tags are added when the session negotiates CAP_TYPED_NESTED in HELLO: 5=List (u32 count + recursive typed values) and 6=Map<String,*> (u32 count + u16 key_len + key + recursive typed value). The v5 extension closes a gap that prevented populating spec §7.1 envelope columns (source_refs, disclosure_notes, both List<String>) through this opcode — pre-v5 clients had to omit those columns, triggering --xg-envelope-enforcement=on rejection. Pre-CAP_TYPED_NESTED behavior is preserved verbatim: byte values >4 still fall through to the legacy all-string length-prefix path.

Note: First upsert column is the unique key. Index is auto-created. Use BULK_INSERT for initial load when duplicates are acceptable.

Data Retention (1)

Functions for managing data lifecycle and automatic deletion policies.

ttl.delete_expired

CALL ttl.delete_expired(label, timestamp_prop, max_age_days, exempt_prop?) YIELD deleted_count, scanned_count, exempt_count

Time-based data retention with optional exemption. Scans vertices with the given label, compares timestamp_prop against max_age_days, and detach-deletes expired vertices. If exempt_prop is provided, vertices with that property set to true are preserved. Equivalent to MySQL EVENT + DELETE WHERE created_at < DATE_SUB(NOW(), INTERVAL N DAY).

Cypher
CALL ttl.delete_expired('AdsReport', 'timestamp', 30, 'is_archived')
YIELD deleted_count, scanned_count, exempt_count
RETURN 'Deleted ' + deleted_count + ' reports older than 30 days. Scanned: ' + scanned_count + ', Exempt: ' + exempt_count

Use Case: Automatically purge ADS-B reports older than 30 days, but keep archived reports. Runs periodically to maintain storage efficiency.

Database Management (7)

Multi-database administration, isolation, and connection routing.

CREATE DATABASE

CREATE DATABASE <name>

Creates a new named database with its own isolated storage, WAL, indexes, and data directory. Each database has completely separate vertices, edges, and properties. Requires Enterprise license. Example: CREATE DATABASE swim

Cypher
CREATE DATABASE swim

Effect: Creates new database "swim" with isolated storage. All subsequent queries can select this database via USE DATABASE swim.

DROP DATABASE

DROP DATABASE <name>

Permanently deletes a named database and all its data. Cannot drop the default database. Requires Enterprise license.

Cypher
DROP DATABASE swim

Warning: This is irreversible. All data in the database is permanently deleted. The default 'xraygraphdb' database cannot be dropped.

MULTI_DATABASE_BOLT

Bolt Multi-Database Usage

Python: session = driver.session(database='swim'). Node.js: session = driver.session({ database: 'swim' }). Java: session = driver.session(SessionConfig.forDatabase('swim')). If no database specified, uses the default 'xraygraphdb' database.

Python
from neo4j import GraphDatabase

driver = GraphDatabase.driver('bolt://localhost:7687', auth=('user', 'password'))

# Connect to swim database
swim_session = driver.session(database='swim')
result = swim_session.run('MATCH (a:Aircraft) RETURN COUNT(a) AS count')
print(result.single()['count'])

# Connect to default database
default_session = driver.session()
result = default_session.run('MATCH (n) RETURN COUNT(n) AS total')
print(result.single()['total'])

Multi-DB Architecture: Each database is fully isolated. The same driver can manage multiple databases by opening different sessions.

MULTI_DATABASE_OVERVIEW

Multi-Database Architecture

Each database has completely isolated storage. Queries on one database CANNOT see or modify data in another database. Use cases: isolate SWIM aviation data from code intelligence, per-tenant data separation, staging vs production. SWIM uses the 'swim' database. XRay-Vision uses the 'xray' database. Default database is 'xraygraphdb'.

Cypher
-- List all available databases
SHOW DATABASES

-- Use specific database (Bolt: pass in session config)
-- Via xrayProtocol: send database name in HELLO frame
-- Example: Switch to swim database for aviation data
USE DATABASE swim
MATCH (a:Aircraft {icao24: 'ABC1234'}) RETURN a

-- Different session: switch to xray database for code intelligence
USE DATABASE xray
MATCH (m:Method {name: 'parseAltitude'}) RETURN m

Isolation Guarantee: No cross-database queries. Each database is independently indexed, cached, and transactional.

MULTI_DATABASE_XRAYPROTOCOL

xrayProtocol Multi-Database Usage

Pass database name in HELLO message: [u16 version][u16 caps][u32 token_len][token][u32 db_len][db_name]. The server routes all operations on that connection to the specified database. Empty db_name = default database.

xrayProtocol (raw wire format)
-- HELLO frame selecting 'swim' database
[u32 payload_len=50]
[u8 msg_type=0x01]   // HELLO
[u8 flags=0x00]
[u16 query_id=0x0000]
[u16 version=1]
[u16 capabilities=0x0003]
[u32 token_len=11]["user:pass"]
[u32 db_len=4]["swim"]  // Route to swim database

-- All subsequent EXECUTE frames on this connection target 'swim'
[u32 payload_len=...]
[u8 msg_type=0x03]  // EXECUTE
[u8 flags=0x00]
[u16 query_id=0x0001]
[u8 language=0]      // Cypher
[u32 query_len=40]   // MATCH (a:Aircraft) RETURN COUNT(a)
...

Connection Affinity: Once a connection selects a database in HELLO, all queries on that connection route to that database. No per-query override.

SHOW DATABASES

SHOW DATABASES

Lists all databases. Returns database name for each. The default database is named 'xraygraphdb'. Additional databases are created with CREATE DATABASE.

Cypher
SHOW DATABASES

Output: A result table with column 'name' listing all available databases. Default output includes 'xraygraphdb', 'swim', 'xray'.

USE DATABASE

USE DATABASE <name>

Switches the current session to a different database. All subsequent queries execute against the selected database. Via Bolt: pass database name in session options. Via xrayProtocol: pass database name in HELLO message.

Cypher
-- Switch to swim database
USE DATABASE swim

-- All subsequent queries operate on swim database
MATCH (a:Aircraft) RETURN COUNT(a) AS aircraft_count

-- Switch to xray database
USE DATABASE xray

-- Queries now operate on xray database
MATCH (c:Class {name: 'Parser'}) RETURN COUNT(c) AS class_count

Scope: Database selection is per-session. Each client connection (Bolt or xrayProtocol) maintains its own database context.

DateTime (Capital D & T)

Temporal data creation and manipulation functions with timezone support.

date

date(map?) -> date

Creates a Date from a map of components or the current date.

Cypher
-- Current date
RETURN date() AS today

-- Date from components
RETURN date({year: 2026, month: 4, day: 15}) AS flight_departure

-- Use in graph query
MATCH (r:AdsReport)
WHERE r.received_date > date({year: 2026, month: 3, day: 1})
RETURN COUNT(r) AS reports_since_march

Type: Returns a Date object without time component. Useful for day-level comparisons.

date_to_millis

date_to_millis(date_string) -> integer

Converts a date string to epoch milliseconds.

Cypher
-- Convert date string to milliseconds
RETURN date_to_millis('2026-04-15') AS millis

-- Use for timestamp comparison
WITH date_to_millis('2026-03-01') AS march_start
MATCH (r:AdsReport)
WHERE r.timestamp_ms > march_start
RETURN COUNT(r) AS reports

Format: Input is ISO 8601 date format. Returns UNIX epoch milliseconds for storage/comparison.

datetime

datetime(map?) -> datetime

Creates a DateTime (with timezone) from a map or the current instant.

Cypher
-- Current datetime with timezone
RETURN datetime() AS now

-- DateTime with UTC timezone
RETURN datetime({year: 2026, month: 4, day: 15, hour: 14, minute: 30, second: 0, timezone: 'UTC'}) AS scheduled_event

-- Use in event tracking
CREATE (e:Event {timestamp: datetime(), name: 'Takeoff', flight_id: 'UA123'})
RETURN e

Type: Returns ZonedDateTime. Includes both date, time, and timezone information.

duration

duration(map) -> duration

Creates a Duration from a map of time components.

Cypher
-- Flight duration of 5 hours 30 minutes
RETURN duration({hours: 5, minutes: 30}) AS flight_duration

-- Use in calculations
MATCH (f:Flight)
WHERE f.scheduled_duration > duration({hours: 6})
RETURN f.callsign, f.scheduled_duration

Components: Supports years, months, weeks, days, hours, minutes, seconds, milliseconds.

duration_between

duration_between(dt1, dt2) -> duration

Returns the duration between two temporal values.

Cypher
-- Calculate flight time
MATCH (f:Flight)
WITH f, duration_between(f.departure_time, f.arrival_time) AS elapsed
RETURN f.callsign, elapsed.hours + ' hours ' + (elapsed.minutes % 60) + ' minutes' AS duration

-- Find long-running queries
MATCH (q:QueryLog)
WITH q, duration_between(q.start_time, q.end_time) AS exec_time
WHERE exec_time.seconds > 60
RETURN q.query_text, exec_time.seconds

Use Case: Measure elapsed time between two datetime values. Automatically handles timezone conversions.

epoch_millis

epoch_millis() -> integer

Returns the current epoch time in milliseconds.

Cypher
-- Get current time as milliseconds
RETURN epoch_millis() AS current_timestamp_ms

-- Use for performance tracking
WITH epoch_millis() AS start_ms
MATCH (a:Aircraft) RETURN COUNT(a)
WITH epoch_millis() - start_ms AS query_time_ms
RETURN query_time_ms + ' ms' AS execution_time

Type: Returns integer. Zero reference is 1970-01-01T00:00:00Z (UNIX epoch).

epoch_seconds

epoch_seconds() -> integer

Returns the current epoch time in seconds.

Cypher
-- Get current time as seconds
RETURN epoch_seconds() AS current_timestamp_s

-- Record timestamp in node
CREATE (a:AirspaceAlert {alert_id: 'AL001', detection_time: epoch_seconds()})
RETURN a

Precision: One-second granularity. Use epoch_millis() for sub-second precision.

localdatetime

localdatetime(map?) -> localdatetime

Creates a LocalDateTime from a map or the current date and time.

Cypher
-- Current local datetime (no timezone)
RETURN localdatetime() AS local_now

-- LocalDateTime at specific local time
RETURN localdatetime({year: 2026, month: 4, day: 15, hour: 14, minute: 30}) AS local_departure

-- Use for scheduling without timezone conversion
MATCH (e:Event)
WHERE e.local_start_time > localdatetime({year: 2026, month: 4, day: 15, hour: 12, minute: 0})
RETURN e.name, e.local_start_time

Type: LocalDateTime without timezone. Useful for recording times independent of geographic location.

localtime

localtime(map?) -> localtime

Creates a LocalTime from a map of components or the current time.

Cypher
-- Current local time
RETURN localtime() AS current_time

-- Create time at 14:30
RETURN localtime({hour: 14, minute: 30, second: 0}) AS afternoon_time

-- Find flights departing after 10:00 AM
MATCH (f:Flight)
WHERE f.departure_time > localtime({hour: 10, minute: 0})
RETURN f.callsign, f.departure_time

Type: LocalTime without date or timezone. Useful for daily schedules and time-of-day filters.

millis_to_date

millis_to_date(millis) -> string

Converts epoch milliseconds to a date string.

Cypher
-- Convert milliseconds to readable date
RETURN millis_to_date(1745606400000) AS readable_date

-- Use with stored timestamps
MATCH (r:AdsReport)
RETURN r.callsign, millis_to_date(r.timestamp_ms) AS report_date
LIMIT 10

Format: Returns ISO 8601 date string (YYYY-MM-DD). Inverse of date_to_millis().

Datetime (Lowercase)

Additional datetime utility functions with aliases for JavaScript/ClickHouse compatibility.

epochMillis

epochMillis(datetime) -> integer

Returns epoch milliseconds from a datetime for JavaScript/Unix interop. Equivalent to MySQL UNIX_TIMESTAMP(dt)*1000.

Cypher
-- Convert datetime to milliseconds for JavaScript
MATCH (e:Event)
RETURN e.name, epochMillis(e.event_time) AS timestamp_js

-- Store in database with millisecond precision
CREATE (r:Report {recorded_at: epochMillis(datetime())})
RETURN r

JavaScript Interop: Use for APIs expecting Unix milliseconds (Date.getTime()). Equivalent to MySQL UNIX_TIMESTAMP()*1000.

now

now() -> datetime

Returns the current datetime. Alias for datetime() with current time. Equivalent to MySQL NOW() or ClickHouse now().

Cypher
-- Get current time
RETURN now() AS server_time

-- Record creation timestamp
CREATE (a:AirspaceAlert {alert_type: 'tfr', created_at: now(), alert_id: 'ALR20260415001'})
RETURN a

-- Find recent activity
MATCH (a:Activity)
WHERE a.timestamp > now() - duration({days: 1})
RETURN a.description, a.timestamp

Alias: Equivalent to datetime(). Use for clarity in familiar SQL-style patterns.

toDate

toDate(datetime) -> date

Extracts the Date portion from a ZonedDateTime or LocalDateTime. Equivalent to MySQL DATE() or ClickHouse toDate().

Cypher
-- Extract date from datetime
RETURN toDate(now()) AS today

-- Group reports by date
MATCH (r:AdsReport)
WITH toDate(r.timestamp) AS report_date, COUNT(r) AS count
RETURN report_date, count
ORDER BY report_date DESC

-- Find flights on a specific date
MATCH (f:Flight)
WHERE toDate(f.departure_time) = date({year: 2026, month: 4, day: 15})
RETURN f.callsign, f.departure_time

Use Case: Extract date component from timestamp for day-level grouping or filtering. Discards time and timezone.

toStartOfDay

toStartOfDay(datetime) -> datetime

Truncates a datetime to midnight. Equivalent to MySQL DATE(dt) or ClickHouse toStartOfDay().

Cypher
-- Truncate to start of day
RETURN toStartOfDay(now()) AS midnight_today

-- Find records from start of day
MATCH (r:AdsReport)
WHERE r.timestamp >= toStartOfDay(now())
RETURN COUNT(r) AS today_reports

-- Group by day
MATCH (r:AdsReport)
WITH toStartOfDay(r.timestamp) AS day, COUNT(r) AS daily_count
RETURN day, daily_count
ORDER BY day DESC

Result: Returns datetime at 00:00:00 of the same day. Preserves timezone.

toStartOfHour

toStartOfHour(datetime) -> datetime

Truncates a datetime to the start of the hour boundary. Equivalent to MySQL DATE_FORMAT(dt, '%Y-%m-%d %H:00:00') or ClickHouse toStartOfHour().

Cypher
-- Truncate to start of hour
RETURN toStartOfHour(now()) AS hourly_boundary

-- Hourly aggregation
MATCH (r:AdsReport)
WITH toStartOfHour(r.timestamp) AS hour, COUNT(r) AS hourly_count, AVG(r.altitude) AS avg_alt
RETURN hour, hourly_count, avg_alt
ORDER BY hour DESC

-- Find reports from the last hour
MATCH (r:AdsReport)
WHERE r.timestamp >= toStartOfHour(now())
RETURN COUNT(r) AS recent_reports

Use Case: Create hourly buckets for time-series analysis. Returns datetime at HH:00:00 of the same hour.

toStartOfMinute

toStartOfMinute(datetime) -> datetime

Truncates a datetime to the start of the minute. Equivalent to MySQL DATE_FORMAT(dt, '%Y-%m-%d %H:%i:00') or ClickHouse toStartOfMinute().

Cypher
-- Truncate to start of minute
RETURN toStartOfMinute(now()) AS minute_boundary

-- Minute-level aggregation
MATCH (r:AdsReport)
WITH toStartOfMinute(r.timestamp) AS minute, COUNT(r) AS reports_per_minute
WHERE reports_per_minute > 100  -- High activity threshold
RETURN minute, reports_per_minute
ORDER BY minute DESC

-- Find events in the last minute
MATCH (e:Event)
WHERE e.timestamp >= toStartOfMinute(now())
RETURN e.description, e.timestamp

Precision: Returns datetime at MM:00 of the same minute. Useful for sub-hourly bucketing in high-frequency data.

GIS / Spatial Functions (33)

bbox_from_radius

bbox_from_radius(lat, lon, radius_m) -> [min_lat, min_lon, max_lat, max_lon]

Returns a bounding box around a point with the given radius. Useful for spatial filtering and geographic searches within a circular area.

Cypher
RETURN bbox_from_radius(40.7128, -74.0060, 5000) AS nyc_5km_bbox
// Returns: [40.66856, -74.06269, 40.75704, -73.94931]
// Bounding box for 5km radius around New York City

bearing

bearing(lat1, lon1, lat2, lon2) -> float

Returns the initial bearing in degrees (0-360) from one geographic point to another. North is 0°, East is 90°, South is 180°, West is 270°.

Cypher
RETURN bearing(40.7128, -74.0060, 51.5074, -0.1278) AS bearing_nyc_to_london
// Returns: 51.27 (approximately northeast)
// Initial bearing from NYC to London

cartesian_to_polar

cartesian_to_polar(x, y) -> [r, theta]

Converts 2D Cartesian coordinates (x, y) to polar coordinates (radius, angle in radians). Angle is measured counter-clockwise from positive x-axis.

Cypher
RETURN cartesian_to_polar(3, 4) AS polar_coords
// Returns: [5.0, 0.9273]
// Converts point (3,4) to radius 5, angle ~53 degrees

deg_to_dms

deg_to_dms(decimal_degrees) -> string

Converts decimal degrees to degrees-minutes-seconds (DMS) format. Returns a formatted string like "40°42'46.08\"N".

Cypher
RETURN deg_to_dms(40.7128) AS latitude_dms
// Returns: "40°42'46.08\"N"
// Converts NYC latitude to DMS format

distance

distance(lat1, lon1, lat2, lon2) -> float

Neo4j-compatible alias for haversine_distance. Returns the great-circle distance in meters between two geographic points.

Cypher
RETURN distance(40.7128, -74.0060, 34.0522, -118.2437) AS distance_nyc_to_lax
// Returns: 3944000 (approximately 3944 km)
// Distance from NYC to LAX airport in meters

dms_to_deg

dms_to_deg(dms_string) -> float

Converts a degrees-minutes-seconds string to decimal degrees. Supports formats like "40°42'46.08\"N" or "40 42 46.08".

Cypher
RETURN dms_to_deg("40°42'46.08\"N") AS decimal_latitude
// Returns: 40.7128
// Converts DMS format back to decimal degrees

ecef_to_wgs84

ecef_to_wgs84(x, y, z) -> [lat, lon, alt_m]

Converts Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates to WGS84 geodetic coordinates (latitude, longitude, altitude).

Cypher
RETURN ecef_to_wgs84(1334636, -4653242, 4137881) AS wgs84_coords
// Returns: [40.7128, -74.0060, 0]
// Converts satellite/GPS ECEF coordinates to lat/lon

geo.bearing

geo.bearing(lat1, lon1, lat2, lon2) -> float

Returns the initial bearing in degrees from one point to another. Namespace variant of bearing function.

Cypher
RETURN geo.bearing(51.5074, -0.1278, 48.8566, 2.3522) AS bearing_london_to_paris
// Returns: 171.03 (approximately south)
// Initial bearing from London to Paris

geo.destination

geo.destination(lat, lon, bearing, distance_m) -> [lat, lon]

Computes the destination point given a start location, initial bearing (degrees), and distance (meters).

Cypher
RETURN geo.destination(40.7128, -74.0060, 45, 10000) AS dest_northeast_10km
// Returns: [40.80829, -73.93467]
// Point 10km northeast of NYC

geo.distance

geo.distance(lat1, lon1, lat2, lon2) -> float

Returns the great-circle distance in meters between two lat/lon pairs. Namespace variant of haversine_distance.

Cypher
RETURN geo.distance(40.6892, -74.0445, 40.8448, -73.8648) AS distance_jfk_to_lga
// Returns: 20000 (approximately 20 km)
// Distance between JFK and LaGuardia airports in meters

geo.is_ahead

geo.is_ahead(lat1, lon1, heading, lat2, lon2) -> bool

Returns true if the target point is ahead of the current heading direction. Useful for navigation and route following.

Cypher
RETURN geo.is_ahead(40.7128, -74.0060, 45, 40.80, -73.93) AS target_ahead
// Returns: true
// Check if northeast point is ahead when heading 45° (northeast)

geo.reachable_score

geo.reachable_score(lat, lon, speed, time, target_lat, target_lon) -> float

Scores whether a target is reachable given speed (m/s) and time (seconds) constraints. Returns 0.0 to 1.0 where 1.0 means easily reachable.

Cypher
RETURN geo.reachable_score(40.7128, -74.0060, 100, 3600, 40.9, -73.8) AS reachability
// Returns: 0.95
// Score for reaching a point 20km away at 100 m/s in 1 hour

geo.route_score

geo.route_score(path, waypoints) -> float

Scores how well a path follows a set of waypoints. Returns 0.0 to 1.0 where 1.0 means perfect alignment.

Cypher
WITH [[40.7128, -74.0060], [40.80, -73.93], [40.90, -73.85]] AS path,
     [[40.7128, -74.0060], [40.8, -73.93], [40.9, -73.85]] AS waypoints
RETURN geo.route_score(path, waypoints) AS alignment_score
// Returns: 0.98
// Near-perfect alignment between actual and expected waypoints

geo.to_mgrs

geo.to_mgrs(lat, lon) -> string

Converts WGS84 coordinates to Military Grid Reference System (MGRS) format. Used by military, emergency services, and surveying applications.

Cypher
RETURN geo.to_mgrs(40.7128, -74.0060) AS nyc_mgrs
// Returns: "18TWL8026064799"
// MGRS grid reference for New York City

geo.to_utm

geo.to_utm(lat, lon) -> map

Converts WGS84 coordinates to UTM (Universal Transverse Mercator) easting/northing and zone. Returns a map with easting, northing, and zone.

Cypher
RETURN geo.to_utm(40.7128, -74.0060) AS nyc_utm
// Returns: {easting: 583960, northing: 4507523, zone: 18}
// UTM coordinates for NYC (Zone 18N)

geo_destination

geo_destination(lat, lon, bearing_deg, distance_m) -> [lat, lon]

Computes the destination point from a start location, bearing (degrees), and distance (meters). Underscore variant of geo.destination.

Cypher
RETURN geo_destination(51.5074, -0.1278, 135, 50000) AS dest_southeast_50km
// Returns: [51.14287, 0.52194]
// Point 50km southeast of London

geo_midpoint

geo_midpoint(lat1, lon1, lat2, lon2) -> [lat, lon]

Returns the geographic midpoint between two coordinates. Useful for calculating meeting points or center locations.

Cypher
RETURN geo_midpoint(40.7128, -74.0060, 34.0522, -118.2437) AS us_midpoint
// Returns: [37.8825, -96.1253]
// Geographic center point between NYC and LAX

geohash_decode

geohash_decode(geohash) -> [lat, lon]

Decodes a geohash string back to a lat/lon pair. Geohashes provide spatial indexing with hierarchical precision.

Cypher
RETURN geohash_decode("dr5regw") AS decoded
// Returns: [40.714, -74.010]
// Decodes geohash back to coordinates near NYC

geohash_encode

geohash_encode(lat, lon, precision?) -> string

Encodes a lat/lon pair as a geohash string. Precision defaults to 11 (about 1m accuracy). Higher precision = longer string.

Cypher
RETURN geohash_encode(40.7128, -74.0060, 7) AS nyc_geohash
// Returns: "dr5regw"
// 7-character geohash for NYC (~150m precision)

haversine

haversine(lat1, lon1, lat2, lon2) -> float

Alias for haversine_distance. Returns the great-circle distance in meters between two geographic points.

Cypher
RETURN haversine(40.7128, -74.0060, 48.8566, 2.3522) AS distance_nyc_to_paris
// Returns: 5837000 (approximately 5837 km)
// Great-circle distance in meters

haversine_distance

haversine_distance(lat1, lon1, lat2, lon2) -> float

Returns the great-circle distance in meters between two geographic points. Uses the Haversine formula accounting for Earth's curvature.

Cypher
RETURN haversine_distance(37.7749, -122.4194, 34.0522, -118.2437) AS distance_sf_to_lax
// Returns: 559000 (approximately 559 km)
// Distance between San Francisco and Los Angeles

mercator_to_wgs84

mercator_to_wgs84(x, y) -> [lat, lon]

Converts Web Mercator projection coordinates back to WGS84 lat/lon. Web Mercator is used by Google Maps and most web mapping libraries.

Cypher
RETURN mercator_to_wgs84(-8235802, 4960145) AS wgs84_coords
// Returns: [40.7128, -74.0060]
// Converts Web Mercator to NYC coordinates

point

point(map) -> point

Creates a 2D or 3D geographic/cartesian point from a map. Map can contain x/y for Cartesian or latitude/longitude for geographic points.

Cypher
RETURN point({latitude: 40.7128, longitude: -74.0060}) AS nyc_point
// Returns: point{latitude: 40.7128, longitude: -74.0060}
// Creates a geographic point for NYC

point.distance

point.distance(p1, p2) -> float

Returns the geodesic distance in meters between two point objects. Works with both geographic and Cartesian points.

Cypher
WITH point({latitude: 40.7128, longitude: -74.0060}) AS p1,
     point({latitude: 40.9, longitude: -73.8}) AS p2
RETURN point.distance(p1, p2) AS distance_meters
// Returns: 23400 (approximately 23.4 km)
// Distance between two geographic points

point.withinbbox

point.withinbbox(point, lowerLeft, upperRight) -> bool

Returns true if a point is within the bounding box defined by lowerLeft and upperRight corners.

Cypher
WITH point({latitude: 40.7128, longitude: -74.0060}) AS nyc,
     point({latitude: 40.5, longitude: -74.3}) AS lower_left,
     point({latitude: 41.0, longitude: -73.7}) AS upper_right
RETURN point.withinbbox(nyc, lower_left, upper_right) AS in_bbox
// Returns: true
// NYC is within the defined bounding box

polar_to_cartesian

polar_to_cartesian(r, theta) -> [x, y]

Converts polar coordinates (radius, angle in radians) to 2D Cartesian coordinates (x, y).

Cypher
RETURN polar_to_cartesian(5, 0.9273) AS cartesian_coords
// Returns: [3.0, 4.0]
// Converts polar (r=5, θ≈53°) to Cartesian (3,4)

to_geojson

to_geojson(lat, lon) -> string

Returns a GeoJSON Point string for the given coordinates. GeoJSON is a standard format for geographic data interchange.

Cypher
RETURN to_geojson(40.7128, -74.0060) AS geojson
// Returns: '{"type":"Point","coordinates":[-74.0060,40.7128]}'
// Standard GeoJSON format for NYC

to_wkt

to_wkt(lat, lon) -> string

Returns a WKT (Well-Known Text) POINT string for the given coordinates. WKT is a standard format for spatial data.

Cypher
RETURN to_wkt(40.7128, -74.0060) AS wkt
// Returns: 'POINT(-74.0060 40.7128)'
// Standard WKT format for NYC

utm_to_wgs84

utm_to_wgs84(easting, northing, zone, hemisphere) -> [lat, lon]

Converts UTM coordinates back to WGS84 lat/lon. Hemisphere should be "N" for northern or "S" for southern hemisphere.

Cypher
RETURN utm_to_wgs84(583960, 4507523, 18, "N") AS wgs84_coords
// Returns: [40.7128, -74.0060]
// Converts UTM Zone 18N back to NYC coordinates

utm_zone

utm_zone(lat, lon) -> integer

Returns the UTM zone number for a given coordinate. UTM divides Earth into 60 zones, each 6 degrees of longitude wide.

Cypher
RETURN utm_zone(40.7128, -74.0060) AS zone_number
// Returns: 18
// NYC is in UTM Zone 18

wgs84_to_ecef

wgs84_to_ecef(lat, lon, alt_m) -> [x, y, z]

Converts WGS84 geodetic coordinates (lat/lon/altitude) to Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates. Used for satellite and GPS calculations.

Cypher
RETURN wgs84_to_ecef(40.7128, -74.0060, 10) AS ecef_coords
// Returns: [1334636, -4653242, 4137881]
// ECEF representation of NYC at 10m altitude

wgs84_to_mercator

wgs84_to_mercator(lat, lon) -> [x, y]

Converts WGS84 coordinates to Web Mercator projection. Web Mercator is standard for online mapping (Google Maps, OpenStreetMap, etc.).

Cypher
RETURN wgs84_to_mercator(40.7128, -74.0060) AS mercator_coords
// Returns: [-8235802, 4960145]
// Web Mercator projection for NYC

wgs84_to_utm

wgs84_to_utm(lat, lon) -> [easting, northing, zone]

Converts WGS84 coordinates to UTM easting/northing and zone. Returns a list [easting, northing, zone] for the appropriate UTM zone.

Cypher
RETURN wgs84_to_utm(40.7128, -74.0060) AS utm_coords
// Returns: [583960, 4507523, 18]
// NYC in UTM Zone 18N coordinates

Graph Analytics Functions (27)

adamic_adar

adamic_adar(node1, node2) -> float

Returns the Adamic-Adar link prediction score between two nodes. This measure is commonly used in recommendation systems to predict likely connections based on shared neighbors and their degrees.

Cypher
MATCH (alice:User {name: "Alice"}), (bob:User {name: "Bob"})
RETURN xg.adamic_adar(alice, bob) AS prediction_score

bridge_score

bridge_score(edge) -> float

Returns how likely an edge is a bridge between communities. A bridge edge connects different communities and has high importance in network connectivity.

Cypher
MATCH (a:User)-[edge:FOLLOWS]->(b:User)
RETURN edge, xg.bridge_score(edge) AS bridge_likelihood
ORDER BY bridge_likelihood DESC LIMIT 10

chain_avg_by

chain_avg_by(list, property) -> float

Averages a property across all elements in a list. Useful for aggregating metrics from a collection of nodes or relationships.

Cypher
MATCH (user:User)-[:POSTED]->(posts:Post)
WITH user, collect(posts) AS user_posts
RETURN user.name, xg.chain_avg_by(user_posts, "engagement_score") AS avg_engagement

chain_count_by

chain_count_by(list, property) -> map

Counts occurrences of each distinct property value. Returns a map where keys are property values and values are occurrence counts.

Cypher
MATCH (post:Post)<-[:POSTED]-(:User)
WITH collect(post) AS all_posts
RETURN xg.chain_count_by(all_posts, "category") AS category_distribution

chain_filter_eq

chain_filter_eq(list, property, value) -> list

Filters a list keeping elements where a property equals a value. Returns a new list containing only matching elements.

Cypher
MATCH (user:User)-[:RATED]->(movie:Movie)
WITH user, collect(movie) AS movies
RETURN user.name, xg.chain_filter_eq(movies, "genre", "Action") AS action_movies

chain_filter_gt

chain_filter_gt(list, property, threshold) -> list

Filters a list keeping elements where a property is greater than a threshold. Returns a new list with qualifying elements.

Cypher
MATCH (user:User)-[:RATED]->(movie:Movie)
WITH user, collect(movie) AS rated_movies
RETURN user.name, xg.chain_filter_gt(rated_movies, "rating", 7.5) AS highly_rated

chain_filter_lt

chain_filter_lt(list, property, threshold) -> list

Filters a list keeping elements where a property is less than a threshold. Returns a new list with matching elements.

Cypher
MATCH (person:Person)-[:HAS_TRANSACTION]->(txn:Transaction)
WITH person, collect(txn) AS transactions
RETURN person.name, xg.chain_filter_lt(transactions, "amount", 100.0) AS small_transactions

chain_group_by

chain_group_by(list, property) -> map

Groups list elements by a property into a map of lists. Keys are property values, values are lists of elements with that property value.

Cypher
MATCH (user:User)-[:COMPLETED]->(course:Course)
WITH user, collect(course) AS user_courses
RETURN user.name, xg.chain_group_by(user_courses, "difficulty_level") AS courses_by_level

chain_map

chain_map(list, property) -> list

Extracts a property from each element in a list (projection). Returns a new list containing only the specified property values.

Cypher
MATCH (team:Team)-[:HAS_MEMBER]->(player:Player)
WITH team, collect(player) AS members
RETURN team.name, xg.chain_map(members, "salary") AS all_salaries

chain_max_by

chain_max_by(list, property) -> value

Returns the element with the maximum value of a property. Useful for finding the highest-scoring or highest-valued item in a collection.

Cypher
MATCH (user:User)-[:AUTHORED]->(article:Article)
WITH user, collect(article) AS articles
RETURN user.name, xg.chain_max_by(articles, "view_count") AS most_viewed_article

chain_min_by

chain_min_by(list, property) -> value

Returns the element with the minimum value of a property. Useful for finding the lowest-scoring or lowest-valued item.

Cypher
MATCH (warehouse:Warehouse)-[:STORES]->(item:Item)
WITH warehouse, collect(item) AS inventory
RETURN warehouse.location, xg.chain_min_by(inventory, "stock_level") AS lowest_stock_item

chain_sort_by

chain_sort_by(list, property) -> list

Sorts a list of elements by a given property. Returns a new list ordered by the property value.

Cypher
MATCH (conference:Conference)-[:INCLUDES]->(session:Session)
WITH conference, collect(session) AS sessions
RETURN conference.name, xg.chain_sort_by(sessions, "start_time") AS ordered_schedule

chain_sum_by

chain_sum_by(list, property) -> number

Sums a property across all elements in a list. Returns the total of all property values.

Cypher
MATCH (customer:Customer)-[:MADE_PURCHASE]->(order:Order)
WITH customer, collect(order) AS purchases
RETURN customer.name, xg.chain_sum_by(purchases, "total_amount") AS lifetime_value

common_neighbor_count

common_neighbor_count(node1, node2) -> integer

Returns the number of common neighbors between two nodes. Useful for measuring similarity or predicting new connections.

Cypher
MATCH (person1:Person {name: "John"}), (person2:Person {name: "Jane"})
RETURN xg.common_neighbor_count(person1, person2) AS mutual_friends

community_modularity

community_modularity(partition) -> float

Returns the modularity score of a community partition. Higher scores indicate stronger community structure.

Cypher
MATCH (node:Node)
WHERE node.community_id IS NOT NULL
WITH collect(node) AS partition
RETURN xg.community_modularity(partition) AS modularity_score

edge_weight_normalize

edge_weight_normalize(edges, property) -> list

Normalizes edge weights to the 0-1 range by property. Returns a list of edges with normalized weight values.

Cypher
MATCH (a:Page)-[edge:LINKS_TO]->(b:Page)
WITH collect(edge) AS all_edges
RETURN xg.edge_weight_normalize(all_edges, "frequency") AS normalized_links

graph_density

graph_density(node_count, edge_count) -> float

Returns the density of a graph or subgraph. Density is calculated as edges / (nodes * (nodes - 1) / 2).

Cypher
MATCH (n:Node)
WITH count(DISTINCT n) AS total_nodes
MATCH ()-[e:CONNECTED]->()
WITH total_nodes, count(DISTINCT e) AS total_edges
RETURN xg.graph_density(total_nodes, total_edges) AS density

graph_diameter_approx

graph_diameter_approx(node_count?) -> float

Returns an approximate graph diameter. The optional node_count parameter can improve estimation accuracy.

Cypher
MATCH (n:Node)
WITH count(n) AS total_nodes
RETURN xg.graph_diameter_approx(total_nodes) AS estimated_diameter

graph_radius_approx

graph_radius_approx(node_count?) -> float

Returns an approximate graph radius. The optional node_count parameter can improve estimation accuracy.

Cypher
MATCH (n:Node)
WITH count(n) AS total_nodes
RETURN xg.graph_radius_approx(total_nodes) AS estimated_radius

hits_authority

hits_authority(node) -> float

Returns the HITS authority score of a node. Authority nodes are pointed to by many hub nodes (useful for finding quality sources).

Cypher
MATCH (page:WebPage)
RETURN page.url, xg.hits_authority(page) AS authority_score
ORDER BY authority_score DESC LIMIT 20

hits_hub

hits_hub(node) -> float

Returns the HITS hub score of a node. Hub nodes point to many authority nodes (useful for finding directory pages).

Cypher
MATCH (page:WebPage)
RETURN page.url, xg.hits_hub(page) AS hub_score
ORDER BY hub_score DESC LIMIT 20

influence_spread

influence_spread(node, steps?) -> float

Estimates the influence spread of a node in the graph. Returns a score representing how far influence propagates, optionally limited to a number of steps.

Cypher
MATCH (influencer:User {name: "Alice"})
RETURN xg.influence_spread(influencer, 3) AS reach_3_hops

katz_centrality

katz_centrality(node, alpha?, beta?) -> float

Returns the Katz centrality score of a node. Measures importance considering both direct and indirect connections. Alpha (damping) and beta (bias) parameters are optional.

Cypher
MATCH (person:Person)
RETURN person.name, xg.katz_centrality(person, 0.1, 1.0) AS centrality
ORDER BY centrality DESC

preferential_attachment

preferential_attachment(node1, node2) -> integer

Returns the preferential attachment score (degree product). Used to predict link formation based on node degrees in growing networks.

Cypher
MATCH (user1:User {name: "Alice"}), (user2:User {name: "Bob"})
RETURN xg.preferential_attachment(user1, user2) AS attachment_score

resource_allocation

resource_allocation(node1, node2) -> float

Returns the resource allocation index for link prediction. Measures the likelihood of a connection forming between two nodes.

Cypher
MATCH (user1:User)-[]->(mutual:User)<-[]-(:User {name: "Bob"})
WITH DISTINCT user1, collect(DISTINCT mutual) AS mutual_friends
RETURN user1.name, xg.resource_allocation(user1, mutual_friends[0]) AS link_probability

shortest_path_weight

shortest_path_weight(path) -> float

Returns the total weight of the shortest weighted path. Sums edge weights along the path.

Cypher
MATCH (start:City {name: "New York"}), (end:City {name: "Los Angeles"})
MATCH path = shortestPath((start)-[:ROUTE*]->(end))
RETURN xg.shortest_path_weight(path) AS total_distance

temporal_decay_rank

temporal_decay_rank(node, half_life_days) -> float

Returns a rank score that decays over time from last activity. Useful for time-sensitive rankings where recent activity is more valuable.

Cypher
MATCH (item:Item)
WHERE item.last_accessed IS NOT NULL
RETURN item.id, xg.temporal_decay_rank(item, 30) AS recency_score
ORDER BY recency_score DESC

Hash Functions (3)

crc32

crc32(value) -> integer

Returns the CRC-32 checksum of a value. Useful for data integrity verification and content-based routing in distributed graphs.

Cypher
MATCH (p:Person {name: 'Alice'})
RETURN p.name, crc32(p.email) AS email_hash
LIMIT 1

fnv1a

fnv1a(value) -> integer

Returns the FNV-1a hash of a value. Fast non-cryptographic hash with good distribution, ideal for bucketing and sharding.

Cypher
MATCH (d:Document)
RETURN d.id, fnv1a(d.content) % 8 AS shard_bucket
ORDER BY shard_bucket

murmur3

murmur3(value) -> integer

Returns the MurmurHash3 hash of a value. Excellent hash distribution, commonly used for deduplication and set operations.

Cypher
MATCH (u:User)-[r:POSTED]->(t:Tweet)
WITH u, t, murmur3(t.content) AS content_hash
WHERE NOT (u)-[:DUPLICATE_DETECTED]->(t)
MATCH (u2:User)-[r2:POSTED]->(t2:Tweet)
WHERE murmur3(t2.content) = content_hash AND t2 <> t
CREATE (u)-[:DUPLICATE_DETECTED {confidence: 0.95}]->(t2)

List Functions (32)

keys

keys(map_or_node) -> list

Returns a list of all property keys from a map or entity node. Essential for schema introspection and dynamic property handling.

Cypher
MATCH (u:User)
WHERE size(keys(u)) > 10
RETURN u.id, keys(u) AS all_properties
ORDER BY u.created_at DESC
LIMIT 5

labels

labels(node) -> list

Returns a list of labels assigned to a node. Used for dynamic filtering and type introspection in polymorphic graphs.

Cypher
MATCH (n)
WHERE 'Account' IN labels(n)
RETURN n.id, labels(n) AS node_types
LIMIT 10

list.avg

list.avg(list) -> float

Returns the average of all numeric elements in a list. Perfect for computing average metrics from collected values.

Cypher
MATCH (s:Sensor)-[r:RECORDED {timestamp: {year: 2026, month: 4}}]->(v:Value)
WITH s.id, collect(r.temperature) AS temps
RETURN s.id, list.avg(temps) AS avg_temperature
ORDER BY avg_temperature DESC

list.contains

list.contains(list, value) -> bool

Returns true if the list contains the given value. Efficient membership testing for filtering results.

Cypher
WITH ['active', 'pending', 'verified'] AS allowed_statuses
MATCH (a:Account {status: 'pending'})
WHERE list.contains(allowed_statuses, a.status)
RETURN a.id, a.status
LIMIT 20

list.difference

list.difference(list1, list2) -> list

Returns elements in list1 that are not in list2. Useful for set operations like finding removed items or exclusions.

Cypher
MATCH (u:User {id: 1})-[r:FOLLOWING]->(f:User)
WITH collect(f.id) AS current_following
WITH current_following, [2, 3, 4, 5] AS previous_following
RETURN list.difference(previous_following, current_following) AS unfollowed_ids

list.distinct

list.distinct(list) -> list

Returns a list with duplicate elements removed. Critical for deduplication in aggregation pipelines.

Cypher
MATCH (p:Post)-[r:LIKED_BY]->(u:User)
WITH p.id, collect(u.category) AS categories
RETURN p.id, list.distinct(categories) AS unique_user_categories

list.drop

list.drop(list, n) -> list

Returns a list with the first n elements removed. Useful for pagination and skipping initial results.

Cypher
MATCH (a:Article)
WITH collect(a.id) AS all_articles
RETURN list.drop(all_articles, 10) AS articles_after_skip_10

list.flatten

list.flatten(nested_list) -> list

Flattens a nested list into a single-depth list. Essential for processing multi-level aggregations.

Cypher
MATCH (g:Group)-[r:CONTAINS]->(t:Team)
WITH g.id, collect(collect(t.id)) AS nested_teams
RETURN g.id, list.flatten(nested_teams) AS all_team_ids

list.indexof

list.indexof(list, value) -> integer

Returns the zero-based index of the first occurrence of a value, or -1 if not found. Perfect for position-based logic.

Cypher
WITH ['red', 'green', 'blue', 'yellow'] AS colors
RETURN list.indexof(colors, 'blue') AS blue_position

list.intersect

list.intersect(list1, list2) -> list

Returns elements common to both lists. Critical for finding shared properties across entities.

Cypher
MATCH (u1:User {id: 1})-[r1:KNOWS]->(c1:Contact)
MATCH (u2:User {id: 2})-[r2:KNOWS]->(c2:Contact)
WITH collect(c1.id) AS friends_u1, collect(c2.id) AS friends_u2
RETURN list.intersect(friends_u1, friends_u2) AS mutual_friends

list.length

list.length(list) -> integer

Returns the number of elements in a list. Fundamental for counting aggregated results and filtering by size.

Cypher
MATCH (a:Author)-[r:WROTE]->(b:Book)
WITH a.name, collect(b.id) AS books
WHERE list.length(books) > 5
RETURN a.name, list.length(books) AS book_count
ORDER BY book_count DESC

list.max

list.max(list) -> number

Returns the maximum value in a list. Essential for finding peak metrics and threshold analysis.

Cypher
MATCH (d:Device)-[r:RECORDED]->(m:Metric)
WITH d.id, collect(r.value) AS metric_values
RETURN d.id, list.max(metric_values) AS peak_value
WHERE list.max(metric_values) > 100

list.median

list.median(list) -> number

Returns the median value of numeric elements in a list. Better than average for skewed distributions.

Cypher
MATCH (s:Store)-[r:SOLD {date: {year: 2026, month: 4}}]->(p:Product)
WITH s.id, collect(r.price) AS prices
RETURN s.id, list.median(prices) AS median_price
ORDER BY median_price

list.min

list.min(list) -> number

Returns the minimum value in a list. Essential for anomaly detection and finding lower bounds.

Cypher
MATCH (c:Cache)-[r:ACCESSED]->(i:Item)
WITH c.id, collect(r.latency_ms) AS latencies
WHERE list.min(latencies) > 10
RETURN c.id, list.min(latencies) AS min_latency

list.percentile

list.percentile(list, percentile) -> number

Returns the value at the given percentile in a sorted list. Critical for SLA and performance analysis.

Cypher
MATCH (api:APIEndpoint)-[r:INVOKED]->(t:Transaction)
WITH api.name, collect(r.response_time_ms) AS times
RETURN api.name,
  list.percentile(times, 0.50) AS p50,
  list.percentile(times, 0.95) AS p95,
  list.percentile(times, 0.99) AS p99

list.product

list.product(list) -> number

Returns the product of all numeric elements in a list. Useful for growth rates and compound calculations.

Cypher
MATCH (inv:Investment)-[r:QUARTERLY_RETURN]->(q:Quarter)
WITH inv.name, collect(r.growth_factor) AS growth_factors
RETURN inv.name, list.product(growth_factors) AS total_growth
ORDER BY total_growth DESC

list.reverse

list.reverse(list) -> list

Returns a copy of a list in reverse order. Useful for timeline reversal and LIFO processing.

Cypher
MATCH (t:Timeline)-[r:EVENT_AT]->(e:Event)
WITH collect(e.timestamp) AS chronological
RETURN list.reverse(chronological) AS reverse_chronological

list.slice

list.slice(list, start, end?) -> list

Returns a sub-list from start index to optional end index. Essential for window operations and pagination.

Cypher
MATCH (q:Queue)-[r:TASK_IN]->(t:Task)
WITH collect(t.id) AS all_tasks
RETURN list.slice(all_tasks, 10, 20) AS page_2_tasks

list.sort

list.sort(list) -> list

Returns a sorted copy of a list in ascending order. Foundation for ranking and ordered aggregation.

Cypher
MATCH (c:Country)-[r:EXPORTS]->(g:Good)
WITH c.name, collect(r.value_usd) AS trade_values
RETURN c.name, list.sort(trade_values) AS sorted_exports
ORDER BY c.name

list.stddev

list.stddev(list) -> float

Returns the population standard deviation of numeric elements. Critical for variance analysis and quality control.

Cypher
MATCH (m:Machine)-[r:PRODUCED]->(p:Part)
WITH m.id, collect(r.weight_grams) AS weights
WHERE list.stddev(weights) > 2.5
RETURN m.id, list.stddev(weights) AS weight_variance

list.sum

list.sum(list) -> number

Returns the sum of all numeric elements in a list. Fundamental aggregation for totaling metrics.

Cypher
MATCH (o:Order)-[r:CONTAINS]->(i:Item)
WITH o.id, collect(r.quantity * r.unit_price) AS amounts
RETURN o.id, list.sum(amounts) AS order_total
ORDER BY order_total DESC

list.take

list.take(list, n) -> list

Returns the first n elements of a list. Essential for LIMIT-like behavior on collected results.

Cypher
MATCH (u:User)-[r:VIEWED]->(v:Video)
WITH u.id, collect(v.id ORDER BY r.timestamp DESC) AS watched
RETURN u.id, list.take(watched, 5) AS last_5_videos

list.union

list.union(list1, list2) -> list

Returns the set union of two lists. Perfect for combining datasources and avoiding duplicates.

Cypher
MATCH (t:Team {id: 1})-[r:MEMBER_OF]->(p:Person)
MATCH (t)-[r2:AFFILIATE_OF]->(a:Affiliate)
WITH collect(p.id) AS core_members, collect(a.person_id) AS affiliates
RETURN list.union(core_members, affiliates) AS all_participants

list.variance

list.variance(list) -> float

Returns the population variance of numeric elements in a list. Essential for statistical analysis and risk assessment.

Cypher
MATCH (p:Portfolio)-[r:HOLDING]->(s:Stock)
WITH p.id, collect(r.return_percent) AS returns
RETURN p.id, list.variance(returns) AS portfolio_variance
ORDER BY portfolio_variance DESC

list.zip

list.zip(list1, list2) -> list

Zips two lists into a list of pairs. Critical for aligned pairwise operations and correlation analysis.

Cypher
WITH ['A', 'B', 'C'] AS keys, [1, 2, 3] AS values
RETURN list.zip(keys, values) AS key_value_pairs

nodes

nodes(path) -> list

Returns all nodes in a path as a list. Essential for extracting node sequences from traversals.

Cypher
MATCH p = (a:Author {name: 'Alice'})-[:COLLABORATED*1..3]->(z:Author {name: 'Zoe'})
WITH p, nodes(p) AS collaboration_chain
RETURN [node IN collaboration_chain | node.name] AS author_names
LIMIT 1

range

range(start, end, step?) -> list

Generates a list of integers from start to end with optional step. Foundation for generating synthetic data and sequences.

Cypher
UNWIND range(1, 10, 2) AS odd_number
MATCH (n {id: odd_number})
RETURN collect(n.value) AS odd_values

range_generate

range_generate(start, end, step?) -> list

Generates a list of integers from start to end with optional step. Alias and alternative for range function.

Cypher
MATCH (m:Month)
WHERE m.number IN range_generate(1, 12, 3)
RETURN m.name

relationships

relationships(path) -> list

Returns all relationships in a path as a list. Critical for analyzing edge sequences and relationship types in traversals.

Cypher
MATCH p = (s:Source)-[*1..5]->(t:Target)
WITH p, relationships(p) AS edge_path
RETURN [rel IN edge_path | type(rel)] AS relationship_types

tail

tail(list) -> list

Returns all elements of a list except the first. Useful for skip-one processing and tail recursion patterns.

Cypher
MATCH (h:HeadNode)-[r:LINKED]->(n:NextNode)
WITH collect(n.id) AS all_nodes
RETURN tail(all_nodes) AS nodes_after_first

toset

toset(list) -> list

Removes duplicate elements from a list. Alias for deduplication, same as list.distinct().

Cypher
MATCH (p:Publication)-[r:WRITTEN_BY]->(a:Author)
WITH collect(a.field) AS author_fields
RETURN toset(author_fields) AS unique_fields

uniformsample

uniformsample(list, count) -> list

Returns a uniformly random sample of elements from a list. Essential for statistical sampling and monte carlo analysis.

Cypher
MATCH (p:Patient)-[r:TEST_RESULT]->(t:Test)
WITH collect(r.value) AS all_results
RETURN uniformsample(all_results, 50) AS sample_results

Map Functions (6)

map_get

map_get(map, key, default?) -> value

Returns the value for a key from a map, or a default if missing. Essential for safe property access without null errors.

Cypher
MATCH (c:Config)
RETURN c.id,
  map_get(c.settings, 'timeout', 30) AS timeout,
  map_get(c.settings, 'retries', 3) AS retries

map_has_key

map_has_key(map, key) -> bool

Returns true if the map contains the given key. Perfect for conditional logic based on key presence.

Cypher
MATCH (p:Product)
WHERE map_has_key(p.attributes, 'weight_kg')
RETURN p.id, p.attributes AS all_attributes
LIMIT 10

map_keys

map_keys(map) -> list

Returns all keys from a map as a list. Critical for schema inspection and dynamic property enumeration.

Cypher
MATCH (u:User {id: 123})
RETURN u.id, map_keys(u.profile) AS profile_fields

map_merge

map_merge(map1, map2) -> map

Merges two maps into one, with the second overriding duplicates. Essential for configuration composition and patching.

Cypher
MATCH (u:User {id: 1})
WITH u.base_settings AS base, u.custom_overrides AS custom
RETURN map_merge(base, custom) AS final_settings

map_values

map_values(map) -> list

Returns all values from a map as a list. Useful for aggregating values across a property map.

Cypher
MATCH (s:Store {id: 1})
RETURN s.id, map_values(s.inventory) AS stock_counts

values

values(map) -> list

Returns all values from a map as a list. Alias for map_values, used interchangeably.

Cypher
MATCH (r:Report)
WITH r.metrics AS metrics_map
RETURN values(metrics_map) AS all_metrics

Math Functions (35)

abs

abs(x) -> number

Returns the absolute value of a number. Essential for magnitude calculations and distance metrics.

Cypher
MATCH (t:Transaction)
WHERE abs(t.balance_change) > 1000
RETURN t.id, abs(t.balance_change) AS magnitude
ORDER BY magnitude DESC

acos

acos(x) -> float

Returns the arc cosine of x in radians. Used in geometric and navigation calculations.

Cypher
WITH 0.5 AS cosine_value
RETURN acos(cosine_value) AS angle_radians

asin

asin(x) -> float

Returns the arc sine of x in radians. Used for angle recovery from sine values in spatial calculations.

Cypher
WITH 0.866 AS sine_value
RETURN asin(sine_value) AS angle_radians

atan

atan(x) -> float

Returns the arc tangent of x in radians. Used in slope and gradient calculations.

Cypher
MATCH (r:Road)
RETURN r.id, atan(r.slope) AS gradient_angle

atan2

atan2(y, x) -> float

Returns the arc tangent of y/x using the signs of both to determine the quadrant. Critical for bearing calculations.

Cypher
MATCH (a:Aircraft)-[r:POSITION]->(p:Point)
RETURN a.id, atan2(p.lat_change, p.lon_change) AS bearing_radians

cbrt

cbrt(x) -> float

Returns the cube root of x. Used in volume calculations and scaling operations.

Cypher
MATCH (c:Container)
RETURN c.id, cbrt(c.volume_cubic_units) AS side_length

ceil

ceil(x) -> integer

Rounds a number up to the nearest integer. Essential for allocation and quota calculations.

Cypher
MATCH (i:Invoice)
RETURN i.id, ceil(i.amount / 100) AS hundreds_required

clamp

clamp(value, min, max) -> number

Clamps a value between a minimum and maximum. Critical for enforcing range constraints.

Cypher
MATCH (s:Setting)
RETURN s.id, clamp(s.refresh_interval, 100, 5000) AS bounded_interval

cos

cos(x) -> float

Returns the cosine of x (in radians). Foundation for circular and oscillatory calculations.

Cypher
MATCH (w:Wave)
RETURN w.id, cos(w.phase_radians) AS amplitude

cosh

cosh(x) -> float

Returns the hyperbolic cosine of x. Used in exponential growth models and catenary curves.

Cypher
WITH 1.5 AS exponent
RETURN cosh(exponent) AS hyperbolic_value

degrees

degrees(radians) -> float

Converts radians to degrees. Essential for making angle calculations human-readable.

Cypher
MATCH (d:Direction)
RETURN d.id, degrees(d.heading_radians) AS heading_degrees

e

e() -> float

Returns the mathematical constant e (Euler's number, ~2.71828). Foundation for exponential functions.

Cypher
RETURN e() * e() AS e_squared

exp

exp(x) -> float

Returns e raised to the power of x. Critical for exponential growth and decay models.

Cypher
MATCH (g:Growth)
RETURN g.id, exp(g.growth_rate * g.time_years) AS multiplier

floor

floor(x) -> integer

Rounds a number down to the nearest integer. Essential for binning and conservative rounding.

Cypher
MATCH (p:Price)
RETURN p.id, floor(p.amount) AS whole_dollars

haversin

haversin(theta) -> float

Returns the haversine of an angle (in radians). Foundation for great-circle distance calculations.

Cypher
MATCH (g:GeoPoint)
RETURN g.id, haversin(g.latitude_radians) AS hav_value

hypot

hypot(x, y) -> float

Returns the Euclidean distance sqrt(x*x + y*y). Perfect for 2D distance calculations.

Cypher
MATCH (p1:Point {id: 1}), (p2:Point {id: 2})
RETURN hypot(p1.x - p2.x, p1.y - p2.y) AS distance

lerp

lerp(a, b, t) -> float

Linearly interpolates between a and b by factor t (clamped 0-1). Essential for animation and gradual transitions.

Cypher
WITH 100 AS start, 200 AS end, 0.3 AS progress
RETURN lerp(start, end, progress) AS current_value

log

log(x) -> float

Returns the natural logarithm (base e) of x. Essential for growth rate analysis and inverse exponentials.

Cypher
MATCH (m:Metric)
RETURN m.id, log(m.value) AS log_value
WHERE m.value > 0

log10

log10(x) -> float

Returns the base-10 logarithm of x. Used for magnitude scales like pH, Richter, and dB.

Cypher
MATCH (s:Sound)
RETURN s.id, 20 * log10(s.amplitude) AS decibels

log2

log2(x) -> float

Returns the base-2 logarithm of x. Used for bit/information calculations and complexity analysis.

Cypher
MATCH (d:Dataset)
RETURN d.id, log2(d.cardinality) AS information_bits

max_val

max_val(a, b) -> number

Returns the larger of two numeric values. Pairwise maximum for conditional logic.

Cypher
MATCH (c:Comparison)
RETURN c.id, max_val(c.value_a, c.value_b) AS larger

min_val

min_val(a, b) -> number

Returns the smaller of two numeric values. Pairwise minimum for constraint enforcement.

Cypher
MATCH (s:Schedule)
RETURN s.task_id, min_val(s.deadline, s.estimated_end) AS critical_date

mod

mod(dividend, divisor) -> number

Returns the remainder of integer division. Essential for cycling, sharding, and periodic calculations.

Cypher
MATCH (i:Item)
WHERE mod(i.id, 10) = 0
RETURN i.id AS every_tenth_item

pi

pi() -> float

Returns the mathematical constant pi (~3.14159). Foundation for circular and polar calculations.

Cypher
MATCH (c:Circle)
RETURN c.id, pi() * c.radius * c.radius AS area

power

power(base, exponent) -> float

Returns base raised to the given exponent. Essential for polynomial growth, scaling, and physics calculations.

Cypher
MATCH (c:Compound)
RETURN c.id, power(2, c.doubling_count) AS final_amount

radians

radians(degrees) -> float

Converts degrees to radians. Essential for trigonometric function inputs.

Cypher
MATCH (d:Direction)
RETURN d.id, sin(radians(d.bearing_degrees)) AS sine_component

rand

rand() -> float

Returns a random float between 0.0 (inclusive) and 1.0 (exclusive). Foundation for stochastic sampling and randomization.

Cypher
MATCH (u:User)
WHERE rand() < 0.05
RETURN u.id AS sampled_user

round

round(x) -> integer

Rounds a number to the nearest integer. Essential for displaying and rounding metrics.

Cypher
MATCH (r:Rating)
RETURN r.id, round(r.score) AS rounded_score

sign

sign(x) -> integer

Returns the signum of a number: -1 (negative), 0 (zero), or 1 (positive). Perfect for trend detection.

Cypher
MATCH (t:Transaction)
RETURN t.id,
  CASE sign(t.amount)
    WHEN -1 THEN 'Withdrawal'
    WHEN 1 THEN 'Deposit'
    ELSE 'No Change'
  END AS transaction_type

sin

sin(x) -> float

Returns the sine of x (in radians). Foundation for circular motion and oscillation models.

Cypher
MATCH (o:OscillatingSignal)
RETURN o.id, sin(o.phase) AS amplitude

sinh

sinh(x) -> float

Returns the hyperbolic sine of x. Used in hyperbolic geometry and exponential models.

Cypher
WITH 2.0 AS exponent
RETURN sinh(exponent) AS hyperbolic_sine

sqrt

sqrt(x) -> float

Returns the square root of x. Essential for distance calculations and variance operations.

Cypher
MATCH (n:Node)
RETURN n.id, sqrt(n.variance) AS standard_deviation
WHERE n.variance >= 0

tan

tan(x) -> float

Returns the tangent of x (in radians). Used for slope and angle relationship calculations.

Cypher
MATCH (a:Angle)
RETURN a.id, tan(a.radians) AS slope

tanh

tanh(x) -> float

Returns the hyperbolic tangent of x. Used in neural networks and sigmoid-like compression functions.

Cypher
MATCH (v:Vector)
RETURN v.id, tanh(v.value) AS normalized_value

truncate

truncate(x, places) -> float

Truncates a number to the specified number of decimal places. Precise for financial and display formatting.

Cypher
MATCH (i:Interest)
RETURN i.id, truncate(i.rate, 4) AS rate_4_decimals

Materialized Views

Create, manage, and refresh materialized views for cached query results. Similar to ClickHouse MATERIALIZED VIEW.

mv.create

CALL mv.create(name, query) YIELD name, status

Creates a materialized view by storing a Cypher query definition as a :MaterializedView node. Equivalent to ClickHouse CREATE MATERIALIZED VIEW.

Cypher
CALL mv.create("top_users", "MATCH (u:User)-[c:CREATED]->(p:Post) RETURN u.id, COUNT(p) AS post_count ORDER BY post_count DESC LIMIT 100") YIELD name, status RETURN name, status;

mv.drop

CALL mv.drop(name) YIELD name, status

Drops a materialized view definition.

Cypher
CALL mv.drop("top_users") YIELD name, status RETURN name, status;

mv.due

CALL mv.due() YIELD name, query, overdue_sec

Returns materialized views that are overdue for refresh based on their interval.

Cypher
CALL mv.due() YIELD name, query, overdue_sec WHERE overdue_sec > 300 RETURN name, overdue_sec ORDER BY overdue_sec DESC;

mv.list

CALL mv.list() YIELD name, query

Lists all materialized view definitions.

Cypher
CALL mv.list() YIELD name, query RETURN name, LENGTH(query) AS query_length ORDER BY name;

mv.refresh

CALL mv.refresh(name) YIELD name, query, status

Marks a materialized view as refreshed and returns its query. The caller executes the returned query. Cron-based refresh: * * * * * echo "CALL mv.due() YIELD name WITH name CALL mv.refresh(name) YIELD status RETURN status;" | xgconsole. Equivalent to ClickHouse SYSTEM REFRESH MATERIALIZED VIEW.

Cypher
CALL mv.refresh("top_users") YIELD name, query, status WITH query CALL apoc.cypher.run(query, {}) YIELD value RETURN name, status, COUNT(value) AS row_count;

mv.set_interval

CALL mv.set_interval(name, interval_sec) YIELD name, interval_sec

Sets the refresh interval in seconds for a materialized view.

Cypher
CALL mv.set_interval("top_users", 3600) YIELD name, interval_sec RETURN name, interval_sec;

Particle Physics

High-energy physics functions for particle detector analysis, event reconstruction, and kinematic calculations.

particle.aplanarity

particle.aplanarity(px_list, py_list, pz_list) -> float

Computes the aplanarity event-shape variable.

Cypher
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.id = "evt_001" WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.aplanarity(px, py, pz) AS aplanarity;

particle.corrected_mass

particle.corrected_mass(visible_mass, pt_miss) -> float

Returns the corrected mass accounting for missing transverse momentum.

Cypher
MATCH (e:Event {id: "evt_042"}) RETURN particle.corrected_mass(125.5, 47.3) AS corrected_mass;

particle.delta_r

particle.delta_r(eta1, phi1, eta2, phi2) -> float

Returns the angular separation deltaR in eta-phi space.

Cypher
MATCH (j1:Jet {id: "jet_1"}), (j2:Jet {id: "jet_2"}) RETURN particle.delta_r(j1.eta, j1.phi, j2.eta, j2.phi) AS separation_deltaR;

particle.dira

particle.dira(momentum, flight_direction) -> float

Returns the direction angle cosine between momentum and flight direction.

Cypher
MATCH (d:DecayChain {id: "B_decay_001"}) RETURN particle.dira([1.254, -0.387, 2.105], [0.891, -0.143, 0.883]) AS dira_cosine;

particle.flight_distance

particle.flight_distance(prod_vertex, decay_vertex) -> float

Returns the flight distance between production and decay vertices.

Cypher
MATCH (decay:Decay) RETURN particle.flight_distance([0.012, 0.008, 0.003], [3.450, 2.134, 1.678]) AS flight_distance_mm;

particle.fox_wolfram

particle.fox_wolfram(px_list, py_list, pz_list, order) -> float

Computes a Fox-Wolfram moment of the given order.

Cypher
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.id = "evt_089" WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.fox_wolfram(px, py, pz, 2) AS fw_moment_2;

particle.impact_parameter

particle.impact_parameter(track, vertex) -> float

Returns the impact parameter distance of a track to a vertex.

Cypher
MATCH (t:Track {id: "trk_234"}), (v:Vertex {id: "pv_001"}) RETURN particle.impact_parameter(t, v) AS impact_parameter_um;

particle.invariant_mass

particle.invariant_mass(E1, px1, py1, pz1, E2, px2, py2, pz2) -> float

Computes the invariant mass of a two-particle system from 4-momenta.

Cypher
RETURN particle.invariant_mass(125.5, 0.8, 2.1, 45.3, 127.2, -1.2, 1.9, 44.7) AS invariant_mass_GeV;

particle.invariant_mass_pt

particle.invariant_mass_pt(pt1, eta1, phi1, m1, pt2, eta2, phi2, m2) -> float

Computes invariant mass from pT, eta, phi, and mass.

Cypher
MATCH (l1:Lepton {type: "muon"}), (l2:Lepton {type: "muon"}) WHERE l1.charge != l2.charge RETURN particle.invariant_mass_pt(28.3, 1.2, 2.45, 0.1057, 35.7, -0.8, 5.63, 0.1057) AS dilepton_mass_GeV;

particle.missing_et

particle.missing_et(px_list, py_list) -> float

Returns the missing transverse energy from lists of px/py.

Cypher
MATCH (e:Event {id: "evt_156"})-[:CONTAINS]->(p:Particle) WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py RETURN particle.missing_et(px, py) AS missing_energy_GeV;

particle.pseudorapidity

particle.pseudorapidity(px, py, pz) -> float

Returns the pseudorapidity eta from a 3-momentum vector.

Cypher
MATCH (j:Jet {id: "jet_45"}) RETURN particle.pseudorapidity(j.px, j.py, j.pz) AS eta;

particle.rapidity

particle.rapidity(E, pz) -> float

Returns the rapidity y from energy and longitudinal momentum.

Cypher
MATCH (p:Particle {id: "pho_78"}) RETURN particle.rapidity(p.energy, p.pz) AS rapidity_y;

particle.sphericity

particle.sphericity(px_list, py_list, pz_list) -> float

Computes the sphericity event-shape variable.

Cypher
MATCH (e:Event {id: "evt_203"})-[:CONTAINS]->(p:Particle) WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.sphericity(px, py, pz) AS sphericity_value;

particle.thrust

particle.thrust(px_list, py_list, pz_list) -> float

Computes the thrust event-shape variable.

Cypher
MATCH (e:Event)-[:CONTAINS]->(p:Particle) WHERE e.run_number = 2024 WITH COLLECT(p.px) AS px, COLLECT(p.py) AS py, COLLECT(p.pz) AS pz RETURN particle.thrust(px, py, pz) AS thrust_axis ORDER BY thrust_axis DESC LIMIT 1;

particle.transverse_momentum

particle.transverse_momentum(px, py) -> float

Returns the transverse momentum pT from px and py components.

Cypher
MATCH (l:Lepton {type: "muon"}) WHERE l.px IS NOT NULL RETURN l.id, particle.transverse_momentum(l.px, l.py) AS pT_GeV ORDER BY pT_GeV DESC LIMIT 10;

particle.vertex_chi2

particle.vertex_chi2(tracks_list) -> float

Returns the chi-squared of a vertex fit.

Cypher
MATCH (v:Vertex {id: "sv_001"})-[:CONTAINS]->(t:Track) WITH v, COLLECT(t) AS tracks RETURN v.id, particle.vertex_chi2(tracks) AS chi2_fit WHERE chi2_fit < 10.0;

Patent (Automated Test Generation)

Patent-protected automated test generation (ATG) functions for code coverage and mutation testing analysis.

atg_boundary_values

atg_boundary_values(node) -> list

Returns suggested boundary values for automated test inputs.

Cypher
MATCH (fn:Function {name: "process_transaction"}) RETURN fn.name, atg_boundary_values(fn) AS test_boundaries;

atg_coverage

atg_coverage(node) -> float

Returns the automated test generation coverage score for a code node.

Cypher
MATCH (fn:Function) RETURN fn.name, atg_coverage(fn) AS coverage_score ORDER BY coverage_score ASC LIMIT 20;

atg_mutation_score

atg_mutation_score(node) -> float

Returns the mutation testing score from ATG analysis.

Cypher
MATCH (fn:Function {module: "core_engine"}) RETURN fn.name, atg_mutation_score(fn) AS mutation_score WHERE mutation_score > 0.85 ORDER BY mutation_score DESC;

atg_test_priority

atg_test_priority(node) -> float

Returns a priority score for which tests should be generated first.

Cypher
MATCH (fn:Function {in_hot_path: true}) RETURN fn.name, atg_test_priority(fn) AS priority ORDER BY priority DESC LIMIT 15;

Physics

Physics functions for ballistics, orbital mechanics, oceanography, and relativistic calculations.

mil.ballistic_range

mil.ballistic_range(velocity_mps, angle_deg) -> float

Computes the ballistic range in meters for a projectile.

Cypher
RETURN mil.ballistic_range(850.0, 35.0) AS range_meters;

mil.radar_horizon

mil.radar_horizon(antenna_height_m, target_height_m?) -> float

Returns the radar horizon distance in meters.

Cypher
MATCH (r:Radar {id: "radar_01"}) RETURN mil.radar_horizon(45.0, 8000.0) AS horizon_km UNION RETURN mil.radar_horizon(45.0) AS horizon_sea_level;

ocean.drift

ocean.drift(lat, lon, current_speed, current_dir, time_sec) -> [lat, lon]

Predicts position after drifting in an ocean current.

Cypher
MATCH (s:Ship {id: "SS_001"}) RETURN ocean.drift(s.lat, s.lon, 1.2, 45.0, 3600.0) AS drifted_position;

ocean.sound_speed

ocean.sound_speed(temp_c, salinity_psu, depth_m) -> float

Returns the speed of sound in seawater in m/s using the UNESCO formula.

Cypher
RETURN ocean.sound_speed(15.5, 35.0, 1000.0) AS sound_speed_ms;

orbital.escape_velocity

orbital.escape_velocity(radius_m, mass_kg) -> float

Returns the escape velocity in m/s at a given radius from a body.

Cypher
WITH 6.371e6 AS earth_radius_m, 5.972e24 AS earth_mass_kg RETURN orbital.escape_velocity(earth_radius_m, earth_mass_kg) AS escape_velocity_ms;

orbital.hohmann

orbital.hohmann(r1_m, r2_m, mass_kg) -> map

Computes delta-v values for a Hohmann transfer orbit between two radii.

Cypher
WITH 6.678e6 AS leo_m, 4.224e7 AS geo_m, 5.972e24 AS earth_mass_kg RETURN orbital.hohmann(leo_m, geo_m, earth_mass_kg) AS transfer_deltav;

orbital.period

orbital.period(semi_major_axis_m, central_mass_kg) -> float

Returns the orbital period in seconds from Kepler's third law.

Cypher
WITH 4.224e7 AS geostationary_sma_m, 5.972e24 AS earth_mass_kg RETURN orbital.period(geostationary_sma_m, earth_mass_kg) / 86400.0 AS period_days;

orbital.velocity

orbital.velocity(radius_m, central_mass_kg) -> float

Returns the circular orbital velocity in m/s at a given radius.

Cypher
WITH 6.678e6 AS leo_orbit_m, 5.972e24 AS earth_mass_kg RETURN orbital.velocity(leo_orbit_m, earth_mass_kg) AS leo_velocity_ms;

physics.gps_correction

physics.gps_correction(orbit_radius_m, velocity_mps) -> float

Returns the relativistic clock correction in seconds per second for GPS.

Cypher
WITH 2.662e7 AS gps_orbit_radius_m, 3874.0 AS gps_velocity_ms RETURN physics.gps_correction(gps_orbit_radius_m, gps_velocity_ms) AS relativistic_correction;

physics.lorentz

physics.lorentz(velocity_mps) -> float

Returns the Lorentz factor gamma for a given velocity.

Cypher
WITH 0.9 * 299792458 AS relativistic_velocity_ms RETURN physics.lorentz(relativistic_velocity_ms) AS gamma_factor;

physics.time_dilation

physics.time_dilation(velocity_mps) -> float

Returns the Lorentz time dilation factor at a given velocity.

Cypher
WITH 10000000.0 AS high_velocity_ms RETURN physics.time_dilation(high_velocity_ms) AS time_dilation_factor;

Predicate Functions

Type and existence check functions for validation and filtering.

isempty

isempty(value) -> bool

Returns true if a list, map, or string is empty.

Cypher
MATCH (u:User {id: 1})
WHERE isempty(u.tags)
RETURN u.name, u.tags AS empty_tags

isinf

isinf(x) -> bool

Returns true if the value is positive or negative infinity.

Cypher
MATCH (r:Reading {sensor_id: 'A1'})
WHERE NOT isinf(r.temperature)
RETURN r.timestamp, r.temperature
LIMIT 10

isnan

isnan(x) -> bool

Returns true if the value is NaN.

Cypher
MATCH (m:Measurement)
WHERE isnan(m.calibration_factor)
RETURN m.id, m.device, m.calibration_factor AS invalid_value
ORDER BY m.timestamp DESC

ProcessFlow Functions

Traverse and validate flow graphs, state machines, and decision trees.

xray.flow_trace

CALL xray.flow_trace(start_label, start_prop, start_val, max_depth) YIELD step, depth, node_name, node_type, condition, action

Traces execution paths through flow graphs (decision trees, state machines). Follows FLOW_EDGE, NEXT, DECISION, and TRANSITION edges.

Cypher
CALL xray.flow_trace('LoanApproval', 'loan_id', '12345', 5)
YIELD step, depth, node_name, node_type, condition, action
RETURN step, depth, node_name, node_type, condition, action
ORDER BY step

xray.flow_validate

CALL xray.flow_validate(flow_name) YIELD flow_name, node_count, start_nodes, end_nodes, orphan_nodes, valid, issues

Validates flow graph structure: checks for start/end nodes, orphans, and connectivity.

Cypher
CALL xray.flow_validate('OrderProcessing')
YIELD flow_name, node_count, valid, issues
WHERE NOT valid
RETURN flow_name, node_count, issues

Protocol Functions (xrayProtocol)

Wire format reference for xrayProtocol v1 (native transport on port 7689). All messages are 8-byte header + payload.

XRAYPROTOCOL_FRAME

Frame: [u32 payload_length][u8 msg_type][u8 flags][u16 query_id][payload...]

Every xrayProtocol message is an 8-byte header + payload. payload_length excludes the header. flags bit 0 = LZ4 compression. query_id for multiplexing. Default port: 7689.

Wire Format
HELLO Frame:
  [u32 0x0000002A]        // payload length: 42 bytes
  [u8 0x01]              // msg_type: HELLO
  [u8 0x00]              // flags: no compression
  [u16 0x0001 LE]        // query_id: 1
  [payload: 42 bytes]    // auth + version

XRAYPROTOCOL_HELLO

HELLO (0x01): [u16 version=1][u16 capabilities][u32 token_len][token_bytes][u32 db_len?][db_bytes?]

Client→Server. auth_token is 'user:password' UTF-8. database_name is optional (empty = default). Server responds with HELLO_OK (0x02).

Wire Format
HELLO payload (frame msg_type=0x01):
  [u16 0x0001 LE]        // version: 1
  [u16 0x0007 LE]        // capabilities: PROFILE|EXPLAIN|COMPRESSED
  [u32 0x0000000C LE]    // token_len: 12
  [12 bytes: "admin:pass"]
  [u32 0x00000000 LE]    // db_len: 0 (use default)
  // Server responds: HELLO_OK (0x02) frame

XRAYPROTOCOL_EXECUTE

EXECUTE (0x03): [u8 language][u32 query_len][query_bytes][u32 param_count][params...][u32 options]

Client→Server. language: 0=Cypher, 1=GFQL. params: each is String name + typed value. options: bitmask (1=PROFILE, 2=EXPLAIN, 4=READ_ONLY). Server responds: SCHEMA(0x04) + BATCH(0x05)* + COMPLETE(0x06), or ERROR(0x07).

Wire Format
EXECUTE payload (frame msg_type=0x03):
  [u8 0x00]              // language: Cypher
  [u32 0x0000001F LE]    // query_len: 31
  [31 bytes: "MATCH (n:User) RETURN n LIMIT 1"]
  [u32 0x00000000 LE]    // param_count: 0
  [u32 0x00000001 LE]    // options: PROFILE
  // Server responds: SCHEMA, BATCH(es), COMPLETE

XRAYPROTOCOL_RESULTS

SCHEMA(0x04): [u16 col_count][col_defs...] | BATCH(0x05): [u32 row_count][rows...] | COMPLETE(0x06): [u32 total_rows][u32 exec_us][u32 compile_us]

Server→Client result flow: one SCHEMA, zero or more BATCHes, one COMPLETE. Each value in BATCH: u8 type_tag + value bytes. Type tags match ColumnType enum (0x01=null, 0x02=bool, 0x03=int64, 0x04=double, 0x05=string).

Wire Format
SCHEMA + BATCH + COMPLETE payload chain:
  SCHEMA (0x04): [u16 0x0001 LE] [u8 0x03] [u32 0x00000001] ['n']
    // 1 column, type INT64 (0x03), name_len=1, name='n'
  BATCH (0x05): [u32 0x00000001 LE] [u8 0x03] [i64 0x000000000000012C LE]
    // 1 row, column type INT64, value=300
  COMPLETE (0x06): [u32 0x00000001 LE] [u32 0x000003E8 LE] [u32 0x000000FA LE]
    // total_rows=1, exec_time=1000 µs, compile_time=250 µs

XRAYPROTOCOL_BULK

BULK_INSERT_NODES(0x21) / BULK_UPSERT_NODES(0x27): [u32 rows][u16 cols][col_defs...][col_data...]

Columnar batch write. First start with BULK_INSERT_BEGIN(0x20). Upsert uses first property as key with 3-tier lookup: GID cache O(1), label-property index O(1), full scan O(N). End with BULK_INSERT_COMMIT(0x24). ACK: 0x25, Error: 0x26.

Wire Format
BULK_INSERT_NODES (0x21) payload:
  [u32 0x00000002 LE]    // rows: 2
  [u16 0x0003 LE]        // cols: 3 (id, name, age)
  // Column definitions:
  [u8 0x05] [u32 0x00000002] ['id']       // STRING, name_len=2
  [u8 0x05] [u32 0x00000004] ['name']    // STRING, name_len=4
  [u8 0x03] [u32 0x00000003] ['age']     // INT64, name_len=3
  // Column data (row 1):
  [u8 0x05] [u32 0x00000002] ['42'] [u8 0x05] [u32 0x00000004] ['Alice'] [u8 0x03] [i64 28]
  // Column data (row 2):
  [u8 0x05] [u32 0x00000002] ['43'] [u8 0x05] [u32 0x00000003] ['Bob'] [u8 0x03] [i64 35]

RAG/LLM Functions

Text processing, embeddings, semantic search, and context management for retrieval-augmented generation.

embed

embed(text) -> list

Generates a 384-dimensional vector embedding from text using ONNX or hash fallback.

Cypher
MATCH (doc:Document {id: 'doc_001'})
SET doc.embedding = embed(doc.content)
RETURN doc.id, size(doc.embedding) AS dimensions

vector_dot_product

vector_dot_product(vec1, vec2) -> float

Returns the dot product of two numeric vectors.

Cypher
WITH embed('machine learning') AS query_emb
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
WITH doc, vector_dot_product(query_emb, doc.embedding) AS similarity
RETURN doc.title, similarity
ORDER BY similarity DESC
LIMIT 5

vector_normalize

vector_normalize(vec) -> list

Returns the L2-normalized unit vector.

Cypher
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
SET doc.normalized_embedding = vector_normalize(doc.embedding)
RETURN count(doc) AS documents_normalized

vector_norm

vector_norm(vec) -> float

Returns the L2 (Euclidean) norm of a vector.

Cypher
WITH [3.0, 4.0] AS vec
RETURN vector_norm(vec) AS magnitude

vector_scale

vector_scale(vec, scalar) -> list

Returns a vector scaled by a scalar multiplier.

Cypher
WITH [1.0, 2.0, 3.0] AS vec
RETURN vector_scale(vec, 2.5) AS scaled

vector_add

vector_add(vec1, vec2) -> list

Returns the element-wise sum of two vectors.

Cypher
WITH [1.0, 2.0, 3.0] AS v1, [4.0, 5.0, 6.0] AS v2
RETURN vector_add(v1, v2) AS sum_vec

vector_dimension

vector_dimension(vec) -> integer

Returns the number of dimensions in a vector.

Cypher
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
RETURN doc.id, vector_dimension(doc.embedding) AS dims
LIMIT 1

bm25_score

bm25_score(query, document, doc_count, avg_doc_len) -> float

Computes the BM25 relevance score for a query against a document.

Cypher
MATCH (doc:Document)
WITH doc,
     'information retrieval' AS query,
     1000 AS total_docs,
     450 AS avg_length
RETURN doc.title,
       bm25_score(query, doc.content, total_docs, avg_length) AS score
ORDER BY score DESC
LIMIT 10

tf_idf

tf_idf(term, document, corpus_size, doc_freq) -> float

Computes the TF-IDF score for a term in a document.

Cypher
WITH 'neural' AS term,
     'deep learning requires neural networks' AS doc,
     500 AS corpus_size,
     42 AS docs_with_term
RETURN tf_idf(term, doc, corpus_size, docs_with_term) AS score

text_similarity

text_similarity(s1, s2) -> float

Returns a 0-1 similarity score between two text strings.

Cypher
WITH 'The quick brown fox' AS text1,
     'A fast brown fox' AS text2
RETURN text_similarity(text1, text2) AS similarity

levenshtein_distance

levenshtein_distance(s1, s2) -> integer

Returns the edit distance between two strings.

Cypher
WITH 'kitten' AS s1, 'sitting' AS s2
RETURN levenshtein_distance(s1, s2) AS edit_distance

ngram_similarity

ngram_similarity(s1, s2, n?) -> float

Returns the n-gram similarity score between two strings.

Cypher
WITH 'analysis' AS word1, 'analytics' AS word2
RETURN ngram_similarity(word1, word2, 3) AS trigram_sim

relevance_score

relevance_score(query_vec, doc_vec) -> float

Returns a composite relevance score between a query and document vector.

Cypher
WITH embed('natural language processing') AS query_vec
MATCH (doc:Document)
WHERE doc.embedding IS NOT NULL
WITH doc, relevance_score(query_vec, doc.embedding) AS score
RETURN doc.title, score
ORDER BY score DESC
LIMIT 5

context_rank

context_rank(query, candidates) -> list

Ranks candidate texts by relevance to a query.

Cypher
WITH 'What is machine learning?' AS query,
     ['ML is a type of AI',
      'Cooking is fun',
      'Algorithms enable machine learning'] AS candidates
RETURN context_rank(query, candidates) AS ranked_candidates

extract_keywords

extract_keywords(text, max_keywords?) -> list

Extracts the most significant keywords from a text.

Cypher
MATCH (article:Article {id: 'art_123'})
RETURN extract_keywords(article.body, 10) AS top_keywords

text_chunk_by_size

text_chunk_by_size(text, max_chars) -> list

Splits text into chunks of at most max_chars characters.

Cypher
MATCH (doc:Document {id: 'doc_001'})
WITH text_chunk_by_size(doc.content, 256) AS chunks
UNWIND chunks AS chunk
RETURN chunk AS text_chunk
LIMIT 5

text_chunk_by_words

text_chunk_by_words(text, max_words) -> list

Splits text into chunks of at most max_words words.

Cypher
MATCH (doc:Document)
WITH text_chunk_by_words(doc.content, 100) AS chunks
UNWIND chunks AS chunk
SET doc.chunks = chunks
RETURN count(doc) AS documents_chunked

text_chunk_overlap

text_chunk_overlap(text, chunk_size, overlap) -> list

Splits text into overlapping chunks of given size and overlap.

Cypher
MATCH (manual:Manual {product: 'XR-2000'})
WITH text_chunk_overlap(manual.text, 512, 64) AS chunks
UNWIND chunks AS chunk
CREATE (chunk_node:Chunk {text: chunk})
RETURN count(chunk_node) AS chunks_created

estimate_tokens

estimate_tokens(text) -> integer

Estimates the token count for a text string.

Cypher
MATCH (prompt:Prompt {name: 'rag_context'})
RETURN estimate_tokens(prompt.template) AS token_count

fits_in_context

fits_in_context(text, max_tokens) -> bool

Returns true if the text fits within the given token limit.

Cypher
MATCH (doc:Document)
WHERE fits_in_context(doc.content, 2048)
RETURN doc.title, estimate_tokens(doc.content) AS tokens
LIMIT 20

truncate_to_tokens

truncate_to_tokens(text, max_tokens) -> string

Truncates text to fit within a token limit.

Cypher
MATCH (doc:Document {id: 'doc_001'})
WITH truncate_to_tokens(doc.content, 1024) AS truncated
RETURN truncated AS limited_context

context_utilization

context_utilization(text, max_tokens) -> float

Returns the fraction of the context window used by the text.

Cypher
MATCH (query:Query {id: 'q_789'})
WITH context_utilization(query.prompt, 4096) AS usage
WHERE usage > 0.8
RETURN query.id, usage AS context_percent

format_prompt

format_prompt(template, variables_map) -> string

Substitutes variables into a prompt template string.

Cypher
WITH {user: 'Alice', topic: 'quantum computing'} AS vars
RETURN format_prompt(
  'Hello {user}, tell me about {topic}',
  vars
) AS formatted_prompt

graph_context

graph_context(node, depth?) -> string

Returns a textual context summary of a node and its neighborhood.

Cypher
MATCH (person:Person {id: 'p_456'})
RETURN graph_context(person, 2) AS neighborhood_summary

word_count

word_count(text) -> integer

Returns the number of words in a text string.

Cypher
MATCH (article:Article)
WHERE word_count(article.body) > 500
RETURN article.title, word_count(article.body) AS word_len
ORDER BY word_len DESC
LIMIT 10

sentence_count

sentence_count(text) -> integer

Returns the number of sentences in a text string.

Cypher
MATCH (summary:Summary)
WHERE sentence_count(summary.content) BETWEEN 5 AND 20
RETURN summary.id, sentence_count(summary.content) AS sentences

readability_score

readability_score(text) -> float

Returns the Flesch-Kincaid readability score of a text.

Cypher
MATCH (content:Content)
WITH readability_score(content.body) AS readability
WHERE readability < 12.0
RETURN content.id, readability AS flesch_kincaid_grade

text_fingerprint

text_fingerprint(text) -> integer

Returns a fast hash fingerprint of a text string.

Cypher
MATCH (doc:Document)
RETURN doc.id, text_fingerprint(doc.content) AS content_hash
LIMIT 100

reciprocal_rank_fusion

reciprocal_rank_fusion(rank_lists, k?) -> list

Merges multiple ranked lists using reciprocal rank fusion.

Cypher
WITH [['doc_1', 'doc_2', 'doc_3'],
      ['doc_2', 'doc_1', 'doc_4']] AS rankings
RETURN reciprocal_rank_fusion(rankings, 60) AS fused_ranking

Reactive Engine Functions (5)

engine.create_model

CALL engine.create_model(label, property, type, min_samples?, sigma_threshold?) YIELD label, property, type, status

Creates a statistical model bound to a (label, property) pair. The engine learns baselines from writes and detects deviations inline during SetProperty — O(K) per write, K bounded by schema. Types: envelope, exact, frequency, trend, seasonal, hybrid, auto. Model stored as :Model graph node for persistence and replication.

Cypher
CALL engine.create_model("User", "age", "envelope", 10, 2.5) YIELD status

engine.drain_events

CALL engine.drain_events(limit?) YIELD vertex_gid, label_id, property_id, sigma, expected, actual, model_type, timestamp

Drains deviation events from the lock-free ring buffer. Events are consumed (not re-readable). Use for alerting, dashboards, or external system integration. Buffer holds 100K events; oldest evicted on overflow.

Cypher
CALL engine.drain_events(50) YIELD vertex_gid, sigma, expected, actual

engine.drop_model

CALL engine.drop_model(label, property) YIELD label, property, status

Drops a statistical model. Deviation checks stop immediately. Removes the :Model graph node.

Cypher
CALL engine.drop_model("User", "age") YIELD status

engine.show_model_types

CALL engine.show_model_types() YIELD type, fits, parameters, auto_detect_when

Lists all available model types with parameters and auto-detection heuristics. Use this to understand which model type fits your data.

Cypher
CALL engine.show_model_types() YIELD type, parameters

engine.show_models

CALL engine.show_models() YIELD label, property, type, status, sample_count

Lists all active models with current status (cold/learning/warm) and sample counts.

Cypher
CALL engine.show_models() YIELD label, property, type, status

Scalar Functions (31)

degree

degree(node) -> integer

Returns the total degree (in + out) of a node.

Cypher
MATCH (n:User) RETURN n.name, degree(n) ORDER BY degree(n) DESC

endnode

endnode(edge) -> node

Returns the end (target) node of a relationship.

Cypher
MATCH (s:User)-[r:FOLLOWS]->(t) RETURN s.name, endnode(r) as target

frombytestring

frombytestring(string) -> value

Decodes a byte-encoded string back to a value.

Cypher
RETURN frombytestring("SGVsbG8gV29ybGQ=") AS decoded_value

head

head(list) -> value

Returns the first element of a list.

Cypher
WITH [1, 2, 3, 4, 5] AS nums RETURN head(nums) AS first_element

id

id(node_or_edge) -> integer

Returns the internal ID of a node or relationship.

Cypher
MATCH (n:User) WHERE n.name = "Alice" RETURN id(n) AS user_id

indegree

indegree(node) -> integer

Returns the number of incoming edges of a node.

Cypher
MATCH (n:User) RETURN n.name, indegree(n) AS incoming_edges

json_extract

json_extract(json_string, path) -> value

Extracts a value from a JSON string using a path expression.

Cypher
WITH '{"user": "Alice", "age": 30}' AS json_str RETURN json_extract(json_str, "$.user")

last

last(list) -> value

Returns the last element of a list.

Cypher
WITH [10, 20, 30, 40] AS values RETURN last(values) AS final_element

length

length(value) -> integer

Alias for size(); returns length of a list, path, or string.

Cypher
WITH [1, 2, 3, 4, 5] AS list RETURN length(list) AS list_size

outdegree

outdegree(node) -> integer

Returns the number of outgoing edges of a node.

Cypher
MATCH (n:User) RETURN n.name, outdegree(n) AS outgoing_edges

parse_float

parse_float(string) -> float

Parses a string to a floating-point number.

Cypher
WITH "3.14159" AS num_str RETURN parse_float(num_str) AS pi_value

parse_int

parse_int(string, radix?) -> integer

Parses a string to an integer with an optional radix.

Cypher
WITH "FF" AS hex_str RETURN parse_int(hex_str, 16) AS decimal_value

properties

properties(node_or_edge) -> map

Returns a map of all properties on a node or relationship.

Cypher
MATCH (n:User {name: "Bob"}) RETURN properties(n) AS user_properties

propertysize

propertysize(entity, key) -> integer

Returns the byte size of a specific property value.

Cypher
MATCH (n:User) RETURN n.name, propertysize(n, "description") AS size_bytes

randomuuid

randomuuid() -> string

Generates a random UUID v4 string.

Cypher
RETURN randomuuid() AS new_uuid

size

size(value) -> integer

Returns the size of a list, map, path, or string.

Cypher
MATCH (n:User)-[r]->() RETURN n.name, size(r) AS rel_count

startnode

startnode(edge) -> node

Returns the start (source) node of a relationship.

Cypher
MATCH (s:User)-[r:FOLLOWS]->(t) RETURN startnode(r) as follower, s.name

timestamp

timestamp() -> integer

Returns the current epoch time in milliseconds.

Cypher
RETURN timestamp() AS current_time_ms

to_json

to_json(value) -> string

Serializes a value to a JSON string.

Cypher
MATCH (n:User {name: "Carol"}) RETURN to_json(properties(n)) AS json_output

toboolean

toboolean(value) -> bool

Converts a value to a boolean.

Cypher
WITH ["true", "false", "1", "0"] AS bool_strs RETURN [b in bool_strs | toboolean(b)]

tobooleanlist

tobooleanlist(list) -> list

Converts all elements in a list to booleans.

Cypher
WITH [[1, 0, 1]] AS int_list RETURN tobooleanlist(int_list[0])

tobytestring

tobytestring(value) -> string

Converts a value to a byte-encoded string.

Cypher
RETURN tobytestring("Hello") AS encoded

toenum

toenum(type, value) -> enum

Converts a type name and value to an enum.

Cypher
RETURN toenum("Status", "ACTIVE") AS status_enum

tofloat

tofloat(value) -> float

Converts a value to a floating-point number.

Cypher
WITH ["1.5", "2.7", "3.9"] AS nums RETURN [n in nums | tofloat(n)]

tofloatlist

tofloatlist(list) -> list

Converts all elements in a list to floats.

Cypher
WITH [["1.2", "3.4"]] AS num_strs RETURN tofloatlist(num_strs[0])

tointeger

tointeger(value) -> integer

Converts a value to an integer.

Cypher
WITH "42" AS num_str RETURN tointeger(num_str) AS int_value

tointegerlist

tointegerlist(list) -> list

Converts all elements in a list to integers.

Cypher
WITH [["10", "20", "30"]] AS int_strs RETURN tointegerlist(int_strs[0])

type

type(edge) -> string

Returns the type name of a relationship.

Cypher
MATCH (u:User)-[r:FOLLOWS]->(v) RETURN type(r) AS relationship_type

typeof

typeof(value) -> string

Returns the runtime type name of a value.

Cypher
RETURN [typeof(1), typeof("text"), typeof([1,2,3]), typeof({a: 1})]

uuid_generate

uuid_generate() -> string

Generates a new UUID v4 string.

Cypher
RETURN uuid_generate() AS generated_uuid

valuetype

valuetype(value) -> string

Returns the type name of any value as a string.

Cypher
RETURN valuetype(42) AS int_type, valuetype("text") AS string_type

String Functions (35)

base64_decode

base64_decode(base64_string) -> value

Decodes a Base64 string back to bytes.

Cypher
RETURN base64_decode("SGVsbG8gV29ybGQ=") AS decoded_text

base64_encode

base64_encode(value) -> string

Encodes a value as a Base64 string.

Cypher
RETURN base64_encode("Hello World") AS encoded_text

char_length

char_length(string) -> integer

Returns the number of characters in a string.

Cypher
WITH "xrayGraphDB" AS text RETURN char_length(text) AS char_count

concat

concat(values...) -> string

Concatenates all arguments into a single string.

Cypher
RETURN concat("Hello", " ", "World", "!") AS greeting

contains

contains(string, substring) -> bool

Returns true if the string contains the given substring.

Cypher
WITH "xrayGraphDB" AS text RETURN contains(text, "Graph") AS has_graph

ends_with_func

ends_with_func(string, suffix) -> bool

Returns true if the string ends with the given suffix.

Cypher
WITH "database.cpp" AS filename RETURN ends_with_func(filename, ".cpp") AS is_cpp_file

endswith

endswith(string, suffix) -> bool

Returns true if the string ends with the given suffix.

Cypher
WITH "query.cypher" AS file RETURN endswith(file, ".cypher") AS is_cypher

format_number

format_number(number, format) -> string

Formats a number according to a format pattern string.

Cypher
RETURN format_number(1234.5678, "#,##0.00") AS formatted

hex_decode

hex_decode(hex_string) -> value

Decodes a hexadecimal string back to bytes.

Cypher
RETURN hex_decode("48656c6c6f") AS decoded_hex

hex_encode

hex_encode(value) -> string

Encodes a value as a hexadecimal string.

Cypher
RETURN hex_encode("Hello") AS hex_encoded

indexof

indexof(string, substring) -> integer

Returns the zero-based index of the first occurrence of a substring, or -1.

Cypher
WITH "xrayGraphDB" AS text RETURN indexof(text, "Graph") AS graph_pos

join

join(list, delimiter) -> string

Joins list elements into a single string with a delimiter.

Cypher
WITH ["Alice", "Bob", "Carol"] AS names RETURN join(names, ", ") AS name_list

left

left(string, n) -> string

Returns the leftmost n characters of a string.

Cypher
WITH "xrayGraphDB" AS text RETURN left(text, 4) AS first_four

lpad

lpad(string, length, pad_char?) -> string

Left-pads a string to the specified length.

Cypher
WITH "42" AS num RETURN lpad(num, 5, "0") AS padded_num

ltrim

ltrim(string) -> string

Removes leading whitespace from a string.

Cypher
WITH "  leading spaces" AS text RETURN ltrim(text) AS trimmed

pad_number

pad_number(number, width, pad_char?) -> string

Formats a number as a zero-padded string of the given width.

Cypher
RETURN pad_number(7, 4, "0") AS zero_padded

regex_match

regex_match(string, pattern) -> list|null

Returns regex capture groups if the pattern matches, or null.

Cypher
WITH "user@example.com" AS email RETURN regex_match(email, "([^@]+)@(.+)") AS parts

repeat

repeat(string, count) -> string

Returns a string repeated the given number of times.

Cypher
WITH "x" AS char RETURN repeat(char, 10) AS repeated_x

replace

replace(string, search, replacement) -> string

Replaces all occurrences of search with replacement in a string.

Cypher
WITH "Hello World" AS text RETURN replace(text, "World", "xrayGraphDB") AS replaced

replace_string

replace_string(string, search, replacement) -> string

Replaces occurrences of a search string within a string.

Cypher
WITH "old value" AS text RETURN replace_string(text, "old", "new") AS new_text

reverse

reverse(string) -> string

Returns the string reversed.

Cypher
WITH "stressed" AS word RETURN reverse(word) AS reversed_word

reverse_string

reverse_string(string) -> string

Returns the string with characters in reverse order.

Cypher
WITH "GraphDB" AS text RETURN reverse_string(text) AS backwards

right

right(string, n) -> string

Returns the rightmost n characters of a string.

Cypher
WITH "xrayGraphDB" AS text RETURN right(text, 2) AS last_two

rpad

rpad(string, length, pad_char?) -> string

Right-pads a string to the specified length.

Cypher
WITH "42" AS num RETURN rpad(num, 5, ".") AS right_padded

rtrim

rtrim(string) -> string

Removes trailing whitespace from a string.

Cypher
WITH "trailing spaces  " AS text RETURN rtrim(text) AS trimmed

split

split(string, delimiter) -> list

Splits a string by delimiter into a list of strings.

Cypher
WITH "Alice,Bob,Carol" AS names RETURN split(names, ",") AS name_list

starts_with_func

starts_with_func(string, prefix) -> bool

Returns true if the string starts with the given prefix.

Cypher
WITH "xrayGraphDB" AS text RETURN starts_with_func(text, "xray") AS starts_x

startswith

startswith(string, prefix) -> bool

Returns true if the string starts with the given prefix.

Cypher
WITH "GraphQL" AS text RETURN startswith(text, "Graph") AS starts_g

substring

substring(string, start, length?) -> string

Returns a substring starting at the given index with optional length.

Cypher
WITH "xrayGraphDB" AS text RETURN substring(text, 4, 5) AS substr

to_string

to_string(value) -> string

Converts any value to its string representation.

Cypher
RETURN to_string(42) AS num_as_string, to_string([1,2,3]) AS list_as_string

tolower

tolower(string) -> string

Converts a string to lowercase.

Cypher
WITH "XRAYGRAPHDB" AS text RETURN tolower(text) AS lowercase

tostring

tostring(value) -> string

Converts any value to its string representation.

Cypher
RETURN tostring(true) AS bool_string, tostring(3.14) AS float_string

tostringornull

tostringornull(value) -> string|null

Converts a value to a string, returning null if conversion fails.

Cypher
RETURN tostringornull(42) AS num_str, tostringornull(null) AS null_result

toupper

toupper(string) -> string

Converts a string to uppercase.

Cypher
WITH "xrayGraphDB" AS text RETURN toupper(text) AS uppercase

trim

trim(string) -> string

Removes leading and trailing whitespace from a string.

Cypher
WITH "  whitespace  " AS text RETURN trim(text) AS trimmed_text

System Functions

System functions provide database diagnostics, user identification, and snapshot management.

assert

assert(condition, message?) -> bool

Throws an exception if the condition is false. Useful for validating query results and enforcing invariants during execution.

Cypher
MATCH (n:User) WHERE assert(n.age > 0, "Age must be positive") RETURN n.name, n.age

counter

counter(name, initial?) -> integer

Returns an auto-incrementing counter for the given name. Each call increments the counter. Useful for generating sequential IDs or tracking operation counts.

Cypher
CREATE (n:Task {id: counter("task_id", 1000), name: "Process payment", created: timestamp()}) RETURN n.id, n.name

explain_analyze

explain_analyze(query_string) -> map

Returns execution plan analysis with cost estimates and row count predictions. Provides insights into query optimization and plan selection.

Cypher
RETURN explain_analyze("MATCH (n:Product) WHERE n.price > 100 RETURN n LIMIT 10") AS plan

gethopscounter

gethopscounter(path) -> integer

Returns the hop count from a variable-length path traversal. Counts the number of relationships in a path.

Cypher
MATCH p = (src:Airport {code: "LAX"})-[:ROUTE*1..5]->(dst:Airport {code: "JFK"}) RETURN gethopscounter(p) AS num_hops, length(p) AS path_length LIMIT 1

graph_diff

graph_diff(snapshot1, snapshot2) -> map

Returns the differences between two graph snapshots. Shows added, removed, and modified nodes and relationships.

Cypher
WITH graph_snapshot("Account") AS snap1, graph_snapshot("Account") AS snap2 RETURN graph_diff(snap1, snap2) AS changes

graph_snapshot

graph_snapshot(label?) -> map

Captures a point-in-time snapshot of the graph or a label subset. Returns a snapshot object that can be compared with other snapshots.

Cypher
RETURN graph_snapshot("Transaction") AS txn_snapshot, graph_snapshot() AS full_snapshot

roles

roles() -> list

Returns the list of roles assigned to the current authenticated user. Used for role-based access control (RBAC) queries.

Cypher
RETURN username() AS current_user, roles() AS assigned_roles

username

username() -> string

Returns the current authenticated username. Returns NULL if running without authentication.

Cypher
MATCH (a:AuditLog) WHERE a.performed_by = username() AND a.timestamp > timestamp() - 86400000 RETURN COUNT(a) AS actions_last_24h

Unit Conversion Functions

Unit conversion functions handle standard measurements for distance, temperature, weight, and velocity—essential for aviation, logistics, and scientific applications.

c_to_f

c_to_f(c) -> float

Alias for celsius_to_fahrenheit. Converts a temperature from Celsius to Fahrenheit.

Cypher
MATCH (s:Sensor {location: "LAX_Terminal1"}) WHERE s.temperature_c IS NOT NULL RETURN s.location, s.temperature_c, c_to_f(s.temperature_c) AS temperature_f

celsius_to_fahrenheit

celsius_to_fahrenheit(c) -> float

Converts a temperature from Celsius to Fahrenheit. Formula: (C × 9/5) + 32

Cypher
RETURN celsius_to_fahrenheit(0) AS freezing, celsius_to_fahrenheit(25) AS room_temp, celsius_to_fahrenheit(100) AS boiling

convert

convert(value, from_unit, to_unit) -> float

Converts a value between compatible measurement units. Supports distance, weight, temperature, and velocity conversions.

Cypher
MATCH (f:Flight {id: "UA123"}) RETURN f.id, f.cruise_altitude_ft, convert(f.cruise_altitude_ft, "feet", "meters") AS cruise_altitude_m

f_to_c

f_to_c(f) -> float

Alias for fahrenheit_to_celsius. Converts a temperature from Fahrenheit to Celsius.

Cypher
MATCH (w:WeatherReport) WHERE w.temp_f > 95 RETURN w.station_code, w.temp_f, f_to_c(w.temp_f) AS temp_c

fahrenheit_to_celsius

fahrenheit_to_celsius(f) -> float

Converts a temperature from Fahrenheit to Celsius. Formula: (F - 32) × 5/9

Cypher
RETURN fahrenheit_to_celsius(32) AS freezing, fahrenheit_to_celsius(98.6) AS body_temp, fahrenheit_to_celsius(212) AS boiling

feet_to_meters

feet_to_meters(ft) -> float

Converts feet to meters. Factor: 1 foot = 0.3048 meters.

Cypher
MATCH (a:Aircraft {model: "B787"}) RETURN a.model, a.wingspan_ft, feet_to_meters(a.wingspan_ft) AS wingspan_m

fpm_to_mps

fpm_to_mps(fpm) -> float

Converts feet per minute to meters per second. Used for climb/descent rates in aviation.

Cypher
MATCH (t:TrajectoryPoint) WHERE t.vertical_rate_fpm > 1000 RETURN t.timestamp, t.vertical_rate_fpm, fpm_to_mps(t.vertical_rate_fpm) AS vertical_rate_mps

ft_to_m

ft_to_m(ft) -> float

Alias for feet_to_meters. Converts feet to meters.

Cypher
MATCH (r:Runway {airport: "JFK"}) RETURN r.name, r.length_ft, ft_to_m(r.length_ft) AS length_m

kg_to_lb

kg_to_lb(kg) -> float

Alias for kilograms_to_pounds. Converts kilograms to pounds.

Cypher
MATCH (c:Cargo {id: "CARGO_001"}) RETURN c.id, c.weight_kg, kg_to_lb(c.weight_kg) AS weight_lb

kilograms_to_pounds

kilograms_to_pounds(kg) -> float

Converts kilograms to pounds. Factor: 1 kg = 2.20462 pounds.

Cypher
MATCH (p:Person) WHERE p.weight_kg IS NOT NULL RETURN p.name, p.weight_kg, kilograms_to_pounds(p.weight_kg) AS weight_lbs ORDER BY p.weight_kg DESC LIMIT 5

kilometers_to_miles

kilometers_to_miles(km) -> float

Converts kilometers to miles. Factor: 1 km = 0.621371 miles.

Cypher
MATCH (r:Route) WHERE r.distance_km > 5000 RETURN r.origin, r.destination, r.distance_km, kilometers_to_miles(r.distance_km) AS distance_mi

km_to_mi

km_to_mi(km) -> float

Alias for kilometers_to_miles. Converts kilometers to miles.

Cypher
MATCH (s:Segment {type: "LON_to_BOS"}) RETURN s.type, s.distance_km, km_to_mi(s.distance_km) AS distance_miles

knots_to_mps

knots_to_mps(knots) -> float

Converts knots to meters per second. Used for airspeed conversions in aviation. Factor: 1 knot = 0.51444 m/s.

Cypher
MATCH (v:Vector {source: "ADS-B"}) WHERE v.speed_knots > 450 RETURN v.timestamp, v.speed_knots, knots_to_mps(v.speed_knots) AS speed_mps

lb_to_kg

lb_to_kg(lb) -> float

Alias for pounds_to_kilograms. Converts pounds to kilograms.

Cypher
MATCH (i:Item {warehouse: "PHX"}) RETURN i.name, i.weight_lb, lb_to_kg(i.weight_lb) AS weight_kg

m_to_ft

m_to_ft(m) -> float

Alias for meters_to_feet. Converts meters to feet.

Cypher
MATCH (b:Building {city: "New York"}) RETURN b.name, b.height_m, m_to_ft(b.height_m) AS height_ft

meters_to_feet

meters_to_feet(m) -> float

Converts meters to feet. Factor: 1 meter = 3.28084 feet.

Cypher
MATCH (altimeter:Sensor) RETURN altimeter.id, altimeter.altitude_m, meters_to_feet(altimeter.altitude_m) AS altitude_ft

mi_to_km

mi_to_km(mi) -> float

Alias for miles_to_kilometers. Converts miles to kilometers.

Cypher
MATCH (c:City) WHERE c.distance_from_airport_mi > 10 RETURN c.name, c.distance_from_airport_mi, mi_to_km(c.distance_from_airport_mi) AS distance_km

miles_to_kilometers

miles_to_kilometers(mi) -> float

Converts miles to kilometers. Factor: 1 mile = 1.60934 kilometers.

Cypher
MATCH (route:Route {airline: "AA"}) WHERE route.distance_miles IS NOT NULL RETURN route.origin, route.destination, route.distance_miles, miles_to_kilometers(route.distance_miles) AS distance_km

mps_to_knots

mps_to_knots(mps) -> float

Converts meters per second to knots. Used for airspeed conversions in aviation. Factor: 1 m/s = 1.94384 knots.

Cypher
MATCH (wind:WeatherData {station: "KJFK"}) RETURN wind.station, wind.wind_speed_mps, mps_to_knots(wind.wind_speed_mps) AS wind_speed_knots

pounds_to_kilograms

pounds_to_kilograms(lb) -> float

Converts pounds to kilograms. Factor: 1 pound = 0.453592 kilograms.

Cypher
MATCH (aircraft:Aircraft) WHERE aircraft.max_weight_lb IS NOT NULL RETURN aircraft.registration, aircraft.max_weight_lb, pounds_to_kilograms(aircraft.max_weight_lb) AS max_weight_kg

Procedure Reference

Complete reference for all 107 xrayGraphDB v4.9.4 procedures, organized by module prefix. Analytics procedures (PageRank, Betweenness, Triangle Count, Community Detection, etc.) are powered by xrayGraphDB's patent-pending high-performance graph engine.

db.* — Database Introspection (3 procedures)

CALL db.indexes()

db.indexes() :: (labelsOrTypes :: STRING, name :: STRING, properties :: STRING, type :: STRING)

Lists all database indexes including label indexes, property indexes, and relationship type indexes. Returns metadata about each index including the properties it covers and its type (e.g., HASH, RANGE, TEXT).

Cypher
CALL db.indexes() YIELD name, type, properties
RETURN name, type, properties
ORDER BY type;

CALL db.labels()

db.labels() :: (label :: STRING)

Returns all vertex labels currently defined in the database. Each row represents a distinct label name used by one or more vertices in the graph.

Cypher
CALL db.labels() YIELD label
RETURN label
ORDER BY label;

CALL db.relationshipTypes()

db.relationshipTypes() :: (relationshipType :: STRING)

Returns all relationship types currently defined in the database. Each row represents a distinct relationship type used by one or more edges in the graph.

Cypher
CALL db.relationshipTypes() YIELD relationshipType
RETURN relationshipType
ORDER BY relationshipType;

engine.* — Reactive Engine Management (5 procedures)

CALL engine.create_model()

engine.create_model(label :: STRING, property :: STRING, type :: STRING, min_samples :: INTEGER?, sigma_threshold :: FLOAT?) :: (label :: STRING, property :: STRING, status :: STRING, type :: STRING)

Creates an anomaly detection model for a specific property on vertices with a given label. The model uses statistical analysis to identify outliers based on mean and standard deviation. Optional parameters control minimum training samples and detection sensitivity.

Cypher
CALL engine.create_model('Aircraft', 'altitude', 'ZSCORE', 100, 2.5)
YIELD label, property, status, type
RETURN label, property, status, type;

CALL engine.drain_events()

engine.drain_events(limit :: INTEGER?) :: (actual :: FLOAT, expected :: FLOAT, label_id :: INTEGER, model_type :: STRING, property_id :: INTEGER, sigma :: FLOAT, timestamp :: INTEGER, vertex_gid :: INTEGER)

Retrieves pending anomaly detection events from the reactive engine. Returns detected anomalies with actual vs expected values, deviation metrics in standard deviations (sigma), and affected vertex information. Optional limit parameter controls maximum events returned per call.

Cypher
CALL engine.drain_events(1000)
YIELD vertex_gid, actual, expected, sigma, timestamp
WHERE sigma > 3.0
RETURN vertex_gid, actual, expected, sigma, timestamp
ORDER BY timestamp DESC;

CALL engine.drop_model()

engine.drop_model(label :: STRING, property :: STRING) :: (label :: STRING, property :: STRING, status :: STRING)

Removes an anomaly detection model for a specific property on vertices with a given label. Stops all event detection for that model.

Cypher
CALL engine.drop_model('Aircraft', 'altitude')
YIELD label, property, status
RETURN label, property, status;

CALL engine.show_model_types()

engine.show_model_types() :: (auto_detect_when :: STRING, fits :: STRING, parameters :: STRING, type :: STRING)

Lists all available anomaly detection model types supported by the reactive engine. Shows when each model type is automatically selected, what data distributions it fits best, and required parameters.

Cypher
CALL engine.show_model_types() YIELD type, fits, parameters
RETURN type, fits, parameters
ORDER BY type;

CALL engine.show_models()

engine.show_models() :: (label :: STRING, property :: STRING, sample_count :: INTEGER, status :: STRING, type :: STRING)

Lists all active anomaly detection models currently running in the reactive engine. Shows model type, training status, and sample count for each model.

Cypher
CALL engine.show_models() YIELD label, property, status, sample_count
WHERE status = 'ACTIVE'
RETURN label, property, sample_count
ORDER BY sample_count DESC;

mv.* — Materialized Views (6 procedures)

CALL mv.create()

mv.create(name :: STRING, query :: STRING) :: (name :: STRING, status :: STRING)

Creates a new materialized view that stores query results for efficient refresh. The query is cached and can be refreshed on-demand or on a scheduled interval.

Cypher
CALL mv.create('flight_stats',
  'MATCH (f:Flight) RETURN COUNT(*) as total, AVG(f.altitude) as avg_alt')
YIELD name, status
RETURN name, status;

CALL mv.drop()

mv.drop(name :: STRING) :: (name :: STRING, status :: STRING)

Removes a materialized view and its cached results. The view name is no longer available for queries.

Cypher
CALL mv.drop('flight_stats')
YIELD name, status
RETURN name, status;

CALL mv.due()

mv.due() :: (name :: STRING, overdue_sec :: INTEGER, query :: STRING)

Lists materialized views that are due for refresh based on their scheduled interval. Shows how many seconds each view is overdue for an update.

Cypher
CALL mv.due() YIELD name, overdue_sec
WHERE overdue_sec > 0
RETURN name, overdue_sec
ORDER BY overdue_sec DESC;

CALL mv.list()

mv.list() :: (name :: STRING, query :: STRING)

Lists all materialized views in the database with their underlying queries.

Cypher
CALL mv.list() YIELD name, query
RETURN name, query;

CALL mv.refresh()

mv.refresh(name :: STRING) :: (name :: STRING, query :: STRING, status :: STRING)

Refreshes a materialized view by re-executing its query and updating the cached results.

Cypher
CALL mv.refresh('flight_stats')
YIELD name, status
RETURN name, status;

CALL mv.set_interval()

mv.set_interval(name :: STRING, interval_sec :: INTEGER) :: (interval_sec :: INTEGER, name :: STRING)

Sets the auto-refresh interval (in seconds) for a materialized view. The view will be automatically refreshed at this interval.

Cypher
CALL mv.set_interval('flight_stats', 300)
YIELD name, interval_sec
RETURN name, interval_sec;

repl.* — Replication (2 procedures)

CALL repl.set_sync_policy()

repl.set_sync_policy(address :: STRING, mode :: STRING, targets :: STRING) :: (status :: STRING)

Configures replication synchronization policy for a replica. Allows setting the replica address, sync mode (ASYNC, SYNC, SEMISYNC), and target databases for replication.

Cypher
CALL repl.set_sync_policy('192.168.1.100:7687', 'SEMISYNC', 'main,analytics')
YIELD status
RETURN status;

CALL repl.show_replicas()

repl.show_replicas() :: (acked_lsn :: INTEGER, address :: STRING, status :: STRING, sync_mode :: STRING, sync_targets :: STRING)

Lists all configured replicas and their replication status. Shows each replica's address, sync mode, acknowledged log sequence number (LSN), and target databases.

Cypher
CALL repl.show_replicas() YIELD address, status, sync_mode, acked_lsn
RETURN address, status, sync_mode, acked_lsn;

ttl.* — TTL/Data Retention (1 procedure)

CALL ttl.delete_expired()

ttl.delete_expired(label :: STRING, timestamp_property :: STRING, max_age_days :: INTEGER, exemption_property :: STRING) :: (deleted_count :: INTEGER, exempt_count :: INTEGER, scanned_count :: INTEGER)

Deletes vertices with a given label that are older than max_age_days based on a timestamp property. Respects an optional exemption property that can mark vertices to keep. Returns counts of vertices deleted, exempted, and scanned.

Cypher
CALL ttl.delete_expired('LogEntry', 'timestamp', 90, 'keep_permanently')
YIELD deleted_count, exempt_count, scanned_count
RETURN deleted_count, exempt_count, scanned_count;

xray.* — Graph Analytics

Compute-heavy procedures over the CSR-mmapped graph. Every procedure ships in Community-tier; results stream back via the same xrayProtocol BATCH frames as ordinary Cypher rows. Analytic procs back onto a per-tenant persistent scratch pool — the first call after a daemon restart pays a one-time sizing cost (typically a few seconds) and every subsequent call hits the warm path.

Betweenness Centrality — three procedures, customer picks the precision

Three BC variants are available; choose by what you actually need. All three share the same output schema (node_id, centrality, name, time_ms) so they drop into existing dashboards without re-mapping.

Procedure Best for Friendster (3.6 B undirected edges)
xray.betweenness_centrality_sampled Ranked BC distribution across the whole graph. Source-sampled — produces a graded centrality distribution. Tens of seconds to minutes at ε=0.05 (k≈5,700 source-sampled BFSes).
xray.betweenness_pair_sampled Triage / "is this vertex significant at all" + customer-knob output resolution. Pair-sampled — produces an integer-multiple ladder. Sub-second to ~17 s depending on knobs.
xray.betweenness_pair_sampled_adaptive Top-K leaderboard. Stops as soon as the top-K vertex set has stabilised across consecutive batches. Faster than non-adaptive when top_k stabilises quickly; same fidelity caveat.

If you want a real ranked BC distribution on a large graph: use betweenness_centrality_sampled. If you want sub-second triage with a precision knob: use betweenness_pair_sampled with target_buckets. If you only care about the top-K bridges: use betweenness_pair_sampled_adaptive.

CALL xray.betweenness_centrality_sampled()

xray.betweenness_centrality_sampled(epsilon :: FLOAT, delta :: FLOAT, label :: STRING) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)

Source-sampled betweenness centrality with a formal (ε, δ) confidence bound. Samples k = ceil((log(N) + log(2/δ)) / (2ε²)) vertices uniformly at random as BFS sources; each BFS contributes counts to every vertex it reaches, so the resulting centrality column is graded — different vertices have different values across the full BC distribution. Cost is O(k·E). Use this when ranking matters more than wall time.

Defaults: ε = 0.05, δ = 0.05. Empty label means "every vertex".

Cypher
CALL xray.betweenness_centrality_sampled(0.05, 0.05, '')
  YIELD node_id, centrality, name, time_ms
  RETURN node_id, centrality
  ORDER BY centrality DESC LIMIT 50;
GFQL
betweenness_centrality_sampled(epsilon=0.05, delta=0.05, label='')

Output fidelity: graded — every emitted vertex's centrality reflects its expected fraction of all-pairs-shortest-paths weight. Use this proc when downstream consumers rank or threshold on the centrality value itself.

CALL xray.betweenness_pair_sampled()

xray.betweenness_pair_sampled(epsilon :: FLOAT, delta :: FLOAT, label :: STRING, target_buckets :: INT, max_k_multiplier :: INT) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)

ABRA-style pair-sampled BC. Samples k = ceil(0.5·(VC + log(2/δ))/ε²) uniform-random vertex pairs and runs bidirectional BFS per pair; internal vertices on each shortest path receive +1. Sub-second on small-world graphs, but the per-vertex count is integer-valued so at low k the centrality column produces only a few distinct values.

Three customer knobs:

  • epsilon, delta — accuracy bound (defaults 0.05, 0.05). Every estimate is within ε of the true BC with probability ≥ 1−δ.
  • target_buckets — minimum distinct centrality values the proc tries to emit (default 50). The proc auto-grows k beyond the (ε, δ) target until ≥ target_buckets distinct values are produced. Set to 0 to disable auto-grow (legacy behaviour).
  • max_k_multiplier — hard cap on auto-grow as a multiple of the (ε, δ) target k (default 8, max 64). Bounds runtime even when the graph is too uniform to reach target_buckets.

If target_buckets can't be reached within max_k_multiplier × k_target samples, the daemon emits a single WARN log line with concrete advice ("graph appears too uniform; lower ε, or use betweenness_centrality_sampled for a graded distribution"). The (ε, δ) accuracy bound is the floor — auto-grow only ever samples MORE pairs than the bound requires; the accuracy guarantee is preserved verbatim.

Cypher — fast triage (sub-second on small-world graphs)
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 0, 8)
  YIELD node_id, centrality, name, time_ms
  RETURN node_id, centrality
  ORDER BY centrality DESC LIMIT 50;
Cypher — push for finer ranking
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 200, 16)
  YIELD node_id, centrality, name, time_ms
  RETURN node_id, centrality
  ORDER BY centrality DESC LIMIT 200;
GFQL
betweenness_pair_sampled(epsilon=0.05, delta=0.05, label='', target_buckets=200)

Output fidelity caveat: at coarse precision (ε ≥ 0.05) on Friendster-scale graphs, the count column produces ~3-10 distinct values even with auto-grow at the maximum multiplier. This is mathematically correct (ABRA's per-vertex count is integer-valued and most vertices get 1-2 hits at small k) but is NOT a graded ranking. For graded ranking on large graphs, use xray.betweenness_centrality_sampled.

CALL xray.betweenness_pair_sampled_adaptive()

xray.betweenness_pair_sampled_adaptive(epsilon :: FLOAT, delta :: FLOAT, label :: STRING, top_k :: INT, stability_threshold :: FLOAT) :: (node_id :: INT, centrality :: FLOAT, name :: STRING, time_ms :: INT)

kADABRA-flavored adaptive BC. Runs pair-sampled batches until the top_k vertex set stabilises across consecutive batches (Jaccard similarity ≥ stability_threshold), or the (ε, δ) target k is reached. The early-stop is on top-K membership, not on individual centrality values — same fidelity caveat as the non-adaptive variant.

Defaults: ε = 0.05, δ = 0.05, top_k = 50, stability_threshold = 0.95.

Cypher
CALL xray.betweenness_pair_sampled_adaptive(0.05, 0.05, '', 50, 0.95)
  YIELD node_id, centrality, name, time_ms
  RETURN node_id, centrality
  ORDER BY centrality DESC LIMIT 50;
GFQL
betweenness_pair_sampled_adaptive(epsilon=0.05, delta=0.05, label='', top_k=50, stability_threshold=0.95)

CALL xray.find_path_bidirectional()

xray.find_path_bidirectional(start_id :: INT, end_id :: INT, max_hops :: INT) :: (path_nodes :: STRING, hops :: INT, explored_nodes_fwd :: INT, explored_nodes_bwd :: INT)

Bidirectional BFS shortest-path on undirected / symmetrised CSR. Forward and backward BFS expand toward each other; when they meet via direction-bitmap detection, the path is reconstructed. On small-world graphs at depth d, this visits roughly O(bd/2) vertices instead of O(bd) — five orders of magnitude fewer touches than unidirectional BFS at d ≈ 6.

Defaults: max_hops = 64.

Returns one row with the shortest path as a comma-separated list of GIDs from start to end, the hop count, and the number of vertices explored from each direction (useful for tuning).

Cypher
CALL xray.find_path_bidirectional(81306110, 20676652, 8)
  YIELD path_nodes, hops, explored_nodes_fwd, explored_nodes_bwd
  RETURN path_nodes, hops, explored_nodes_fwd, explored_nodes_bwd;
GFQL
find_path_bidirectional(start_id=81306110, end_id=20676652, max_hops=8)

Bench-team recipe — running BC the way we ran it

The 2026-05-05 sub-second BC validation on Friendster used these knobs and queries. Reproduce by running on the same dataset; numbers will vary by hardware.

Reproducibility — bit-identical runs across daemon restarts
// Each pair-sampled call seeds its RNG from (worker_id, batch_id, num_verts).
// Three runs at the same knob settings → bit-identical top-K. Use this to
// verify a fresh build doesn't silently change behaviour.
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 50, 8)
  YIELD node_id, centrality
  RETURN node_id, centrality
  ORDER BY centrality DESC, node_id ASC LIMIT 200;
Triage vs ranking — pick the variant for what you need
// Sub-second indicator (3-10 distinct values on Friendster):
CALL xray.betweenness_pair_sampled(0.05, 0.05, '', 0, 8) YIELD node_id, centrality
RETURN node_id ORDER BY centrality DESC LIMIT 50;

// Ranked distribution (slower, graded centrality):
CALL xray.betweenness_centrality_sampled(0.05, 0.05, '') YIELD node_id, centrality
RETURN node_id, centrality ORDER BY centrality DESC LIMIT 50;

Production port: always 7689 on .187. Soak ports 17689 (TSAN) and 27689 (ASAN) run separate binaries — never validate correctness against soak ports.

xg.* — Module Management (15 procedures)

CALL xg.builtin_functions()

xg.builtin_functions() :: (category :: STRING, description :: STRING, name :: STRING, signature :: STRING)

Lists all built-in Cypher functions available in xrayGraphDB, including aggregation, scalar, string, list, and mathematical functions. Shows function signature and category.

Cypher
CALL xg.builtin_functions() YIELD name, category, signature
WHERE category = 'String'
RETURN name, signature
ORDER BY name;

CALL xg.create_module_file()

xg.create_module_file(filename :: STRING, content :: STRING) :: (path :: STRING)

Creates a new module file (Cypher or GFQL custom function) with the given content. Returns the full path to the created file.

Cypher
CALL xg.create_module_file('my_functions.cypher',
  'FUNCTION my_sqrt(x) RETURN sqrt(abs(x))')
YIELD path
RETURN path;

CALL xg.delete_module_file()

xg.delete_module_file(path :: STRING) :: ()

Deletes a module file from the module directory. The file is removed and no longer available.

Cypher
CALL xg.delete_module_file('modules/my_functions.cypher');

CALL xg.functions()

xg.functions() :: (description :: STRING, is_editable :: BOOLEAN, mode :: STRING, name :: STRING, path :: STRING, signature :: STRING)

Lists all custom functions loaded from module files. Shows function name, signature, path, and whether it is editable.

Cypher
CALL xg.functions() YIELD name, path, signature
WHERE is_editable = true
RETURN name, path, signature;

CALL xg.get_module_file()

xg.get_module_file(path :: STRING) :: (content :: STRING)

Retrieves the full content of a module file by its path.

Cypher
CALL xg.get_module_file('modules/my_functions.cypher')
YIELD content
RETURN content;

CALL xg.get_module_files()

xg.get_module_files() :: (is_editable :: BOOLEAN, path :: STRING)

Lists all module files currently available, showing paths and whether each is editable.

Cypher
CALL xg.get_module_files() YIELD path, is_editable
RETURN path, is_editable;

CALL xg.load()

xg.load(module_name :: STRING) :: ()

Loads a specific module by name, making its functions and procedures available for execution.

Cypher
CALL xg.load('my_functions');

CALL xg.load_all()

xg.load_all() :: ()

Loads all available modules, making all custom functions and procedures available for execution.

Cypher
CALL xg.load_all();

CALL xg.plugins_disable()

xg.plugins_disable(name :: STRING) :: (message :: STRING, success :: STRING)

Disables a plugin, preventing it from running. The plugin state is changed to DISABLED.

Cypher
CALL xg.plugins_disable('xray-vision')
YIELD success, message
RETURN success, message;

CALL xg.plugins_enable()

xg.plugins_enable(name :: STRING) :: (message :: STRING, success :: STRING)

Enables a plugin, allowing it to run. The plugin state is changed to ENABLED.

Cypher
CALL xg.plugins_enable('xray-vision')
YIELD success, message
RETURN success, message;

CALL xg.plugins_license()

xg.plugins_license(name :: STRING, key :: STRING) :: (message :: STRING, success :: STRING)

Applies a license key to a plugin, enabling licensed features or extending the license expiry date.

Cypher
CALL xg.plugins_license('xray-vision', 'YOUR_LICENSE_KEY_HERE')
YIELD success, message
RETURN success, message;

CALL xg.plugins_list()

xg.plugins_list() :: (author :: STRING, crash_count :: INTEGER, description :: STRING, display_name :: STRING, last_error :: STRING, license_expires_at :: INTEGER, license_issued_to :: STRING, license_tier :: STRING, licensed :: STRING, name :: STRING, pid :: INTEGER, state :: STRING, type :: STRING, version :: STRING)

Lists all installed plugins with their metadata including author, version, license status, crash count, and current state (ENABLED, DISABLED, ERROR).

Cypher
CALL xg.plugins_list() YIELD name, version, state, licensed
RETURN name, version, state, licensed
ORDER BY name;

CALL xg.plugins_revoke()

xg.plugins_revoke(name :: STRING) :: (message :: STRING, success :: STRING)

Revokes the license from a plugin, reverting it to unlicensed state. The plugin continues to run but with limited functionality.

Cypher
CALL xg.plugins_revoke('xray-vision')
YIELD success, message
RETURN success, message;

CALL xg.plugins_scan()

xg.plugins_scan() :: (message :: STRING, success :: STRING)

Scans the plugins directory for new or updated plugins and reloads them.

Cypher
CALL xg.plugins_scan()
YIELD success, message
RETURN success, message;

CALL xg.procedures()

xg.procedures() :: (description :: STRING, is_editable :: BOOLEAN, is_write :: BOOLEAN, mode :: STRING, name :: STRING, path :: STRING, signature :: STRING)

Lists all custom procedures loaded from module files. Shows procedure name, signature, path, editability, and whether it performs write operations.

Cypher
CALL xg.procedures() YIELD name, path, is_write
WHERE is_write = true
RETURN name, path;

CALL xg.transformations()

xg.transformations() :: (is_editable :: BOOLEAN, name :: STRING, path :: STRING)

Lists all transformation definitions available in the database. Transformations are special operations for bulk graph modifications.

Cypher
CALL xg.transformations() YIELD name, path
RETURN name, path;

CALL xg.update_module_file()

xg.update_module_file(path :: STRING, content :: STRING) :: ()

Updates the content of an existing module file. The file is modified in-place and changes are immediately available.

Cypher
CALL xg.update_module_file('modules/my_functions.cypher',
  'FUNCTION my_sqrt(x) RETURN sqrt(abs(x) + 1)');

CALL xg.xray_vision_builtin_functions()

xg.xray_vision_builtin_functions() :: (category :: STRING, description :: STRING, name :: STRING, signature :: STRING)

Lists all XRay-Vision plugin functions. XRay-Vision plugin required. Shows available code analysis functions with categories and signatures.

Cypher
CALL xg.xray_vision_builtin_functions() YIELD name, category
WHERE category = 'CodeMetrics'
RETURN name
ORDER BY name;

Note: XRay-Vision plugin required.

GFQL Overview

GFQL (Graph Frame Query Language) is a dataframe-native query language for graph traversal and analysis. It is designed for data scientists and developers who prefer chainable, functional-style operations over declarative pattern matching.

GFQL queries run natively inside the xrayGraphDB engine (patent pending) alongside Cypher. They operate on the same in-memory graph as Cypher queries, with the same transaction isolation guarantees.

Note: GFQL is available starting with xrayGraphDB v4.0. It can be used alongside Cypher in the same database without conflicts.

SET GFQL_CONTEXT

Before executing GFQL operations, set a query context that defines the working scope (labels, edge types, or property filters).

GFQL
// Set context to all Function nodes
SET GFQL_CONTEXT label='Function';

// Set context with edge filter
SET GFQL_CONTEXT label='Module', edge_type='IMPORTS';

// Set context with property filter
SET GFQL_CONTEXT label='Person', WHERE age > 25;

chain(), n(), e_forward(), e_reverse()

GFQL operations are chained together using a fluent API. The core primitives are:

FunctionDescriptionExample
chain()Start a GFQL operation chainchain()
n()Select nodes (optionally filtered)n(label='Person')
e_forward()Traverse outgoing edgese_forward(type='CALLS')
e_reverse()Traverse incoming edgese_reverse(type='CALLS')
.filter()Filter current frame.filter(complexity > 5)
.hop()Multi-hop traversal.hop(edge_type='CALLS', depth=3)
.select()Project specific columns.select('name', 'module')
.aggregate()Group and aggregate.aggregate(by='module', count='name')
GFQL
// Find high-complexity functions and their callees
chain()
  .n(label='Function')
  .filter(complexity > 10)
  .e_forward(type='CALLS')
  .select('source.name', 'target.name');

// Multi-hop traversal
chain()
  .n(label='Module', name='auth')
  .hop(edge_type='IMPORTS', depth=3)
  .select('name', '_hop_depth');

// Aggregate by module
chain()
  .n(label='Function')
  .aggregate(by='module', count='name', avg_complexity='complexity');

Filter Predicates

GFQL supports the following predicates inside .filter() expressions:

OperatorDescriptionExample
=, !=Equality / inequality.filter(status = 'active')
>, <, >=, <=Comparison.filter(age >= 18)
AND, ORLogical operators.filter(age > 18 AND active = true)
NOTLogical negation.filter(NOT deleted)
INList membership.filter(status IN ['active', 'pending'])
LIKEPattern matching (% wildcard).filter(name LIKE 'auth%')
IS NULLNull check.filter(email IS NULL)
IS NOT NULLNot null.filter(email IS NOT NULL)

Bolt Protocol (port 7687)

Bolt is the primary protocol for xrayGraphDB. It supports the complete Cypher feature set including stored procedures (CALL...YIELD), DDL operations (CREATE INDEX, CREATE CONSTRAINT), EXPLAIN/PROFILE, and all query types. Use Bolt for application development and any operation that requires the full query engine.

xrayGraphDB is compatible with the entire Neo4j driver ecosystem. Any application using a Neo4j driver can connect without code changes.

Supported Bolt Versions

VersionStatusNotes
Bolt v3SupportedMinimum version, used by older drivers
Bolt v4.xSupportedMulti-database messages accepted (single-db only)
Bolt v5.0-5.6SupportedFull feature support including notifications

Bolt TLS is supported. Pass the --bolt-cert-file and --bolt-key-file flags to enable encrypted connections.

Driver Compatibility

DriverLanguageMinimum VersionStatus
neo4j-python-driverPython5.0+Tested
neo4j-driver (npm)JavaScript5.0+Tested
neo4j-java-driverJava5.0+Tested
neo4j-go-driverGo5.0+Tested
Neo4j.Driver (NuGet)C# / .NET5.0+Tested
py2neoPython2021.1+Compatible
neomodelPython5.0+Compatible

Bolt Connection Examples

All Neo4j drivers connect the same way. The only requirement is to point the driver at the xrayGraphDB Bolt port.

Connection Strings
# Unencrypted
bolt://localhost:7687

# With TLS (if server has cert configured)
bolt+s://xraygraphdb.example.com:7687

# Neo4j scheme (resolves to bolt://)
neo4j://localhost:7687

Authentication uses the same credentials as xrayGraphDB user management. The default admin account is admin with the password you set during setup.

xrayProtocol (port 7689)

xrayProtocol is xrayGraphDB's native binary protocol (patent pending), optimized for high-throughput, columnar data streaming. It delivers up to 24x higher throughput than Bolt for large result sets.

Important: Stored procedures (CALL...YIELD), EXPLAIN, PROFILE, and DDL commands (CREATE INDEX, CREATE CONSTRAINT) are currently Bolt-only. xrayProtocol supports MATCH, CREATE, MERGE, DELETE, and RETURN queries. Use Bolt for the full Cypher feature set.

Protocol Feature Comparison

FeatureBolt (7687)xrayProtocol (7689)
MATCH / CREATE / MERGE / DELETEYesYes
RETURN with result streamingYesYes (columnar, 24x faster)
Parameters ($param)YesYes
Transactions (BEGIN/COMMIT)YesYes
CALL...YIELD (stored procedures)YesNot yet
EXPLAIN / PROFILEYesNot yet
CREATE INDEX / CONSTRAINTYesNot yet
GFQL queriesYesYes
LZ4 compressionNoYes
Query pipeliningNoYes
Neo4j driver compatibleYesNo (native client)
Best forFull feature set, stored procedures, DDLBulk analytics, large result sets, data pipelines

When to Use xrayProtocol

When to Use Bolt

Connection String
# xrayProtocol URI
xray://localhost:7689

xrayProtocol Message Types

xrayProtocol uses a binary message format with the following core message types:

MessageDirectionDescription
HELLOClient to ServerInitiate connection with auth credentials
WELCOMEServer to ClientConfirm authentication and protocol version
RUNClient to ServerExecute a query with optional parameters
COLUMNSServer to ClientColumn metadata for the result set
CHUNKServer to ClientColumnar data batch (Arrow-compatible)
DONEServer to ClientQuery complete with summary statistics
ERRORServer to ClientError response with code and message
GOODBYEEitherClose the connection gracefully

Connection Lifecycle

An xrayProtocol session follows this sequence:

Sequence
Client                           Server
  |                                  |
  |------ HELLO (auth) ------------>|
  |<----- WELCOME (version) --------|
  |                                  |
  |------ RUN (query, params) ----->|
  |<----- COLUMNS (metadata) -------|
  |<----- CHUNK (data batch 1) -----|
  |<----- CHUNK (data batch 2) -----|
  |<----- ...                       |
  |<----- DONE (summary) -----------|
  |                                  |
  |------ RUN (next query) -------->|
  |<----- ...                       |
  |                                  |
  |------ GOODBYE ----------------->|
  |<----- GOODBYE ------------------|
  |                                  |

Data is streamed in columnar CHUNK messages. Each CHUNK contains a batch of rows (default batch size: 8192). The client does not need to wait for all chunks before processing.

Bulk Insert (xrayProtocol)

Bulk insert bypasses Cypher parsing entirely for maximum write throughput. Instead of individual CREATE statements, clients stream columnar batches of nodes and edges directly into the storage engine.

Performance: Bulk insert over xrayProtocol is 10-100x faster than batched Cypher CREATE statements over Bolt. A 100K node + 450K edge import that takes 26 minutes over Bolt completes in under 1 minute via bulk insert.
MessageCodeDirectionDescription
BULK_INSERT_BEGIN0x20Client → ServerOpen a bulk insert session
BULK_INSERT_NODES0x21Client → ServerColumnar node batch (labels + properties)
BULK_INSERT_EDGES0x22Client → ServerColumnar edge batch (from, to, type, properties)
BULK_INSERT_DELETE0x23Client → ServerBatch delete by ID list
BULK_INSERT_COMMIT0x24Client → ServerCommit bulk session
BULK_INSERT_ACK0x25Server → ClientBatch acknowledged (count + timing)
BULK_INSERT_ERROR0x26Server → ClientBatch error

Node Batch Wire Format (0x21)

Binary
// All integers are little-endian. Strings are length-prefixed UTF-8.
uint32  node_count          // number of nodes in this batch
uint32  prop_count          // number of properties per node

// Property name declarations
[prop_count]:
  uint32  name_len
  char[]  name              // e.g., "fnid", "name", "complexity"

// Per-node data
[node_count]:
  [prop_count]:
    uint32  value_len
    char[]  value           // string-encoded property value
  uint32  label_count
  [label_count]:
    uint32  label_len
    char[]  label           // e.g., "Function", "File"

Edge Batch Wire Format (0x22)

Binary
uint32  edge_count
uint32  prop_count

[prop_count]:
  uint32  name_len
  char[]  name

[edge_count]:
  uint32  from_id_len
  char[]  from_id           // source node identifier (e.g., fnid)
  uint32  to_id_len
  char[]  to_id             // target node identifier
  uint32  type_len
  char[]  type              // edge type (e.g., "CALLS", "IMPORTS")
  [prop_count]:
    uint32  value_len
    char[]  value

ACK Response Payload

Binary
uint32  count               // number of items processed
double  milliseconds        // server-side processing time

Session Flow

Sequence
Client                              Server
  |                                       |
  |------ HELLO (auth) ----------------->|
  |<----- HELLO_OK ----------------------|
  |                                       |
  |------ BULK_INSERT_BEGIN ------------>|
  |<----- BULK_INSERT_ACK --------------|
  |                                       |
  |------ BULK_INSERT_NODES (batch 1) -->|  1000 nodes
  |<----- BULK_INSERT_ACK {1000, 12ms} -|
  |------ BULK_INSERT_NODES (batch 2) -->|  1000 nodes
  |<----- BULK_INSERT_ACK {1000, 11ms} -|
  |                                       |
  |------ BULK_INSERT_EDGES (batch 1) -->|  5000 edges
  |<----- BULK_INSERT_ACK {5000, 45ms} -|
  |                                       |
  |------ BULK_INSERT_COMMIT ----------->|
  |<----- BULK_INSERT_ACK --------------|
  |                                       |
Recommended batch sizes: 1,000-5,000 nodes per batch, 5,000-10,000 edges per batch. The server commits each batch independently. The tenantId property is automatically set from the authenticated session.

HA Replication

xrayGraphDB supports Primary-Replica high availability via delta streaming over xrayProtocol. Every write on the Primary is captured and streamed to Replicas in real time.

Setup

bash
# Primary server
xraygraphdb \
  --replication-mode=primary \
  --replication-replicas=replica-host:7690

# Replica server
xraygraphdb \
  --replication-mode=replica \
  --replication-port=7690

How It Works

Replicated Operations

Delta TypeDescription
VERTEX_CREATENew node inserted into the vertex store
VERTEX_DELETENode removed from the vertex store
EDGE_CREATENew edge with adjacency list linking
EDGE_DELETEEdge removed from adjacency lists
LABEL_ADDLabel assigned to node (name-based resolution)
LABEL_REMOVELabel removed from node
PROPERTY_SETProperty value set on node (name-based resolution)
PROPERTY_REMOVEProperty removed from node
FlagDefaultDescription
--replication-modestandalonestandalone, primary, or replica
--replication-replicas(empty)Comma-separated replica addresses (host:port)
--replication-port7690Port for incoming replication connections (replica mode)

Server Flags

xrayGraphDB is configured via command-line flags. Flags can also be set via environment variables using the format XRAY_FLAG_NAME (uppercase, underscores instead of hyphens).

Shell
# Command-line
./bin/xraygraphdb-wrapper --bolt-port=7687 --data-directory=/data

# Environment variable equivalent
export XRAY_BOLT_PORT=7687
export XRAY_DATA_DIRECTORY=/data
./bin/xraygraphdb-wrapper

Data & Storage Flags

FlagDefaultDescription
--data-directory/var/lib/xraygraphdbDirectory for snapshots and WAL files
--storage-gc-cycle-sec30Garbage collection interval in seconds
--storage-snapshot-interval-sec300Automatic snapshot interval (0 to disable)
--storage-snapshot-on-exittrueTake snapshot on graceful shutdown
--storage-recover-on-startuptrueLoad latest snapshot on startup
--storage-snapshot-retention-count3Number of snapshots to keep
--storage-wal-enabledtrueEnable write-ahead logging

Network Flags

FlagDefaultDescription
--bolt-port7687Port for the Bolt protocol
--bolt-address0.0.0.0Bind address for Bolt
--bolt-num-workers(auto)Number of Bolt worker threads
--bolt-cert-file(none)Path to TLS certificate for Bolt
--bolt-key-file(none)Path to TLS private key for Bolt
--xray-port7689Port for xrayProtocol
--xray-address0.0.0.0Bind address for xrayProtocol

Snapshot & Persistence Flags

FlagDefaultDescription
--storage-snapshot-interval-sec300Seconds between automatic snapshots (0 = manual only)
--storage-snapshot-on-exittrueCreate snapshot during graceful shutdown
--storage-snapshot-retention-count3How many snapshot files to retain on disk
--storage-recover-on-startuptrueAutomatically load latest snapshot on start
--storage-wal-enabledtrueWrite-ahead log for crash recovery between snapshots
--storage-wal-file-size-kib20480Max WAL file size before rotation (KiB)

Performance Flags

FlagDefaultDescription
--query-execution-timeout-sec180Max execution time per query (0 = no limit)
--plan-cache-size1024Number of compiled query plans to cache
--bolt-session-inactivity-timeout1800Idle session timeout in seconds
--storage-parallel-index-recoverytrueRebuild indexes in parallel during recovery
--storage-parallel-schema-recoverytrueRebuild schemas in parallel during recovery

Auth Flags

FlagDefaultDescription
--auth-enabledtrueEnable authentication (set false for local dev only)
--auth-password-strength8Minimum password length
--auth-module-timeout-ms10000Timeout for external auth module calls
Warning: Never disable authentication in production. The --auth-enabled=false flag is for isolated development environments only.

License Flags

FlagDefaultDescription
--license-key(none)Path to license key JSON file
--organization-name(none)Licensed organization name (must match key)

Without a license key, xrayGraphDB operates in Community mode with all core features available. A license unlocks commercial features (HA clustering, per-tenant encryption, RBAC). Reads, writes, and startup are never blocked by license status.

Persistence Model

xrayGraphDB supports two storage engines: mmap (data on NVMe, paged into RAM on demand) and default (all data in RAM). Both engines persist data to disk via snapshots and WAL. See Storage Engine Selection for configuration details.

Snapshots

Snapshots are full-state dumps of the graph written to the configured --data-directory. They are triggered by:

Recovery on Startup

When --storage-recover-on-startup=true, the server loads the most recent snapshot on start, then replays any WAL entries created after that snapshot. This provides durability with minimal data loss.

Critical: Never use kill -9 to stop xrayGraphDB. A forced kill skips the exit snapshot and may corrupt in-progress WAL writes. Always use SIGTERM, systemctl stop, or docker stop.

Snapshot File Layout

Directory Structure
/var/lib/xraygraphdb/
  snapshots/
    snapshot-20260401T120000.bin
    snapshot-20260401T115500.bin
    snapshot-20260401T115000.bin
  wal/
    wal-000001.bin
    wal-000002.bin

Memory Configuration

Since xrayGraphDB stores all data in RAM, memory planning is critical.

Memory Estimation

As a rule of thumb, allocate 2x the estimated raw data size to account for indexes, query working memory, and GC overhead.

Docker Memory Limits

Shell
# Limit container to 16 GB
docker run -d \
  --name xraygraphdb \
  --memory=16g \
  -p 7687:7687 \
  xraygraphdb:v4.9.4

User Management

xrayGraphDB supports built-in user management via Cypher commands. Users are authenticated against the internal user store.

Cypher
// Create a new user
CREATE USER analyst IDENTIFIED BY "<strong-password>";

// Change a user's password
ALTER USER analyst SET PASSWORD "<new-password>";

// Drop a user
DROP USER analyst;

// List all users
SHOW USERS;
Note: The admin user is created at first startup. Set a strong password immediately. All examples in this documentation use admin / <your-password> as the credentials.

Role Management

Roles define permissions that are granted to users. A user can have multiple roles. Permissions are additive across roles.

Cypher
// Create a role
CREATE ROLE analyst;

// Grant read access on all labels
GRANT READ ON LABELS * TO analyst;

// Grant read access on specific label
GRANT READ ON LABELS :Person TO analyst;

// Deny write access
DENY WRITE ON LABELS * TO analyst;

// Assign a role to a user
SET ROLE FOR analyst_user TO analyst;

// Revoke a permission
REVOKE READ ON LABELS :Secret FROM analyst;

// Show all roles and their privileges
SHOW PRIVILEGES;

// Drop a role
DROP ROLE analyst;
PermissionScopeDescription
READLABELS, EDGE_TYPESRead nodes/relationships of specified types
WRITELABELS, EDGE_TYPESCreate/update/delete specified types
CREATELABELSCreate nodes of specified labels
DELETELABELSDelete nodes of specified labels
INDEXGLOBALCreate and drop indexes
AUTHGLOBALManage users and roles

Admin CLI (xraygraph-admin)

The xraygraph-admin command-line tool provides administrative operations that cannot be performed through Cypher. It connects directly to the data directory and does not require a running server for some operations.

CommandDescriptionRequires Running Server
--validate-license <file>Verify license key signature, expiry, and organizationNo
--admin-reset --auth-token=<epoch>Reset admin password. Auth token is the Unix epoch from the server's first-start timestamp.No
--verify-integrity --auth-token=<epoch>Check snapshot and WAL integrity hashesNo
--repair --auth-token=<epoch>Attempt automatic repair of corrupted snapshot segmentsNo
Shell
# Validate a license key file
xraygraph-admin --validate-license /path/to/license.json

# Reset admin password (requires auth token from first-start log)
xraygraph-admin \
  --admin-reset \
  --auth-token=1711929600 \
  --data-directory=/var/lib/xraygraphdb

# Verify data integrity
xraygraph-admin \
  --verify-integrity \
  --auth-token=1711929600 \
  --data-directory=/var/lib/xraygraphdb

# Attempt repair of corrupted data
xraygraph-admin \
  --repair \
  --auth-token=1711929600 \
  --data-directory=/var/lib/xraygraphdb
Warning: The --repair command modifies snapshot files. Always make a backup of the data directory before running repair.

Backup & Recovery

Backup xrayGraphDB by copying the snapshot files from the data directory. The safest approach is to stop the server, copy the files, and restart.

Online Backup (server running)

Shell
# Trigger a snapshot first
# (connect via driver and run)
# FREE MEMORY;  -- triggers GC + snapshot

# Then copy the latest snapshot
cp /var/lib/xraygraphdb/snapshots/snapshot-latest.bin \
   /backup/xraygraphdb-$(date +%Y%m%d).bin

Offline Backup (server stopped)

Shell
# Stop the server (creates exit snapshot)
sudo systemctl stop xraygraphdb

# Copy the entire data directory
cp -r /var/lib/xraygraphdb /backup/xraygraphdb-$(date +%Y%m%d)

# Restart
sudo systemctl start xraygraphdb

Restore from Backup

Shell
# Stop the server
sudo systemctl stop xraygraphdb

# Replace data directory with backup
rm -rf /var/lib/xraygraphdb
cp -r /backup/xraygraphdb-20260401 /var/lib/xraygraphdb

# Restart (will load the restored snapshot)
sudo systemctl start xraygraphdb

License Management

xrayGraphDB uses Ed25519-signed JSON license keys. Community mode (no license) provides full query functionality with single-node deployment. Licensed mode unlocks HA clustering, per-tenant encryption (patent pending), and RBAC.

Install a License at Startup

Shell
./bin/xraygraphdb-wrapper \
  --license-key=/path/to/license.json \
  --organization-name="Your Company"

Validate a License Offline

Shell
xraygraph-admin --validate-license /path/to/license.json
# Output:
#   Organization: Your Company
#   Expires:      2027-04-01
#   Features:     ha, encryption, rbac
#   Signature:    VALID

License Key Format

JSON
{
  "organization": "Your Company",
  "issued": "2026-04-01T00:00:00Z",
  "expires": "2027-04-01T00:00:00Z",
  "features": ["ha", "encryption", "rbac"],
  "signature": "<ed25519-signature>"
}
Note: A license is never required for reads, writes, or server startup. xrayGraphDB Community is fully functional as a single-node graph database. Only horizontal scaling, per-tenant encryption, and fine-grained RBAC require a license.

Plugin System Overview

xrayGraphDB supports a generic plugin system (patent pending) for extending the database with external data sources, analyzers, and connectors. Plugins run as isolated subprocesses via fork()+exec(), providing complete memory and library isolation from the core database process.

Key characteristics:

Plugin Architecture

Plugins live in a directory under the xrayGraphDB data path (default: <data-dir>/plugins/, configurable via --plugins-dir). Each plugin is a subdirectory containing a plugin.json manifest and its binaries.

Directory Structure

Directory
plugins/
  swim/
    plugin.json              # Manifest (name, version, type, entry point)
    bin/
      xraygraphdb-swim       # Plugin executable
    lib/
      libsolclient.so        # Plugin-specific dependencies
    config/
      swim.json              # Plugin configuration
    schema/
      aircraft.cypher        # Schema init (indexes, constraints)

Plugin Types

TypeCommunicationDescription
data-sourcePipe (stdin/stdout)Subprocess that writes data to the database via a pipe. The database reads serialized records and routes them to the appropriate storage handler.
analyzerShared library (dlopen)In-process library that registers custom Cypher procedures.
connectorBidirectional socketSubprocess with two-way communication for request/response protocols.
extensionShared libraryIn-process library that extends core database functionality.

Lifecycle

Lifecycle
Install (untar) → Scan → License → Enable → Running
                                                  ↓
                              Pause ↔ Resume    Stop → Disable
                                                  ↓
                              Crash → Auto-restart (up to max_failures)

Installing Plugins

Plugins are distributed as tarballs. Extract into the plugins directory:

Shell
# Extract plugin into the plugins directory
tar -xzf xraygraphdb-plugin-swim-1.0.0.tar.gz \
    -C /var/lib/xraygraphdb/plugins/

# Tell the running database to rescan
# (connect via any Cypher client)
CALL xg.plugins_scan() YIELD message;

After scanning, the plugin appears in the list but is not yet running. You must activate it with a license key.

To remove a plugin, disable it first, then delete its directory:

Cypher
-- Disable and revoke license
CALL xg.plugins_revoke('swim') YIELD message;
Shell
rm -rf /var/lib/xraygraphdb/plugins/swim

Plugin Licensing

Plugin licenses are Ed25519-signed JSON keys, stored in the database (not on disk). This means:

Activate a Plugin License

Cypher
-- Store and validate a license key (provided by eMTAi)
CALL xg.plugins_license(
  'swim',
  '{"plugin":"swim","tier":"enterprise","issued_to":"Your Org","issued_at":1744700000,"expires_at":0,"signing_key_id":"xg-key-2026-001","signature":"..."}'
) YIELD success, message;

The database validates the Ed25519 signature, checks the tier requirement (the plugin declares its minimum tier in plugin.json), and checks expiration. If valid, the license is stored and the plugin moves to licensed state.

Enable After Licensing

Cypher
-- Start the plugin subprocess
CALL xg.plugins_enable('swim') YIELD success, message;

License Tiers

TierRankDescription
community1Free plugins. No license key required.
enterprise2Commercial plugins. Requires a valid enterprise license key.
dod3Government/military plugins. Requires a DOD-tier license key.

A license with a higher tier can activate any plugin that requires a lower tier. For example, a dod license activates plugins that require enterprise or community.

Expiration: If expires_at is 0, the license never expires. Otherwise, it is an epoch timestamp in seconds. An expired license prevents the plugin from starting, but does not kill a running plugin mid-flight. The expiration is re-checked on each restart.

Management Commands

All plugin management is done via Cypher procedures in the xg module.

List All Plugins

Cypher
CALL xg.plugins_list()
YIELD name, version, state, licensed, license_tier,
      license_issued_to, license_expires_at, pid, crash_count;

Return Columns

ColumnTypeDescription
nameStringPlugin identifier (matches directory name)
display_nameStringHuman-readable name
versionStringPlugin version (semver)
descriptionStringPlugin description
authorStringPlugin author
typeStringdata-source, analyzer, connector, extension
stateStringdiscovered, licensed, unlicensed, starting, running, paused, stopped, crashed, error
licensedString"true" or "false"
license_tierStringTier of the installed license
license_issued_toStringOrganization name from license
license_expires_atIntegerEpoch seconds (0 = never)
pidIntegerOS process ID (-1 if not running)
crash_countIntegerNumber of times the plugin has crashed since last enable
last_errorStringLast error message (empty if healthy)

All Management Procedures

ProcedureArgumentsDescription
xg.plugins_list()noneList all discovered plugins with full details.
xg.plugins_license(name, key)name: String, key: String (JSON)Store and validate a license key for a plugin.
xg.plugins_enable(name)name: StringStart a licensed plugin. Persists as enabled across restarts.
xg.plugins_disable(name)name: StringStop a plugin. License is kept but plugin will not auto-start.
xg.plugins_revoke(name)name: StringStop a plugin and delete its stored license.
xg.plugins_scan()noneRescan the plugins directory for new or updated plugins.

All management procedures return success (String: "true"/"false") and message (String) columns, except plugins_list which returns the detail columns above.

Example: Full Plugin Lifecycle

Cypher
-- 1. Scan after installing a new plugin
CALL xg.plugins_scan() YIELD message;

-- 2. Check what's available
CALL xg.plugins_list()
YIELD name, state, licensed
RETURN name, state, licensed;

-- 3. Activate with license key (provided by eMTAi)
CALL xg.plugins_license('swim', '{"plugin":"swim",...}')
YIELD success;

-- 4. Enable (starts the subprocess)
CALL xg.plugins_enable('swim') YIELD success;

-- 5. Pause temporarily (SIGSTOP)
-- CALL xg.plugins_disable('swim') YIELD success;

-- 6. Resume
-- CALL xg.plugins_enable('swim') YIELD success;

-- 7. Fully remove
-- CALL xg.plugins_revoke('swim') YIELD message;

Plugin Manifest (plugin.json)

Every plugin must include a plugin.json file in its root directory. This manifest declares the plugin's identity, type, entry point, and requirements.

JSON
{
  "name": "swim",
  "display_name": "FAA SWIM Consumer",
  "version": "1.0.0",
  "description": "FAA SWIM native consumer for aviation data.",
  "author": "eMTAi LLC",
  "license_tier": "enterprise",
  "type": "data-source",
  "entry_point": "bin/xraygraphdb-swim",
  "communication": "pipe",
  "config_schema": "config/swim.json",
  "schema_init": "schema/aircraft.cypher",

  "requires": {
    "xraygraphdb": "4.2.0",
    "databases": ["swim"]
  },

  "health_check": {
    "interval_seconds": 30,
    "max_failures": 5,
    "restart_delay_seconds": 10
  }
}

Manifest Fields

FieldRequiredDescription
nameYesUnique plugin identifier. Must match the directory name.
display_nameNoHuman-readable name. Defaults to name.
versionNoSemver version string.
descriptionNoShort description shown in plugins_list.
authorNoAuthor or organization.
license_tierNoMinimum required tier: community, enterprise, or dod. Default: community.
typeNoPlugin type: data-source, analyzer, connector, extension. Default: data-source.
entry_pointYesPath to executable, relative to the plugin directory.
communicationNoCommunication method: pipe, socket, dlopen. Default: pipe.
config_schemaNoPath to config file, relative to plugin directory. Passed as argv[2] to the entry point.
schema_initNoCypher file to run on first start (e.g., CREATE INDEX).
requires.xraygraphdbNoMinimum xrayGraphDB version.
requires.databasesNoList of database names the plugin needs created.
health_check.interval_secondsNoHow often to check plugin health. Default: 30.
health_check.max_failuresNoMaximum consecutive crashes before disabling. Default: 3.
health_check.restart_delay_secondsNoSeconds to wait before restarting a crashed plugin. Default: 5.

Developing Plugins

A data-source plugin is a standalone executable that writes serialized records to a pipe. The database launches it with:

Shell
# The database calls exec() with:
./bin/your-plugin <write_fd> [config_file_path]

Where write_fd is a file descriptor number for the write end of a pipe. The plugin writes length-prefixed binary messages to this fd:

Wire Format (Pipe Protocol)

Binary
For each message:
  [uint32_t payload_length]   // little-endian, max 65536 bytes
  [payload_length bytes]      // plugin-specific serialization

The database reads these messages in a dedicated reader thread and passes the raw bytes to a data callback. The callback deserializes the plugin-specific format and routes data to the appropriate storage handler.

Plugin Process Rules

Minimal Plugin Example (C)

C
// my-sensor-plugin.c — writes sensor readings to xrayGraphDB
#include <signal.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

static volatile int running = 1;
void handle_sigterm(int sig) { running = 0; }

int main(int argc, char **argv) {
    if (argc < 2) return 1;
    int write_fd = atoi(argv[1]);

    signal(SIGTERM, handle_sigterm);

    while (running) {
        // Build your serialized record
        uint8_t payload[] = { /* your data */ };
        uint32_t len = sizeof(payload);

        // Write length prefix + payload
        write(write_fd, &len, 4);
        write(write_fd, payload, len);

        sleep(1);
    }

    close(write_fd);
    return 0;
}
Configuration: If your plugin needs configuration (API keys, connection strings, etc.), put a JSON config file in config/ and reference it as config_schema in the manifest. The database passes the full path to your config file as argv[2].

Supported Neo4j Syntax

xrayGraphDB supports the openCypher standard plus commonly-used Neo4j extensions. The following table summarizes compatibility.

FeatureStatusNotes
MATCH / RETURN / WHEREFullopenCypher standard
CREATE / MERGE / SET / REMOVE / DELETEFullopenCypher standard
Variable-length pathsFullIncluding shortestPath and allShortestPaths
Aggregation functionsFullcount, sum, avg, min, max, collect, percentile, stDev
List comprehensionsFullIncluding reduce, filter, map
Pattern comprehensionsFull[(n)-->(m) | m.name]
CASE expressionsFullSimple and generic CASE
Named indexes (CREATE INDEX name FOR ...)FullNeo4j 4.x+ syntax
SHOW INDEX INFOFull
SHOW CONSTRAINT INFOFull
Explicit transactions (BEGIN/COMMIT/ROLLBACK)Full
EXPLAIN / PROFILEFullQuery plan inspection
CALL proceduresPartialBuilt-in procedures only; no APOC
Multi-databaseNot supportedSingle database per instance
Subqueries (CALL { ... })Not supportedUse WITH for query chaining
LOAD CSVNot supportedUse driver-side bulk import instead

Driver Compatibility

xrayGraphDB is wire-compatible with Neo4j Bolt protocol versions 3 through 5.6. Any driver that speaks Bolt can connect. The recommended driver versions are listed in the Driver Compatibility table.

Connection URI Schemes

SchemeBehavior
bolt://Direct unencrypted connection
bolt+s://Direct TLS connection (verify server cert)
bolt+ssc://Direct TLS connection (self-signed cert accepted)
neo4j://Routing driver (resolves to bolt://, no cluster routing)
neo4j+s://Routing driver with TLS
Note: The neo4j:// scheme is accepted for compatibility. Since xrayGraphDB Community runs as a single instance, routing resolves to a direct connection. Cluster routing is available with a license.

Migration from Neo4j

Migrating from Neo4j to xrayGraphDB is straightforward for most applications.

Step 1: Export Data from Neo4j

Use the Neo4j APOC export or neo4j-admin dump to extract your data as Cypher statements.

Shell
# Using APOC in Neo4j to export Cypher statements
# Run inside Neo4j Browser:
# CALL apoc.export.cypher.all("/export/data.cypher", {})

# Or use neo4j-admin
neo4j-admin database dump neo4j --to-path=/export/

Step 2: Import into xrayGraphDB

Feed the Cypher export statements into xrayGraphDB via a driver script.

Python
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("admin", "<your-password>")
)

with open("data.cypher") as f:
    statements = f.read().split(";")

with driver.session() as session:
    for stmt in statements:
        stmt = stmt.strip()
        if stmt:
            session.run(stmt)

driver.close()
print("Import complete")

Step 3: Update Connection String

Update your application's connection string from the Neo4j URI to the xrayGraphDB URI. No other code changes are needed if you use standard Cypher.

Differences to Be Aware Of