RAG over a graph — in one Cypher query
xrayGraphDB ships native embed() and vector_dot_product() as
first-class Cypher functions. Type a free-form symptom below — the engine embeds your query, runs
vector similarity against 274,065 real FDA warning labels, then graph-traverses
(:Drug)-[:HAS_WARNING]->(:Warning) to surface every drug whose warning matches.
All in one transaction. One query. One engine.
How it works — the exact Cypher
Two documented Cypher functions do all the heavy lifting. Both are visible at docs.html#fn-rag_llm — nothing proprietary, nothing hidden.
Step 1 — Embed once (offline)
For every :Warning node, store a 384-dim vector generated by the engine's
native ONNX-or-hash embedder. No external API call. Run once at ingest time:
MATCH (w:Warning) WHERE w.embedding IS NULL WITH w LIMIT 1000 SET w.embedding = embed(w.text) RETURN count(w) AS embedded;
Step 2 — Search at query time
Embed the user's free-form text inline, dot-product against every warning's stored embedding, traverse the graph to the linked drugs, return the top matches. One round-trip. One transaction. No two-system sync.
WITH embed($user_query) AS qv MATCH (d:Drug)-[:HAS_WARNING]->(w:Warning) WHERE w.embedding IS NOT NULL WITH d, w, vector_dot_product(qv, w.embedding) AS similarity ORDER BY similarity DESC LIMIT 10 RETURN d.name, w.text, similarity;
That is the entire RAG pipeline. The query the audience copy-pastes is the query that ran above.
Why one engine matters
| Concern | Pinecone / Weaviate / pgvector | xrayGraphDB |
|---|---|---|
| Query atomicity | Two systems, two-phase or eventual consistency | Vector + graph in one Cypher transaction |
| Embedding pipeline | External API (OpenAI / Cohere / self-host) | embed() inline in Cypher — native ONNX |
| Join vector hits to graph | Vector-search-then-fetch via app-side glue | Built-in — MATCH (d)-[:HAS_WARNING]->(w) on the vector result |
| Encryption at rest | Varies, usually opt-in | AES-256-GCM per tenant, always-on, no opt-out |
The full RAG/LLM toolkit (29 functions)
This demo uses two. The rest are documented at docs.html → RAG/LLM and run identically — native Cypher, no extensions:
embedvector_dot_productvector_normalizevector_normvector_scalevector_addvector_dimensionrelevance_scorecontext_rankbm25_scoretf_idftext_similaritylevenshtein_distancengram_similarityextract_keywordstext_chunk_by_sizeData is real, not seeded
Every Warning shown is a real FDA label from open.fda.gov, ingested via the same public pipeline at demos/pharma/ingest/. No mock data, no curated subset to make the demo look good. The query you run hits the same data every other pharma demo query hits.