Embedding-based RAG is a great default, but it is not a silver bullet. In many production systems, the hard questions are not semantic similarity problems; they are relationship reasoning problems.
In this post, I’ll show how to build graph-based RAG without embeddings using a graph database, and why this approach often outperforms embedding retrieval on complex enterprise queries.
Where Embedding RAG Struggles
Embedding retrieval works by mapping documents and queries into vectors and then doing nearest-neighbor search. That is excellent for semantic similarity, but several failure modes show up quickly in real workloads.
1. Multi-hop reasoning is weak
A query like “Which services owned by Team A call a deprecated payment API and are deployed in eu-west-1?” requires joining multiple facts:
- ownership relation
- service dependency relation
- API lifecycle relation
- deployment relation
Vector similarity can retrieve related text chunks, but it does not naturally model and traverse explicit relationships.
2. Ambiguous entities cause noisy retrieval
Terms like “gateway”, “edge”, or “auth” can refer to many entities. Embedding search often returns topically related but graph-irrelevant chunks.
3. Poor explainability
When a vector retriever picks chunk A over B, debugging why is difficult. Teams often need transparent answers like: “This result was selected because service X depends on API Y and API Y is marked deprecated.”
4. Metadata and constraints are second-class
Questions with strict constraints (region, version, owner, compliance tier) are awkward in pure vector search and often require ad hoc filtering.
Why Graph RAG Helps
Graph RAG stores knowledge as entities and relationships, then retrieves context by traversing those relationships with explicit constraints.
Instead of asking “what text is similar to this question?”, you ask:
- What entities are mentioned?
- How are they connected?
- Which paths satisfy business constraints?
This shifts retrieval from semantic proximity to structural relevance.
Minimal Graph Schema
You can model many enterprise knowledge domains with a compact schema:
(:Service {name, owner, tier, region})
(:API {name, version, status})
(:Team {name})
(:Incident {id, severity, date})
(:DocChunk {id, text, source})
(:Team)-[:OWNS]->(:Service)
(:Service)-[:CALLS]->(:API)
(:Service)-[:DEPLOYED_IN]->(:Region)
(:Service)-[:LINKED_TO]->(:DocChunk)
(:Incident)-[:IMPACTED]->(:Service)
Even if your source data is documents, extracting entities and edges during ingestion creates a navigable knowledge graph for retrieval.
Graph Retrieval Query (Cypher)
Suppose a user asks:
“Which Team A services call deprecated APIs in eu-west-1, and what is the migration guidance?”
With a graph database like Neo4j, retrieval becomes explicit:
MATCH (t:Team {name: $team})-[:OWNS]->(s:Service)-[:CALLS]->(a:API)
MATCH (s)-[:DEPLOYED_IN]->(:Region {name: $region})
WHERE a.status = 'deprecated'
OPTIONAL MATCH (s)-[:LINKED_TO]->(d:DocChunk)
RETURN
s.name AS service,
a.name AS api,
a.version AS version,
collect(DISTINCT d.text)[0..5] AS evidence
ORDER BY service;
This query returns grounded, constraint-aware evidence instead of semantically similar but potentially unrelated chunks.
End-to-End Graph RAG Flow
The generation step still uses an LLM; only retrieval changes.
-
Question parsing Extract entities and constraints from the question (team, API status, region, time range).
-
Graph retrieval Run Cypher queries and optional k-hop traversals to fetch relevant paths and evidence chunks.
-
Context packaging Build a context block with triples/paths and source evidence text.
-
Answer generation Prompt the LLM to answer only from provided graph evidence.
Python Example: Graph Retriever + LLM
from neo4j import GraphDatabase
from openai import OpenAI
SYSTEM_PROMPT = """You are a helpful assistant.
Answer only from the provided graph evidence.
If evidence is insufficient, say you do not know.
"""
CYPHER = """
MATCH (t:Team {name: $team})-[:OWNS]->(s:Service)-[:CALLS]->(a:API)
MATCH (s)-[:DEPLOYED_IN]->(:Region {name: $region})
WHERE a.status = 'deprecated'
OPTIONAL MATCH (s)-[:LINKED_TO]->(d:DocChunk)
RETURN s.name AS service,
a.name AS api,
a.version AS version,
collect(DISTINCT d.text)[0..3] AS evidence
ORDER BY service
LIMIT 20;
"""
def retrieve_graph(driver, team, region):
with driver.session() as session:
rows = session.run(CYPHER, team=team, region=region)
return [record.data() for record in rows]
def build_context(rows):
blocks = []
for i, row in enumerate(rows, start=1):
blocks.append(
"\n".join([
f"[{i}] service={row['service']}",
f"api={row['api']} version={row['version']}",
"evidence:",
*(row.get("evidence") or []),
])
)
return "\n\n".join(blocks)
def answer_question(team, region, question):
driver = GraphDatabase.driver(
"bolt://localhost:7687", auth=("neo4j", "password")
)
rows = retrieve_graph(driver, team=team, region=region)
context = build_context(rows)
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": f"Question: {question}\n\nGraph Evidence:\n{context}",
},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
q = "Which Team A services in eu-west-1 call deprecated APIs?"
print(answer_question("Team A", "eu-west-1", q))
How Graph Retrieval Solves Embedding Problems
1. Multi-hop questions become first-class
In graph RAG, multi-hop is just traversal depth. You can explicitly follow $n$ edges and apply constraints at each step.
2. Entity disambiguation improves
Nodes represent canonical entities (Service:payments-gateway), so retrieval is anchored to IDs, not just similar words.
3. Explanations are native
You can return the exact reasoning path:
Team A -> OWNS -> payments-service -> CALLS -> billing-v1 API (deprecated)
This gives auditable evidence for generated answers.
4. Constraints are easy and exact
Region, owner, version, status, and time windows are direct query predicates, not fuzzy ranking hints.
Practical Ingestion Strategy
Building a good graph is the hard part. A practical pipeline:
- Chunk documents
- Extract entities and relations (NER + relation extraction or rule-based parsing)
- Resolve entities to canonical IDs (dedup + alias mapping)
- Upsert nodes and edges
- Link evidence chunks back to entities for grounded generation
Start with high-confidence rules for critical entity types (service names, API names, owners) before introducing model-based extraction.
Should You Replace Embeddings Completely?
Not always. The strongest setup in many systems is:
- Graph-first retrieval for constraint-heavy, relationship-heavy questions
- Embedding fallback for broad semantic exploration
- Fusion/re-ranking to combine both when needed
But if your workload is primarily operational reasoning over known entities, graph-only retrieval can be both simpler and more accurate.
Common Pitfalls in Graph RAG
-
Weak entity resolution If aliases are unresolved, your graph fragments and recall drops.
-
Overly dense graphs If everything connects to everything, traversal noise increases. Keep relation types meaningful.
-
No provenance links Always keep links from graph facts back to source chunks for answer grounding.
-
Unbounded traversal Apply depth limits and predicates, or query latency will grow quickly.
Conclusion
Embedding RAG is great for semantic similarity, but it often struggles with relationship-heavy reasoning, strict constraints, and explainability. Graph databases solve these problems by making entities and relationships explicit, queryable, and auditable.
If your users ask questions that sound like joins rather than summaries, graph-based RAG without embeddings is a strong architecture to adopt first.