GraphFusion-AI: Building a Hybrid RAG System with Vector + Graph Retrieval
Introduction
Traditional RAG (Retrieval-Augmented Generation) systems rely on a single retrieval strategy β usually vector similarity search. While effective, this approach can miss contextual relationships between pieces of information. GraphFusion-AI solves this by combining three retrieval strategies into a single platform: Normal RAG, Graph RAG, and Hybrid RAG.
In this blog, I'll walk through the complete architecture, implementation details, and design decisions behind building this system.
Architecture Overview
βββββββββββββββββββ ββββββββββββββββββββββββββββββ
β β β FastAPI Backend β
β Streamlit β HTTP β β
β Frontend ββββββββββΆβ ββββββββββββββββββββββββ β
β β β β LangGraph Workflow β β
β β’ Chat UI β β β β β
β β’ File Upload β β β ββββββββββββββββββ β β
β β’ RAG Selector β β β β Route by Mode β β β
β β β β βββββββββ¬βββββββββ β β
βββββββββββββββββββ β β β β β
β β βββββββΌββββββ β β
β β βΌ βΌ βΌ β β
β β Normal Graph Hybrid β β
β βββββ¬ββββββ¬ββββββ¬βββββββ β
β β β β β
ββββββββΌββββββΌββββββΌββββββββββ
β β β
βββββββββββ β ββββββββββββ
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββ
β Pinecone β β Neo4j β β Both + β
β (Vectors) β β (Graph) β β Merge β
βββββββββββββββ βββββββββββββββ βββββββββββββ
The Problem with Traditional RAG
Standard RAG pipelines work like this:
- Split documents into chunks
- Convert chunks into vector embeddings
- When a user asks a question, convert it to a vector
- Find the most similar chunks via cosine similarity
- Pass those chunks as context to an LLM
This works well for direct factual questions, but struggles with:
- Multi-hop reasoning β when the answer requires connecting information from multiple chunks
- Structural relationships β when document structure (sections, hierarchy) matters
- Entity relationships β when understanding how entities relate is key
π‘ Key Insight: By storing documents in both a vector database AND a knowledge graph, we can combine the precision of semantic search with the relational power of graph traversal.
Dual Ingestion Pipeline
When a document is uploaded, GraphFusion-AI processes it through a dual ingestion pipeline:
Document Upload
β
βΌ
βββββββββββββββββββ
β OCR / Text β β Tesseract for scanned docs
β Extraction β β pdf2image for PDF pages
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Text Chunking β β 1000 chars, 200 overlap
β (LangChain) β
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ
βΌ βΌ
ββββββββββ ββββββββββ
βPineconeβ β Neo4j β
βUpsert β β Create β
βVectors β β Nodes β
ββββββββββ ββββββββββ
Pinecone Ingestion
Each chunk is embedded using OpenAI's text-embedding-ada-002 model (1536 dimensions) and stored in Pinecone with metadata including the document ID, user email, and the original text.
# Generate embeddings for all chunks
vectors = embeddings.embed_documents(chunks)
# Upsert to Pinecone with metadata
records = [
{
"id": f"{doc_id}_{i}",
"values": vector,
"metadata": {"text": chunk, "doc_id": doc_id, "user": user_email}
}
for i, (vector, chunk) in enumerate(zip(vectors, chunks))
]
index.upsert(vectors=records)
Neo4j Graph Ingestion
Simultaneously, we create a graph structure with Document nodes connected to Chunk nodes via PART_OF relationships:
# Create document node
session.run("MERGE (d:Document {id: $id}) SET d.filename = $fn, d.user = $user",
id=doc_id, fn=filename, user=user_email)
# Create chunk nodes with relationships
for i, chunk in enumerate(chunks):
session.run(
"MATCH (d:Document {id: $doc_id}) "
"CREATE (c:Chunk {id: $cid, text: $text, index: $idx})-[:PART_OF]->(d)",
doc_id=doc_id, cid=f"{doc_id}_{i}", text=chunk, idx=i
)
Three RAG Strategies
1. Normal RAG (Vector Search)
Converts the user's question into a vector and finds the top-5 most semantically similar chunks from Pinecone.
Question β Embed β Query Pinecone (top-5) β Context β LLM β Answer
Best for: Direct factual questions, keyword-heavy queries, finding specific passages.
2. Graph RAG (Knowledge Graph)
Traverses the Neo4j graph to find all chunks belonging to the target document using Cypher queries.
MATCH (c:Chunk)-[:PART_OF]->(d:Document {id: $doc_id})
RETURN c.text AS text LIMIT 10
Best for: Questions requiring broader document context, structural understanding, relationship-based queries.
3. Hybrid RAG (Combined)
Executes both vector search AND graph traversal, then merges the results before sending to the LLM.
Question βββ¬βββΆ Pinecone (top-3) βββ
β ββββΆ Merge Context βββΆ LLM βββΆ Answer
ββββΆ Neo4j (top-5) ββββ
Best for: Complex questions requiring both semantic similarity and structural context.
LangGraph Workflow Orchestration
Rather than using simple if/else logic, GraphFusion-AI uses LangGraph to orchestrate the RAG pipeline as a state machine:
from langgraph.graph import StateGraph, END
class RAGState(TypedDict):
query: str
mode: str # "normal", "graph", or "hybrid"
document_id: str | None
response: str
web_results: list
def build_rag_workflow():
workflow = StateGraph(RAGState)
workflow.add_node("route_rag", route_rag)
workflow.add_node("web_search", web_search_node)
workflow.set_entry_point("route_rag")
workflow.add_edge("route_rag", END)
workflow.add_edge("web_search", END)
return workflow.compile()
π§ Why LangGraph? It makes the workflow extensible. Adding new nodes (reranking, fact-checking, multi-step reasoning) is as simple as adding a node and an edge β no refactoring needed.
OCR Pipeline with Tesseract
Many real-world documents are scanned PDFs or photos of printed pages. GraphFusion-AI handles this transparently:
def extract_text(content: bytes, filename: str) -> str:
ext = filename.rsplit(".", 1)[-1].lower()
if ext == "pdf":
# Convert each PDF page to an image, then OCR
images = convert_from_bytes(content)
return "\n".join(pytesseract.image_to_string(img) for img in images)
elif ext in ("png", "jpg", "jpeg", "tiff", "bmp"):
img = Image.open(io.BytesIO(content))
return pytesseract.image_to_string(img)
else:
return content.decode("utf-8", errors="ignore")
The pipeline uses pdf2image (backed by Poppler) to render PDF pages as images, then Tesseract extracts the text. This handles both digital PDFs and scanned documents.
Authentication with Google SSO
Security is handled via Google OAuth 2.0 using the authlib library:
User βββΆ /auth/login βββΆ Google OAuth βββΆ /auth/callback
β
βΌ
Create JWT Token
β
βΌ
Redirect to Frontend
with token in URL
Every subsequent API request includes the JWT in the Authorization: Bearer header. The backend validates it before processing any request.
Internet Search Integration
Sometimes the uploaded document doesn't contain all the information needed. GraphFusion-AI includes a Tavily-powered internet search feature that lets users find additional context on any topic:
async def internet_search(query: str) -> list[dict]:
client = TavilyClient(api_key=settings.tavily_api_key)
results = client.search(query, max_results=5)
return [{"title": r["title"], "url": r["url"], "content": r["content"]}
for r in results["results"]]
Frontend: Streamlit Dashboard
The user interface is built with Streamlit, providing:
- Google SSO login β single click authentication
- RAG mode selector β dropdown to switch between Normal/Graph/Hybrid
- Document upload β drag-and-drop with progress indication
- Chat interface β conversational Q&A with message history
- Internet search β sidebar search with clickable results
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββ ββββββββββββββββββββββββββββββββ β β β SIDEBAR β β MAIN AREA β β β β β β β β β β RAG Mode: βΌ β β π§ What is the revenue? β β β β [Hybrid RAG] β β β β β β β β π€ Based on the document, β β β β βββββββββββ β β the revenue for Q4 was β β β β Upload Doc β β $2.3M, representing a β β β β [Browse...] β β 15% increase... β β β β β β β β β β βββββββββββ β β β β β β Web Search β β ββββββββββββββββββββββββββββ β β β [...........] β β β Ask a question... ββ β β β [π Search] β β ββββββββββββββββββββββββββββ β β ββββββββββββββββ ββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Deployment
Local (Docker Compose)
The entire stack runs locally with a single command:
docker-compose up --build
# Backend: http://localhost:8000
# Frontend: http://localhost:8501
# Neo4j: http://localhost:7474
Production (AWS ECS + Terraform)
For production, the project includes Terraform scripts that provision:
- ECS Fargate β serverless containers for backend and frontend
- Application Load Balancer β HTTPS termination, path-based routing
- Route53 β DNS management
- S3 β document storage
- ECR β container image registry
Key Design Decisions
| Decision | Rationale |
|---|---|
| Dual storage (Pinecone + Neo4j) | Each excels at different retrieval patterns |
| LangGraph over plain functions | Extensibility β easy to add reranking, routing, guardrails |
| Tesseract OCR | Open-source, no API costs, handles most document types |
| FastAPI + Streamlit separation | Independent scaling, API reusable by other clients |
| JWT over sessions | Stateless auth, works across services |
Performance Comparison
In testing with a 50-page technical document:
| RAG Mode | Accuracy | Latency | Best For |
|---|---|---|---|
| Normal RAG | Good | ~2s | Factual lookups |
| Graph RAG | Good | ~1.5s | Structural queries |
| Hybrid RAG | Best | ~3s | Complex questions |
What's Next
- Add entity extraction during ingestion to create richer graph relationships
- Implement reranking with Cohere or a cross-encoder for better precision
- Add conversation memory for multi-turn follow-up questions
- Support multi-document chat β query across all uploaded documents
- Implement streaming responses for better UX
Try It Out
The full source code is available on GitHub. Deploy it locally in under 5 minutes with Docker Compose:
git clone https://github.com/bergerlijk07/GraphFusion-AI.git
cd GraphFusion-AI
cp .env.example .env # Fill in your API keys
docker-compose up --build
π Try the Live Demo