GraphFusion-AI: Building a Hybrid RAG System with Vector + Graph Retrieval

June 17, 2026 Dhiman Sutradhar 15 min read
Python FastAPI LangGraph LangChain Neo4j Pinecone Streamlit Tesseract OCR OpenAI
πŸš€ Live Demo (Coming Soon) View Source Code

Introduction

Traditional RAG (Retrieval-Augmented Generation) systems rely on a single retrieval strategy β€” usually vector similarity search. While effective, this approach can miss contextual relationships between pieces of information. GraphFusion-AI solves this by combining three retrieval strategies into a single platform: Normal RAG, Graph RAG, and Hybrid RAG.

In this blog, I'll walk through the complete architecture, implementation details, and design decisions behind building this system.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚         β”‚       FastAPI Backend       β”‚
β”‚   Streamlit     β”‚  HTTP   β”‚                            β”‚
β”‚   Frontend      │────────▢│  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                 β”‚         β”‚  β”‚   LangGraph Workflow  β”‚  β”‚
β”‚  β€’ Chat UI      β”‚         β”‚  β”‚                      β”‚  β”‚
β”‚  β€’ File Upload  β”‚         β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β€’ RAG Selector β”‚         β”‚  β”‚  β”‚  Route by Mode β”‚  β”‚  β”‚
β”‚                 β”‚         β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚  β”‚          β”‚           β”‚  β”‚
                            β”‚  β”‚    β”Œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”     β”‚  β”‚
                            β”‚  β”‚    β–Ό     β–Ό     β–Ό     β”‚  β”‚
                            β”‚  β”‚ Normal Graph Hybrid   β”‚  β”‚
                            β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚
                            β”‚      β”‚     β”‚     β”‚         β”‚
                            β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚     β”‚     β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚     └──────────┐
                         β–Ό               β–Ό                β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  Pinecone   β”‚ β”‚    Neo4j    β”‚ β”‚  Both +   β”‚
                  β”‚  (Vectors)  β”‚ β”‚   (Graph)   β”‚ β”‚  Merge    β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
				

The Problem with Traditional RAG

Standard RAG pipelines work like this:

  1. Split documents into chunks
  2. Convert chunks into vector embeddings
  3. When a user asks a question, convert it to a vector
  4. Find the most similar chunks via cosine similarity
  5. Pass those chunks as context to an LLM

This works well for direct factual questions, but struggles with:

πŸ’‘ Key Insight: By storing documents in both a vector database AND a knowledge graph, we can combine the precision of semantic search with the relational power of graph traversal.

Dual Ingestion Pipeline

When a document is uploaded, GraphFusion-AI processes it through a dual ingestion pipeline:

    Document Upload
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  OCR / Text     β”‚  ← Tesseract for scanned docs
β”‚  Extraction     β”‚  ← pdf2image for PDF pages
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Text Chunking  β”‚  ← 1000 chars, 200 overlap
β”‚  (LangChain)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β–Ό         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Pineconeβ”‚ β”‚ Neo4j  β”‚
β”‚Upsert  β”‚ β”‚ Create β”‚
β”‚Vectors β”‚ β”‚ Nodes  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
				

Pinecone Ingestion

Each chunk is embedded using OpenAI's text-embedding-ada-002 model (1536 dimensions) and stored in Pinecone with metadata including the document ID, user email, and the original text.

# Generate embeddings for all chunks
vectors = embeddings.embed_documents(chunks)

# Upsert to Pinecone with metadata
records = [
    {
        "id": f"{doc_id}_{i}",
        "values": vector,
        "metadata": {"text": chunk, "doc_id": doc_id, "user": user_email}
    }
    for i, (vector, chunk) in enumerate(zip(vectors, chunks))
]
index.upsert(vectors=records)

Neo4j Graph Ingestion

Simultaneously, we create a graph structure with Document nodes connected to Chunk nodes via PART_OF relationships:

# Create document node
session.run("MERGE (d:Document {id: $id}) SET d.filename = $fn, d.user = $user",
            id=doc_id, fn=filename, user=user_email)

# Create chunk nodes with relationships
for i, chunk in enumerate(chunks):
    session.run(
        "MATCH (d:Document {id: $doc_id}) "
        "CREATE (c:Chunk {id: $cid, text: $text, index: $idx})-[:PART_OF]->(d)",
        doc_id=doc_id, cid=f"{doc_id}_{i}", text=chunk, idx=i
    )

Three RAG Strategies

1. Normal RAG (Vector Search)

Converts the user's question into a vector and finds the top-5 most semantically similar chunks from Pinecone.

Question β†’ Embed β†’ Query Pinecone (top-5) β†’ Context β†’ LLM β†’ Answer
				

Best for: Direct factual questions, keyword-heavy queries, finding specific passages.

2. Graph RAG (Knowledge Graph)

Traverses the Neo4j graph to find all chunks belonging to the target document using Cypher queries.

MATCH (c:Chunk)-[:PART_OF]->(d:Document {id: $doc_id})
RETURN c.text AS text LIMIT 10

Best for: Questions requiring broader document context, structural understanding, relationship-based queries.

3. Hybrid RAG (Combined)

Executes both vector search AND graph traversal, then merges the results before sending to the LLM.

Question ──┬──▢ Pinecone (top-3) ──┐
           β”‚                       β”œβ”€β”€β–Ά Merge Context ──▢ LLM ──▢ Answer
           └──▢ Neo4j (top-5)   β”€β”€β”€β”˜
				

Best for: Complex questions requiring both semantic similarity and structural context.

LangGraph Workflow Orchestration

Rather than using simple if/else logic, GraphFusion-AI uses LangGraph to orchestrate the RAG pipeline as a state machine:

from langgraph.graph import StateGraph, END

class RAGState(TypedDict):
    query: str
    mode: str           # "normal", "graph", or "hybrid"
    document_id: str | None
    response: str
    web_results: list

def build_rag_workflow():
    workflow = StateGraph(RAGState)
    workflow.add_node("route_rag", route_rag)
    workflow.add_node("web_search", web_search_node)
    workflow.set_entry_point("route_rag")
    workflow.add_edge("route_rag", END)
    workflow.add_edge("web_search", END)
    return workflow.compile()

πŸ”§ Why LangGraph? It makes the workflow extensible. Adding new nodes (reranking, fact-checking, multi-step reasoning) is as simple as adding a node and an edge β€” no refactoring needed.

OCR Pipeline with Tesseract

Many real-world documents are scanned PDFs or photos of printed pages. GraphFusion-AI handles this transparently:

def extract_text(content: bytes, filename: str) -> str:
    ext = filename.rsplit(".", 1)[-1].lower()
    if ext == "pdf":
        # Convert each PDF page to an image, then OCR
        images = convert_from_bytes(content)
        return "\n".join(pytesseract.image_to_string(img) for img in images)
    elif ext in ("png", "jpg", "jpeg", "tiff", "bmp"):
        img = Image.open(io.BytesIO(content))
        return pytesseract.image_to_string(img)
    else:
        return content.decode("utf-8", errors="ignore")

The pipeline uses pdf2image (backed by Poppler) to render PDF pages as images, then Tesseract extracts the text. This handles both digital PDFs and scanned documents.

Authentication with Google SSO

Security is handled via Google OAuth 2.0 using the authlib library:

User ──▢ /auth/login ──▢ Google OAuth ──▢ /auth/callback
                                              β”‚
                                              β–Ό
                                     Create JWT Token
                                              β”‚
                                              β–Ό
                                   Redirect to Frontend
                                   with token in URL
				

Every subsequent API request includes the JWT in the Authorization: Bearer header. The backend validates it before processing any request.

Internet Search Integration

Sometimes the uploaded document doesn't contain all the information needed. GraphFusion-AI includes a Tavily-powered internet search feature that lets users find additional context on any topic:

async def internet_search(query: str) -> list[dict]:
    client = TavilyClient(api_key=settings.tavily_api_key)
    results = client.search(query, max_results=5)
    return [{"title": r["title"], "url": r["url"], "content": r["content"]}
            for r in results["results"]]

Frontend: Streamlit Dashboard

The user interface is built with Streamlit, providing:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  SIDEBAR     β”‚  β”‚        MAIN AREA             β”‚  β”‚
β”‚  β”‚              β”‚  β”‚                              β”‚  β”‚
β”‚  β”‚ RAG Mode: β–Ό  β”‚  β”‚  πŸ§‘ What is the revenue?    β”‚  β”‚
β”‚  β”‚ [Hybrid RAG] β”‚  β”‚                              β”‚  β”‚
β”‚  β”‚              β”‚  β”‚  πŸ€– Based on the document,  β”‚  β”‚
β”‚  β”‚ ─────────── β”‚  β”‚     the revenue for Q4 was  β”‚  β”‚
β”‚  β”‚ Upload Doc   β”‚  β”‚     $2.3M, representing a   β”‚  β”‚
β”‚  β”‚ [Browse...]  β”‚  β”‚     15% increase...         β”‚  β”‚
β”‚  β”‚              β”‚  β”‚                              β”‚  β”‚
β”‚  β”‚ ─────────── β”‚  β”‚                              β”‚  β”‚
β”‚  β”‚ Web Search   β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚  β”‚
β”‚  β”‚ [...........] β”‚  β”‚  β”‚ Ask a question...      β”‚β”‚  β”‚
β”‚  β”‚ [πŸ” Search]  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
				

Deployment

Local (Docker Compose)

The entire stack runs locally with a single command:

docker-compose up --build
# Backend:  http://localhost:8000
# Frontend: http://localhost:8501
# Neo4j:    http://localhost:7474

Production (AWS ECS + Terraform)

For production, the project includes Terraform scripts that provision:

Key Design Decisions

Decision Rationale
Dual storage (Pinecone + Neo4j) Each excels at different retrieval patterns
LangGraph over plain functions Extensibility β€” easy to add reranking, routing, guardrails
Tesseract OCR Open-source, no API costs, handles most document types
FastAPI + Streamlit separation Independent scaling, API reusable by other clients
JWT over sessions Stateless auth, works across services

Performance Comparison

In testing with a 50-page technical document:

RAG Mode Accuracy Latency Best For
Normal RAG Good ~2s Factual lookups
Graph RAG Good ~1.5s Structural queries
Hybrid RAG Best ~3s Complex questions

What's Next

Try It Out

The full source code is available on GitHub. Deploy it locally in under 5 minutes with Docker Compose:

git clone https://github.com/bergerlijk07/GraphFusion-AI.git
cd GraphFusion-AI
cp .env.example .env    # Fill in your API keys
docker-compose up --build
πŸš€ Try the Live Demo

Written by Dhiman Sutradhar β€” Technical Lead & Backend Architect

LinkedIn GitHub