<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ambarish Ganguly's Blog]]></title><description><![CDATA[Ambarish Ganguly's Blog]]></description><link>https://blog.ambarishganguly.com</link><generator>RSS for Node</generator><lastBuildDate>Sat, 25 Apr 2026 08:16:33 GMT</lastBuildDate><atom:link href="https://blog.ambarishganguly.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Starter - GENAI]]></title><description><![CDATA[Business Requirement
I am a student and I want answers from my textbooks easily so that I can understand the subject better.
Goal

Understand Vector Database

Understand RAG

Understand how to product]]></description><link>https://blog.ambarishganguly.com/sample-assignment-genai</link><guid isPermaLink="true">https://blog.ambarishganguly.com/sample-assignment-genai</guid><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Wed, 25 Feb 2026 16:30:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/603294be72a689446db3ef80/127e41d5-f775-4704-83d6-64bfe6d1963d.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Business Requirement</h2>
<p>I am a student and I want answers from my textbooks easily so that I can understand the subject better.</p>
<h2>Goal</h2>
<ol>
<li><p>Understand Vector Database</p>
</li>
<li><p>Understand RAG</p>
</li>
<li><p>Understand how to productionize a GENAI app to a certain degree</p>
</li>
</ol>
<h2>Ingestion</h2>
<ol>
<li><p>Get all PDFs from <a href="https://ncert.nic.in/textbook.php?kebo1=0-19">Class 11 Biology PDFs</a></p>
</li>
<li><p>Create a Qdrant vector database</p>
</li>
<li><p>Write Python Script to upload the PDFs into the database</p>
</li>
<li><p>Do not use frameworks such as Langchain / LLamaIndex</p>
</li>
<li><p>Insert Metadata such as Chapter , Page Number when you are inserting data into the Database</p>
</li>
</ol>
<p><a href="https://github.com/ambarishg/JUWORKSHOP_JAN_2026/tree/main/QDRANT">Github for Qdrant Code</a></p>
<p>This directory collects helper scripts, configs, and notebooks for working with a Qdrant vector store, ingesting data, and prototyping RAG workflows.</p>
<table>
<thead>
<tr>
<th>File Name</th>
<th>File Description</th>
</tr>
</thead>
<tbody><tr>
<td><code>.env</code></td>
<td>Stores endpoint URLs, API keys, and model settings that <code>config_qdrant.py</code> loads for consistent configuration across notebooks.</td>
</tr>
<tr>
<td>01. <code>config_qdrant.py</code></td>
<td>Reads the shared environment variables and exposes a configured <code>QdrantClient</code> plus embedding/model metadata.</td>
</tr>
<tr>
<td>02. <code>connect.ipynb</code></td>
<td>Minimal notebook that imports <code>QdrantClient</code> and validates the hosted Qdrant connection using the shared config.</td>
</tr>
<tr>
<td>03.<code>create_collection.ipynb</code></td>
<td>Defines the BEES collection schema so that later ingestion and search notebooks can store vectors with metadata.</td>
</tr>
<tr>
<td>04. <code>documents_extraction.ipynb</code></td>
<td>Uses LangChain’s <code>PyPDFLoader</code> helpers to pull text from PDFs, clean it, and prepare it for embedding.</td>
</tr>
<tr>
<td>05. <code>ingest.ipynb</code></td>
<td>Illustrates iterating over local data sources and pushing documents plus embeddings into the configured Qdrant collection.</td>
</tr>
<tr>
<td>06. <code>advanced_rag_qdrant.ipynb</code></td>
<td>Walks through a multi-step RAG pipeline, combining ingestion, Qdrant vector search, and OpenAI completions.</td>
</tr>
<tr>
<td>07. <code>hybrid_search_create_collection.ipynb</code></td>
<td>Combines the collection creation steps with the hybrid search flow for an all-in-one run.</td>
</tr>
<tr>
<td>08. <code>hybrid_search.ipynb</code></td>
<td>Demonstrates a hybrid vector/text search flow built on the shared Qdrant setup.</td>
</tr>
<tr>
<td>09.<code>universal_hybrid_search_create_collection.ipynb</code></td>
<td>Builds a universal collection and immediately runs the universal hybrid search</td>
</tr>
<tr>
<td>10.<code>universal_hybrid_search.ipynb</code></td>
<td>Shows universal hybrid search examples that can generalize beyond the Netflix/BEES datasets.</td>
</tr>
<tr>
<td>11. <code>netflix_hybrid_search_create_collection.ipynb</code></td>
<td>Builds the Netflix collection before running a hybrid search scenario tailored to that data.</td>
</tr>
<tr>
<td>12. <code>netflix.ipynb</code></td>
<td>Samples Netflix-specific prompts and retrieval logic against the provided title dataset.</td>
</tr>
<tr>
<td><code>netflix_titles.csv</code></td>
<td>Public Netflix title metadata that fuels the Netflix notebooks; includes genres, descriptions, and other columns.</td>
</tr>
</tbody></table>
<h2>Search</h2>
<ol>
<li><p>User asks a question</p>
</li>
<li><p>Use the question to search the Database</p>
</li>
<li><p>Use simple text search to get results</p>
</li>
<li><p>Use semantic search to get results</p>
</li>
<li><p>Use hybrid search to get results</p>
</li>
<li><p>Understand the difference between text search / semantic search / hybrid search</p>
</li>
<li><p>In the search results show the meta data associated with the result</p>
</li>
<li><p>Advanced [ Implement RRF ]</p>
</li>
</ol>
<h2>LLM</h2>
<ol>
<li><p>User asks a question</p>
</li>
<li><p>Understand RAG</p>
</li>
<li><p>Use the search results and the LLM to get the answer</p>
</li>
<li><p>In the answer , show the portions of the text used to frame the answer [ the Chapter , Page Number ]</p>
</li>
<li><p>Understand how effectively the LLM and RAG is answering the question</p>
</li>
</ol>
<h2>UI</h2>
<ol>
<li><p>Create screens to upload more documents</p>
</li>
<li><p>Create screens to have the user ask a question</p>
</li>
<li><p>Create a chatbot</p>
</li>
<li><p>Create a chatbot with memory</p>
</li>
</ol>
<h2>FastAPI</h2>
<ol>
<li>Use FASTAPI to expose the function</li>
</ol>
<h2>Docker</h2>
<ol>
<li>Make a docker image of the FASTAPI</li>
</ol>
<h2>Advanced</h2>
<hr />
<h2>Langraph</h2>
<ol>
<li><p>Understand Langraph using the repo <a href="https://github.com/ambarishg/langchain_and_langraph">https://github.com/ambarishg/langchain_and_langraph</a></p>
</li>
<li><p>Inspiration from the <strong>LangGraph Complete Course for Beginners – Complex AI Agents with Python</strong></p>
</li>
</ol>
<h2>Agent Framework</h2>
<ol>
<li><p>Understand the Agent Framework using the repo <a href="https://github.com/ambarishg/agent-framework">https://github.com/ambarishg/agent-framework</a></p>
</li>
<li><p>Inspiration from Microsoft Agent framework samples</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Word Embeddings]]></title><description><![CDATA[From the TensorFlow documentation word embeddings documentation

Word embeddings give us a way to use an efficient, dense representation in which similar words have a similar encoding.
Importantly, you do not have to specify this encoding by hand. An...]]></description><link>https://blog.ambarishganguly.com/word-embeddings</link><guid isPermaLink="true">https://blog.ambarishganguly.com/word-embeddings</guid><category><![CDATA[#WordEmbeddings]]></category><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Wed, 22 Oct 2025 10:28:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761129165157/7ad54cf6-c127-4894-a88b-a2115736c476.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>From the TensorFlow documentation <a target="_blank" href="https://www.tensorflow.org/tutorials/text/word_embeddings">word embeddings documentation</a></p>
<blockquote>
<p>Word embeddings give us a way to use an efficient, dense representation in which similar words have a similar encoding.</p>
<p>Importantly, you do not have to specify this encoding by hand. An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify).</p>
<p>Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).</p>
<p>It is common to see word embeddings that are 8-dimensional (for small datasets), up to 1024-dimensions when working with large datasets. A higher dimensional embedding can capture fine-grained relationships between words, but takes more data to learn.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761128754200/98a2cd7c-978f-4cd7-8a45-f2a2c5a814ea.jpeg" alt class="image--center mx-auto" /></p>
<p>Let us explore the word embedding with some examples. We will use <strong>spacy</strong> for demonstration.</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> spacy
<span class="hljs-keyword">from</span> sklearn.metrics.pairwise <span class="hljs-keyword">import</span> cosine_similarity
# Need to load the large model to get the vectors
nlp = spacy.load(<span class="hljs-string">'en_core_web_lg'</span>)

nlp(<span class="hljs-string">"queen"</span>).vector.shape
</code></pre><p>We find the word embedding of a  single word <strong>queen</strong> and find that we have a vector with 1 row and 300 columns. Therefore a single word is converted to 300 numerical values.    </p>
<p>We find the similarity between the words using cosine similarity   </p>
<pre><code>cosine_similarity([nlp(<span class="hljs-string">"queen"</span>).vector],[nlp(<span class="hljs-string">"king"</span>).vector])
</code></pre><blockquote>
<p>0.725261</p>
</blockquote>
<pre><code>cosine_similarity([nlp(<span class="hljs-string">"queen"</span>).vector],[nlp(<span class="hljs-string">"mother"</span>).vector])
</code></pre><blockquote>
<p>0.44720313   </p>
</blockquote>
<pre><code>cosine_similarity([nlp(<span class="hljs-string">"queen"</span>).vector],[nlp(<span class="hljs-string">"princess"</span>).vector])
</code></pre><blockquote>
<p>0.6578181            </p>
</blockquote>
<p>We observe that the similarity between queen and king is the highest , followed by princess and mother   </p>
<p>We will see that how we can use the similarity between sentences</p>
<pre><code>x1 = nlp(<span class="hljs-string">"I am a software consultant"</span>).vector
x2 = nlp(<span class="hljs-string">"Hey ,me  data guy"</span>).vector
x3 = nlp(<span class="hljs-string">"Hey ,me  plumber"</span>).vector
</code></pre><pre><code>x1.shape , x2.shape , x3.shape
</code></pre><blockquote>
<p>((300,), (300,), (300,))    </p>
</blockquote>
<p>We find that the shape of the sentence vectors are also 1 x 300. The individual words also have shape  1 x 300 . But for a sentence , we average the vectors so as to get the shape also as 1 x 300.</p>
<pre><code>cosine_similarity([nlp(<span class="hljs-string">"x1"</span>).vector],[nlp(<span class="hljs-string">"x2"</span>).vector])
</code></pre><blockquote>
<p>0.7383951</p>
</blockquote>
<pre><code>cosine_similarity([nlp(<span class="hljs-string">"x1"</span>).vector],[nlp(<span class="hljs-string">"x3"</span>).vector])
</code></pre><blockquote>
<p>0.64217263   </p>
</blockquote>
<p>We see that the similarity between the sentence with software consultant and data guy is higher than the sentence with software consultant and plumber  </p>
]]></content:encoded></item><item><title><![CDATA[Solving Business Problems with Agentic RAG]]></title><description><![CDATA[Business Problem
User queries cannot be answered satisfactorily using a single source. We can use a very effective technique called RAG( Retrieval Augmented Generation) for getting the results from a rich context and sending it to a LLM( Large Langua...]]></description><link>https://blog.ambarishganguly.com/agentic-rag</link><guid isPermaLink="true">https://blog.ambarishganguly.com/agentic-rag</guid><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Fri, 25 Apr 2025 04:34:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1745633910522/e5c3520a-abca-41d1-9d1d-cd104d287cce.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-business-problem">Business Problem</h1>
<p>User queries cannot be answered satisfactorily using a single source. We can use a very effective technique called <strong>RAG</strong>( Retrieval Augmented Generation) for getting the results from a rich context and sending it to a LLM( Large Language Model ). However , the results can be improved if the answer is enriched using other sources.The answer to this problem is combining the power of Agents which would help the RAG to combine other sources and provide an appropriate answer.</p>
<p>Let us take a concrete example. You are company which does <strong>Field Service Management</strong>. Your employees fix things. You have developed a mobile application which helps the field engineers to ask questions on a knowledge repository [ You have used a Vector Database to load all your documents ]</p>
<blockquote>
<p>What happens if the information required is not present in the Vector database ?</p>
</blockquote>
<p>Here is where Agentic RAG comes in</p>
<p>The Agent would first search in the Vector Database and then it would try to find answers from different sources from the enterprise such as [ Databases , Applications , File Stores ].</p>
<p>The Agent would use Tools to get this information.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The code associated with the Blog is in the Github repository [ <a target="_self" href="https://github.com/ambarishg/crewai_azure">https://github.com/ambarishg/crewai_azure</a> ]</div>
</div>

<h1 id="heading-flow-of-the-blog">Flow of the Blog</h1>
<p>The Blog follows the <strong><em>Concept to Code philosophy</em></strong></p>
<p>The first part is conceptual and is meant for all , executives and all technical audience. The second part goes into the implementation details targetting the technical audience.</p>
<h1 id="heading-agentic-rag">Agentic RAG</h1>
<p>Agentic RAG combines the power of RAG with AI Agents.</p>
<h2 id="heading-simple-rag">Simple RAG</h2>
<p>Let us first look at the simple RAG</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745545106950/6f0f3664-3a4a-4851-b3a7-b6d425ac7b81.jpeg" alt class="image--center mx-auto" /></p>
<p><strong>Steps</strong></p>
<ol>
<li><p>The user provides the Query</p>
</li>
<li><p>The query is used to search in the Vector Database</p>
</li>
<li><p>The results are returned to the orchestrator which sends the results to the LLM</p>
</li>
<li><p>The LLM utilizes the results to provide the answer</p>
</li>
</ol>
<h2 id="heading-agentic-rag-1">Agentic RAG</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745545822512/4f752b59-3ff3-4e62-b0c8-dbf98baf5fba.jpeg" alt class="image--center mx-auto" /></p>
<p><strong>Steps</strong></p>
<ol>
<li><p>The user provides the Query</p>
</li>
<li><p>The query is used to search in the Vector Database</p>
</li>
<li><p>The results are returned to the orchestrator which sends the results to the LLM</p>
</li>
<li><p>The LLM utilizes the results to provide the answer.</p>
</li>
</ol>
<p>4a. <mark>Answer sent by the LLM is satisfactory, you are OK and no need to go to the next steps</mark></p>
<ol start="5">
<li><p>Answer sent by the LLM is not satisfactory</p>
</li>
<li><p>Send the user input to the <strong>Agent</strong></p>
</li>
<li><p>The Agent provides the answer</p>
</li>
</ol>
<p>We can optimize this data flow a bit by not sending the results to the LLM at first. We can check if the search results are satisfactory before sending it to the LLM. The modified steps are as follows</p>
<ol>
<li><p>The user provides the Query</p>
</li>
<li><p>The query is used to search in the Vector Database</p>
</li>
<li><p>The results are returned to the orchestrator which sends the results to the LLM</p>
</li>
<li><p>Check the search results sent by the Vector Database is satisfactory or not</p>
</li>
<li><p><mark>If the search results are satisfactory send the search results to the LLM</mark></p>
</li>
</ol>
<p>5a. The LLM utilizes the results to provide the answer</p>
<ol start="6">
<li><p>If the search results are NOT satisfactory follow the next steps</p>
</li>
<li><p>Send the user input to the <strong>Agent</strong></p>
</li>
<li><p>The Agent provides the answer</p>
</li>
</ol>
<h2 id="heading-agent">Agent</h2>
<p>An Agent is a composite system.</p>
<p>It has the following components</p>
<ol>
<li><p>LLM</p>
</li>
<li><p>Tools</p>
</li>
<li><p>Planning</p>
</li>
<li><p>Memory</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745548433727/41a392b2-ebda-42be-bcdc-ed9996067dbb.jpeg" alt class="image--center mx-auto" /></p>
<p>We would be using <strong>Crewai</strong> as the agentic framework for this blog.</p>
<h2 id="heading-crewai">Crewai</h2>
<p>Crewai has the following components</p>
<ul>
<li><p>Crew</p>
<ul>
<li><p>Top Level Organization</p>
</li>
<li><p>Manages Agents</p>
</li>
<li><p>Ensures collaboration</p>
</li>
</ul>
</li>
<li><p>Agents</p>
<ul>
<li><p>Specialized Team members</p>
</li>
<li><p>Have specific Roles</p>
</li>
<li><p>Has Tasks associated with it</p>
</li>
<li><p>Has Tools</p>
</li>
</ul>
</li>
<li><p>Tasks</p>
<ul>
<li><p>Uses specific tools</p>
</li>
<li><p>Has Individual assignments</p>
</li>
</ul>
</li>
</ul>
<p>Crewai has other components but for this discussion we would like to keep this simple and restrict ourselves to these components only</p>
<p>This agentic RAG would be done by Crewai using</p>
<ol>
<li><p>Crew</p>
</li>
<li><p>1 Simple Agent</p>
</li>
<li><p>1 Task</p>
</li>
<li><p>2 tools [ A tool to search the Vector Database , another tool to search the Internet ]</p>
</li>
</ol>
<p>When the user enters the query , the agent takes control and is responsible for using the Tasks and the Tools</p>
<h3 id="heading-agent-1">Agent</h3>
<p>The Agent is a Router Assistant .</p>
<blockquote>
<p>You are an experienced assistant specializing in routing the flow to the appropriate tool.</p>
</blockquote>
<h3 id="heading-tasks">Tasks</h3>
<p>The Agent uses the Task. The Task does this</p>
<blockquote>
<p>Based on the user's questions: ask the VECTOR DB TOOL. If the answer is not found, ask the SEARCH TOOL for the answer</p>
</blockquote>
<p>Let us revisit</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745545822512/4f752b59-3ff3-4e62-b0c8-dbf98baf5fba.jpeg" alt class="image--center mx-auto" /></p>
<p>Note that the decision of choosing the VECTOR DB Tool or the Search Tool is taken by the Agent</p>
<h1 id="heading-implementation-details">Implementation details</h1>
<p>For this blog, we are using the following</p>
<ul>
<li><p>Vector Database - Qdrant Cloud</p>
</li>
<li><p>LLM - Azure Open AI</p>
</li>
<li><p>Agent Framework - Crew AI</p>
</li>
</ul>
<h3 id="heading-crew-ai-implementation-details">Crew AI Implementation details</h3>
<pre><code class="lang-plaintext"># Configure the LLM
llm = LLM(model=os.getenv("model"),)
</code></pre>
<p><strong>Step 1</strong> - We have first have to configure the LLM.</p>
<p><strong>Step 2</strong> Next step is to define the Agent</p>
<pre><code class="lang-plaintext"># Define the Agent
router_agent = Agent(
    role='Router Assistant',
    goal='Assistant to route to the appropriate tool',
    backstory="You are an experienced assistant specializing in routing the flow to the appropriate tool.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
    tools=[search_serper, rag_qdrant]  
)
</code></pre>
<p>Note the tools which we are using in this space</p>
<ol>
<li><p>VECTOR DB tool = rag_qdrant</p>
</li>
<li><p>SEARCH tool = search_serper</p>
</li>
</ol>
<p><strong>Step 3</strong> Define the Task</p>
<pre><code class="lang-plaintext"># Define the Task
task = Task(
    description=""""
    Based on the user's questions: {questions}, 
    ask the tool rag_qdrant. If the answer is not found, 
    ask the tool search for the answer 
    """,
    expected_output="A properly well worded answer from the tool which can be used by the user.",
    agent=router_agent
)
</code></pre>
<p><strong>Step 4</strong> Define the Crew</p>
<pre><code class="lang-plaintext"># Create the Crew
crew = Crew(
    agents=[router_agent],
    tasks=[task],
    verbose=True,
)
</code></pre>
<p><strong>Step 4</strong> Run the Crew</p>
<pre><code class="lang-plaintext"># User input for travel preferences
user_input = {
    "questions": "What is naked trust ?"
}

# Execute the Crew
result = crew.kickoff(inputs=user_input)
</code></pre>
<h2 id="heading-tools-implementation-details">Tools Implementation Details</h2>
<p>In this section we go into the details of the tools , Search Tool and the Vector DB Tool</p>
<h3 id="heading-search-tool">Search TOOL</h3>
<p>Let us explore in depth the Search Tool</p>
<pre><code class="lang-plaintext">@tool('SerperDevTool')
def search_serper(search_query: str):
    """Search the web for information on a given topic"""
    tool = SerperDevTool(
    search_url=SEARCH_URL
    n_results=2,
)
</code></pre>
<p>The Search URL is "https://google.serper.dev/search"</p>
<p>Here we are using the <strong>SerperDevTool</strong> for searching Google</p>
<h3 id="heading-vector-db-tool">VECTOR DB TOOL</h3>
<pre><code class="lang-plaintext">@tool('RAG_QDRANT')
def rag_qdrant(search_query:str):
    """Gets the answer from Qdrant"""
    rag_helper = RAGSystem(AzureOpenAIManager())
    return (rag_helper.query(search_query))
</code></pre>
<p>Vector DB Tool is the Tool which uses the Qdrant Vector DB and Azure Open AI as the LLM to search as well as utilizes the LLM to get the result. We will in the subsequent sections go deeper and see the internals of the RAGSystem class</p>
<h3 id="heading-rag-system-details">RAG System Details</h3>
<p>It has 2 components</p>
<ul>
<li><p>Retrieve - retrieve results from the Qdrant DB</p>
</li>
<li><p>Query - get the response from the user query using the Vector DB and the LLM.</p>
</li>
</ul>
<pre><code class="lang-plaintext">import os
import logging

class RAGSystem:
    def __init__(self, generator: ILLMHelper):
        self.generator = generator
        # Initialize Qdrant components
        logging.basicConfig(level=logging.INFO)
        logging.info("Loading model: %s", os.getenv("MODEL_NAME"))


    def retrieve(self, query: str) -&gt; str:
        self.model = SentenceTransformer(os.getenv("MODEL_NAME"))
        self.client = qc.QdrantClient(
            url=os.getenv("QDRANT_URL"),
            api_key=os.getenv("QDRANT_KEY")
        )
        vector = self.model.encode(query, convert_to_tensor=True).tolist()
        results = self.client.search(
            collection_name=os.getenv("QDRANT_COLLECTION"),
            query_vector=vector,
            limit=5
        )
        return "\n---\n".join([r.payload["content"] for r in results])

    def query(self, query: str) -&gt; str:
        context = self.retrieve(query)
        return self.generator.generate(context, query)
</code></pre>
<h4 id="heading-init-method">Init method</h4>
<p>Initializes the LLM to be used by the class.</p>
<p>In this blog we have used the Azure Open AI model</p>
<h4 id="heading-retrieve-method">Retrieve method</h4>
<ol>
<li><p>Uses SentenceTransformer model to create the embedding</p>
</li>
<li><p>Searches the VECTOR DB using the embedding</p>
</li>
<li><p>Gets the results from the embedding</p>
</li>
</ol>
<h4 id="heading-query-method">Query method</h4>
<ol>
<li><p>Uses the retrieve method to get the context</p>
</li>
<li><p>Uses the context and the query to get the response</p>
</li>
</ol>
<p><a target="_blank" href="https://github.com/ambarishg/crewai_azure">Github Repo for this Blog</a> is also provided</p>
]]></content:encoded></item><item><title><![CDATA[Why you should participate in Hackathons and Community Building]]></title><description><![CDATA[For the last couple of weeks, we participated in 2 events , a hackathon and a showcasing event. You might have guessed the outcome—we won what one of our friends called a "consolation prize." Reflecting on my life, I realize that, except for a few in...]]></description><link>https://blog.ambarishganguly.com/why-you-should-participate-in-hacks-competition</link><guid isPermaLink="true">https://blog.ambarishganguly.com/why-you-should-participate-in-hacks-competition</guid><category><![CDATA[hacking]]></category><category><![CDATA[#CommunityBuilding]]></category><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Tue, 01 Oct 2024 12:06:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727794150148/6b4d7d88-1303-4573-88db-2ca10d889841.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For the last couple of weeks, we participated in 2 events , a hackathon and a showcasing event. You might have guessed the outcome—we won what one of our friends called a <strong>"consolation prize."</strong> Reflecting on my life, I realize that, except for a few instances, I have mostly won consolation prizes.</p>
<p>Through the endless participation in these competitions , I had got the following treasures</p>
<ul>
<li><p>Discovered new ways of <strong>thinking. Thinking is perhaps the most important skill nowadays</strong></p>
</li>
<li><p>Discovered communities and all the nice people supporting the community organizing the hackathon. <mark>Having a feeling that you belong to a good #community is very surreal. </mark> My gratitude to all community builders / community leads for making our learning inclusive.</p>
</li>
</ul>
<p>What is more important is what happens <strong><em>after the competition</em></strong>. I prefer participating in competitions where the winning solutions are shared [ what can be more beneficial , is the code is also shared]</p>
<p>This is what I plan or usually do. There is always a difference between plan and execution. At least having a plan helps in the execution</p>
<ul>
<li><p>Congratulate the winners and everyone. Express gratitude to the organizers and community</p>
</li>
<li><p><strong>Go through the winning solutions</strong></p>
</li>
<li><p>Add things from the solutions to your toolkit so that your knowledge grows</p>
</li>
<li><p>Use the knowledge in your work place as well as in future hackathons / competitions</p>
</li>
<li><p>If possible, do a better writeup of your own solution and share with the community [ Sharing is caring ]</p>
</li>
</ul>
<p>To add a personal touch to the article, let me share something which might inspire people like me who have spent most of their life getting consolation prizes.</p>
<p>After countless attempts in various competitions without success, I finally achieved a major victory one morning. Just a week later, I received another award. Although my streak of participating without wins continues, I do occasionally win awards.</p>
<p>I have been a Kaggler for more than a decade. Though I am a Kaggler, I am not a GrandMaster or anything. Still the passion for looking into competition continues.</p>
<p>This week , Kaggle has made a very nice gesture for all Kagglers and has given <mark>badges and award</mark> acknowledgments for their contributions to the community.</p>
<p><mark>It feels surreal that the community remembers the efforts</mark></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727782896109/c0968f57-ec1a-41da-9a97-9f34730aea92.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727782944989/60b4ce4e-3e3d-499e-8f7f-50603041c36f.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727783032415/b9fba3a6-590c-4d76-835a-5da87185592a.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727783064936/9af76789-c782-43f8-9075-2167b3c0f96a.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727783442561/90cfb1b1-85d0-4ecf-b44d-ed294b90121b.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Understanding BBC News Q&A with Advanced RAG and Microsoft Phi3]]></title><description><![CDATA[In this blog, we would be doing question and answering on a news data feed.
The blog has 2 parts

Conceptual
Implementation details which comes as expected with code as well as the full code link            

Please feel free to choose both or at lea...]]></description><link>https://blog.ambarishganguly.com/microsoft-phi3-revealed-a-deep-dive-into-advanced-rag-methods-with-bbc-news</link><guid isPermaLink="true">https://blog.ambarishganguly.com/microsoft-phi3-revealed-a-deep-dive-into-advanced-rag-methods-with-bbc-news</guid><category><![CDATA[advanced rag]]></category><category><![CDATA[phi3]]></category><category><![CDATA[RAG ]]></category><category><![CDATA[semantic search]]></category><category><![CDATA[qdrant]]></category><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Mon, 13 May 2024 17:59:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716016081612/45ab6274-f291-4f4e-a117-7d2042f3396d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this blog, we would be doing question and answering on a news data feed.
The blog has 2 parts</p>
<ul>
<li><code>Conceptual</code></li>
<li><code>Implementation details</code> which comes as expected with code as well as the full code link            </li>
</ul>
<p>Please feel free to choose both or at least the <code>Conceptual</code> section</p>
<p>For this we are using the <a target="_blank" href="https://www.kaggle.com/datasets/gpreda/bbc-news/versions/801">BBC News Dataset</a> . This is a <code>self updating dataset</code> and is updated daily.</p>
<p>We would be learning Simple and Advanced RAG [ <strong>Retrieval Augmented Generation</strong>] using a small language model <strong>Phi3  mini 128K instruct</strong> through this blog.</p>
<p>We would be asking questions like <code>What is the news in Ukraine</code> and the application will provide the <strong>answers</strong> using this technique.</p>
<p>The Phi-3-Mini-128K-Instruct is a <code>3.8 billion-parameter</code>, lightweight, state-of-the-art open model trained using the Phi-3 datasets. In comparison GPT-4 has more than a trillion parameters and the smallest Llama 3 model has 8 billion.  These models such as Phi-3 are popularly known as <strong>SLM</strong>[ <code>Small Language Models</code>] while the likes of GPT-4, GPT-3.5 Turbo are known as <strong>LLM</strong>[ <code>Large Language Models</code>]</p>
<p>The concept of <strong>Word Embeddings</strong> would be widely used in the blog.
From the TensorFlow documentation <a target="_blank" href="https://www.tensorflow.org/tutorials/text/word_embeddings">word embeddings documentation</a></p>
<blockquote>
<p>Word embeddings give us a way to use an <strong>efficient, dense</strong> representation in which similar words have a similar encoding.   </p>
<p>Importantly, you do not have to specify this encoding by hand. An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify).   </p>
<p>Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).  </p>
<p>It is common to see word embeddings that are 8-dimensional (for small datasets), up to 1024-dimensions when working with large datasets. A higher dimensional embedding can capture fine-grained relationships between words, but takes more data to learn.        </p>
</blockquote>
<p><img src="https://i.imgur.com/tPWCPQ6.png" alt="Word Embeddings" /></p>
<p>RAG has 3 major components</p>
<ol>
<li><p>Ingestion</p>
</li>
<li><p>Querying</p>
</li>
<li><p>Generation </p>
</li>
</ol>
<hr />

<h2 id="heading-ingestion">Ingestion</h2>
<hr />

<p>For Ingestion, following are the key components</p>
<ol>
<li><p>Read the Data Source</p>
</li>
<li><p>Convert the read text into manageable chunks</p>
</li>
<li><p>Convert the manageable chunks into embeddings. This is a technique in which you convert text into an array of numbers</p>
</li>
<li><p>Store the embeddings into a vector database</p>
</li>
<li><p>Store the metadata such as the filename, text , and other relevant things in the vector database</p>
</li>
</ol>
<p><img src="https://i.imgur.com/vcccw0V.png" alt="Ingestion" /></p>
<hr />

<h2 id="heading-query-the-data-using-simple-rag">Query the data using Simple RAG</h2>
<hr />

<p>In the query component, we require 3 main components</p>
<ol>
<li><p><code>Orchestrating application</code> which is responsible for coordinating the interactions between the other components such as the user , vector database , Language Model .</p>
</li>
<li><p>Vector Database which stores the information</p>
</li>
<li><p>Language model which is helpful for generating the information after it has been provided <strong>contextual</strong> information</p>
</li>
</ol>
<hr />

<h2 id="heading-data-flow-of-a-simple-rag">Data Flow of a Simple RAG</h2>
<hr />

<ol>
<li><p>The user inputs the question . Example : <code>What is the news in Ukraine</code></p>
</li>
<li><p>The Orchestrating application uses a <strong>encoder</strong> to transform the text into embedding We have used the <code>all-MiniLM-L6-v2</code> of the Sentence Transformer as the encoder</p>
</li>
<li><p>The embedding is searched in the Vector database. In this case we have used the <strong>Qdrant</strong> database as the vector database</p>
</li>
<li><p>Search results are obtained from the vector database. We get the top K results from the vector database. The number of results to be obtained is configurable</p>
</li>
<li><p>A consolidated answer or popularly called <strong>context</strong> is prepared from the answers. In the implementation that we would do is done by concatenating the search results</p>
</li>
<li><p>This context is sent to the language model for generating the answers relevant for the context. In the implementation we have used a small language model <strong>Phi3</strong></p>
</li>
</ol>
<p><img src="https://i.imgur.com/b2xtcFG.png" alt="Simple RAG" /></p>
<hr />

<h2 id="heading-data-flow-of-a-advanced-rag">Data Flow of a Advanced RAG</h2>
<hr />

<p>The steps remain the same.</p>
<p>Except the following</p>
<p><code>Step 4</code> - Search results are obtained from the vector database. We get the top K2 results from the vector database. The number of results to be obtained is configurable. The results K2 is larger than K</p>
<p><code>Step 4A</code>. The results obtained are passed into a new type of block known as the <strong>cross-encoder</strong> which distills the number of results and provides a smaller set of results which has high similarity between the results and the query. These smaller set of results can be the top K results.</p>
<p><img src="https://i.imgur.com/EM41f5k.png" alt="Advanced RAG" /></p>
<hr />

<h2 id="heading-implementation-details">Implementation details</h2>
<hr />

<p>For this implementation , we have used the following</p>
<ol>
<li><p>Dataset - <strong>BBC News</strong> dataset</p>
</li>
<li><p>Vector Database - Qdrant. We have a used in memory version of Qdrant for demonstration</p>
</li>
<li><p>Language Model - Small language model <code>Phi3</code></p>
</li>
<li><p>Orchestrator application - Kaggle notebook</p>
</li>
</ol>
<h2 id="heading-setup">Setup</h2>
<h3 id="heading-install-the-python-libraries">Install the python libraries</h3>
<pre><code class="lang-plaintext">! pip install -U qdrant-client --quiet
! pip install -U sentence-transformers --quiet
</code></pre>
<h3 id="heading-imports">Imports</h3>
<pre><code class="lang-plaintext">from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer,CrossEncoder
</code></pre>
<h3 id="heading-sentence-transformer-encoder">Sentence Transformer Encoder</h3>
<p>Instantiate the sentence transformer encoder</p>
<pre><code class="lang-plaintext">encoder = SentenceTransformer("all-MiniLM-L6-v2")
</code></pre>
<h3 id="heading-create-the-qdrant-collection">Create the Qdrant Collection</h3>
<p>We are creating</p>
<ul>
<li><p>In memory qdrant collection</p>
</li>
<li><p>The collection name is BBC</p>
</li>
<li><p>The size of the vector embedding to be inserted is the dimention of the encoder . In this case , the dimension when evaluated is <code>384</code></p>
</li>
<li><p>Distance of similarity is <code>cosine</code></p>
</li>
</ul>
<pre><code class="lang-plaintext">qdrant = QdrantClient(":memory:")

qdrant.recreate_collection(
    collection_name="BBC",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)
</code></pre>
<h2 id="heading-data-ingestion">Data Ingestion</h2>
<h3 id="heading-read-the-dataset">Read the Dataset</h3>
<p>Read the BBC News Dataset</p>
<pre><code class="lang-plaintext">LIMIT = 500
df = pd.read_csv("/kaggle/input/bbc-news/bbc_news.csv")
docs = df[:LIMIT]
</code></pre>
<p><img src="https://i.imgur.com/ySY37Kx.png" alt="BBC News Dataset Rows" /></p>
<h3 id="heading-upload-the-documents-into-qdrant">Upload the documents into Qdrant</h3>
<pre><code class="lang-plaintext">import uuid
%%capture --no-display
qdrant.upload_points(
    collection_name="BBC",
    points=[
        models.PointStruct(
            id=str(uuid.uuid4()), 
            vector=encoder.encode(row[1]["title"]),
            payload={ "title":row[1]["title"] ,
                     "description":row[1]["description"] }
        )
        for row in docs.iterrows()
    ],
)
</code></pre>
<h3 id="heading-verify-the-documents-have-been-uploaded-into-qdrant">Verify the documents have been uploaded into Qdrant</h3>
<pre><code class="lang-plaintext">qdrant.count(
    collection_name="BBC",
    exact=True,
)
</code></pre>
<p>If you have reached till this point, Congratulations 👌 . You have been able to complete the understanding of the <strong>Data Ingestion into Qdrant</strong></p>
<h2 id="heading-query-the-qdrant-database">Query the Qdrant database</h2>
<h3 id="heading-query-for-the-user">Query for the user</h3>
<pre><code class="lang-plaintext">query_string = "Describe the news for Ukraine"
</code></pre>
<h3 id="heading-search-qdrant-for-the-query">Search Qdrant for the query</h3>
<p>For searching , note how we have converted the user input into a embedding</p>
<p><code>encoder.encode(query_string).tolist()</code></p>
<pre><code class="lang-plaintext">hits = qdrant.search(
    collection_name="BBC",
    query_vector=encoder.encode(query_string).tolist(),
    limit=35,
)

for hit in hits:
    print(hit.payload, "score:", hit.score)
</code></pre>
<h3 id="heading-refine-the-result-with-the-crossencoder">Refine the result with the CrossEncoder</h3>
<p>We are refining the results from the CrossEncoder .</p>
<p>We have got in our implementation K2 = 35 results from Qdrant. We have used the Cross Encoder <code>cross-encoder/ms-marco-MiniLM-L-6-v2</code> to refine the results The refined results in our case K = 5 after we pass the results through the cross encoder.</p>
<pre><code class="lang-plaintext">CROSSENCODER_MODEL_NAME = 'cross-encoder/ms-marco-MiniLM-L-6-v2'
RANKER_RESULTS_LIMIT = 5

user_input = query_string

contexts_list = []
for result in hits:
    contexts_list.append(result.payload['description'])

cross_encoder = CrossEncoder(CROSSENCODER_MODEL_NAME)
cross_inp = [[user_input, hit] for hit in contexts_list]
cross_scores = cross_encoder.predict(cross_inp)

cross_scores_text = []
cross_scores_length = len(cross_scores)
for i in range(cross_scores_length):
    d = {}
    d['score'] = cross_scores[i]
    d['text'] = contexts_list[i]
    cross_scores_text.append(d)

hits_selected = sorted(cross_scores_text, key=lambda x: x['score'], reverse=True)
contexts =""
hits = hits_selected[:RANKER_RESULTS_LIMIT]
</code></pre>
<h3 id="heading-create-the-context">Create the context</h3>
<p>We create the Context for RAG using the search results</p>
<pre><code class="lang-plaintext">contexts =""
for i in range(len(hits)):
    contexts  +=  hits[i]['text']+"\n---\n"
</code></pre>
<p>If you have reached till this point, Congratulations 👌 👌 again. You have been able to complete the understanding of the <strong>Getting Results from Qdrant [ Vector Database ] </strong></p>
<hr />

<h2 id="heading-generate-the-answer-with-the-small-language-model">Generate the answer with the Small Language Model</h2>
<p></p><hr />
Now we have got the context from the Vector Database , Qdrant and we would send the results to our small language model <strong>Phi3</strong><p></p>
<p>We also use the small language model <strong>microsoft/Phi-3-mini-128k-instruct</strong> model .</p>
<p>From the Hugging Face model card </p>
<blockquote>
<p>The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.</p>
</blockquote>
<p>From the <a target="_blank" href="https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/">Microsoft blog</a></p>
<blockquote>
<p>Thanks to their smaller size, Phi-3 models can be used in compute-limited inference environments. Phi-3-mini, in particular, can be used on-device, especially when further optimized with ONNX Runtime for cross-platform availability. The smaller size of Phi-3 models also makes fine-tuning or customization easier and more affordable. In addition, their lower computational needs make them a lower cost option with much better latency. The longer context window enables taking in and reasoning over large text content—documents, web pages, code, and more. Phi-3-mini demonstrates strong reasoning and logic capabilities, making it a good candidate for analytical tasks. </p>
</blockquote>
<pre><code><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(<span class="hljs-number">0</span>)

model = AutoModelForCausalLM.from_pretrained(
    <span class="hljs-string">"microsoft/Phi-3-mini-128k-instruct"</span>, 
    device_map=<span class="hljs-string">"cuda"</span>, 
    torch_dtype=<span class="hljs-string">"auto"</span>, 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"microsoft/Phi-3-mini-128k-instruct"</span>)
</code></pre><h3 id="heading-create-the-prompt">Create the prompt</h3>
<p>The prompt is created with 2 components </p>
<ul>
<li>Context which we created in the section <code>Create the context</code>         </li>
<li>User input which is the user input         </li>
</ul>
<pre><code>prompt = f<span class="hljs-string">""</span><span class="hljs-string">"Answer based on context:\n\n{contexts}\n\n{user_input}"</span><span class="hljs-string">""</span>
</code></pre><h3 id="heading-create-the-message-template">Create the message template</h3>
<pre><code>messages = [
     {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: prompt},
]
</code></pre><h3 id="heading-generate-the-message">Generate the message</h3>
<pre><code>%%time
model_inputs = tokenizer.apply_chat_template(messages, return_tensors=<span class="hljs-string">"pt"</span>)
model_inputs =  model_inputs.to(<span class="hljs-string">'cuda'</span>)
generated_ids = model.generate(model_inputs, max_new_tokens=<span class="hljs-number">1000</span>, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
</code></pre><h3 id="heading-print-the-answer">Print the answer</h3>
<pre><code>print(decoded[<span class="hljs-number">0</span>].split(<span class="hljs-string">"&lt;|assistant|&gt;"</span>)[<span class="hljs-number">-1</span>].split(<span class="hljs-string">"&lt;|end|&gt;"</span>)[<span class="hljs-number">0</span>])
</code></pre><h2 id="heading-code">Code</h2>
<p>The code can be found in the <strong>Kaggle</strong> notebook 
<a target="_blank" href="https://www.kaggle.com/code/ambarish/bbc-news-advanced-rag-phi3">BBC NEWS Advanced RAG PHI3</a></p>
]]></content:encoded></item><item><title><![CDATA[Generative AI Playlist]]></title><description><![CDATA[RAG [ Retrieval-Augmented Generation ] with AWS Bedrock and Qdrant in 8 minutes
https://youtu.be/DIWYqTj4vj0]]></description><link>https://blog.ambarishganguly.com/generative-ai-playlist-1</link><guid isPermaLink="true">https://blog.ambarishganguly.com/generative-ai-playlist-1</guid><category><![CDATA[generative ai]]></category><category><![CDATA[Generative AI, OpenAI, Azure OpenAI, LLaMA-2, PaLM API, Vertex AI, DALL-E, ChatGPT, Whisper]]></category><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Thu, 28 Dec 2023 14:01:22 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1703772813891/3e726328-99d3-4bff-867c-426d914d2280.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>RAG [ Retrieval-Augmented Generation ] with AWS Bedrock and Qdrant in 8 minutes</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/DIWYqTj4vj0">https://youtu.be/DIWYqTj4vj0</a></div>
]]></content:encoded></item><item><title><![CDATA[Question Answering System Very Simple using Azure Open AI]]></title><description><![CDATA[Overview
The session gives an overview of Azure AI and the position of Azure OpenAI in the Azure AI ecosystem. We focus on a question and answering system and show the different components of the built question answering system
✅ Header
✅ Context
✅ P...]]></description><link>https://blog.ambarishganguly.com/generative-ai-playlist</link><guid isPermaLink="true">https://blog.ambarishganguly.com/generative-ai-playlist</guid><category><![CDATA[generative ai]]></category><category><![CDATA[Azure OpenAI]]></category><dc:creator><![CDATA[Ambarish Ganguly]]></dc:creator><pubDate>Thu, 28 Dec 2023 13:55:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1703772614582/5015de9b-ca08-46b6-9a30-b14e21fcb8cd.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-overview">Overview</h1>
<p>The session gives an overview of Azure AI and the position of Azure OpenAI in the Azure AI ecosystem. We focus on a question and answering system and show the different components of the built question answering system</p>
<p>✅ Header</p>
<p>✅ Context</p>
<p>✅ Prompt</p>
<p>We also have a sneak peak on prompt engineering and delve in the details of different types of models</p>
<p>⭐ text davinci</p>
<p>⭐ gpt 3.5</p>
<p>⭐ gpt 4</p>
<p>We also look deeper in the code and the difference of implementations of the various models</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/x8t2mIQNnbw">https://youtu.be/x8t2mIQNnbw</a></div>
]]></content:encoded></item></channel></rss>