Architecting agentic RAG systems
Our newest course dropped: PhiloAgents (open-source, production agents, RAG, observability, you name it!)
Today, something wild happened:
21,000+ followers
190,000+ monthly views
Decoding ML has become one of the go-to places to learn how to actually ship AI systems.
And it’s all because of you.
The builders. The engineers. The founders. The ones tired of hype and looking for clarity.
Thank you for being here. For reading. Building. Asking smart questions. Pushing me to go deeper.
This week’s topics:
Architecting agentic RAG systems
Prompt engineering from zero to hero - Master the art of AI interaction
Our newest open-source course just dropped: PhiloAgents
Architecting agentic RAG systems
If you want to build agents that don’t break in production...
You must start with the most important pattern:
Agentic RAG
In the Second Brain AI Assistant course, we released the Agentic RAG module
... and today, I’m breaking down how it’s architected from the ground up.
What is the Agentic RAG module?
The Agentic RAG module takes a user query via a Gradio UI.
The output is a reasoned answer, generated through:
Semantic search from a vector DB
Multi-step reasoning via an agent
Optional summarization through a model/API
Online vs. Offline pipelines
Here’s the thing...
Most GenAI pipelines are offline.
They’re pre-scheduled, long-running jobs.
(Using tools such as ZenML)
But this module is online.
It runs as a standalone Python app and powers real-time user interactions.
We intentionally decoupled it from our offline feature/training pipelines to preserve a clean separation between ingestion and inference.
Agentic Layer: Tooling Breakdown
The agent is built using SmolAgents (by Hugging Face) and is powered by 3 tools:
"What can I do?" Tool
→ Helps users explore agent capabilitiesRetriever Tool
→ Queries MongoDB's vector and text indexes (populated offline)Summarization Tool
→ Hits a REST API for refining long-form web content
Each tool was picked to reflect real-world agent scenarios:
Python logic
DB queries
External API calls
The agent uses these tools iteratively to minimize cost and latency.
All reasoning happens in real time with full traceability via the Gradio UI.
What happens under the hood?
User submits a query
The agent decides: “Do I need context?”
If yes → queries the vector DB (retriever tool)
Retrieved chunks optionally go through summarization
The agent reasons → repeats if more context is needed to answer the question fully
Once confident → final response returned
We can swap the summarization model for full customization between our custom small language model (hosted as a real-time API on Hugging Face) and OpenAI (as a fallback).
It’s modular, testable, and future-proof.
Could we have used a simple workflow?
Yes.
But the agentic approach unlocks scalability and extensibility.
This is critical if you want to:
Add new tools
Support multi-turn reasoning
Layer in observability
or eval logic later
But this is just the beginning.
We’ll be expanding this system with observability:
Evaluation
Prompt monitoring
If you want to learn how real AI agents are built, start with the Agentic RAG module.
You can learn more about this in lesson 6 of our Second Brain AI Assistant free course
Prompt Engineering from Zero to Hero - Master the Art of AI Interaction (Affiliate)
If you think you “know” prompt engineering...
Think again.
I’ve been following
for a while now - his GitHub repos and Substack have become go-to resources for AI practitioners.He has a rare gift:
The ability to break down complex GenAI topics like he’s teaching a 7-year-old (without dumbing anything down).
... And now he’s done it again with a new eBook:
Prompt Engineering from Zero to Hero - Master the Art of AI Interaction

This isn’t just another “use more bullet points in your prompt” kind of guide.
It’s a practical deep dive with:
Code examples
Real-world exercises
Clear explanations of common mistakes
And the subtle mechanics behind great AI interaction
Get 20% off with code PAUL
Our newest open-source course just dropped: PhiloAgents
Everyone’s talking about agents.
Few are building systems that *actually work* in production.
But we’ve fixed that...
Our newest open-source course just dropped: PhiloAgents
It’s a collaboration between Miguel Otero Pedrido from
and Decoding ML.I can hands down say this is one of the most creative, technical builds we’ve released.
Forget scripted avatars.
In this course, you’ll simulate a village of AI philosophers (e.g., Plato, Aristotle, Turing and more).
... and each of them are powered by a real AI agent.
They don’t follow a script.
They reflect. Argue. Debate.
And you don’t just play it.
You build it.
Here's what you’ll learn in the course:
How to architect production-ready agentic systems
How to implement agentic RAG apps using LangGraph (by LangChain)
How to deploy the agent using FastAPI, WebSockets and Docker
How to integrate Groq (LLM API), MongoDB (document database), and Opik (LLMOps)
How to observe real agentic applications (monitoring + evaluation)
How to bring NPCs to life in a game-like simulation
All wrapped into a 6-lesson course with video + written content.
The truth is that learning how to build agent systems in production can feel chaotic.
You’ve got RAG, agents, backend infra, frontend interfaces, monitoring, Docker, and more.
This course glues all those pieces together with a use case that’s equal parts playful and powerful.
The first lesson is already live. Start digging into our free course and learn to build production-ready agents ↓
Whenever you’re ready, there are 3 ways we can help you:
Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.
Images
If not otherwise stated, all images are created by the author.