Architecting agentic RAG systems

Our newest course dropped: PhiloAgents (open-source, production agents, RAG, observability, you name it!)

Paul Iusztin

Apr 19, 2025

Today, something wild happened:

21,000+ followers
190,000+ monthly views

Decoding ML has become one of the go-to places to learn how to actually ship AI systems.

And it’s all because of you.

The builders. The engineers. The founders. The ones tired of hype and looking for clarity.

Thank you for being here. For reading. Building. Asking smart questions. Pushing me to go deeper.

This week’s topics:

Architecting agentic RAG systems
Prompt engineering from zero to hero - Master the art of AI interaction
Our newest open-source course just dropped: PhiloAgents

Architecting agentic RAG systems

If you want to build agents that don’t break in production...

You must start with the most important pattern:

Agentic RAG

In the Second Brain AI Assistant course, we released the Agentic RAG module

... and today, I’m breaking down how it’s architected from the ground up.

What is the Agentic RAG module?

The Agentic RAG module takes a user query via a Gradio UI.

The output is a reasoned answer, generated through:

Semantic search from a vector DB
Multi-step reasoning via an agent
Optional summarization through a model/API

Online vs. Offline pipelines

Here’s the thing...

Most GenAI pipelines are offline.

They’re pre-scheduled, long-running jobs.

(Using tools such as ZenML)

But this module is online.

It runs as a standalone Python app and powers real-time user interactions.

We intentionally decoupled it from our offline feature/training pipelines to preserve a clean separation between ingestion and inference.

Agentic Layer: Tooling Breakdown

The agent is built using SmolAgents (by Hugging Face) and is powered by 3 tools:

"What can I do?" Tool
→ Helps users explore agent capabilities
Retriever Tool
→ Queries MongoDB's vector and text indexes (populated offline)
Summarization Tool
→ Hits a REST API for refining long-form web content

Each tool was picked to reflect real-world agent scenarios:

Python logic
DB queries
External API calls

The agent uses these tools iteratively to minimize cost and latency.

All reasoning happens in real time with full traceability via the Gradio UI.

What happens under the hood?

User submits a query

The agent decides: “Do I need context?”

If yes → queries the vector DB (retriever tool)

Retrieved chunks optionally go through summarization

The agent reasons → repeats if more context is needed to answer the question fully

Once confident → final response returned

We can swap the summarization model for full customization between our custom small language model (hosted as a real-time API on Hugging Face) and OpenAI (as a fallback).

It’s modular, testable, and future-proof.

Could we have used a simple workflow?

Yes.

But the agentic approach unlocks scalability and extensibility.

This is critical if you want to:

Add new tools
Support multi-turn reasoning
Layer in observability
or eval logic later

But this is just the beginning.

We’ll be expanding this system with observability:

Evaluation
Prompt monitoring

(powered by Opik from Comet)

If you want to learn how real AI agents are built, start with the Agentic RAG module.

You can learn more about this in lesson 6 of our Second Brain AI Assistant free course

LLMOps for production agentic RAG

Paul Iusztin and Anca Ioana Muscalagiu

Mar 20

Read full story

Prompt Engineering from Zero to Hero - Master the Art of AI Interaction (Affiliate)

If you think you “know” prompt engineering...

Think again.

I’ve been following

Nir Diamant

for a while now - his GitHub repos and Substack have become go-to resources for AI practitioners.

He has a rare gift:

The ability to break down complex GenAI topics like he’s teaching a 7-year-old (without dumbing anything down).

... And now he’s done it again with a new eBook:

Prompt Engineering from Zero to Hero - Master the Art of AI Interaction

Get your copy *(**20% off** with code PAUL)*

This isn’t just another “use more bullet points in your prompt” kind of guide.

It’s a practical deep dive with:

Code examples
Real-world exercises
Clear explanations of common mistakes
And the subtle mechanics behind great AI interaction

Get 20% off with code PAUL

Get your copy

Our newest open-source course just dropped: PhiloAgents

Everyone’s talking about agents.

Few are building systems that *actually work* in production.

But we’ve fixed that...

Our newest open-source course just dropped: PhiloAgents

It’s a collaboration between Miguel Otero Pedrido from

The Neural Maze

and Decoding ML.

I can hands down say this is one of the most creative, technical builds we’ve released.

Forget scripted avatars.

In this course, you’ll simulate a village of AI philosophers (e.g., Plato, Aristotle, Turing and more).

... and each of them are powered by a real AI agent.

They don’t follow a script.
They reflect. Argue. Debate.

And you don’t just play it.
You build it.

Here's what you’ll learn in the course:

How to architect production-ready agentic systems
How to implement agentic RAG apps using LangGraph (by LangChain)
How to deploy the agent using FastAPI, WebSockets and Docker
How to integrate Groq (LLM API), MongoDB (document database), and Opik (LLMOps)
How to observe real agentic applications (monitoring + evaluation)
How to bring NPCs to life in a game-like simulation

All wrapped into a 6-lesson course with video + written content.

The truth is that learning how to build agent systems in production can feel chaotic.

You’ve got RAG, agents, backend infra, frontend interfaces, monitoring, Docker, and more.

This course glues all those pieces together with a use case that’s equal parts playful and powerful.

The first lesson is already live. Start digging into our free course and learn to build production-ready agents ↓

Build your gaming simulation AI agent

Paul Iusztin

Apr 3

Read full story

Whenever you’re ready, there are 3 ways we can help you:

Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.

Images

If not otherwise stated, all images are created by the author.

Architecting agentic RAG systems

Our newest course dropped: PhiloAgents (open-source, production agents, RAG, observability, you name it!)

This week’s topics:

Architecting agentic RAG systems

LLMOps for production agentic RAG

Prompt Engineering from Zero to Hero - Master the Art of AI Interaction (Affiliate)

Our newest open-source course just dropped: PhiloAgents

Build your gaming simulation AI agent

Whenever you’re ready, there are 3 ways we can help you:

Images

Discussion about this post