Overview of Agentic Search and Deep Research

Agentic Search and Deep Research serve two very different masteries: one is about finding, and the other is about synthesizing. But in practice, they are often intertwined. Agentic Search is the “engine” that powers Deep Research, while Deep Research is the “full-service agency” (the complete project from plan to final report). Internally, it uses Large Language Models (LLMs) to interleve reasoning and tool use action, allowing it to “think” about what it needs to find, then “act” by calling tools, and “reflect” on the results to refine its next actions.

Agentic Search is a sophisticated, iterative search methodology powered by multi-hop reasoning and tool use abilities.

  • Multi-hop reasoning allows the agent to fully utilize LLMs’ test-time-scaling capabilities to perform complex, multi-step atctions. This ability allows it to break complext query into a logical chain, or refine action based on what it finds or what errors it encounters. It can also spawn multiple “sub-agents” to work in parallel on different parts of the query, then gather the results together.

    Notes: previous non-thinking model make use of explicit “prompt engineering” and ReAct1 (Reason + Act) framework to achieve similar results, but current modern generation of reasoning models (e.g., o3/Gemini 3) can do this natively and seemlessly, with better accuracy. Because they are native reasoning models trained on “browsing trajectories” and have much larger context windows.

  • Tool use (often called Function Calling) allows the agent to step outside its own static training data to get “ground truth.” Key tools for Agentic Search include:

    • Web Browsers: Integrated tool or fine-grained control tools (searching, clicking, browsing or url validation etc..) to fetch live data (stock prices, news, weather).
    • Python Sandboxes(e.g. Calculators): To ensure that if it finds two numbers, it doesn’t “hallucinate” the math but calculates the result exactly. While modern LLMs are pretty good at math, they can still make mistakes, especially with large numbers or complex calculations. By using a Python sandbox, the agent can execute code to get precise results.
    • File Parsers: To “read” a 100-page PDF or a complex CSV file it just downloaded.
    • Optional Internal Knowledge Bases (RAG api): To check if the public web info contradicts your private company data.

A specific example of how Agentic Search works in practice: If you ask for “How does the latest GDP growth in the country where ‘Company X’ is headquartered affect their 2026 stock forecast?” It reason about query and realize that it need to call search tool to find where Company X is headquartered (e.g., France) and analyst reports for Company X. Then it will try to find the latest GDP growth data for France for cross-validation. For each search steps, it doesn’t just take your query at face value; it interprets the intent of your query and expand the search query to be more specific (e.g., “France GDP growth 2025” and “France GDP growth 2026” instead of just “GDP growth”). If the returned results are too long, it will use the summarization tool to extract the key data points. It can also use tools like Python code execution to analyze data tables it finds, or multimodal vision to extract information from images and charts. The LLMs’ are pretty good at processing complicate information, it cross-references sources to ensure the data is reliable before passing it on. Once it has all the pieces, it combines the GDP data with the analyst reports to form a reasoned conclusion.

Without multi-hop, the AI would likely fail because no single webpage contains all those specific pieces of information linked together.

Technical Note: You might also hear the term RAG (Retrieval-Augmented Generation). Agentic Deep Search is effectively a high-level implementation of RAG that has the autonomy to browse the live web rather than just a local database.

Deep Research is an end-to-end autonomous workflow. It uses Agentic Deep Search as a tool to accomplish a larger goal. The system aims to automate the time-consuming process of literature review and research synthesis.

Here is a general work flow of how a Deep Research system (like OpenAI’s “Deep Research” or Gemini’s research mode) is working with Agentic Search:

  • User Query: “Write a report on the impact of salt-water batteries on the EV market.”
  • Deep Research Layer:
    • Plans: It breaks the complex topic (e.g., “the impact of salt-water batteries on the EV market.”) into a multi-chapter outline (e.g. Market share, Chemistry, Pros/Cons, Key Players).
    • Executes: It runs multiple Agentic Deep Searches for each chapter.
      • Agentic Deep Search Layer: For the “Key Players” section, the agent searches for companies, finds a list, realizes some are defunct, and performs a follow-up search to find active startups.
  • Synthesis: The research layer combines all those verified findings and writes a cohesive, structured document (like a 10-page report).

The transition from basic keyword searches to modern “Deep Research” has been a rapid four-year evolution of architecture, moving from static information retrieval to autonomous, goal-oriented reasoning.


🗺️ The Development Roadmap: 2022–2026

Phase 1: Simple Search (The “Keyword” Era)

Timeline: Pre-2022

  • Technology: BM25 / Inverted Indexing. Search engines like Google and Bing matched specific keywords in your query to keywords on a page.
  • The Experience: You received a “10 blue links” result. The “intelligence” was purely in ranking the most popular or authoritative pages.
  • Limitation: It didn’t understand meaning. Searching for “how to fix a leaky faucet” might just return pages containing those exact words, even if they weren’t helpful tutorials.

Phase 2: Conversational Search (The “Naive RAG” Era)

Timeline: late 2022 – 2023

  • Technology: Retrieval-Augmented Generation (RAG). When ChatGPT and early Perplexity launched, they added a “retrieval” step.
  1. The AI turns your question into a Vector Embedding (a mathematical representation of the meaning).
  2. It pulls 3–5 top snippets from the web.
  3. It “reads” them and summarizes an answer.
  • The Experience: Direct answers with citations. No more clicking through links.
  • Limitation: “One-and-done.” If the first search didn’t find the answer, the AI would often hallucinate or say it didn’t know. It couldn’t “try again.”

Phase 3: Deep Search & Deep Research

Timeline: 2024 – 2026 (Current State)

  • Technology: System 2 Reasoning & Asynchronous Planning.

    • Thinking Models (o1/o3/Gemini 3): These models use Reinforcement Learning to “deliberate” for minutes before acting.
    • Multi-Step Queries: It could break a question like “Compare Tesla and BYD’s 2024 revenue” into two separate searches, then combine the data.
    • Self-Correction: If the AI retrieved a page and realized it was irrelevant, it would rewrite its own search query and try again.
    • Asynchronous Agents: The AI doesn’t just wait for a response; it spawns dozens of “sub-agents” to research different chapters of a report simultaneously.
  • The Experience: High accuracy for specific facts.

  • Synthesis: It doesn’t just summarize; it analyzes. It looks for contradictions between sources, executes Python code to verify data, and creates structured 20-page reports.

  • Limitation: Narrow Scope. It was still “searching” for answers, not “investigating” a topic. It lacked long-term planning.

  • The Experience: You provide a goal (e.g., “Perform a due diligence report on this startup”), and 10 minutes later, you get a professional-grade document with 100+ citations.

  • Limitation: It still couldn’t “think” about the process of research. It was good at finding and summarizing, but it didn’t have a “research strategy” or the ability to “pivot” its approach based on what it found. It lacked continuous operation and long-term planning.


  1. Yao, Shunyu, et al. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv:2210.03629, arXiv, 10 Mar. 2023. arXiv.org, https://doi.org/10.48550/arXiv.2210.03629↩︎