Back to blogs

LangChain Agents 101: Building Your First Autonomous Tool-User From Scratch

April 05, 2024

LangChain Agents 101: Building Your First Autonomous Tool-User From Scratch

Alright, let's cut to the chase. You landed here probably searching for "LangChain Agents 101." The internet is awash with tutorials showing you how to initialize_agent and call it a day. But as someone who's building towards replicating cognitive architectures, particularly the modularity and efficiency of a human brain using Mixture-of-Experts (MoE) architectures, I've got a problem with that approach.

Why We're Not Using LangChain (and What We're Doing Instead)

Frankly, frameworks like LangChain, while offering a low barrier to entry, often become bloatware. They abstract away critical details, introduce unnecessary overhead, and limit the granular control essential for performance tuning and genuine understanding. My goal isn't just to string together APIs; it's to deeply understand and eventually replicate sophisticated intelligence. Bloated frameworks are just... noise in that pursuit.

So, while the title says "LangChain Agents 101," consider this "Agentic Systems 101: Building from First Principles." We're going to build a functional agent that reasons and uses tools – precisely what LangChain agents do – but we'll do it with raw API calls, emphasizing efficiency, clarity, and direct control. No magic initialize_agent function here. We're rolling up our sleeves.

This approach is crucial for anyone serious about AI research. When you're thinking about specialized "expert" modules in an MoE setup, you need to dictate exactly how they interact, what data they see, and how quickly they process it. Obscured abstractions are an enemy of progress here.

The Architecture: Replicating Cognitive Functions, Efficiently

At its core, an agent that can reason and use tools mimics a simplified cognitive loop: observe, think, act.

  1. The LLM as the "Frontal Lobe": This is our reasoning engine. It takes in the task, the available tools, and the history of interactions, then outputs a Thought and an Action (or a Final Answer). We interface with it directly via its API.
  2. The ReAct Framework: This is the cognitive pattern we'll follow. It's not a LangChain invention; it's a prompt engineering technique.
    • Thought: The LLM's internal monologue, planning the next step.
    • Action: The specific tool the LLM decides to use.
    • Action Input: The parameters for that tool.
    • Observation: The result returned by the tool, which feeds back into the LLM for the next Thought.
  3. Defining "Tools": The Sensory-Motor System: These are just functions that perform external tasks (like searching the web, querying a database, or running code). We'll define them as simple Python callables. When building MoE systems, these "tools" can be thought of as highly specialized expert networks or external peripherals.

Building It Raw: Pythonic Purity

Let's get to the code. We'll use a simple requests call for our LLM interaction (mimicking an OpenAI API call) and standard Python functions for our tools. No unnecessary classes or abstractions that you didn't write yourself.

First, a minimal LLM interface. Replace this with your actual OpenAI API key and endpoint. For this example, I'll use a mock call_llm function to keep it runnable without credentials.

import json
import re
import requests # For making actual API calls
import os # To get API keys
 
# --- 1. LLM Interface ---
# Replace with your actual OpenAI API key or a compatible service
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "YOUR_MOCK_API_KEY") 
OPENAI_ENDPOINT = "https://api.openai.com/v1/chat/completions" # Or your local LLM endpoint
 
def call_llm(prompt: str, model: str = "gpt-4o-mini") -> str:
    """Makes a raw API call to an OpenAI-compatible LLM."""
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {OPENAI_API_KEY}",
    }
    payload = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.0, # Keep it deterministic for agentic behavior
    }
    
    try:
        # In a real scenario, handle async with 'httpx' or 'aiohttp' for performance
        response = requests.post(OPENAI_ENDPOINT, headers=headers, json=payload)
        response.raise_for_status() # Raise an exception for bad status codes
        result = response.json()
        return result['choices'][0]['message']['content']
    except requests.exceptions.RequestException as e:
        print(f"Error calling LLM: {e}")
        return f"ERROR: Could not communicate with LLM: {e}"
    except KeyError:
        print(f"Error parsing LLM response: {response.json()}")
        return "ERROR: Malformed LLM response."
 
# --- MOCK LLM CALL (for local testing without an actual API key) ---
# Uncomment and use this if you don't have an API key configured.
def mock_call_llm(prompt: str) -> str:
    """A mock LLM for local testing. Simulates ReAct responses."""
    print("\n--- MOCK LLM CALL ---")
    if "what is the capital of France" in prompt.lower() and "Observation:" not in prompt:
        return "Thought: The user is asking for the capital of France. I should use the search tool.\nAction: search\nAction Input: capital of France"
    elif "capital of france" in prompt.lower() and "Observation:" in prompt:
         return "Thought: The search results clearly state that Paris is the capital. I have enough information now.\nFinal Answer: The capital of France is Paris."
    elif "who won the world series in 2020" in prompt.lower() and "Observation:" not in prompt:
        return "Thought: I need to find the winner of the 2020 World Series. The search tool is appropriate for this.\nAction: search\nAction Input: 2020 World Series winner"
    elif "2020 world series winner" in prompt.lower() and "Observation:" in prompt:
        return "Thought: The search results indicate the Los Angeles Dodgers won. I can now provide the final answer.\nFinal Answer: The Los Angeles Dodgers won the World Series in 2020."
    elif "current date" in prompt.lower():
        return "Thought: The user is asking for the current date. I should use the get_current_date tool.\nAction: get_current_date\nAction Input: today"
    elif "current date" in prompt.lower() and "Observation:" in prompt:
        return "Thought: I have the current date from the tool. I can provide the final answer.\nFinal Answer: The current date is October 26, 2023." # This will be dynamic with real tool
    else:
        return "Thought: I am unable to answer this question with the available tools or based on prior context. I need to be more explicit or use a different tool. Final Answer: I cannot answer this question based on the provided information and tools."
 
# Choose which LLM function to use
LLM_FUNC = call_llm if OPENAI_API_KEY != "YOUR_MOCK_API_KEY" else mock_call_llm
 
# --- 2. Tool Definitions ---
# These are simple functions. No special LangChain Tool wrappers needed.
def search_tool(query: str) -> str:
    """A search engine. Use this to answer questions about current events or facts.
    Input is the search query (e.g., "latest news").
    """
    print(f"DEBUG: Executing search for '{query}'...")
    # In a real system, you'd integrate with a search API (e.g., SerpApi, Google Custom Search).
    # For this example, we'll return a deterministic mock result.
    if "capital of France" in query:
        return "Observation: Paris is the capital and most populous city of France."
    elif "2020 World Series winner" in query:
        return "Observation: The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays."
    else:
        return f"Observation: No specific search result found for '{query}'. (This is a mock search)."
 
from datetime import datetime
def get_current_date(input_str: str) -> str:
    """Returns the current date. Input is ignored (can be 'today' or empty)."""
    print(f"DEBUG: Executing get_current_date with input '{input_str}'...")
    return f"Observation: The current date is {datetime.now().strftime('%B %d, %Y')}."
 
# Store tools in a dictionary for easy lookup and description generation
TOOLS = {
    "search": {
        "func": search_tool,
        "description": search_tool.__doc__.strip()
    },
    "get_current_date": {
        "func": get_current_date,
        "description": get_current_date.__doc__.strip()
    }
}
 
# --- 3. Agent Prompt Template ---
# This is crucial for guiding the LLM to follow the ReAct pattern.
def generate_agent_prompt(task: str, thought_history: list[str], current_observation: str, tools_dict: dict) -> str:
    """Generates the prompt for the LLM based on the task, history, and available tools."""
    tool_descriptions = "\n".join([
        f"{name}: {data['description']}"
        for name, data in tools_dict.items()
    ])
    
    # Format the history to be clean for the LLM
    history_str = "\n".join(thought_history)
    
    prompt = f"""You are an AI assistant designed to answer questions and solve problems using tools.
You have access to the following tools:
{tool_descriptions}
 
The current task is: "{task}"
 
You should strictly follow this format:
Thought: You must always think about what to do next.
Action: The name of the tool to use, must be one of [{', '.join(tools_dict.keys())}]
Action Input: The input string to the tool
Observation: The result from the tool
... (This Thought/Action/Action Input/Observation cycle can repeat N times)
Thought: I have enough information to provide a final answer.
Final Answer: The ultimate answer to the original question.
 
If you believe you have already answered the question, state the Final Answer.
 
Current conversation history:
{history_str}
{f"Observation: {current_observation}" if current_observation else ""}
 
Thought:
"""
    return prompt
 
# --- 4. The Agent Loop ---
# This is where the magic happens – we manually control the ReAct cycle.
def run_agent(task: str, max_iterations: int = 5) -> str:
    """Runs the ReAct agent loop."""
    thought_history = []
    current_observation = ""
    
    for i in range(max_iterations):
        print(f"\n--- Iteration {i+1}/{max_iterations} ---")
        prompt = generate_agent_prompt(task, thought_history, current_observation, TOOLS)
        
        # print(f"LLM Input:\n```\n{prompt}\n```") # Uncomment to see the full prompt
        
        llm_response = LLM_FUNC(prompt)
        print(f"LLM Output:\n```\n{llm_response}\n```")
        
        thought_history.append(llm_response) # Keep full LLM response for history
        
        # Parse LLM response for Thought, Action, Action Input
        thought_match = re.search(r"Thought: (.*?)(?=\nAction:|\nFinal Answer:|$)", llm_response, re.DOTALL)
        action_match = re.search(r"Action: (\w+)\n", llm_response)
        action_input_match = re.search(r"Action Input: (.*)", llm_response)
        final_answer_match = re.search(r"Final Answer: (.*)", llm_response, re.DOTALL)
        
        thought = thought_match.group(1).strip() if thought_match else ""
        action = action_match.group(1).strip() if action_match else ""
        action_input = action_input_match.group(1).strip() if action_input_match else ""
        
        if final_answer_match:
            print(f"Agent providing Final Answer!")
            return final_answer_match.group(1).strip()
        
        if action and action_input:
            if action in TOOLS:
                print(f"Agent executing tool '{action}' with input '{action_input}'...")
                tool_func = TOOLS[action]["func"]
                current_observation = tool_func(action_input)
                print(f"Tool Observation: {current_observation}")
            else:
                current_observation = f"Observation: Error: Unknown tool '{action}'. Available tools: {', '.join(TOOLS.keys())}"
                print(current_observation)
        else:
            current_observation = "Observation: Error: LLM did not provide a valid Action/Action Input, or reached an ambiguous state."
            print(current_observation)
            
    return f"Agent reached maximum iterations ({max_iterations}) without a Final Answer. Last thought: {thought_history[-1] if thought_history else 'None'}"
 
# --- Example Usage ---
if __name__ == "__main__":
    print("\n--- Running Agent for: 'What is the capital of France?' ---")
    result1 = run_agent("What is the capital of France?", max_iterations=3)
    print(f"\nFinal Result 1: {result1}")
 
    print("\n--- Running Agent for: 'Who won the World Series in 2020?' ---")
    result2 = run_agent("Who won the World Series in 2020?", max_iterations=3)
    print(f"\nFinal Result 2: {result2}")
 
    print("\n--- Running Agent for: 'What is the current date?' ---")
    result3 = run_agent("What is the current date?", max_iterations=3)
    print(f"\nFinal Result 3: {result3}")
 
    print("\n--- Running Agent for: 'Tell me a joke.' (Should fail gracefully) ---")
    result4 = run_agent("Tell me a joke.", max_iterations=2)
    print(f"\nFinal Result 4: {result4}")

What I Learned (and What You Should Too)

This "from scratch" approach provides several key insights:

  1. ReAct is a Prompt Pattern, Not a Framework Feature: You don't need initialize_agent to implement ReAct. The power lies in meticulously crafting your prompt to guide the LLM's thought process and action selection. This direct control over prompt engineering is invaluable.
  2. Tools are Just Functions: Any callable Python function can be a tool. The "magic" is how you describe it to the LLM and how you parse the LLM's output to call it. This simplicity aligns perfectly with building specialized modules in an MoE system – each expert could be a "tool" or an aggregation of tools.
  3. Performance and Debugging Control: When you own the agent loop, you control everything. You can inspect every prompt, every LLM response, every tool call, and its observation. This is critical for debugging complex reasoning chains and optimizing latency. Frameworks often obscure these details, making performance bottlenecks and tricky errors a nightmare to diagnose. When I'm thinking about deploying expert models, latency and resource utilization are paramount.
  4. No Unnecessary Dependencies: This agent requires requests (for actual LLM calls) and re (for parsing), both standard. No heavy frameworks adding bloat to your requirements.txt. For production, I'd typically move this to TypeScript, leveraging its type safety and performance characteristics, but the core logic remains the same.
  5. Understanding the Fundamentals is Power: By building this yourself, you understand the core mechanics. This knowledge is transferable to any agentic system, regardless of the underlying LLM or the specific tools. It's the difference between knowing how to drive a car and knowing how to build an engine.

This kind of granular control isn't just about avoiding "bloat." It's essential when you're thinking about modular, specialized intelligence. My pursuit of replicating the human brain's MoE architecture demands this level of direct interaction and understanding. We're not just building LLM wrappers; we're trying to engineer intelligence, and that requires precise control over every cognitive cycle.