Tuesday, May 5, 2026

Tool Calling in LangChain

In the previous article Tools in LangChain With Examples we saw how you can create tools in LangChain and how to invoke those tools using invoke() method of the tool as every tool is a Runnable in LangChain which means it automatically inherits the .invoke() method from the Runnable interface. That works great when you are the one calling the tool. But the real magic happens when you let the LLM itself decide which tool to use, that’s where tool calling in LangChain comes into picture.

LangChain’s tool calling

Tools aren’t just standalone functions which you will execute manually, they become extensions of the LLM’s reasoning. You bind tools with the model so it can choose the appropriate tool call to get the result.

Integrating tools with the LLM

In LangChain you can use bind_tools() method to integrate your custom tools with the LLM. With bind_tools() method you can pass the list of tools to bind to the model.

By passing a list of tools to bind_tools(), you expose their name, description, and input schema to the model. This allows the LLM to recognize when a user query matches a tool’s purpose and respond with a structured ToolMessage, essentially a request to execute that tool.

It’s important to understand that the LLM does not read or execute your code directly. When it recognizes that a user request matches a tool’s purpose, it sends back a ToolMessage like:

{"name": "multiply", "args": {"a": 7, "b": 8}}

This is a signal that the tool should be run, but the execution itself is your responsibility.

Developer’s Role:

After receiving the tool call, following are the activities that needs to be done manually:

  1. Parsing the ToolMessage from the LLM’s response.
  2. Executing the tool with the provided arguments.
  3. Returning the result back to the model so it can continue reasoning and produce a polished answer.

If you want the orchestration to happen automatically, you’ll need to use an agent. Agents handle the loop of:

  1. Detecting tool calls,
  2. Running the tools,
  3. Feeding results back into the LLM,
  4. Continuing the conversation seamlessly.

In this article we’ll stick to manual bind_tools() workflow where the developer explicitly handles tool execution after the LLM emits a tool call. This approach helps you understand what’s happening under the hood before moving on to full agent automation.

Let’s try to make bind_tools() workflow clear with an example, we’ll use a multiply tool that multiplies two numbers.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_groq import ChatGroq
from dotenv import load_dotenv

load_dotenv()  # Load environment variables from .env file

# Multiplication tool
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers and return the result."""
    return a * b

# Initialize model
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.5)

llm_with_tools = llm.bind_tools([multiply])

# Now the LLM can decide when to call the tool
response = llm_with_tools.invoke("What is 3 times 7?")
print(response)

Full Output

content='' additional_kwargs={'tool_calls': [{'id': 'kj0h3cwtm', 'function': {'arguments': '{"a":3,"b":7}', 'name': 'multiply'}, 
'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 228, 'total_tokens': 247,
'completion_time': 0.058486455, 'completion_tokens_details': None, 'prompt_time': 0.034593443, 'prompt_tokens_details': None,
'queue_time': 0.159857826, 'total_time': 0.093079898}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_45180df409', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'} id='lc_run--019df212-a97c-7ab3-b5be-a48bdeb26b31-0' 
tool_calls=[{'name': 'multiply', 'args': {'a': 3, 'b': 7}, 'id': 'kj0h3cwtm', 'type': 'tool_call'}] invalid_tool_calls=[] usage_metadata={'input_tokens': 228, 'output_tokens': 19, 'total_tokens': 247}

If you inspect the output you will see a tool call, with name and args, is returned by the model.

tool_calls=[{'name': 'multiply', 'args': {'a': 3, 'b': 7}, 'id': 'kj0h3cwtm', 'type': 'tool_call'}]

But LangChain won’t automatically execute the tool for you, as LLM doesn’t actually read your code. You, as a developer need to parse this tools call and invoke the required tool.

Given below is the complete code.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_groq import ChatGroq
from dotenv import load_dotenv

load_dotenv()  # Load environment variables from .env file

# Multiplication tool
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers and return the result."""
    return a * b

# Initialize model
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.5)

llm_with_tools = llm.bind_tools([multiply])

# Now the LLM can decide when to call the tool
response = llm_with_tools.invoke("What is 3 times 7?")
print(response)

# if tool call is requested, it will be in response.tool_calls
if response.tool_calls:
    # loop through tool calls and execute them
    for tc in response.tool_calls:
        if tc["name"] == "multiply":
            result = multiply.invoke(tc["args"])
            print("Tool result:", result)

Output

Tool result: 21

Tool Calling with multiple tools

Here is another example, where we’ll see the workflow with multiple tool calls. Let’s say we have two different tools for addition and multiplication and tool call may contain both tools or any one of them. Workflow is as given below.

  1. LLM receives the user’s query.
  2. It decides whether the query needs external help (tool calling).
  3. If yes, appropriate ToolMessage is returned (tool call with name and args).
  4. The tool executes and returns a Result (e.g., live data, calculation, API response).
  5. That result is sent back to the LLM, which integrates it with the conversation.
  6. Finally, the LLM produces a polished, conversational answer for the user.

If you want to use a simple loop as shown in the previous example, then the code would be as given below.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_groq import ChatGroq
from dotenv import load_dotenv

load_dotenv()  # Load environment variables from .env file

# Multiplication tool
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers and return the result."""
    return a * b

# Addition tool
@tool
def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b

# Initialize model
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.5)

# Bind tools
llm_with_tools = llm.bind_tools([multiply, add])

# System prompt template
system_prompt = """
You are an AI assistant with access to external tools.
Your job is to decide when to call a tool, execute it correctly,
and then provide a polished, conversational answer to the user.

Available tools:
- multiply(a: int, b: int): returns the product of two integers
- add(a: int, b: int): returns the sum of two integers

Guidelines:
1. Think step by step about whether a tool is needed.
2. When calling a tool, output a structured tool call in JSON.
3. If no tool is needed, answer directly
4. Do not expose internal reasoning or chain-of-thought.
5. Only output either a tool call or a final user-friendly answer.
"""

# Build prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")
])

def multi_tool_feedback(query: str):
    # First LLM call: decide tool usage
    first_response = llm_with_tools.invoke(query)
    print("First response:", first_response)
    results = []
    if first_response.tool_calls:
        print("Tool calls:", first_response.tool_calls)
        for tool_call in first_response.tool_calls:
            tool_name = tool_call["name"]
            args = tool_call["args"]

            # Execute each tool
            if tool_name == "multiply":
                result = multiply.invoke(args)
            elif tool_name == "add":
                result = add.invoke(args)
            else:
                result = "Unknown tool"

            results.append(result)

        print("Tool results:", results)
        # Second LLM call: feed all results back
        final_answer = llm.invoke(
            f"User asked: {query}. The tools returned: {results}. "
            "Please combine these into a natural language answer."
        )
        return final_answer
    else:
        # No tool calls, just return the model’s text
        return first_response

response = multi_tool_feedback("Please multiply 5 and 3, then add 10 to the result.")
print("Final response:", response)  
print("Final Answer:", response.content)

Output

First response: content='' additional_kwargs={'tool_calls': [{'id': '1e8aj5waj', 'function': {'arguments': '{"a":5,"b":3}', 'name': 'multiply'}, 'type': 'function'}, 
{'id': 'dbpynva42', 'function': {'arguments': '{"a":15,"b":10}', 'name': 'add'}, 'type': 'function'}]} 
response_metadata={'token_usage': {'completion_tokens': 36, 'prompt_tokens': 295, 'total_tokens': 331, 'completion_time': 0.076297873, 'completion_tokens_details': None, 'prompt_time': 0.014930285, 'prompt_tokens_details': None, 'queue_time': 0.157809024, 'total_time': 0.091228158}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_3272ea2d91', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'} id='lc_run--019df1cb-134e-7232-b741-b9e941c64389-0'
tool_calls=[{'name': 'multiply', 'args': {'a': 5, 'b': 3}, 'id': '1e8aj5waj', 'type': 'tool_call'}, {'name': 'add', 'args': {'a': 15, 'b': 10}, 'id': 'dbpynva42', 'type': 'tool_call'}] invalid_tool_calls=[] usage_metadata={'input_tokens': 295, 'output_tokens': 36, 'total_tokens': 331}
Tool calls: [{'name': 'multiply', 'args': {'a': 5, 'b': 3}, 'id': '1e8aj5waj', 'type': 'tool_call'}, {'name': 'add', 'args': {'a': 15, 'b': 10}, 'id': 'dbpynva42', 'type': 'tool_call'}]
Tool results: [15, 25]
Final response: content='To solve the problem, we first multiply 5 and 3, which equals 15. Then, we add 10 to the result, giving us a final answer of 25.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 39, 'prompt_tokens': 73, 'total_tokens': 112, 'completion_time': 0.124071334, 'completion_tokens_details': None, 'prompt_time': 0.006693922, 'prompt_tokens_details': None, 'queue_time': 0.063180878, 'total_time': 0.130765256}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_dae98b5ecb', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'} id='lc_run--019df1cb-1507-7a33-aa60-6a5bd288b607-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 73, 'output_tokens': 39, 'total_tokens': 112}
Final Answer: To solve the problem, we first multiply 5 and 3, which equals 15. Then, we add 10 to the result, giving us a final answer of 25.

Points to Note

  • The system message is made more descriptive to give clarity to the LLM about the tools and their arguments.
  • While executing the tools, their results are appended to a list.
  • That list and the original query is sent again to the LLM to get a final, polished answer.

Multi- tool calling with chain.

Same example can be done in a more professional way by creating a chain and using RunnableLambda to execute functions.

from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_core.runnables import RunnableLambda
from dotenv import load_dotenv

load_dotenv()  # Load environment variables from .env file
# Define tools
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers and return the result."""
    return a * b

@tool
def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b

# Initialize model
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.5)

# Bind tools
llm_with_tools = llm.bind_tools([multiply, add])

# 4. System prompt template
system_prompt = """
You are an AI assistant with access to external tools.
Your job is to decide when to call a tool, execute it correctly,
and then provide a polished, conversational answer to the user.

Available tools:
- multiply(a: int, b: int): returns the product of two integers
- add(a: int, b: int): returns the sum of two integers

Guidelines:
1. Think step by step about whether a tool is needed.
2. When calling a tool, output a structured tool call in JSON.
3. If no tool is needed, answer directly
4. Do not expose internal reasoning or chain-of-thought.
5. Only output either a tool call or a final user-friendly answer.
"""

# Build prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")
])

# Define tool executor
def execute_tools(output):
    print("LLM output after first call:", output)
    results = []
    if output.tool_calls:
        for tc in output.tool_calls:
            if tc["name"] == "add":
                results.append(add.invoke(tc["args"]))
            elif tc["name"] == "multiply":
                results.append(multiply.invoke(tc["args"]))
            
    return {"tool_results": results, "query": output.content}

# Feed results back into LLM
def final_answer(data):
    return llm.invoke(
        f"User asked: {data['query']}. Tools returned: {data['tool_results']}. "
        "Please combine these into a natural language answer."
    )

# Chain
chain = (prompt 
        | llm_with_tools 
        | RunnableLambda(execute_tools) 
        | RunnableLambda(final_answer)
)

response = chain.invoke({"input": "Please add 5 and 3, then multiply the result by 2."})
print("Final response:", response)  
print("Final Answer:", response.content)

Output

LLM output after first call: content='' additional_kwargs={'tool_calls': [{'id': 'vz71y3drr', 'function': {'arguments': '{"a":5,"b":3}', 'name': 'add'}, 'type': 'function'}, {'id': '14hvbadej', 'function': {'arguments': '{"a":8,"b":2}', 'name': 'multiply'}, 'type': 'function'}]} 
response_metadata={'token_usage': {'completion_tokens': 36, 'prompt_tokens': 441, 'total_tokens': 477, 'completion_time': 0.092313015, 'completion_tokens_details': None, 'prompt_time': 0.022607, 'prompt_tokens_details': None, 'queue_time': 0.04985095, 'total_time': 0.114920015}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_dae98b5ecb', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'} id='lc_run--019df260-48af-7080-9360-37fc5caad39c-0'
tool_calls=[{'name': 'add', 'args': {'a': 5, 'b': 3}, 'id': 'vz71y3drr', 'type': 'tool_call'}, {'name': 'multiply', 'args': {'a': 8, 'b': 2}, 'id': '14hvbadej', 'type': 'tool_call'}] invalid_tool_calls=[] usage_metadata={'input_tokens': 441, 'output_tokens': 36, 'total_tokens': 477}
Final response: content='The tools returned two values: 8 and 16.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 57, 'total_tokens': 70, 'completion_time': 0.034086635, 'completion_tokens_details': None, 'prompt_time': 0.003220668, 'prompt_tokens_details': None, 'queue_time': 0.161129671, 'total_time': 0.037307303}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_3272ea2d91', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'} id='lc_run--019df260-4b3a-73d2-b4ce-79a0fae5297a-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 57, 'output_tokens': 13, 'total_tokens': 70}
Final Answer: The tools returned two values: 8 and 16.

InjectedToolArg in Langchain tools

When building tools in LangChain, not every argument should come from the LLM. Some values, like session IDs, user tokens, timestamps, or database handles, are sensitive or system specific and must be injected at runtime. That’s exactly what InjectedToolArg in LangChain is designed for.

InjectedToolArg is a LangChain Core annotation that marks tool arguments as injected at runtime rather than generated by the LLM.

Arguments marked with InjectedToolArg are excluded from the schema sent to the model. The LLM never sees or generates them. That prevents the LLM from guessing or hallucinating values that should come from the environment (e.g., conversation IDs, request metadata, API keys).

Arguments marked with InjectedToolArg are provided by the developer, agent executor, or system context during tool execution.

InjectedToolArg Langchain example

Here is a simple example where Customer ID argument is marked as a InjectedToolArg, which means LLM won’t see it and even won’t try to guess it on its own. It has to be injected at runtime.

from typing import Annotated
from langchain_core.tools import tool, InjectedToolArg
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

# Define a tool with one normal arg (LLM-supplied) and one 
# injected arg (runtime-supplied)
@tool
def escalate_to_human(
    reason: str,
    customer_id: Annotated[int, InjectedToolArg]
) -> str:
    """Escalate the conversation to a human operator."""
    return f"Escalating: {reason} (Customer ID: {customer_id})"

# Create the LLM
llm = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview", temperature=0.2)

# Bind the tool to the LLM
llm_with_tools = llm.bind_tools([escalate_to_human])

query = "The customer is very upset, escalate this."

# LLM processes the query and emits a tool call
response = llm_with_tools.invoke(query)

print("LLM raw response:", response)

if response.tool_calls:
    for tc in response.tool_calls:
        # Inject runtime context (customer_id from your system/session)
        # hardcodinhg customer_id=12345 for demo purposes
        runtime_context = {"customer_id": 12345}
        merged_args = {**tc["args"], **runtime_context}
        result = escalate_to_human.invoke(merged_args)
        print("Tool execution result:", result)
        # Feed the tool result back into the LLM to get final response
        response = llm.invoke(
            f"""
            Greet the customer using Customer ID (get from context) and inquire about the issue. 
            Context: {result}
            """
        )

print("Final LLM Response:", response)

print("Final LLM answer:", response.content[0].get("text"))

Output

LLM raw response: content=[] additional_kwargs={'function_call': {'name': 'escalate_to_human', 'arguments': '{"reason": "The customer is very upset and has requested an escalation."}'}, '__gemini_function_call_thought_signatures__': {'312283e6-b7ba-4149-97dd-1c93e5e97d8a': 'EjQKMgEMOdbHi4YjgRE760fr4+mtE0xB7dY5NZ3b3Dw6aZy70QE9eyXU9eDGgGKvMN8yu5z2'}} response_metadata={'finish_reason': 'STOP', 'model_name': 'gemini-3.1-flash-lite-preview', 'safety_ratings': [], 'model_provider': 'google_genai'} id='lc_run--019df2d2-81f9-75f3-9a97-f44e2ab3e660-0' 
tool_calls=[{'name': 'escalate_to_human', 'args': {'reason': 'The customer is very upset and has requested an escalation.'}, 'id': '312283e6-b7ba-4149-97dd-1c93e5e97d8a', 'type': 'tool_call'}] invalid_tool_calls=[] usage_metadata={'input_tokens': 59, 'output_tokens': 29, 'total_tokens': 88, 'input_token_details': {'cache_read': 0}}
Tool execution result: Escalating: The customer is very upset and has requested an escalation. (Customer ID: 12345)
Final LLM Response: content=[{'type': 'text', 'text': 'Hello, Customer 12345. I understand that you are very upset and have requested an escalation. I am here to assist you—could you please tell me more about the issue you are experiencing so I can address your concerns immediately?', 'extras': {'signature': 'EjQKMgEMOdbHAXp2AIc7nJffkQs5eevUxSKQmBMqsO/cAn089Zu+2eOdbvLIF6uunch0lbwA'}}] additional_kwargs={} response_metadata={'finish_reason': 'STOP', 'model_name': 'gemini-3.1-flash-lite-preview', 'safety_ratings': [], 'model_provider': 'google_genai'} id='lc_run--019df2d2-870d-71d3-9b4b-56ad10346d96-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 52, 'output_tokens': 50, 'total_tokens': 102, 'input_token_details': {'cache_read': 0}}
Final LLM answer: Hello, Customer 12345. I understand that you are very upset and have requested an escalation. I am here to assist you—could you please tell me more about the issue you are experiencing so I can address your concerns immediately?

That’s all for this topic Tool Calling in LangChain. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. Citation Aware RAG Application in LangChain
  2. LangChain Conversational RAG with Multi-user Sessions
  3. RunnableBranch in LangChain With Examples
  4. Structured Output In LangChain
  5. Messages in LangChain

You may also like-

  1. Chatbot With Chat History - LangChain MessagesPlaceHolder
  2. Document Loaders in LangChain With Examples
  3. Array in Java With Examples
  4. Java CyclicBarrier With Examples
  5. Python assert Statement
  6. Multiple Inheritance in Python
  7. Spring Boot REST API Documentation - OpenAPI, Swagger
  8. Bean Scopes in Spring With Examples

No comments:

Post a Comment