Wednesday, April 15, 2026

Pre-defined Functional Interfaces in Java

In our earlier post on Functional Interfaces in Java we saw how you can create custom functional interfaces and annotate them with the @FunctionalInterface Annotation. However, you don’t always need to define your own functional interface for every scenario. Java has introduced a new package java.util.function that defines many general purpose pre-defined functional interfaces.

These built-in interfaces are widely used across the JDK, including the Collections framework, Java Stream API and in user defined code as well.

In this guide, we’ll dive into these built-in functional interfaces in Java so you have a good idea which functional interface to use in which context while using with Lambda expressions in Java.


Pre-defined functional interfaces categorization

Functional interfaces defined in java.util.function package can be categorized into five types-

  1. Consumer- Consumes the passed argument and no value is returned.
  2. Supplier- Takes no argument and supplies a result.
  3. Function- Takes argument and returns a result.
  4. Predicate- Evaluates a condition on the passed argument and returns a boolean result (true or false).
  5. Operators- A specialized form of Function where both input and output are of the same type.

Consumer functional interface

Consumer<T> represents a function that accepts a single input argument and returns no result. Consumer functional interface definition is as given below consisting of an abstract method accept() and a default method andThen().

@FunctionalInterface
public interface Consumer<T> {
  void accept(T t);
  default Consumer<T> andThen(Consumer<? super T> after) {
    Objects.requireNonNull(after);
    return (T t) -> { accept(t); after.accept(t); };
  }
}

Following pre-defined Consumer functional interfaces are categorized as Consumer as all of these interfaces have the same behavior of consuming the passed value(s) and returning no result. You can use any of these based on number of arguments or data type.

  • BiConsumer<T,U>- Represents an operation that accepts two input arguments and returns no result.
  • DoubleConsumer- Represents an operation that accepts a single double-valued argument and returns no result.
  • IntConsumer- Represents an operation that accepts a single int-valued argument and returns no result.
  • LongConsumer- Represents an operation that accepts a single long-valued argument and returns no result.
  • ObjDoubleConsumer<T>- Represents an operation that accepts an object-valued and a double-valued argument, and returns no result.
  • ObjIntConsumer<T>- Represents an operation that accepts an object-valued and a int-valued argument, and returns no result.
  • ObjLongConsumer<T>- Represents an operation that accepts an object-valued and a long-valued argument, and returns no result.

Consumer functional interface Java example

In the example elements of List are displayed by using an implementation of Consumer functional interface.

import java.util.Arrays;
import java.util.List;
import java.util.function.Consumer;

public class ConsumerExample {
  public static void main(String[] args) {
    Consumer<String> consumer = s -> System.out.println(s);
    List<String> alphaList = Arrays.asList("A", "B", "C", "D");
    for(String str : alphaList) {
      // functional interface accept() method called
      consumer.accept(str);
    }
  }
}

Output

A
B
C
D

Supplier functional interface

Supplier<T> represents a function that doesn't take argument and supplies a result. Supplier functional interface definition is as given below consisting of an abstract method get()-

@FunctionalInterface
public interface Supplier<T> {
  T get();
}

Following pre-defined Supplier functional interfaces are categorized as Supplier as all of these interfaces have the same behavior of supplying a result.

  • BooleanSupplier- Represents a supplier of boolean-valued results.
  • DoubleSupplier- Represents a supplier of double-valued results.
  • IntSupplier- Represents a supplier of int-valued results.
  • LongSupplier- Represents a supplier of long-valued results.

Supplier functional interface Java example

In the example Supplier functional interface is implemented as a lambda expression to supply current date and time.

import java.time.LocalDateTime;
import java.util.function.Supplier;

public class SupplierExample {
  public static void main(String[] args) {
    Supplier<LocalDateTime> currDateTime = () -> LocalDateTime.now();
    System.out.println(currDateTime.get());
  }
}

Function functional interface

Function<T,R> represents a function that accepts one argument and produces a result. Function functional interface definition is as given below consisting of an abstract method apply(), two default methods compose(), andThen() and a static method identity().

@FunctionalInterface
public interface Function<T, R> {

  R apply(T t);

  default <V> Function<V, R> compose(Function<? super V, ? extends T> before) {
    Objects.requireNonNull(before);
    return (V v) -> apply(before.apply(v));
  }

  default <V> Function<T, V> andThen(Function<? super R, ? extends V> after) {
    Objects.requireNonNull(after);
    return (T t) -> after.apply(apply(t));
  }
  static <T> Function<T, T> identity() {
    return t -> t;
  }
}

Following pre-defined Function functional interfaces are categorized as Function as all of these interfaces have the same behavior of accepting argument(s) and producing a result.

  • BiFunction<T,U,R>- Represents a function that accepts two arguments and produces a result.
  • DoubleFunction<R>- Represents a function that accepts a double-valued argument and produces a result.
  • DoubleToIntFunction- Represents a function that accepts a double-valued argument and produces an int-valued result.
  • DoubleToLongFunction- Represents a function that accepts a double-valued argument and produces a long-valued result.
  • IntFunction<R>- Represents a function that accepts an int-valued argument and produces a result.
  • IntToDoubleFunction- Represents a function that accepts an int-valued argument and produces a double-valued result.
  • IntToLongFunction- Represents a function that accepts an int-valued argument and produces a long-valued result.
  • LongFunction<R>- Represents a function that accepts a long-valued argument and produces a result.
  • LongToDoubleFunction- Represents a function that accepts a long-valued argument and produces a double-valued result.
  • LongToIntFunction- Represents a function that accepts a long-valued argument and produces an int-valued result.
  • ToDoubleBiFunction<T,U>- Represents a function that accepts two arguments and produces a double-valued result.
  • ToDoubleFunction<T>- Represents a function that produces a double-valued result.
  • ToIntBiFunction<T,U>- Represents a function that accepts two arguments and produces an int-valued result.
  • ToIntFunction<T>- Represents a function that produces an int-valued result.
  • ToLongBiFunction<T,U>- Represents a function that accepts two arguments and produces a long-valued result.
  • ToLongFunction<T>- Represents a function that produces a long-valued result.

Function functional interface Java example

In the example a Function interface is implemented to return the length of the passed String.

import java.util.function.Function;

public class FunctionExample {
  public static void main(String[] args) {
    Function<String, Integer> function = (s) -> s.length();
    System.out.println("Length of String- " + function.apply("Interface"));
  }
}

Output

Length of String- 9

Predicate functional interface

Predicate<T> represents a function that accepts one argument and produces a boolean result. Abstract method in the Predicate functional interface is boolean test(T t).

Following pre-defined Predicate functional interfaces are categorized as Predicate as all of these interfaces have the same behavior of accepting argument(s) and producing a boolean result.

  • BiPredicate<T,U>- Represents a predicate (boolean-valued function) of two arguments.
  • DoublePredicate- Represents a predicate (boolean-valued function) of one double-valued argument.
  • IntPredicate- Represents a predicate (boolean-valued function) of one int-valued argument.
  • LongPredicate- Represents a predicate (boolean-valued function) of one long-valued argument.

Predicate functional interface Java Example

In the example a number is passed and true is returned if number is even otherwise odd is retuned.

import java.util.function.Predicate;

public class PredicateExample {
  public static void main(String[] args) {
    Predicate<Integer> predicate = (n) -> n%2 == 0;
    boolean val = predicate.test(6);
    System.out.println("Is Even- " + val);    
    System.out.println("Is Even- " + predicate.test(11));
  }
}

Output

Is Even- true
Is Even- false

Operator functional interfaces

Operator functional interfaces are specialized Function interfaces that always return the value of same type as the passed arguments. Operator functional interfaces extend their Function interface counterpart like UnaryOperator extends Function and BinaryOperator extends BiFunction.

Following pre-defined Operator functional interfaces are there that can be used in place of Function interfaces if returned value is same as the type of the passed argument(s).

  • BinaryOperator<T>- Represents an operation upon two operands of the same type, producing a result of the same type as the operands.
  • DoubleBinaryOperator- Represents an operation upon two double-valued operands and producing a double-valued result.
  • DoubleUnaryOperator- Represents an operation on a single double-valued operand that produces a double-valued result.
  • IntBinaryOperator- Represents an operation upon two int-valued operands and producing an int-valued result.
  • IntUnaryOperator- Represents an operation on a single int-valued operand that produces an int-valued result.
  • LongBinaryOperator- Represents an operation upon two long-valued operands and producing a long-valued result.
  • LongUnaryOperator- Represents an operation on a single long-valued operand that produces a long-valued result.
  • UnaryOperator<T>- Represents an operation on a single operand that produces a result of the same type as its operand.

UnaryOperator functional interface Java example

In the example UnaryOperator is implemented to return the square of the passed integer.

import java.util.function.UnaryOperator;

public class UnaryOperatorExample {
  public static void main(String[] args) {
    UnaryOperator<Integer> unaryOperator = (n) -> n*n;
    System.out.println("4 squared is- " + unaryOperator.apply(4));
    System.out.println("7 squared is- " + unaryOperator.apply(7));
  }
}

Output

4 squared is- 16
7 squared is- 49

That's all for this topic Pre-defined Functional Interfaces in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. Exception Handling in Java Lambda Expressions
  2. Method Reference in Java
  3. How to Fix The Target Type of This Expression Must be a Functional Interface Error
  4. Java Stream API Tutorial
  5. Java Lambda Expressions Interview Questions And Answers

You may also like-

  1. Java Stream flatMap() Method
  2. Java Lambda Expression Callable Example
  3. Invoke Method at Runtime Using Java Reflection API
  4. LinkedHashMap in Java With Examples
  5. java.lang.ClassCastException - Resolving ClassCastException in Java
  6. Java String Search Using indexOf(), lastIndexOf() And contains() Methods
  7. BeanFactoryAware Interface in Spring Framework
  8. Angular Two-Way Data Binding With Examples

Embeddings in LangChain With Examples

We have already gone through two of the building blocks of creating a RAG pipeline, document loaders and text splitters in LangChain. In this article, we’ll explore how LangChain embeddings transform raw text into meaningful vectors that truly capture its semantic essence.

Embeddings in LangChain

In LangChain, embeddings are numerical representations of text that capture the inherent semantic meaning. This enables machines to perform semantic search, where comparisons are driven by meaning and concepts rather than mere keyword matches.

For creating such embeddings, embedding models (like OpenAIEmbeddings, GoogleGenerativeAIEmbeddings, OllamaEmbeddings) are used which transform raw text, such as a sentence, paragraph, or tweet, into a fixed-length vector of numbers that captures its semantic meaning.

What is semantic meaning

Now, the first question is what exactly is this "semantic meaning"? Consider the following four sentences.

  • I am running to the market.
  • I am heading to the market in a hurry.
  • I am on my way to the market.
  • I am rushing off to the market.

If you notice all of the four sentences convey the same meaning- sense of motion and urgency. So, in terms of embeddings, each version would produce embeddings that sit close together in semantic space, since they all express the same core intent: you’re moving toward the market.

This closeness is exactly what makes semantic search powerful, queries with slightly different wording but similar meaning will retrieve the same or related results.

How does embedding model work

Let’s break down how an embedding model transforms raw text into vectors that capture its meaning. If we take the simple raw text "I am running to the market" as example.

  1. Text input
  2. You start with the raw text: "I am running to the market".

  3. Tokenization
  4. The text is split into smaller units (tokens). Depending on the embedding model, these could be word by word (I, am, running...) or subwords ("run", "ning").

    For example, produced tokens may look like this- ["I", "am", "run", "ning", "to", "the", "market"]

  5. Mapping Tokens To IDs
  6. Each token is mapped to a unique integer ID using the model’s pre-defined vocabulary.

    For example, I - 101, am - 202, run - 305, ning - 402, etc.

    This id acts as an index in an embedding matrix.

  7. Embedding Lookup
  8. Each token ID is mapped to a dense vector from the model’s embedding matrix. These vectors are usually high-dimensional (e.g., 768 or 1536 dimensions).

    So, each word will get its own full dimension vector. For our example, tokens we'll have vectors for Vi, Vam, Vrun and so on.

    These vectors already exist in the models; magic is in how these vectors are trained. During training-

    • Words that appear in similar contexts get vectors that are close together.
    • Relationships between words are encoded as vector arithmetic.

    Here is a simple program to show the embedding using GoogleGenerativeAIEmbeddings

    from langchain_google_genai import GoogleGenerativeAIEmbeddings
    
    from dotenv import load_dotenv
    
    load_dotenv()
    
    embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-2-preview")
    query = "I am running to the market"
    vector = embeddings.embed_query(query)
    
    # vector dimensions
    print(len(vector)) 
    # first 10 values
    print(vector[:10])   
    

    Output

    3072
    [0.013388703, -0.0026265276, -0.0013064864, 0.013196219, -0.0071006925, 0.0008229259, -0.009015757, 0.00064084254, 0.005457073, -0.0643481]
    

    Note that models don't return separate vectors for each word. The model processes the entire sentence and produces one unified vector that represents the meaning of the whole sentence. That is the Pooling / Final Representation step in the embedding model that combines token-level embeddings into a single sentence-level embedding.

    Embedding models give you a ready-to-use representation of the entire query because embedding API is designed for semantic search and comparison at the sentence or document level.

  9. How does semantic meaning emerge
  10. Words that appear in similar contexts get vectors that are close together. If we take the often used example of "king", "queen", "man", "woman".

    \[v_{\text{king}} - v_{\text{man}} + v_{\text{woman}} \approx v_{\text{queen}}\]

    The model has already learnt the concept of royalty which is already encoded into the vector of king. Man is, well just a common man!

    When we do \(v_{\text{king}} - v_{\text{man}}\), difference is the concept of royalty.

    When vector of woman is added to it, the result lands near the vector for "queen". Concept of royalty is already encoded into the vector of queen.

    Similar analogies hold for geography (Paris – France + Italy \(\approx\) Rome) or verb tense (walk – walking + running \(\approx\) run). It shows embeddings capture a wide range of semantic relationships.

    Here’s a simple geometric visualization of how embeddings capture meaning with king, queen, man, woman. Imagine a 2D plane where one axis represents gender and the other represents royalty.

    how embeddings capture meaning
  11. Semantic proximity
  12. If two sentences share nearly identical structure and meaning, for example

    • I am running to the market
    • I am walking to the market

    As you can see both sentences share nearly identical structure and meaning:

    • Subject: “I”
    • Verb: movement toward a destination
    • Object: “the market”

    Since both verbs describe locomotion, their embeddings are near each other in the model’s learned space. That is the concept of Semantic proximity.

    The distance between these two vectors (often measured by cosine similarity) would be very small (provided they are embedded using the same model).

    cosine similarity

Metrics for comparing embeddings

Several metrics are commonly used to compare embeddings:

  1. Cosine similarity- Measures the angle between two vectors.
  2. Euclidean distance- Measures the straight-line distance between points.
  3. Dot product- Measures how much one vector projects onto another.

We can check this programmatically in LangChain using numpy to calculate cosine similarity.

from langchain_ollama import OllamaEmbeddings
import numpy as np

embeddings = OllamaEmbeddings(model="nomic-embed-text")

v1 = np.array(embeddings.embed_query("I am running to the market"))
v2 = np.array(embeddings.embed_query("I am walking to the market"))

# cosine similarity using numpy
cos_sim = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

print("Similarity is", cos_sim)

Output

Similarity is 0.82145

LangChain Embedding Interface

LangChain provides a standard interface for text embedding models (like OpenAIEmbeddings, GoogleGenerativeAIEmbeddings, OllamaEmbeddings) through the Embeddings interface.

Two main methods are:

  1. embed_documents(texts: List[str]): Embeds a list of documents. Returns a List[List[float]]
  2. embed_query(text: str): Embeds a single query. Returns a List[float]

What is the next step?

You can now store embeddings, which are high-dimensional numerical representations of data, in a vector database (like Pinecone, FAISS, Weaviate, ChromaDB) for semantic search or similarity matching.

That's all for this topic Embeddings in LangChain With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. RunnableBranch in LangChain With Examples
  2. Chatbot With Chat History - LangChain MessagesPlaceHolder
  3. Structured Output In LangChain
  4. Messages in LangChain
  5. Chain Using LangChain Expression Language With Examples

You may also like-

  1. Java Program to Display Prime Numbers
  2. Java Map computeIfAbsent() With Examples
  3. Remove Duplicate Elements From an Array in Java
  4. Difference Between Two Dates in Java
  5. TreeSet in Java With Examples
  6. Java CyclicBarrier With Examples
  7. Java Variable Types With Examples
  8. Circular Dependency in Spring Framework

Monday, April 13, 2026

Armstrong Number or Not Java Program

Checking whether a number is an Armstrong number in Java program is a classic fresher‑level interview question that tests both logical thinking and coding skills. An Armstrong number is a number that is equal to the sum of its digits each raised to the power of the total number of digits.

For Example-

  • 371 is an Armstrong number because it has 3 digits, and
    371 = 33 + 73 + 13 = 27 + 343 + 1 = 371
  • 9474 is also an Armstrong number since it has 4 digits, and
    9474 = 94 + 44 + 74 + 44 = 6561 + 256 + 2401 + 256 = 9474
  • By definition, 0 and 1 are considered Armstrong numbers too.

Check given number Armstrong number or not

So let's write a Java program to check whether a given number is an Armstrong number or not. We'll break down how the logic works step by step later.

import java.util.Scanner;

public class ArmstrongNumber {
  public static void main(String[] args) {
    System.out.println("Please enter a number : ");
    Scanner scanIn = new Scanner(System.in);
    int scanInput = scanIn.nextInt();
    boolean isArmstrong = checkForArmstrongNo(scanInput);
    if(isArmstrong){
     System.out.println(scanInput + "  is an Armstrong number");
    }else{
     System.out.println(scanInput + " is not an Armstrong number"); 
    }
    scanIn.close();
  }
 
  private static boolean checkForArmstrongNo(int number){
    // convert number to String
    String temp = number + "";
    int numLength = temp.length();
    int numCopy = number;
    int sum = 0;
    while(numCopy != 0 ){
      int remainder = numCopy % 10;
      // using Math.pow to get digit raised to the power
      // total number of digits
      sum = sum + (int)Math.pow(remainder, numLength);
      numCopy = numCopy/10;
    }
    System.out.println("sum is " + sum );
    return (sum == number) ? true : false;
  }
}

Some outputs-

Please enter a number : 
125
sum is 134
125 is not an Armstrong number

Please enter a number : 
371
sum is 371
371  is an Armstrong number

Please enter a number : 
54748
sum is 54748
54748  is an Armstrong number

Armstrong number Java program explanation

In an Armstrong number Java program, the input number is first taken from the user. To determine the number of digits, the simplest approach is to convert the number into a string and use its length. This gives us the power to which each digit must be raised.

The logic works as follows:

  1. Extract digits one by one
    • Start from the last digit using the modulus operator (num % 10).
    • Raise this digit to the power of the total number of digits.
  2. Accumulate the sum
    • Add the powered value to a running total.
    • Reduce the number by one digit using integer division (num / 10).
  3. Repeat until all digits are processed
    • Continue the loop until the number becomes zero.
  4. Compare with the original number
    • If the accumulated sum equals the original number, it is an Armstrong number.
    • Otherwise, it is not.

That's all for this topic Armstrong Number or Not Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page


Related Topics

  1. Check if Given String or Number is a Palindrome Java Program
  2. How to Display Pyramid Patterns in Java
  3. Java Program to Display Prime Numbers
  4. Factorial program in Java
  5. Write to a File in Java

You may also like-

  1. Find Duplicate Elements in an Array Java Program
  2. Difference Between Two Dates in Java
  3. How to Create Password Protected Zip File in Java
  4. Spring Component Scan Example
  5. Java Collections Interview Questions And Answers
  6. Java Abstract Class and Abstract Method
  7. Switch Case Statement in Java With Examples
  8. Java SynchronousQueue With Examples

Text Splitters in LangChain With Examples

When you are creating a Retrieval-Augmented Generation (RAG) pipeline first step is to load the data and split it. In the post Document Loaders in LangChain With Examples we saw different types of document loaders provided by LangChain. In this article we’ll see different text splitters provided by LangChain to break the loaded documents into smaller, manageable chunks.

Why do we need Text Splitters

The documents you load using document loaders may be very large in size and it is quite impractical to send the content of the whole document to the LLM to get relevant answers. Text splitters in LangChain help in breaking large documents into smaller, manageable chunks that models can process efficiently without losing context. They help overcome context window limits, improve retrieval accuracy, and enable better indexing and semantic understanding. Here are some of the benefits of splitting the documents.

  • Context window limit- LLMs have a maximum token limit. If you feeding an entire book or long document that will exceed this limit. By splitting documents into smaller, semantically coherent chunks, you can select only the relevant chunks to send to the LLM instead of the entire document.
  • Token Efficiency- If you send the entire document (without any splitting), the LLM has to process every token, even irrelevant ones. That inflates cost and slows response time. With splitting + retrieval, only the relevant chunks are injected into the prompt. This means fewer tokens are consumed, lowering the overall cost.
  • Efficient Retrieval in RAG Pipelines- One of the steps in creating a RAG pipeline is to store the loaded documents in vector databases. By splitting documents into smaller chunks and storing those chunks (not the whole document as is) improves search granularity and ensures the right passage is retrieved from the vector DB.
  • Maintaining Semantic Coherence- There are TextSplitter classes in LangChain that don’t just cut text arbitrarily, they try to preserve contextual meaning. For example, splitting by paragraphs or semantic boundaries avoids breaking sentences mid-thought.

    Splitting at natural boundaries (sentences, paragraphs, sections) keeps ideas intact. That ultimately helps LLM to interpret the context correctly without guessing missing parts. This reduces hallucinations and increases factual accuracy.

Text splitters in LangChain

LangChain offers a variety of text splitters, each designed to serve different functionalities.

1. CharacterTextSplitter

One of the simplest text-splitting utilities in LangChain. It divides text using a specified character sequence (default: "\n\n" meaning paragraph), with chunk length measured by the number of characters.

Text is split using a given character separator (which is paragraph by default). Instead of cutting arbitrarily at the exact character count, the splitter looks for the nearest separator before the limit. This ensures chunks end at natural boundaries (paragraphs, sentences, etc.), preserving meaning. The chunk size is the maximum number of characters allowed in each chunk. For example, if chunk_size=1000, each chunk will contain up to 1000 characters. The splitter tries to fill the chunk up to this limit, but will break at the nearest separator to avoid cutting mid-paragraph or mid-sentence.

CharacterTextSplitter is best for documents with a consistent and predictable structure, such as logs or lists where a single separator (like a newline) clearly defines boundaries.

text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)

Parameters-

  • separator: Used to identify split points. The default is "\n\n" (double newline), which aims to preserve paragraph integrity.
  • chunk_size: The maximum number of characters allowed in a single chunk.
  • chunk_overlap: The number of characters that consecutive chunks should share. This helps maintain semantic context across splits.
  • length_function: A function used to calculate the length of the chunks, defaulting to the standard Python len().

Methods that you can use-

  • .split_text- when you just have raw strings (plain text), it returns plain string chunks.
  • .split_documents- when you already have your text wrapped inside LangChain Document objects. If you have used Document loader in LangChain to load document you will have them as Document objects. In that case, you use split_document to break them into smaller Document chunks while preserving metadata.

LangChain CharacterTextSplitter Example

In the code, space (" ") is used as the separator not the default.

from langchain_text_splitters import CharacterTextSplitter

# Sample text to split
text = """
Generative AI is a type of artificial intelligence that creates new, original content—such as text, images, video, audio, or code—by learning patterns from existing data. Unlike traditional AI that classifies or analyzes data, GenAI uses deep learning models to generate novel outputs that resemble the training data.
 
Key Aspects of Generative AI:

How it Works: These models (e.g., GANs, Transformers) are trained on massive datasets to understand underlying structures and probabilities. When prompted, they predict and generate new, human-like content.
"""

# Create a CharacterTextSplitter instance
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20, separator=" ")

# Split the text into chunks
chunks = text_splitter.split_text(text)

print(f"Total chunks created: {len(chunks)}\n")

# Print the resulting chunks    
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")

Output

Total chunks created: 7

Chunk 1:
Generative AI is a type of artificial intelligence that creates new, original content—such as text,

Chunk 2:
as text, images, video, audio, or code—by learning patterns from existing data. Unlike traditional

Chunk 3:
Unlike traditional AI that classifies or analyzes data, GenAI uses deep learning models to generate

Chunk 4:
models to generate novel outputs that resemble the training data.

Key Aspects of Generative

Chunk 5:
of Generative AI:

How it Works: These models (e.g., GANs, Transformers) are trained on massive

Chunk 6:
trained on massive datasets to understand underlying structures and probabilities. When prompted,

Chunk 7:
When prompted, they predict and generate new, human-like content.

2. RecursiveCharacterTextSplitter

The RecursiveCharacterTextSplitter is the recommended default text splitter for generic text in LangChain. It splits documents by recursively checking a list of characters until the resulting chunks are within a specified size limit. The default list of separator is ["\n\n", "\n", " ", ""]

  • "\n\n": double newline (paragraphs)
  • "\n": single newline (lines)
  • " ": space (words)
  • "": empty string (individual characters)

How RecursiveCharacterTextSplitter Works

Instead of using a single separator, it uses a hierarchical list to preserve semantic context (paragraphs -> lines -> words -> characters):

  • It first attempts to split the text by the first character in its list (default is double newline \n\n for paragraphs).
  • Recursive Fallback: If any resulting chunk still exceeds the chunk_size, it moves to the next separator (e.g., single newline \n) and tries again only on that chunk.
  • Continue in the hierarchy: It repeats this process through the list (e.g., spaces then finally individual characters "") until the size requirement is met.

LangChain RecursiveCharacterTextSplitter Example

In this example first a PDF document is loaded using PyPDFLoader, then RecursiveCharacterTextSplitter is used to split it. Code assumes that the PDF document is inside the resources folder which resides in project root.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os

def get_file_path(file_name):
    # Current script directory
    script_dir = os.path.dirname(os.path.abspath(__file__))

    # Project root is one level above
    project_root = os.path.dirname(script_dir)

    #print(f"Project root directory: {project_root}")
    file_path = os.path.join(project_root, "resources", file_name)
    return file_path

def load_documents(file_name):
    file_path = get_file_path(file_name)
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    print(f"Number of Documents: {len(documents)}")
    return documents

def split_documents(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
    )

    chunks = text_splitter.split_documents(documents)
    print(f"length of chunks {len(chunks)}")
    for i, chunk in enumerate(chunks[:3]):  # first 3 chunks
        # Chunk Lengths
        print(f"Chunk {i+1} length: {len(chunk.page_content)}")
        # Chunk Content
        #print(f"Chunk {i+1}:\n{chunk.page_content}...\n") 
        # Chunk Metadata
        #print(f"Chunk {i+1} metadata: {chunk.metadata}")

if __name__ == "__main__":
    documents = load_documents("Health Insurance Policy Clause.pdf")
    split_documents(documents)

Output

Output
Number of Documents: 41
length of chunks 139
Chunk 1 length: 914
Chunk 2 length: 913
Chunk 3 length: 983

3. Code Text Splitter

Though LangChain provides specific code text splitter classes like PythonCodeTextSplitter for Python but the recommended approach is to use RecursiveCharacterTextSplitter.from_language() method. Supported languages are stored in the langchain_text_splitters.Language enum. You need to pass a value from the enum into RecursiveCharacterTextSplitter.from_language() method to instantiate a splitter that is tailored for a specific language. Here’s an example using the PythonTextSplitter:

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_text_splitters import Language

PYTHON_CODE = """
def hello_world():
    print("Hello, World!")

# Call the function
hello_world()
"""

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=50, chunk_overlap=0
)
python_docs = python_splitter.create_documents([PYTHON_CODE])
print(python_docs)

Output

[Document(metadata={}, page_content='def hello_world():\n    print("Hello, World!")'), Document(metadata={}, page_content='# Call the function\nhello_world()')]

Note that in the above example, create_documents() method is used. This method does both tasks, raw text to Document objects and splitting the documents in one go.

4. TokenTextSplitter

TokenTextSplitter class in LangChain is used to divide text into smaller chunks based on a specific number of tokens rather than characters.

LLMs have strict token-based context window limit this class ensures chunks don’t exceed the model’s max token limit.

How TokenTextSplitter Works

Raw text to tokens

The splitter first converts your text into tokens using the model’s tokenizer (e.g., GPT-3.5, GPT-4, or embedding models).

Chunking by token count

You specify chunk_size and chunk_overlap in terms of tokens. The splitter groups tokens into chunks of the given size, with overlap applied at the token level.

Convert tokens back

Each chunk of tokens is decoded back into a string. The result is a list of text chunks that align with token boundaries. By tokenizing first, the splitter ensures each chunk is within the desired token budget.

from langchain_text_splitters import TokenTextSplitter

text = """
Generative AI is a type of artificial intelligence that creates new, original content—such as text, images, video, audio, or code—by learning patterns from existing data. Unlike traditional AI that classifies or analyzes data, GenAI uses deep learning models to generate novel outputs that resemble the training data.
 
Key Aspects of Generative AI:

How it Works: These models (e.g., GANs, Transformers) are trained on massive datasets to understand underlying structures and probabilities. When prompted, they predict and generate new, human-like content.
"""

#cl100k_base is a tokenizer encoding provided by OpenAI’s tiktoken library.
text_splitter = TokenTextSplitter(
    encoding_name="cl100k_base",
    chunk_size=100,
    chunk_overlap=20
)

chunks = text_splitter.split_text(text)

print(f"total chunks {len(chunks)}")

for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")

Apart from these classes LangChain has some specialized classes for splitting specific documents.

  1. Splitting JSON- RecursiveJsonSplitter splits json data while allowing control over chunk sizes.
  2. Splitting Markdown- MarkdownTextSplitter attempts to split the text along Markdown-formatted headings.
  3. Splitting HTML- LangChain provides three different text splitters that you can use to split HTML content effectively:
    • HTMLHeaderTextSplitter- Splits HTML text based on header tags (e.g., <h1>, <h2>, <h3>, etc.), and adds metadata for each header relevant to any given chunk.
    • HTMLSectionSplitter- Splitting HTML into sections based on specified tags.
    • HTMLSemanticPreservingSplitter- Splits HTML content into manageable chunks while preserving the semantic structure of important elements like tables, lists, and other HTML components.

That's all for this topic Text Splitters in LangChain With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. Structured Output In LangChain
  2. Output Parsers in LangChain With Examples
  3. Chatbot With Chat History - LangChain MessagesPlaceHolder
  4. Chain Using LangChain Expression Language With Examples
  5. RunablePassthrough in LangChain With Examples

You may also like-

  1. Prompt Templates in LangChain With Examples
  2. LangChain PromptTemplate + Streamlit - Code Generator Example
  3. Python String isdigit() Method
  4. Python Exception Handling - try,except,finally
  5. How to Sort ArrayList in Java
  6. Difference Between Abstract Class And Interface in Java
  7. Matrix Addition Java Program
  8. Spring MVC Exception Handling - @ExceptionHandler And @ControllerAdvice Example

Friday, April 10, 2026

Document Loaders in LangChain With Examples

In this article, we’ll explore LangChain Document Loaders and how they fit into the Retrieval-Augmented Generation (RAG) pipeline. LangChain provides specific modules for each of the four core RAG steps

  • Data ingestion (Load & split): Use document loaders(e.g. PyPDFLoader, WebBaseLoader) to import your data and text splitters to break it into smaller, manageable chunks.
  • Indexing (Embed & store): LangChain provides wrappers for embedding models (like OpenAi or Hugging Face) and vector stores (like FAISS, Chroma, or Pinecone) to store your data as searchable vectors.
  • Retrieval: The retriever component identifies and fetches the most relevant document chunks based on a user’s query.
  • Generation: Chains or LCEL (LangChain Expression Language) combine the retrieved context with the user’s prompt and send it to an LLM to generate the final response.

LangChain Document loaders

As you can see first step is to load the documents. Document loaders provide a standard interface for reading data from different sources or different file formats. These sources can be Slack, Google drive, Confluence, Github etc. You have classes to load data from text files, PDFs, Word documents, CSV Files, Web Pages etc.

The documents are loaded in the form of Document objects that can then be used by other components like text Splitters, embeddings, vector stores, LLMs etc.

Note that Document is also a class in LangChain which stores the text content of the document (page_content) and associated metadata (file name, source, page number etc.).

All the document loader classes implement the BaseLoader interface. Each document loader may define its own parameters, but they share a common API:

  • load()– Loads all documents at once.
  • lazy_load()– Streams documents lazily, useful for large datasets.

Popular document loaders

LangChain offers over 200 integrations for different data types. We can categorize these loaders based on functionality. See the full list of document loaders here- https://docs.langchain.com/oss/python/integrations/document_loaders

  • File based:
    • TextLoader: Reads simple .txt files.
    • PYPDFLoader: Extracts text from PDFs page by page.
    • CSVLoader: Converts each row of a CSV into a separate document.
  • Web based:
    • WebBaseLoader: Uses BeautifulSoup to scrape and extract text from URLs.
    • Unstructured: Uses Unstructured to load and parse web pages
  • Directory-Level:
    • DirectoryLoader: Automatically detects and loads all files in a folder using appropriate sub-loaders.
  • Social platforms:
    • Twitter- This loader fetches the text from the Tweets of a list of Twitter users, using the tweepy Python package.
    • Reddit- This loader fetches the text from the Posts of Subreddits or Reddit users, using the praw Python package.

In this article we’ll see examples of some of the most frequently used document loaders in LangChain.

1. Text Loader

TextLoader class is used to load text files. Primarily for files with .txt extension, can also be used for markdown files (.md) or for code files (e.g., .py, .js, .html).

LangChain TextLoader example

Suppose I have a file "genai.txt" under resources folder in project root directory. Keeping cross-platform compatibility and code portability (using relative paths) in view, file path is constructed using os.path.

from langchain_community.document_loaders import TextLoader
import os

# Current script directory
script_dir = os.path.dirname(os.path.abspath(__file__))

# Project root is one level above
project_root = os.path.dirname(script_dir)

print(f"Project root directory: {project_root}")

loader = TextLoader(os.path.join(project_root, "resources", "genai.txt"), encoding="utf-8")

documents = loader.load()

print(f"Number of Documents: {len(documents)}")
print(f"Type of Documents: {type(documents)}")
# Print first 500 characters of the first document
print(f"Content of first Document: {documents[0].page_content[:500]}...")
print(f"Metadata of first Document: {documents[0].metadata}")

Output

Number of Documents: 1
Type of Documents: <class 'list'>
Content of first Document: Generative AI is a type of artificial intelligence that creates new, original content—such as text, images, video, audio, or code—by learning patterns from existing data. Unlike traditional AI that classifies or analyzes data, GenAI uses deep learning models to generate novel outputs that resemble the training data.
Key Aspects of Generative AI:

    How it Works: These models (e.g., GANs, Transformers) are trained on massive datasets to understand underlying structures and probabilities. When ...
Metadata of first Document: {'source': 'D:\\Training content\\Python Training Content\\PythonML\\agent\\langchaindemos\\resources\\genai.txt'}

Points to note here:

  • TextLoader is imported from langchain_community.document_loaders package.
  • An object of TextLoader class is created passing it the path of the file, encoding is also passed to ensure handling of special characters (accents, symbols, non English scripts, emojis). Explicitly setting encoding="utf-8" avoids decoding errors or garbled text.
  • If your file is small, you’ll typically only get one Document, which is the case here. So, documents[0].page_content will contain the full text of the file.

2. PDF Loader

LangChain provides many different PDF loader classes for loading PDF files. Some of the classes with their use cases are given below.

Loader Best Use Case Description
PyPDFLoader Simple PDFs with mostly text Uses pypdf under the hood. Fast and lightweight, but can struggle with complex layouts, tables, or images.
PDFPlumberLoader PDFs with structured layouts (tables, columns, forms) Built on pdfplumber. Better at preserving layout and extracting tabular data. Slightly slower than PyPDF.
PyPDFDirectoryLoader Batch loading multiple PDFs in a directory Wraps PyPDFLoader for convenience. Ideal when you have a corpus of PDFs to ingest at once.
PyMuPDFLoader Complex PDFs with mixed content (images, annotations, multi-column text) Uses PyMuPDF. More powerful parsing, can handle embedded images and metadata. Good for research papers or scanned docs.
UnstructuredPDFLoader Messy, scanned, or semi-structured PDFs Uses the unstructured library. Best when PDFs are inconsistent, contain scanned text, or need aggressive cleaning. Often produces more reliable text for downstream NLP.

Let’s start with a simple example using PyPDFLoader. Needs installation of pypdf package so install it using pip install pypdf command or add pypdf to the requirements.txt of your project and run pip install -r requirements.txt

from langchain_community.document_loaders import PyPDFLoader
import os

# Current script directory
script_dir = os.path.dirname(os.path.abspath(__file__))

# Project root is one level above
project_root = os.path.dirname(script_dir)

print(f"Project root directory: {project_root}")

loader = PyPDFLoader(os.path.join(project_root, "resources", "Health Insurance Policy Clause.pdf"))

documents = loader.load()

print(f"Number of Documents: {len(documents)}")
print(f"Type of Documents: {type(documents)}")

Output

Number of Documents: 41
Type of Documents: <class 'list'>

Note that PyPDFLoader creates one Document object per page of the PDF. In this example, gave a pdf of almost 1 MB size with 41 pages which resulted in loading of 41 documents (0-40, one Document per page of the PDF).

  • Content: the text of each page is stored in documents[i].page_content.
  • Metadata: each Document also carries metadata like the page number and source file path.

3. CSV loader

The CSVLoader class in LangChain is used to load csv data with a single row per document.

LangChain CSVLoader example

from langchain_community.document_loaders import CSVLoader
import os

# Current script directory
script_dir = os.path.dirname(os.path.abspath(__file__))

# Project root is one level above
project_root = os.path.dirname(script_dir)

print(f"Project root directory: {project_root}")

file_path = os.path.join(project_root, "resources", "50_Startups.csv")
loader = CSVLoader(file_path)

documents = loader.load()

print(f"Number of Documents: {len(documents)}")
print(f"Type of Documents: {type(documents)}")
# One documnent per row in the CSV file
print(f"Content of first Document: {documents[0].page_content}") 
print(f"Metadata of first Document: {documents[0].metadata}")

Output

Number of Documents: 50
Type of Documents: <class 'list'>
Content of first Document: R&D Spend: 165349.2
Administration: 136897.8
Marketing Spend: 471784.1
State: New York
Profit: 192261.83
Metadata of first Document: {'source': 'D:\\Training content\\Python Training Content\\PythonML\\agent\\langchaindemos\\resources\\50_Startups.csv', 'row': 0}
Since there are 50 records in the CSV file so 50 Document objects are created, one for each row.

4. Web page loader

To load web pages, LangChain provides a WebBaseLoader class to load all text from HTML webpages into a document format.

In order to use the WebBaseLoader, apart from langchain-community python package, you also need to install beautifulsoup4 package.

To load a web page, pass it to WebBaseLoader.

loader = WebBaseLoader("https://www.example.com/")

If you want to load multiple web pages then you can pass in a list of pages to load from.

loader_multiple_pages = WebBaseLoader(
    ["https://www.example.com/", "https://google.com"]
)

LangChain WebBaseLoader example

from langchain_community.document_loaders import WebBaseLoader

# Example URL to load
url1 = "https://www.netjstech.com/2026/04/runablepassthrough-langchain-examples.html"   
url2 = "https://www.netjstech.com/2026/04/runnableparallel-in-langchain-example.html"

loader = WebBaseLoader([url1, url2])
documents = loader.load()
print(f"Number of Documents: {len(documents)}")

Output

Number of Documents: 2

Let’s feed this data to the LLM to get our queries answered based on the text loaded from the web page. By loading data of the web pages, we can provide context to the LLM.

from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_ollama import ChatOllama
from langchain_core.messages import SystemMessage
import os
from langchain_core.output_parsers import StrOutputParser
from dotenv import load_dotenv

load_dotenv()

def load_text_from_url(url: str):

    loader = WebBaseLoader(url)
    documents = loader.load()
    return documents

def generate_response(user_input: str) -> str:
    document = load_text_from_url("https://www.netjstech.com/2026/04/runablepassthrough-langchain-examples.html")
    system_message = SystemMessage(content="You are a helpful assistant that responds to user queries based on the provided context and nothing else.")
    human_message = HumanMessagePromptTemplate.from_template("Based on the given context: {context}, answer the question: {user_input}")
    prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    model = ChatOllama(model="llama3.1")
    chain = prompt | model | StrOutputParser()
    response = chain.invoke({"user_input": user_input, "context": document[0].page_content})
    return response

if __name__ == "__main__":
    response = generate_response("What is the main topic of the article and what are the key points discussed?")
    print(response)

Output

The main topic of the article is "RunablePassthrough in LangChain With Examples". The article discusses the concept of `RunnablePassthrough` in LangChain, a library used for building chain-based models. Here are the key points discussed:

1. **What is RunablePassthrough**: It's a simple runnable that returns its input unchanged, useful when you want to preserve the original input alongside other computed values.
2. **Example use case**: Preserving the original question in a RAG (Retrieval-Augmented Generation) pipeline to be used later in the prompt construction.
3. **Code example**: Demonstrating how to use `RunnablePassthrough` in a chain-based model, specifically with Pinecone vector store and OpenAI embeddings.
4. **`.assign` method**: Explaining how to add extra static or computed fields to the passthrough output using `.assign`, such as adding metadata like timestamp to the prompt.

Overall, the article provides an introduction to `RunnablePassthrough` in LangChain and demonstrates its usage with examples.

5. Directory loader

The DirectoryLoader class in LangChain is used to load all documents from a directory.

Key parameters that you can pass while creating object of the DirectoryLoader:

  • path: The path to the directory to load from.
  • glob: A pattern to filter which files to load (e.g., **/*.pdf for all PDFs including subfolders).
  • loader_cls: The specific LangChain Loader to use for each file; defaults to UnstructuredFileLoader.
  • use_multithreading: Set to True to speed up loading when dealing with many files.
  • silent_errors: If True, the loader will skip files that fail to load instead of raising an exception.

For example, if you want to load all text files from the data directory using the TextLoader class.

loader = DirectoryLoader(
    './data', 
    glob="**/*.md", 
    loader_cls=TextLoader,
    use_multithreading=True
)

LangChain DirectoryLoader example

from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
import os
print(os.getcwd())
# Load all .pdf files from the specified directory
loader = DirectoryLoader("./langchaindemos/resources", glob="**/*.pdf", loader_cls=PyPDFLoader)

documents = loader.load()

# Check the number of documents loaded
print(f"Loaded {len(documents)} documents.")

That's all for this topic Document Loaders in LangChain With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. RunnableBranch in LangChain With Examples
  2. Chatbot With Chat History - LangChain MessagesPlaceHolder
  3. Structured Output In LangChain
  4. Messages in LangChain
  5. Chain Using LangChain Expression Language With Examples

You may also like-

  1. Check if Given String or Number is a Palindrome Java Program
  2. Remove Duplicate Elements From an Array in Java
  3. Difference Between Two Dates in Java
  4. Java Program to Check Prime Number
  5. TreeSet in Java With Examples
  6. Java CyclicBarrier With Examples
  7. Java Variable Types With Examples
  8. Circular Dependency in Spring Framework

Output Parsers in LangChain With Examples

In the post Structured Output In LangChain we saw how to use structured output to get response from LLM in structured format. In this tutorial we’ll see how to do the same thing using output parsers in LangChain.

Output from LLMs, by default is a free-form text. In order to get formatted output from LLMs, LangChain provides many OutputParser classes. Though the newer way of using structured output to get reliable, schema-validated JSON is preferred but initially many LLMs didn’t have native support for producing structured output. Output parsers emerged as an early solution to obtain structured output from LLMs.

Output parsers are still required when working with models that do not support structured output natively, or when you require lightweight parsing, additional processing or validation of the model's output beyond its inherent capabilities.

OutputParser Classes in LangChain

LangChain provides many OutputParser classes that parse the output of an LLM call into structured data. There are variety of OutputParser classes because different use cases demand different ways of interpreting and structuring the raw text returned by an LLM. Some of the common OutputParser classes are listed below with examples. Note that all of the OutputParser classes have method get_format_instructions() to align the model’s output with the parser’s expectations.

1. StrOutputParser

It extracts the text content from a model message and returns it as a plain string, making it easy to chain with other components.

from langchain_core.prompts import ChatPromptTemplate
model = ChatOllama(model="llama3.1")

history_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a history expert."),
        ("human", "{query}"),
    ]
)

chain_history = history_prompt | model | StrOutputParser()
      

2. CommaSeparatedListOutputParser

Designed to parse the output of a model to a comma-separated list.

LangChain CommaSeparatedListOutputParser example

from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain_ollama import ChatOllama

model = ChatOllama(model="llama3.1")

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()

system_message = SystemMessagePromptTemplate.from_template("You are an expert {field} analyst")

human_message = HumanMessagePromptTemplate.from_template("List 5 important trends in {field}. \n{format_instructions}")

prompt = ChatPromptTemplate.from_messages([system_message, human_message]).partial(format_instructions=format_instructions)

chain = prompt | model | output_parser

result = chain.invoke({"field": "AI"})
print(result)

partial method is used to get a new ChatPromptTemplate with some input variables already filled in. Since format_instructions is static (always coming from the parser), so it can be set as a partial variable.

3. JsonOutputParser

Parses the model's output into a JSON object.

LangChain JsonOutputParser example

from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_ollama import ChatOllama
from pydantic import BaseModel

# Define a schema for the JSON output
class Trend(BaseModel):
    name: str
    description: str


model = ChatOllama(model="llama3.1")
output_parser = JsonOutputParser(pydantic_object=Trend)

# Get format instructions from the parser
format_instructions = output_parser.get_format_instructions()


system_message = SystemMessagePromptTemplate.from_template("You are an expert {field} analyst")
human_message = HumanMessagePromptTemplate.from_template(
    "List one important trend in {field} for 2026.\n{format_instructions}"
)

# Create ChatPromptTemplate with partial variable for format_instructions
prompt = ChatPromptTemplate.from_messages([system_message, human_message]).partial(
    format_instructions=format_instructions
)

chain = prompt | model | output_parser

result = chain.invoke({"field": "AI"})
print(result)

Output

{'name': 'Increased Adoption of Edge AI', 'description': 'More organizations will implement AI models on edge devices to reduce latency and improve real-time processing capabilities'}

4. PydanticOutputParser

Uses Pydantic models to define and enforce a strict schema. It provides the most robust validation and type safety.

LangChain PydanticOutputParser example

from langchain_core.output_parsers import JsonOutputParser, PydanticOutputParser
from pydantic import BaseModel
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_ollama import ChatOllama

class Trend(BaseModel):
    name: str
    description: str

output_parser = PydanticOutputParser(pydantic_object=Trend)

# Get format instructions from the parser
format_instructions = output_parser.get_format_instructions()

model = ChatOllama(model="llama3.1")

system_message = SystemMessagePromptTemplate.from_template("You are an expert {field} analyst")
human_message = HumanMessagePromptTemplate.from_template(
    "List one important trend in {field} for 2026.\n{format_instructions}"
)

# Create ChatPromptTemplate with partial variable for format_instructions
prompt = ChatPromptTemplate.from_messages([system_message, human_message]).partial(
    format_instructions=format_instructions
)

chain = prompt | model | output_parser

result = chain.invoke({"field": "AI"})
print(result)
 

That's all for this topic Output Parsers in LangChain With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. LangChain PromptTemplate + Streamlit - Code Generator Example
  2. Prompt Templates in LangChain With Examples
  3. RunablePassthrough in LangChain With Examples
  4. RunableSequence in LangChain With Examples
  5. Chatbot With Chat History - LangChain MessagesPlaceHolder

You may also like-

  1. How ArrayList Works Internally in Java
  2. How HashSet Works Internally in Java
  3. Why wait(), notify() And notifyAll() Must be Called Inside a Synchronized Method or Block
  4. Synchronization in Java - Synchronized Method And Block
  5. Best Practices For Exception Handling in Java
  6. Java Abstract Class and Abstract Method
  7. Just In Time Compiler (JIT) in Java
  8. Circular Dependency in Spring Framework

Thursday, April 9, 2026

PreparedStatement Interface in Java-JDBC

In the post Statement interface in Java we have already seen how you can create a Statement using Connection object and execute SQL statements. However, the Statement interface has a major limitation; it only works with static SQL queries and offers no direct way to pass parameters. Developers often resorted to string concatenation or StringBuilder to inject values, but this approach is error‑prone and vulnerable to SQL injection attacks.

To solve these issues, JDBC introduced the PreparedStatement Interface in Java, a sub‑interface of Statement designed for parameterized queries. With PreparedStatement, you can safely bind values to placeholders (?) in your SQL, making your code more secure, readable, and efficient. In this post we'll see how to use PreparedStatement in JDBC with examples

Obtaining JDBC PreparedStatement object

You can create a PreparedStatement object by calling the prepareStatement() method of the Connection class.

PreparedStatement preparedStatement = connection.prepareStatement(sql);

Advantages of using PreparedStatement in JDBC

  • Parameterized Queries: You can reuse the same SQL statement with different parameter values, reducing duplication.
  • Efficiency: Unlike Statement object, PreparedStatement is given the SQL statement when it is created. So the SQL is sent to the DB right away where it is already compiled. When you come to execute() method to actually execute the SQL that SQL is pre-compiled making it more efficient for repeated executions.
  • Security: By separating SQL logic from parameter values, PreparedStatement prevents SQL injection attacks.
  • Cleaner Syntax: No need for multiple break statements or messy string concatenation, parameters are set using methods like setInt(), setString(), etc.

Java PreparedStatement Example

Let’s see an example using PreparedStatement in JDBC. DB used here is MySql, schema is netjs and table is employee with columns id, age and name, where id is auto-generated.

In the code there are methods for insert, update, delete and select from the table.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class JDBCPrepStmt {
  public static void main(String[] args) {
    Connection connection = null;
    try {
      // Loading driver
      Class.forName("com.mysql.jdbc.Driver");
    
      // Creating connection
      connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/netjs", 
                         "root", "admin");
      JDBCPrepStmt prep = new JDBCPrepStmt();
      prep.insertEmployee(connection, "Kate", 24);
      prep.updateEmployee(connection, 22, 30);
      prep.displayEmployee(connection, 22);
    
      //prep.deleteEmployee(connection, 24);
    } catch (ClassNotFoundException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    } catch (SQLException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }finally{
      if(connection != null){
        //closing connection 
        try {
          connection.close();
        } catch (SQLException e) {
          // TODO Auto-generated catch block
          e.printStackTrace();
        }
      } // if condition
    }// finally
  }
 
  // Method to insert
  private void insertEmployee(Connection connection, String name, int age) 
        throws SQLException{
    String insertSQL = "Insert into employee (name, age) values (?, ?)";
    PreparedStatement prepStmt = null;
    try {
      prepStmt = connection.prepareStatement(insertSQL);
      prepStmt.setString(1, name);
      prepStmt.setInt(2, age);
      int count = prepStmt.executeUpdate();
      System.out.println("Count of rows inserted " + count);
    }finally{
      if(prepStmt != null){
        prepStmt.close();
      }
    }
  }
 
 // Method to update
 private void updateEmployee(Connection connection, int id, int age) throws SQLException{
  String updateSQL = "Update employee set age = ? where id = ?";
  PreparedStatement prepStmt = null;
  try {
   prepStmt = connection.prepareStatement(updateSQL);
   prepStmt.setInt(1, age);
   prepStmt.setInt(2, id);
   int count = prepStmt.executeUpdate();
   System.out.println("Count of rows updated " + count);
  }finally{
    if(prepStmt != null){
     prepStmt.close();
    }
  }
 }
 
 // Method to delete
 private void deleteEmployee(Connection connection, int id) throws SQLException {
  String deleteSQL = "Delete from employee where id = ?";
  PreparedStatement prepStmt = null;
  try {
   prepStmt = connection.prepareStatement(deleteSQL);
   prepStmt.setInt(1, id);
   int count = prepStmt.executeUpdate();
   System.out.println("Count of rows deleted " + count);
  }finally{
    if(prepStmt != null){
     prepStmt.close();
    }
  }
 }
 

 // Method to retrieve
 private void displayEmployee(Connection connection, int id) throws SQLException{
  String selectSQL = "Select * from employee where id = ?";
  PreparedStatement prepStmt = null;
  try {
   prepStmt = connection.prepareStatement(selectSQL);
   prepStmt.setInt(1, id);
   ResultSet rs = prepStmt.executeQuery();
   while(rs.next()){
     System.out.println("id : " + rs.getInt("id") + " Name : " 
                   + rs.getString("name") + " Age : " + rs.getInt("age")); 
   }
  }finally{
    if(prepStmt != null){
     prepStmt.close();
    }
  }
 }
 
}

Points to note here:

Taking this example as reference let’s go through some of the points you will have to keep in mind when using PreparedStatement in JDBC.

  1. Parameterized statement– In the example you can see that all the SQL statements are parameterized and '?' is used as a placeholder in parameterized statements. For example-
    String insertSQL = "Insert into employee (name, age) values (?, ?)";
    
  2. Setter methods– Values for these placeholders are provided through setter methods. PreparedStatement has various setter methods for different data types i.e. setInt(), setString(), setDate() etc.

    General form of the setter method-

    setXXX(int parameterIndex, value)
    

    Here parameterIndex is the index of the parameter in the statement, index starts from 1. For example-

    String insertSQL = "Insert into employee (name, age) values (?, ?)";
    

    For this sql, where the first parameter is String (name) and second parameter is of type int (age), you need to set the parameters on the PreparedStatement object as follows-

    prepStmt.setString(1, name);
    prepStmt.setInt(2, age);
    
  3. Executing PreparedStatement objects– You can use execute methods for executing the queries.
    1. boolean execute()- Executes the SQL statement in this PreparedStatement object, (it can be any kind of SQL query), which may return multiple results.
      Returns a boolean which is true if the first result is a ResultSet object; false if it is an update count or there are no results.
    2. ResultSet executeQuery(String sql)- Executes the SQL statement in this PreparedStatement object, which returns a single ResultSet object. If you want to execute a Select SQL query which returns results you should use this method.
    3. int executeUpdate()- Executes the SQL statement in this PreparedStatement object, which may be an INSERT, UPDATE, or DELETE statement or an SQL statement that returns nothing, such as an SQL DDL statement.
      Returns an int denoting either the row count for the rows that are inserted, deleted, updated or returns 0 if nothing is returned.
  4. That's all for this topic PreparedStatement Interface in Java-JDBC. If you have any doubt or any suggestions to make please drop a comment. Thanks!

    >>>Return to Java Advanced Tutorial Page


    Related Topics

    1. JDBC Tutorial - Java JDBC Overview
    2. ResultSet Interface in Java-JDBC
    3. Transaction Management in Java-JDBC
    4. Connection Pooling Using C3P0 in Java
    5. Data Access in Spring Framework

    You may also like-

    1. Ternary Operator in Java With Examples
    2. Race Condition in Java Multi-Threading
    3. Switch Expressions in Java
    4. Spliterator in Java
    5. Lambda Expressions in Java
    6. Type erasure in Java Generics
    7. Spring NamedParameterJdbcTemplate Insert, Update And Delete Example
    8. Messages in LangChain