Tuesday, July 26, 2022

Compact Strings in Java

In this post we’ll learn about a new feature Compact Strings in Java, added in Java 9, which adopts a more space-efficient internal representation for strings.

Motivation for Compact Strings in Java

Implementation of Java String class before Java 9 stored characters in a char array, using two bytes for each character - UTF-16 encoding. Since String is one of the most used class, String instances constitute a major component of heap usage. It has been observed that most String objects contain only Latin-1 characters which requires only one byte of storage. So internal storage always as UTF-16 means half of the storage is going unused.

Changes for Compact Strings

In order to make Strings more space efficient Java 9 onward internal representation of the String class has been modified from a UTF-16 char array to a byte array plus an encoding-flag field.

As per the Java Compact String feature, based upon the contents of the string characters are stored either as-

  • ISO-8859-1/Latin-1 (one byte per character), or
  • UTF-16 (two bytes per character)

The encoding-flag field indicates which encoding is used.

In the String class you can see the changes for the same-

Storage from char[] array, before Java 9

/** The value is used for character storage. */
 private final char value[]; 

has been changed to byte[] array

private final byte[] value;

Encoding-flag field is named as coder and is of type byte-

private final byte coder;

coder can have either of these two values-

@Native static final byte LATIN1 = 0;
@Native static final byte UTF16  = 1;

Based on whether the storage is Latin-1 or UTF-16 methods of the String class have different implementations too. In fact even the String class has two variants-

final class StringLatin1

final class StringUTF16

Based on the value of the encoding-flag field (coder) specific implementation is called by the methods of the String class.

public int compareTo(String anotherString) {
  byte v1[] = value;
  byte v2[] = anotherString.value;
  if (coder() == anotherString.coder()) {
    return isLatin1() ? StringLatin1.compareTo(v1, v2)
                        : StringUTF16.compareTo(v1, v2);
  }
  return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
                    : StringUTF16.compareToLatin1(v1, v2);
}

That's all for this topic Compact Strings in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Basics Tutorial Page


Related topics

  1. Java join() Method - Joining Strings
  2. StringBuilder Class in Java With Examples
  3. Java String charAt() Method With Examples
  4. Check String Null or Empty in Java
  5. Find All Permutations of a Given String Java Program

You may also like-

  1. final Keyword in Java With Examples
  2. Race Condition in Java Multi-Threading
  3. Convert Numbers to Words Java Program
  4. Binary Tree Implementation in Java - Insertion, Traversal And Search
  5. Reduction Operations in Java Stream API
  6. Spring MVC Form Example With Bean Validation
  7. Python for Loop With Examples
  8. Data Compression in Hadoop

No comments:

Post a Comment