How to Safely Truncate a String by Byte Length in Java (UTF-8 Solution)

By Errong Leng - March 26, 2026

How to Safely Truncate a String by Byte Length in Java (UTF-8 Solution)

In Java backend development, it’s common to enforce size limits on strings, such as:

Database column byte limits
API payload size constraints
Logging or message size restrictions

A typical approach is:


value.substring(0, n);

However, this is not safe when dealing with UTF-8 encoding.

The Core Problem: Characters ≠ Bytes

In UTF-8 encoding:

ASCII characters → 1 byte
Chinese characters → 3 bytes
Emoji → 4 bytes

For example:


"hello" → 5 bytes
"你好" → 6 bytes
"😊" → 4 bytes

This means:

Truncating by character count does NOT guarantee byte size limits

---

Does Java Provide a Built-in Solution?

No — Java does not provide a direct method to safely truncate a string by byte length.

While Java offers low-level APIs such as:

CharsetEncoder
ByteBuffer
String.getBytes()

They are not designed for simple, safe truncation and require manual handling.

---

Common Incorrect Approaches

❌ Approach 1: substring()


value.substring(0, 50);

Problem: May exceed byte limits.

❌ Approach 2: Truncate byte[] directly


new String(bytes, 0, maxBytes, StandardCharsets.UTF_8);

Problems:

May split multi-byte characters
Produces invalid UTF-8 strings
Results in corrupted characters (�)

---

Correct Solution: Safe UTF-8 Byte Truncation

The correct approach must:

Respect the byte limit
Preserve valid UTF-8 encoding
Avoid cutting characters in half

Java Implementation


public static String truncateUtf8(String value, int maxBytes) {
    if (value == null) return null;

    byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
    if (bytes.length <= maxBytes) return value;

    int len = maxBytes;

    // Step 1: find the start of the last character
    int start = len;
    while (start > 0 && (bytes[start - 1] & 0xC0) == 0x80) {
        start--;
    }

    // Step 2: determine character byte length
    int firstByte = bytes[start] & 0xFF;
    int charLength;

    if ((firstByte & 0x80) == 0x00) {
        charLength = 1;
    } else if ((firstByte & 0xE0) == 0xC0) {
        charLength = 2;
    } else if ((firstByte & 0xF0) == 0xE0) {
        charLength = 3;
    } else if ((firstByte & 0xF8) == 0xF0) {
        charLength = 4;
    } else {
        return new String(bytes, 0, start, StandardCharsets.UTF_8);
    }

    // Step 3: ensure full character fits
    if (start + charLength > maxBytes) {
        len = start;
    }

    return new String(bytes, 0, len, StandardCharsets.UTF_8);
}

---

Why This Works

Detects UTF-8 character boundaries
Avoids partial multi-byte characters
Ensures valid output within byte limits

---

Example Use Cases

Preventing database insert errors (value too long for column)
Limiting API request/response payload size
Safe logging in distributed systems

---

Conclusion

Java does not provide a built-in way to safely truncate strings by byte length — you must handle UTF-8 boundaries manually.

Character length ≠ byte length
UTF-8 requires boundary-aware handling
This utility is essential in real-world backend systems

If your application handles international text or strict byte limits, this method is a must-have.

❤️ Support This Blog

If this post helped you, you can support my writing with a small donation. Thank you for reading.

Coffee ☕ Thanks 🙏 Extra Support ❤️

Search This Blog

Faith & Code | Reflections by Errong

A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android