Java如何安全按字节长度截断String（UTF-8完整解决方案）

By Errong Leng - March 26, 2026

Java如何安全按字节长度截断String（UTF-8完整解决方案）

在Java开发中，我们经常需要对字符串进行长度限制，比如：

数据库字段有字节长度限制
接口参数有大小限制
日志或消息需要截断

很多人第一反应是使用：


value.substring(0, n)

但这种方式在涉及 UTF-8 编码时是不安全的。

问题本质：字符长度 ≠ 字节长度

在 UTF-8 编码中：

英文字符 → 1 byte
中文字符 → 3 bytes
Emoji → 4 bytes

例如：


"hello" → 5 bytes
"你好" → 6 bytes
"😊" → 4 bytes

因此：

substring 按字符截断，无法控制字节大小

---

Java 标准库有没有现成方法？

答案是：没有直接可用的方法。

虽然 Java 提供了：

CharsetEncoder
ByteBuffer
String.getBytes()

但都需要手动处理，且实现复杂，不适合直接使用。

---

错误实现方式（常见坑）

❌ 方式1：substring


value.substring(0, 50);

问题：可能超过字节限制

❌ 方式2：直接截断 byte[]


new String(bytes, 0, maxBytes, UTF_8);

问题：

可能截断多字节字符
产生乱码（�）

---

正确实现：UTF-8安全按字节截断


public static String truncateUtf8(String value, int maxBytes) {
    if (value == null) return null;

    byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
    if (bytes.length <= maxBytes) return value;

    int len = maxBytes;

    // 找到字符起点
    int start = len;
    while (start > 0 && (bytes[start - 1] & 0xC0) == 0x80) {
        start--;
    }

    int firstByte = bytes[start] & 0xFF;
    int charLength;

    if ((firstByte & 0x80) == 0x00) {
        charLength = 1;
    } else if ((firstByte & 0xE0) == 0xC0) {
        charLength = 2;
    } else if ((firstByte & 0xF0) == 0xE0) {
        charLength = 3;
    } else if ((firstByte & 0xF8) == 0xF0) {
        charLength = 4;
    } else {
        return new String(bytes, 0, start, StandardCharsets.UTF_8);
    }

    if (start + charLength > maxBytes) {
        len = start;
    }

    return new String(bytes, 0, len, StandardCharsets.UTF_8);
}

---

这个方法解决了什么问题？

✅ 不超过字节限制
✅ 不截断多字节字符
✅ 不产生乱码

---

总结

Java没有内置“按字节安全截断字符串”的方法，需要自己实现。

字符长度 ≠ 字节长度
UTF-8 必须按字符边界处理
生产系统必须考虑编码问题

如果你的系统涉及数据库、国际化或接口限制，这个方法几乎是必备的。

❤️ Support This Blog

If this post helped you, you can support my writing with a small donation. Thank you for reading.

Coffee ☕ Thanks 🙏 Extra Support ❤️

Search This Blog

Faith & Code | Reflections by Errong

A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android

Java如何安全按字节长度截断String（UTF-8完整解决方案）

Java如何安全按字节长度截断String（UTF-8完整解决方案）

问题本质：字符长度 ≠ 字节长度

Java 标准库有没有现成方法？

错误实现方式（常见坑）

❌ 方式1：substring

❌ 方式2：直接截断 byte[]

正确实现：UTF-8安全按字节截断

这个方法解决了什么问题？

推荐封装为工具类

总结

Comments

Post a Comment

Popular Posts

Recommend GC log analyzer tool : GCPlot & GCeasy

A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android

How to Prepare App Preview Videos and Screenshots for App Store Connect