I’m excited to share a set of mobile apps I’ve recently completed and published on both the Google Play Store and the Apple App Store. These apps are designed with a simple goal in mind: to make meaningful, structured content more accessible, whether you’re studying theology or improving your English vocabulary. 📱 Now Available on Both Platforms All apps are live and available for download: Google Play Developer Page: https://play.google.com/store/apps/dev?id=5835943159853189043 Apple App Store Developer Page: https://apps.apple.com/ca/developer/q-z-l-corp/id1888794100 📖 Theology & Confession Study Apps For those interested in Reformed theology and classical Christian teachings, I’ve developed a series of apps that present foundational texts in a clean, focused reading format: The Belgic Confession Canons of Dort Heidelberg Catechism Westminster Shorter Catechism Each app is designed to provide a distraction-free experience, making it easier to read, reflect, and revisit these im...
解决数据库“value too long for column”错误:Java中基于字节的UTF-8字符串截断
- Get link
- X
- Other Apps
By
Errong Leng
-
解决数据库“value too long for column”错误:Java中基于字节的UTF-8字符串截断
如果你在 Java 后端开发中遇到过 “value too long for column” 的数据库错误,那么问题的根本原因可能并不像表面看起来那么简单。
本文将通过一个真实的生产事故,讲清楚 UTF-8 编码带来的坑,以及为什么字符串长度不等于字节长度,并提供一个安全按字节截断字符串的正确实现方式。
生产问题复盘
我们在生产环境遇到如下错误:
value too long for column
- 数据库字段限制:50 bytes
- 输入字符串长度:53 个字符
- 代码已做处理:截断为 50 个字符
看起来完全没问题,但插入数据库时依然失败。
根本原因:UTF-8 编码
字符数 ≠ 字节数
在 UTF-8 编码中:
- 英文字符(ASCII)→ 1 字节
- 中文字符 → 通常 3 字节
- Emoji → 4 字节
因此,一个 50 个字符的字符串,很可能超过 50 字节。
错误的实现方式
常见代码如下:
if (value.length() > 50) {
value = value.substring(0, 50);
}
这只能保证字符数,而无法保证字节数。
更严重的是,如果直接按字节截断,还可能:
- 截断多字节字符
- 生成非法 UTF-8 字符串
- 出现乱码或替换字符(�)
正确解决方案:按字节安全截断
正确做法必须满足:
- 遵守数据库的字节限制
- 保证结果是合法 UTF-8 字符串
- 不能截断一个字符的一部分
Java 实现(UTF-8安全截断)
public static String truncateUtf8(String value, int maxBytes) {
if (value == null) return null;
byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
if (bytes.length <= maxBytes) return value;
int len = maxBytes;
// 第一步:找到最后一个字符的起始位置
int start = len;
while (start > 0 && (bytes[start - 1] & 0xC0) == 0x80) {
start--;
}
// 第二步:判断该字符占用的字节数
int firstByte = bytes[start] & 0xFF;
int charLength;
if ((firstByte & 0x80) == 0x00) {
charLength = 1;
} else if ((firstByte & 0xE0) == 0xC0) {
charLength = 2;
} else if ((firstByte & 0xF0) == 0xE0) {
charLength = 3;
} else if ((firstByte & 0xF8) == 0xF0) {
charLength = 4;
} else {
return new String(bytes, 0, start, StandardCharsets.UTF_8);
}
// 第三步:确保字符完整
if (start + charLength > maxBytes) {
len = start;
}
return new String(bytes, 0, len, StandardCharsets.UTF_8);
}
为什么这个方法是正确的?
- 识别 UTF-8 字符边界
- 避免截断多字节字符
- 保证结果合法且不超过数据库限制
JUnit 测试用例
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
import java.nio.charset.StandardCharsets;
public class Utf8TruncateTest {
@Test
void asciiShouldPass() {
String input = "abcdefghijklmnopqrstuvwxyz1234567890";
String result = truncateUtf8(input, 50);
assertEquals(input, result);
}
@Test
void asciiShouldTruncate() {
String input = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String result = truncateUtf8(input, 50);
assertTrue(result.getBytes(StandardCharsets.UTF_8).length <= 50);
}
@Test
void chineseCharactersShouldNotBreak() {
String input = "汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉汉";
String result = truncateUtf8(input, 50);
assertTrue(result.getBytes(StandardCharsets.UTF_8).length <= 50);
assertDoesNotThrow(() -> result.getBytes(StandardCharsets.UTF_8));
}
@Test
void emojiShouldNotBreak() {
String input = "hello😊world😊test😊";
String result = truncateUtf8(input, 15);
assertTrue(result.getBytes(StandardCharsets.UTF_8).length <= 15);
}
@Test
void boundaryEdgeCaseFailsNaive() {
String input = "aaaaaaa汉";
String result = truncateUtf8(input, 9);
assertTrue(result.getBytes(StandardCharsets.UTF_8).length <= 9);
assertFalse(result.contains("�"));
}
}
关键总结
- 数据库字段限制通常是字节,不是字符
- UTF-8 编码会导致字符长度不等于字节长度
- 处理字符串时必须考虑编码问题
结论
如果数据库限制按字节,而代码按字符处理,那么生产事故只是时间问题。
在支持多语言的现代系统中,正确处理 UTF-8 是后端开发的基本功。
❤️ Support This Blog
If this post helped you, you can support my writing with a small donation. Thank you for reading.
Popular Posts
2026 Begins: Choosing to Stay on the Path as a Blogger
By
Errong Leng
-
Today is January 1st, 2026 . Before rushing into new goals, I spent some time looking back at the past 12 months of data for this blog — the quiet numbers that tell a very honest story. Looking Back at the Numbers Over the past year: 📈 Total views: ~44,600 👀 Daily average visits: ~200 💬 Comments: 19 💰 AdSense income: about $0.01 per month At first glance, these numbers may look small — especially the income. But when I zoom out, I see something more meaningful. This blog has been quietly read every single day . No viral posts. No aggressive promotion. No paid traffic. Just consistent readers arriving through search, bookmarks, and curiosity. That consistency matters. What the Traffic Graph Tells Me The graph over the last 12 months shows something important: Early months were quiet and uneven Mid-year brought steady growth Toward the end of the year, traffic became more stable , with clear spikes when certain posts resonated This tell...
Cross compile tensorflow for armv7l targets via bazel
By
三好Daddy
-
Health Checks and Scaling Strategies for Next.js in Kubernetes
By
Errong Leng
-
Health Checks and Scaling Strategies for Next.js in Kubernetes This is Part 6 and the final post of the series: Self-Hosting Next.js in Kubernetes (Without Vercel) . At this point, your Next.js standalone app: Builds cleanly Runs in a minimal Docker image Deploys correctly on Kubernetes / OpenShift Serves static assets properly Uses runtime configuration and secrets Now let’s make it resilient and scalable . Why Health Checks Matter Kubernetes relies on health checks to: Know when a pod is ready to receive traffic Restart unhealthy containers Safely roll out new versions Without proper probes, traffic can be sent to a pod that isn’t ready yet. Readiness Probe A readiness probe tells Kubernetes: “This pod can accept traffic.” For most Next.js apps, the root path works well: readinessProbe: httpGet: path: / port: 3000 initialDelaySeconds: 10 periodSeconds: 5 If your app depends on downstream services (APIs, d...
Comments
Post a Comment