How to Optimize a Slow JPA Query Processing 40K+ IDs (Real Performance Tuning Guide)

Handling large datasets efficiently is one of the most common challenges in backend engineering. In this post, I’ll walk through how a query processing 40,000+ IDs went from ~24 seconds down to a few seconds using practical optimization techniques.

All domain-specific details have been intentionally removed to focus purely on reusable engineering patterns.

📉 The Problem

A service fetches data using a JPA query with a large IN clause and joins between tables:

SELECT new SomeDto(
    A.primaryId,
    A.field1,
    A.field2,
    B.field3,
    C.metric1,
    C.metric2
)
FROM EntityA A
LEFT JOIN A.relatedEntity B
LEFT JOIN EntityC C ON C.refId = A.primaryId
WHERE A.primaryId IN :ids

Even with chunking (~900 IDs per query), total execution time was around 24 seconds.

🚨 Root Causes

Unoptimized JOIN conditions
Missing database indexes
Sequential query execution
ORM (JPA) overhead
Too many round trips to the database
Potential row multiplication from joins

🚀 Key Optimizations

1. Add Indexes on Join and Filter Columns

Indexes are the single biggest performance lever for large queries.

CREATE INDEX idx_a_primary_id ON TABLE_A (PRIMARY_ID);
CREATE INDEX idx_c_ref_id ON TABLE_C (REF_ID);

✅ Without indexes, the database performs full table scans for each chunk.

2. Avoid Non-Relational Joins

This pattern:

LEFT JOIN EntityC C ON C.refId = A.primaryId

can lead to inefficient execution because it's not using a mapped relationship.

Better approach:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "REF_ID", referencedColumnName = "PRIMARY_ID")
private EntityC relatedData;

LEFT JOIN A.relatedData C

✅ Improves SQL generation and execution plan.

3. Use Native SQL for Heavy Queries

JPA constructor projections introduce overhead. Switching to native SQL often improves performance:

SELECT 
    A.PRIMARY_ID,
    A.FIELD_1,
    A.FIELD_2,
    B.FIELD_3,
    C.METRIC_1,
    C.METRIC_2
FROM TABLE_A A
LEFT JOIN TABLE_B B 
    ON B.FK_ID = A.FK_ID
LEFT JOIN TABLE_C C 
    ON C.REF_ID = A.PRIMARY_ID
WHERE A.PRIMARY_ID IN (:ids)

✅ Benefits:

Less ORM overhead
Better control over execution plans
Faster query execution

4. Parallelize Query Execution

Original logic executed chunks sequentially:

for (...) {
    callDatabase();
}

Improved version using parallel streams:

return IntStream.range(0, (ids.size() + 899) / 900)
    .parallel()
    .mapToObj(i -> {
        int start = i * 900;
        int end = Math.min(start + 900, ids.size());
        return repository.fetch(ids.subList(start, end));
    })
    .flatMap(List::stream)
    .toList();

✅ Reduces total execution time significantly

Note: Ensure your connection pool can handle concurrency.

5. Reduce Round Trips

Chunking is necessary due to database limits on IN clauses, but:

Use the largest safe chunk size (e.g., ~1000)
Avoid overly small chunks

✅ Fewer queries = better performance

6. Prevent Row Explosion

If joined tables contain multiple rows per key, results can multiply unexpectedly.

Fix: aggregate before joining

SELECT 
    REF_ID,
    SUM(METRIC_1) AS METRIC_1,
    SUM(METRIC_2) AS METRIC_2
FROM TABLE_C
GROUP BY REF_ID

✅ Reduces data volume and improves performance

7. Use Lightweight Projections

Instead of constructing DTOs, use projection interfaces:

public interface DataView {
    String getPrimaryId();
    BigDecimal getMetric1();
}

✅ Reduces object creation overhead

📊 Results

Execution time reduced from ~24s → ~3–6s
Lower database load
Improved scalability

✅ Optimization Checklist

✔ Index all join and filter columns
✔ Prefer native SQL for heavy workloads
✔ Avoid non-mapped joins
✔ Parallelize large operations
✔ Tune chunk sizes
✔ Analyze query execution plans

🧠 Final Thoughts

Performance issues in data-heavy applications are often not caused by business logic, but by how queries are structured and executed.

Understanding the interaction between ORM frameworks and the database is critical to building efficient systems.

When dealing with large datasets: optimize your queries first, then your code.

💬 Discussion

Have you run into similar performance challenges? What strategies worked for you?

❤️ Support This Blog

If this post helped you, you can support my writing with a small donation. Thank you for reading.

Coffee ☕ Thanks 🙏 Extra Support ❤️

Search This Blog

Faith & Code | Reflections by Errong

A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android