A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android

Image
I’m excited to share a set of mobile apps I’ve recently completed and published on both the Google Play Store and the Apple App Store. These apps are designed with a simple goal in mind: to make meaningful, structured content more accessible, whether you’re studying theology or improving your English vocabulary. 📱 Now Available on Both Platforms All apps are live and available for download: Google Play Developer Page: https://play.google.com/store/apps/dev?id=5835943159853189043 Apple App Store Developer Page: https://apps.apple.com/ca/developer/q-z-l-corp/id1888794100 📖 Theology & Confession Study Apps For those interested in Reformed theology and classical Christian teachings, I’ve developed a series of apps that present foundational texts in a clean, focused reading format: The Belgic Confession Canons of Dort Heidelberg Catechism Westminster Shorter Catechism Each app is designed to provide a distraction-free experience, making it easier to read, reflect, and revisit these im...

How to Optimize a Slow JPA Query Processing 40K+ IDs (Real Performance Tuning Guide)

How to Optimize a Slow JPA Query Processing 40K+ IDs (Real Performance Tuning Guide)

Handling large datasets efficiently is one of the most common challenges in backend engineering. In this post, I’ll walk through how a query processing 40,000+ IDs went from ~24 seconds down to a few seconds using practical optimization techniques.

All domain-specific details have been intentionally removed to focus purely on reusable engineering patterns.


📉 The Problem

A service fetches data using a JPA query with a large IN clause and joins between tables:

SELECT new SomeDto(
    A.primaryId,
    A.field1,
    A.field2,
    B.field3,
    C.metric1,
    C.metric2
)
FROM EntityA A
LEFT JOIN A.relatedEntity B
LEFT JOIN EntityC C ON C.refId = A.primaryId
WHERE A.primaryId IN :ids

Even with chunking (~900 IDs per query), total execution time was around 24 seconds.


🚨 Root Causes

  • Unoptimized JOIN conditions
  • Missing database indexes
  • Sequential query execution
  • ORM (JPA) overhead
  • Too many round trips to the database
  • Potential row multiplication from joins

🚀 Key Optimizations

1. Add Indexes on Join and Filter Columns

Indexes are the single biggest performance lever for large queries.

CREATE INDEX idx_a_primary_id ON TABLE_A (PRIMARY_ID);
CREATE INDEX idx_c_ref_id ON TABLE_C (REF_ID);

✅ Without indexes, the database performs full table scans for each chunk.


2. Avoid Non-Relational Joins

This pattern:

LEFT JOIN EntityC C ON C.refId = A.primaryId

can lead to inefficient execution because it's not using a mapped relationship.

Better approach:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "REF_ID", referencedColumnName = "PRIMARY_ID")
private EntityC relatedData;
LEFT JOIN A.relatedData C

✅ Improves SQL generation and execution plan.


3. Use Native SQL for Heavy Queries

JPA constructor projections introduce overhead. Switching to native SQL often improves performance:

SELECT 
    A.PRIMARY_ID,
    A.FIELD_1,
    A.FIELD_2,
    B.FIELD_3,
    C.METRIC_1,
    C.METRIC_2
FROM TABLE_A A
LEFT JOIN TABLE_B B 
    ON B.FK_ID = A.FK_ID
LEFT JOIN TABLE_C C 
    ON C.REF_ID = A.PRIMARY_ID
WHERE A.PRIMARY_ID IN (:ids)

✅ Benefits:

  • Less ORM overhead
  • Better control over execution plans
  • Faster query execution

4. Parallelize Query Execution

Original logic executed chunks sequentially:

for (...) {
    callDatabase();
}

Improved version using parallel streams:

return IntStream.range(0, (ids.size() + 899) / 900)
    .parallel()
    .mapToObj(i -> {
        int start = i * 900;
        int end = Math.min(start + 900, ids.size());
        return repository.fetch(ids.subList(start, end));
    })
    .flatMap(List::stream)
    .toList();

✅ Reduces total execution time significantly

Note: Ensure your connection pool can handle concurrency.


5. Reduce Round Trips

Chunking is necessary due to database limits on IN clauses, but:

  • Use the largest safe chunk size (e.g., ~1000)
  • Avoid overly small chunks

✅ Fewer queries = better performance


6. Prevent Row Explosion

If joined tables contain multiple rows per key, results can multiply unexpectedly.

Fix: aggregate before joining

SELECT 
    REF_ID,
    SUM(METRIC_1) AS METRIC_1,
    SUM(METRIC_2) AS METRIC_2
FROM TABLE_C
GROUP BY REF_ID

✅ Reduces data volume and improves performance


7. Use Lightweight Projections

Instead of constructing DTOs, use projection interfaces:

public interface DataView {
    String getPrimaryId();
    BigDecimal getMetric1();
}

✅ Reduces object creation overhead


📊 Results

  • Execution time reduced from ~24s → ~3–6s
  • Lower database load
  • Improved scalability

✅ Optimization Checklist

  • ✔ Index all join and filter columns
  • ✔ Prefer native SQL for heavy workloads
  • ✔ Avoid non-mapped joins
  • ✔ Parallelize large operations
  • ✔ Tune chunk sizes
  • ✔ Analyze query execution plans

🧠 Final Thoughts

Performance issues in data-heavy applications are often not caused by business logic, but by how queries are structured and executed.

Understanding the interaction between ORM frameworks and the database is critical to building efficient systems.

When dealing with large datasets: optimize your queries first, then your code.


💬 Discussion

Have you run into similar performance challenges? What strategies worked for you?

❤️ Support This Blog


If this post helped you, you can support my writing with a small donation. Thank you for reading.


Comments

Popular Posts

A New Collection of Thoughtful Learning Apps — Now Available on iOS & Android

Swagger annotations for API that allows downloading files as zip