// ORM · entity lifecycle · N+1 · caching · JPQL · transactions · senior → principal
jakarta.persistence.*) defining how Java objects map to relational databases: entity mapping, JPQL, the EntityManager lifecycle, and transaction handling.
Hibernate is the dominant JPA implementation. It translates JPA calls into SQL, manages the Session, handles caching and dirty checking, and provides many extensions beyond the spec (HQL, Criteria API, Envers, spatial types).
Spring Data JPA adds a repository abstraction on top of JPA — generating query implementations from method names (findByEmailAndStatus(...)) and reducing boilerplate EntityManager code.
In practice: code to JPA interfaces; use Hibernate extensions only when JPA can't express what you need, accepting the provider lock-in that entails.
new, unknown to Hibernate, no DB row.
Persistent (Managed) — associated with an open Session. All changes are tracked automatically via dirty checking; Hibernate flushes them to the DB at commit.
Detached — Session was closed or evict() called. Changes are no longer tracked. Accessing LAZY associations throws LazyInitializationException. Re-attach via merge() to create a new persistent copy.
Removed — remove() called on a persistent entity; the DB row will be deleted on flush/commit.
@PersistenceContext EntityManager em injects a thread-safe proxy to the current transaction-scoped EntityManager.
@ManyToOne → EAGER - @OneToOne → EAGER - @OneToMany → LAZY - @ManyToMany → LAZY
@ManyToOne being EAGER by default is a common trap: loading 100 orders automatically JOINs the customer table every time, even when it's not needed.
The N+1 problem: loading N entities then accessing a LAZY collection per entity fires N additional SELECTs. Fix with JOIN FETCH in JPQL, @BatchSize, or @EntityGraph. Never "fix" globally with EAGER — it causes over-fetching on all queries.
@Cache(usage=READ_WRITE) and configure a provider (EhCache, Caffeine, Hazelcast, Redis). Ideal for reference data that changes rarely. Risk: stale entries when another node or direct JDBC write changes the DB without going through Hibernate — requires distributed invalidation in multi-node deployments.
Query cache — caches result-set identifiers for JPQL queries. Only useful with L2 cache; rarely worth the complexity.
@Transactional opens a transaction at method entry and commits or rolls back at exit via AOP proxy.
Propagation behaviours: - REQUIRED (default) — join an existing transaction or start a new one - REQUIRES_NEW — always start a new transaction, suspending any outer one - SUPPORTS — use a transaction if one exists; otherwise run non-transactionally
Rollback rules: Spring rolls back on unchecked exceptions by default. Use rollbackFor = Exception.class to include checked exceptions.
Self-invocation trap: calling a @Transactional method from another method in the same bean bypasses the AOP proxy — the transaction annotation has no effect.
@Version) — prevents lost updates without DB locks. Hibernate adds WHERE version = ? to every UPDATE. If another transaction committed first, 0 rows are updated → OptimisticLockException. The caller must catch and retry. Best for low write-contention workloads.
Pessimistic locking — em.find(Entity.class, id, PESSIMISTIC_WRITE) issues SELECT ... FOR UPDATE, serialising access at the DB level. Correct for high-contention scenarios (inventory decrement, seat booking) but reduces throughput and can cause deadlocks if not applied consistently.
Never rely on application-level if (stock > 0) checks — a concurrent transaction can pass the check and commit between your read and write.
JOIN FETCH in JPQL. For multiple collections, use @BatchSize(size=25) — Hibernate batches proxy initialisation with an IN clause, turning N queries into ceil(N/25). @EntityGraph gives per-repository control without altering entity defaults. Never change global fetch to EAGER — that trades N+1 for always-joining on every query.
@Transactional boundary, or in an @Async method that received a detached entity from the calling thread. Fix: return DTOs from your service layer, use JOIN FETCH to load what you need inside the transaction, or explicitly initialise with Hibernate.initialize() before the Session closes. Do NOT enable OSIV to suppress this — it just hides the symptom.
spring.jpa.open-in-view=true by default, keeping the Hibernate Session open for the entire HTTP request so LAZY associations can be accessed in serialisers. At scale this silently causes N+1 queries outside transactions, holds DB connections for the full request duration (including I/O waits), and bleeds data access concerns into the web layer. Action: set spring.jpa.open-in-view=false and use service-layer DTOs.
save() call required. This means loading an entity, calling a setter for local computation only, and accidentally persisting the change. Fix: annotate read-only service methods with @Transactional(readOnly=true) — Hibernate skips dirty checking entirely, improving performance and preventing accidental writes.
instanceof in equals() breaks proxy comparison. Using the DB-generated ID means two newly persisted (transient) entities with id=null are incorrectly equal in Sets/Maps. Best practice: implement equals()/hashCode() on a natural business key (e.g., a UUID assigned before persist, or a unique domain attribute) rather than the surrogate DB key.
CascadeType.ALL with orphanRemoval=true on a @OneToMany means removing a child from the Java collection triggers a DELETE in the DB. A seemingly innocent collection.clear() inside a transaction deletes every element. Use cascade types precisely: typically only PERSIST and MERGE are needed. Reserve REMOVE + orphanRemoval for true ownership relationships where the child cannot exist without the parent.
| @Entity | Marks a class as a JPA entity mapped to a DB table |
| @Table(name=...) | Overrides the default table name derived from class name |
| @Id | Designates the primary key field |
| @GeneratedValue | PK generation strategy: AUTO, IDENTITY, SEQUENCE, TABLE |
| @Column | Column override: name, nullable, unique, length, insertable, updatable |
| @Transient | Excludes a field from persistence — not mapped to any column |
| @Version | Optimistic lock version field (Long or Timestamp); auto-incremented on UPDATE |
| @Embeddable / @Embedded | Value type embedded inline in the owning entity's table (no separate PK) |
| @MappedSuperclass | Base class whose mappings are inherited but is not itself an entity |
| @Enumerated(STRING) | Persist enum by name. Avoid ORDINAL — breaks silently when enum order changes |
| @ManyToOne | Default EAGER. Always name the FK with @JoinColumn(name=...) explicitly |
| @OneToMany | Default LAZY. Non-owning side; requires mappedBy on bidirectional |
| @OneToOne | Default EAGER. True LAZY requires bytecode enhancement for proxy support |
| @ManyToMany | Default LAZY. Creates a join table; if the join table needs extra columns, model it as an @Entity with two @ManyToOne instead |
| mappedBy | Marks the inverse (non-owning) side; the owning side holds the FK column |
| @JoinColumn | Specifies FK column name, nullable, and constraint options |
| orphanRemoval=true | Deletes child entity from DB when removed from parent's collection |
| PERSIST | New children are saved when parent is persisted |
| MERGE | Detached children are merged when parent is merged |
| REMOVE | Children are deleted when parent is deleted |
| REFRESH | Children are reloaded from DB when parent is refreshed |
| DETACH | Children are detached from Session when parent is detached |
| ALL | All of the above. Use carefully — REMOVE can delete data unexpectedly |
| SINGLE_TABLE | All subclasses in one table with a discriminator column. No JOINs, fast queries. Downside: subtype-specific columns are nullable; cannot enforce NOT NULL. |
| JOINED | Parent table + one table per subclass joined by PK. Normalised, enforces constraints. Downside: polymorphic queries JOIN all subtype tables. |
| TABLE_PER_CLASS | Each concrete class gets its own full table. No JOINs for single-type. Downside: polymorphic queries use UNION — avoid this strategy. |
| spring.jpa.hibernate.ddl-auto | none | validate | update | create | create-drop. Use none or validate in production. Never update in prod. |
| spring.jpa.open-in-view | Disable (false) in production to avoid silent N+1 and connection exhaustion |
| spring.jpa.show-sql | Quick SQL logging; prefer logging.level.org.hibernate.SQL=DEBUG for control |
| hibernate.jdbc.batch_size | Batch INSERT/UPDATE statements (25–50). Also set order_inserts=true and order_updates=true |
| hibernate.default_batch_fetch_size | Batch-fetches LAZY collections using IN clauses — reduces N+1 without JOIN FETCH |
| hibernate.generate_statistics | Enables Statistics MBean for query count, cache hit/miss, and timing per Session |
| Aspect | Hibernate / JPA | Spring Data JDBC | jOOQ | MyBatis | Raw JDBC |
|---|---|---|---|---|---|
| Abstraction | High — entity graph, ORM | Medium — aggregate roots | Medium — type-safe SQL DSL | Medium — SQL + mapping XML | Low — raw ResultSet |
| Learning curve | Steep (lifecycle, caching, N+1) | Low | Medium | Low–Medium | Low |
| SQL control | Low (generated SQL) | Medium | Full | Full | Full |
| N+1 risk | High — requires discipline | Low (explicit) | None | Manual | None |
| Complex queries | Awkward (JPQL limits) | Manual SQL required | Excellent | Excellent | Excellent |
| Bulk / batch ops | OK with tuning (StatelessSession) | Good | Excellent | Good | Best |
| Best for | Rich domain models, write path | Simple aggregates | Reporting, analytics | Legacy DB, tuned SQL | Performance-critical paths |
JPA (Jakarta Persistence API) is a specification — interfaces, annotations, and contracts defined by the Jakarta EE standard. It covers entity mapping, JPQL, the EntityManager lifecycle, and transaction handling.
Hibernate is the most widely used JPA implementation. It provides the actual SQL generation, caching, lazy loading proxies, dirty checking, and session management behind those interfaces.
The distinction matters because: - Portability: code written against JPA interfaces can theoretically switch
providers (EclipseLink, DataNucleus), though this rarely happens in practice.
- Extensions: Hibernate adds features beyond the spec — HQL, @BatchSize,
Envers, spatial types. Using them creates Hibernate lock-in.
- Spring Data JPA sits above JPA, generating repository implementations from
method names. It delegates to Hibernate but abstracts the EntityManager.
Best practice: code to JPA interfaces; use Hibernate extensions only when JPA cannot express what you need.
new, unknown to Hibernate, no DB row. Garbage collected when dereferenced.
Persistent (Managed): associated with an open Session. Hibernate tracks all changes via dirty checking and flushes them to the DB on commit or explicit flush(). LAZY association access works here.
Detached: the Session was closed or evict() was called. The entity has a DB row but is no longer tracked. Accessing LAZY associations throws LazyInitializationException. Re-attach via merge() (copies state onto a new persistent instance) or Hibernate's update() (reattaches the exact object).
Removed: remove() called on a persistent entity. The DB row will be deleted at flush/commit.
Key implication: dirty checking only works on persistent entities. Modifications to detached entities are silently lost unless explicitly merged back.The problem: loading N parent entities and then accessing a LAZY collection per entity fires 1 initial query + N subsequent queries:
java List<Order> orders = repo.findAll(); // 1 query orders.forEach(o -> notify(o.getLineItems())); // N queries
Detection: enable spring.jpa.properties.hibernate.generate_statistics=true and check StatisticsImpl.getQueryExecutionCount() per request. Or use datasource-proxy / p6spy to count raw JDBC calls.
Fix strategies: - JOIN FETCH (per-query): SELECT o FROM Order o JOIN FETCH o.lineItems — one
query. Watch for Cartesian products when joining multiple collections.
- @BatchSize(size=25) on the collection: when any proxy is accessed, Hibernate
fetches up to 25 others with an IN clause — reduces to ceil(N/25)+1 queries.
- @EntityGraph: per-repository override of fetch plan without altering defaults. - DTO projections: SELECT new OrderDTO(o.id, o.total) FROM Order o — don't
load the entity graph at all.
Never "fix" N+1 by changing the fetch type to EAGER globally — that trades N+1 in some queries for over-fetching in all queries.
QueryCountHolder or the Hibernate Statistics bean. The fix then drives a policy: every PR adding a collection-loading endpoint must document its fetch strategy.@ManyToOne → EAGER - @OneToOne → EAGER - @OneToMany → LAZY - @ManyToMany → LAZY
@ManyToOne defaulting to EAGER means: every time you load an entity with a @ManyToOne field, Hibernate JOINs the associated table — even if you never use that association in the current use case. With a chain of EAGER associations (Order → Customer → Address → Country), a simple order load becomes a 4-table JOIN.
Best practice: explicitly set everything to LAZY and use JOIN FETCH / @EntityGraph per query to load only what that specific query needs:
java @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "customer_id") private Customer customer;
This makes the data access explicit and prevents silent performance regressions as the entity graph grows.flush()), it compares current field values against the snapshot. Any changed field generates a SQL UPDATE for that entity's row.
This is automatic — you don't call save():
java @Transactional public void changeEmail(Long id, String email) {
User user = em.find(User.class, id);
user.setEmail(email); // automatically flushed at commit
}
Trap: setting a field inside a transaction for a local purpose accidentally persists it. Always be intentional about modifications inside @Transactional methods.
Performance: dirty checking compares every field of every managed entity at flush time. For bulk processing loading thousands of entities this is expensive. Solutions: use @Transactional(readOnly=true) (Hibernate skips dirty checking), call em.clear() periodically in batch loops, or use StatelessSession for bulk operations.java User a = em.find(User.class, 1L); User b = em.find(User.class, 1L); assert a == b; // true — only one SELECT
It is always on and cannot be disabled. This guarantees object identity consistency within a transaction.
Problem in batch processing: loading tens of thousands of entities in a loop accumulates them all in the L1 cache, causing memory pressure and eventually OOM. Fix: call em.flush(); em.clear() every N iterations to release the cache while still committing work.
The L1 cache is not shared between Sessions. Cross-Session caching requires the optional second-level (L2) cache.@Transactional service method boundary
- @Async thread receiving an entity passed from the parent request thread - OSIV disabled but service returns entities instead of DTOs
Prevention (pick the appropriate one): 1. DTOs / projections: map entities to records inside the transaction — no proxies
cross the boundary
2. JOIN FETCH: load all associations you'll need within the active transaction scope 3. Hibernate.initialize(entity.getCollection()): explicit initialisation before
the Session closes (use sparingly)
4. @Transactional boundary: ensure all lazy access happens within the transaction
Do NOT enable OSIV to suppress this error — it hides the symptom while causing N+1 queries at scale.@Inheritance(strategy = InheritanceType.XXX) on the root entity.
SINGLE_TABLE (default): all subclasses share one table with a discriminator column. Pros: simplest queries, no JOINs, best read performance. Cons: all subclass-specific columns are nullable; cannot enforce NOT NULL constraints on subtype fields; poor third normal form.
JOINED: parent class in one table; each subclass in its own table linked by PK/FK. Pros: fully normalised, enforces constraints per subtype, clean schema. Cons: polymorphic queries JOIN across all subtype tables.
TABLE_PER_CLASS: each concrete class gets its own complete table. Pros: no JOINs for single concrete-type queries. Cons: polymorphic queries require UNION — unindexable and slow. Generally avoid.
Decision guide: - Few subtypes with many shared fields, few unique fields → SINGLE_TABLE - Many distinct fields per subtype, data integrity matters → JOINED - Polymorphic queries across all subtypes are never needed → could use TABLE_PER_CLASS, but JOINED is usually safer@Version on a Long or Timestamp field enables optimistic locking:
java @Entity public class Product {
@Version private Long version;
private int stock;
}
Hibernate adds AND version = ? to every UPDATE: sql UPDATE product SET stock = ?, version = 2 WHERE id = ? AND version = 1
If another transaction committed in between, 0 rows are updated → Hibernate throws OptimisticLockException. The caller should catch and retry the operation, or surface a 409 Conflict to the client.
Choose optimistic when: low-to-moderate write contention, reads heavily outnumber writes, and retry cost is acceptable.
Choose pessimistic (LockModeType.PESSIMISTIC_WRITE → SELECT ... FOR UPDATE) when: high write contention makes retries impractical (inventory, seat booking, financial balances), or when you cannot tolerate even momentary inconsistency.
Pessimistic locking serialises access at the DB level — correct, but reduces throughput and risks deadlocks if not applied consistently across all code paths.EntityManager.merge(detached) (JPA-standard): - Loads the persistent copy from the Session L1 cache or DB - Copies the detached entity's state onto the persistent instance - Returns the managed instance — the original argument remains detached - Safe when another transaction may have modified the entity since detachment
Session.update(detached) (Hibernate-specific): - Reattaches the exact same object instance to the Session - Throws NonUniqueObjectException if a different instance with the same ID already
exists in the Session
- Does not check for concurrent modifications
Always prefer merge() for JPA code — it's portable and safe by default. Use update() only when you need Hibernate-specific semantics and fully control the Session state (rare).Detection pipeline:
1. Hibernate Statistics (hibernate.generate_statistics=true): exposes
QueryExecutionCount, EntityLoadCount, CollectionFetchCount per SessionFactory.
Wire into a Spring HandlerInterceptor to log counts per HTTP request.
datasource-proxy / p6spy: intercepts JDBC at the DataSource level. Gives
exact SQL and counts regardless of ORM. Assert in integration tests:
java
assertThat(QueryCountHolder.getGrandTotal().getSelect()).isLessThanOrEqualTo(3);
APM agents (Datadog, Dynatrace): trace JDBC spans per transaction in production. Set an alert when any endpoint exceeds, e.g., 10 queries per request.
Fix hierarchy (apply in order): 1. JOIN FETCH in the repository query for the specific endpoint 2. @EntityGraph for per-method overrides without modifying entity defaults 3. @BatchSize(size=25) for collections too large to JOIN (Cartesian product risk) 4. DTO projections for read-only endpoints — bypass entity hydration entirely 5. spring.jpa.properties.hibernate.default_batch_fetch_size=25 as a safety net
Prevention: add query-count assertions to every integration test for collection-loading endpoints. Gate new collection endpoints with a fetch strategy checklist in PR reviews.
Choose jOOQ when: - Complex reporting queries with aggregations, window functions, CTEs — JPQL can't express them and native queries in JPA lose type safety and IDE support. - You want compile-time SQL validation against the real schema. - The query logic is complex enough that "what SQL will this generate?" should be obvious from the code, not a debugging exercise.
Choose Spring Data JDBC when: - The aggregate is narrow (1–3 tables) with a clear DDD boundary. - You want predictable, explicit SQL: save() always issues an INSERT or UPDATE,
no dirty checking, no proxy, no Session lifecycle surprises.
- A team new to JPA wants data access without Hibernate's footguns.
In practice, mix them within one application: Hibernate for the domain write path and rich object graphs; jOOQ or JDBC for reporting endpoints, analytics, or bulk export jobs. They share the same DataSource and @Transactional infrastructure.
The decisive question: does the Unit of Work / entity graph model add value here, or does it add accidental complexity?
How it works: - Scoped to the SessionFactory — shared across all Sessions in the same JVM. - Annotate entities with @Cache(usage = CacheConcurrencyStrategy.READ_WRITE) and
configure a provider (EhCache, Caffeine via JCache, Hazelcast).
- Hibernate checks L2 before querying the DB for em.find() calls. - L2 entries are evicted automatically when Hibernate executes an UPDATE or DELETE
through its own Session.
Multi-node risks: - Stale data: direct JDBC updates, another service writing to the same table, or DB migrations bypass Hibernate's eviction. The L2 cache serves stale data. - Per-node isolation: without a distributed cache, each node has its own L2 region. Writes on Node A are invisible to Node B's cache — inconsistent reads. - Fix: use a distributed provider (Hazelcast cluster, Redis via Redisson) so invalidation propagates cluster-wide. - Memory pressure: caching large or frequently updated entities wastes heap and increases GC pause times.
Safe L2 candidates: slowly changing reference data — product categories, country codes, configuration values. Avoid caching inventory levels, user sessions, or anything requiring tight consistency.
java em.createQuery("UPDATE Product p SET p.price = p.price * 1.1 WHERE p.category = :c")
.setParameter("c", "electronics").executeUpdate(); Bypasses L1/L2 and lifecycle callbacks. Manually evict L2 regions if needed.
Processing entities in chunks (flush-and-clear): java for (int i = 0; i < entities.size(); i++) {
process(entities.get(i));
if (i % 50 == 0) { em.flush(); em.clear(); }
} Prevents L1 cache bloat. Enable JDBC batching (hibernate.jdbc.batch_size=50) to group the resulting INSERTs/UPDATEs into batch round-trips.
Hibernate StatelessSession: no L1 cache, no dirty checking, no lazy loading proxy — pure CRUD. Ideal for high-volume imports where you don't need lifecycle hooks.
Spring Batch: for production-grade jobs, use JpaPagingItemReader / JdbcCursorItemReader with chunk-oriented processing. Provides restartability, skip/retry policies, and partitioned parallel execution — critical for large datasets.@Cacheable): explicitly manage a Redis or Caffeine cache for coarse entities using Spring Cache abstraction. Evict via @CacheEvict on write operations. Simpler to reason about than Hibernate L2 but requires manual coordination.
4. What NOT to cache: high-write or tightly consistent data — inventory, financial balances, user sessions. Invalidation complexity outweighs the read savings; tight DB indexes serve these better.
Operational concern: cold cache after a rolling deploy triggers a thundering herd. Mitigate with lazy warming + circuit breakers, or a background cache pre-warming job at startup before the instance enters the load balancer rotation.GET /users/{id}/orders) returns the last 50 orders with their line items and assigned agent. Response time is 80 ms at 10 RPS but degrades to 2 s at 100 RPS. Database CPU spikes proportionally to request rate. The service uses Spring Boot with JPA and MySQL.hibernate.generate_statistics=true, confirm 101+ queries per request (1 orders + 50 lineItems + 50 agents)SELECT o FROM Order o JOIN FETCH o.lineItems WHERE o.user.id = :uid ORDER BY o.createdAt DESCJOIN FETCH o.agent to the same query — but watch for Cartesian product if both are collections; only one collection can be JOIN FETCHed safely@BatchSize(size=50) on the field — batches initialisation via IN clausespring.jpa.properties.hibernate.default_batch_fetch_size=25 as a safety net across the whole appspring.jpa.open-in-view=false) to prevent the serialiser from firing LAZY queries silently@OneToMany(fetch=EAGER) globally — trades N+1 in some queries for always-joining in allOrder entity in a @Transactional method, then passes it to a @Async email method. The email template accesses order.getCustomer().getEmail() and iterates order.getLineItems(). LazyInitializationException is thrown intermittently — and more frequently under load.OrderConfirmedEvent with the DTO payload inside the transaction; a separate event listener handles async sending — decoupled and testableassertFalse(Hibernate.isInitialized(order.getLineItems())) before the boundary to document the contractWHERE version = 1 → 0 rows → OptimisticLockException. Retry the loser. Works well at moderate contention.em.find(Product.class, id, LockModeType.PESSIMISTIC_WRITE) issues SELECT ... FOR UPDATE, serialising access. Correct but throughput falls linearly with contention; connection pool exhaustion risk at 10k RPS.UPDATE Product p SET p.stock = p.stock - 1 WHERE p.id = :id AND p.stock > 0 — returns 0 rows if out of stock. Single atomic round-trip, no entity load, no locking overhead.stock >= 0) as a last line of defence regardless of which approach is usedif (stock > 0) as the only guard — race condition is guaranteed under any concurrency