// auto-configuration · starters · actuator · JPA · embedded server · senior → principal
@SpringBootApplication combines @Configuration + @EnableAutoConfiguration + @ComponentScan. On startup, @EnableAutoConfiguration triggers AutoConfigurationImportSelector, which reads META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports from every jar on the classpath. Each candidate configuration is evaluated against its @Conditional* annotations:
- @ConditionalOnClass(DataSource.class) — only applies if a JPA jar is on the classpath - @ConditionalOnMissingBean(DataSource.class) — backs off if you defined your own bean - @ConditionalOnProperty("spring.datasource.url") — only applies if the property is set
Adding spring-boot-starter-data-jpa to your pom automatically creates a DataSource, EntityManagerFactory, and TransactionManager — unless you override them. Debug with --debug flag or /actuator/conditions endpoint.
spring-boot-starter-web → Spring MVC + embedded Tomcat + Jackson - spring-boot-starter-data-jpa → Hibernate + Spring Data + HikariCP - spring-boot-starter-security → Spring Security filter chain - spring-boot-starter-actuator → Actuator endpoints + Micrometer - spring-boot-starter-test → JUnit 5 + Mockito + AssertJ + MockMvc
To swap an embedded server (Tomcat → Undertow): exclude spring-boot-starter-tomcat from spring-boot-starter-web and add spring-boot-starter-undertow.
java -jar app.jar starts a production-capable HTTP server.
Thread-per-request (Spring MVC / servlet): each request occupies one thread from a bounded pool (server.tomcat.threads.max, default 200). Blocking I/O — DB queries, HTTP calls — holds the thread for its full duration. Throughput ceiling: thread_count / mean_response_time_seconds.
Event-loop (Spring WebFlux / Netty): a small number of threads (≈ CPU count) handle thousands of concurrent connections via non-blocking I/O. Higher concurrency under I/O-heavy workloads. Requires the reactive programming model end-to-end — a single blocking call in the pipeline blocks an event-loop thread.
/actuator/health — UP/DOWN status + component health (DB, Kafka, disk).
Maps to Kubernetes liveness/readiness probes.
- /actuator/metrics — Micrometer metrics. Add micrometer-registry-prometheus
for a Prometheus-compatible /actuator/prometheus scrape endpoint.
- /actuator/loggers — view and change log levels at runtime without restart. - /actuator/env — all active properties and sources (values sanitized by default). - /actuator/threaddump — full JVM thread dump for diagnosing deadlocks and saturation.
Default: only /health and /info exposed. Expose others explicitly: management.endpoints.web.exposure.include=health,metrics,loggers and protect them with Spring Security.
--server.port=9090) 2. Environment variables (SPRING_DATASOURCE_URL) 3. application-{profile}.yml 4. application.yml
Profiles (spring.profiles.active=prod): activate profile-specific config files and @Profile("prod")-gated beans.
@ConfigurationProperties("prefix") binds an entire config prefix to a typed POJO — type-safe, IDE-completable, JSR-303 validated with @Validated. Prefer it over @Value for any structured config block with two or more related properties.
JpaRepository<Entity, ID> to get CRUD, paging, and sorting for free.
Query derivation: method names derive queries automatically — findByEmailAndStatus(String email, Status s) generates the JPQL at startup.
@Query: explicit JPQL or native SQL for complex queries.
Fetch strategies: FetchType.LAZY (default for collections) defers loading until the field is accessed. Accessing a lazy association after the transaction closes throws LazyInitializationException. Fix with @EntityGraph, JOIN FETCH, or DTOs via projections.
N+1 problem: loading 100 orders then calling order.getItems() on each triggers 100 additional queries. Enable logging.level.org.hibernate.SQL=DEBUG in development to catch this early.
@RestController = @Controller + @ResponseBody. Spring MVC maps requests to methods via @GetMapping, @PostMapping, etc.
Input validation: annotate DTOs with Bean Validation constraints (@NotNull, @Size, @Email) and add @Valid on the @RequestBody parameter. Spring throws MethodArgumentNotValidException on failure.
Global error handling: @ControllerAdvice + @ExceptionHandler centralizes exception-to-response mapping across all controllers. Use ProblemDetail (RFC 7807, native in Spring Boot 3) for a standards-compliant error format.
ResponseEntity<T>: full control over status code, headers, and body — use when the HTTP response must vary (201 with Location header, 204 No Content).
Spring Boot provides test slices — lightweight contexts that load only one layer, making tests faster and more focused.
- @SpringBootTest: full application context. Use for integration tests that span
multiple layers. Add webEnvironment=RANDOM_PORT for a real HTTP server. Slow.
- @WebMvcTest(Controller.class): web layer only — controllers, filters, security.
Inject MockMvc. Mock service dependencies with @MockBean.
- @DataJpaTest: JPA infrastructure + H2 (default). Tests repository queries in
isolation. Rolls back after each test.
- @MockBean: replaces a Spring bean with a Mockito mock in the context. Unlike
@Mock, it integrates with Spring DI — other beans get the mock autowired.
Testcontainers + @DataJpaTest with a real Postgres image catches database-specific behavior that H2 silently ignores (window functions, constraints, custom types).
@Transactional works via AOP proxy. When method A in the same class calls @Transactional method B, the call goes directly to this, bypassing the proxy — no transaction is created or joined. This is the most common Spring gotcha. Fix: inject self via @Autowired, move B to another Spring bean, or use AspectJ weaving (load-time or compile-time) which instruments this calls directly.
@OneToMany(fetch = LAZY) proxies load from the DB only when accessed. Accessing them after the @Transactional method returns (in a controller, in toString(), or during Jackson serialization) throws LazyInitializationException. Fix: use DTOs/projections, @EntityGraph, or JOIN FETCH. Avoid spring.jpa.open-in-view=true (the "Open Session In View" anti-pattern) — it masks the root cause and causes silent N+1 queries.
SecurityContextHolder uses a ThreadLocal by default. When @Async dispatches work to another thread, the security context is not copied — the async thread sees an empty SecurityContextHolder. Any code calling SecurityContextHolder.getContext() in an async method (e.g., to get the current user for auditing) gets null. Fix: configure DelegatingSecurityContextAsyncTaskExecutor as the @Async executor, or set the holder strategy to MODE_INHERITABLETHREADLOCAL.
NoSuchBeanDefinitionException or duplicate bean errors. Run with --debug to see the Conditions Evaluation Report: which configurations matched, which were excluded, and why. The fix is usually: add @ConditionalOnMissingBean to your bean definition, or explicitly exclude an auto-configuration class via @SpringBootApplication(exclude = ...).
javax.* packages to jakarta.*. Any import of javax.persistence.*, javax.servlet.*, or javax.validation.* fails to compile. Transitive dependencies on pre-Jakarta libraries also break. Run OpenRewrite's UpgradeSpringBoot_3_0 recipe to automate most renames. Verify each transitive dependency (Hibernate 6+, Flyway 9+, MapStruct 1.5+) supports Jakarta EE before upgrading.
spring.main.lazy-initialization=true defers bean creation to first use, reducing startup time by 5–15s. The cost: a misconfigured datasource, missing property, or broken bean definition only surfaces on the first request that touches it — not at startup. In production, this means a deployment can succeed and the pod can become "Ready" while harboring a fatal configuration error. Use lazy init only if startup time is a hard requirement, and compensate with a smoke test that exercises every major flow on deploy.
| @SpringBootApplication | Entry point. Combines @Configuration + @EnableAutoConfiguration + @ComponentScan(basePackage of annotated class). |
| @RestController | @Controller + @ResponseBody. All return values serialized to the HTTP response body (JSON by default via Jackson). |
| @Service / @Repository | @Component specializations. @Repository additionally translates JPA/JDBC exceptions to Spring's DataAccessException hierarchy. |
| @Autowired | Dependency injection. Prefer constructor injection — Spring injects implicitly if there is only one constructor. Avoid field injection (untestable without Spring context). |
| @ConfigurationProperties("prefix") | Binds a config prefix to a typed POJO. Type-safe, IDE-completable, validated with @Validated. Preferred over @Value for structured config. |
| @Value("${prop}") | Inject a single property. Supports SpEL and defaults: @Value("${timeout:5000}"). Use only for one-off single properties. |
| @Transactional | Proxy-based AOP transaction. propagation controls join/create/suspend; isolation controls concurrency. readOnly=true hints optimizer. Self-calls bypass proxy. |
| @Query | Explicit JPQL or SQL on a Spring Data repository method. nativeQuery=true for raw SQL. Overrides derived query name parsing. |
| @EntityGraph(attributePaths) | Declares which LAZY associations to eagerly fetch for a specific repository method. Prevents N+1 without changing the entity mapping. |
| @ControllerAdvice + @ExceptionHandler | @ControllerAdvice applies globally to all controllers. @ExceptionHandler maps an exception type to an HTTP response. Most-specific handler wins. |
| @Valid / @Validated | @Valid triggers Bean Validation on @RequestBody or method parameters. Failure throws MethodArgumentNotValidException (400). |
| @Async | Run a method on a separate executor thread pool. Requires @EnableAsync on a @Configuration class. Return void or Future/CompletableFuture. |
| /actuator/health | Liveness + readiness. K8s probe paths: /actuator/health/liveness and /actuator/health/readiness. Component health (DB, Kafka, disk) shown when management.endpoint.health.show-details=always. |
| /actuator/metrics | Micrometer metrics as JSON. Add micrometer-registry-prometheus for /actuator/prometheus scrape endpoint. Query: /actuator/metrics/http.server.requests?tag=status:500 |
| /actuator/loggers | GET: current log levels. POST {configuredLevel:'DEBUG'}: change log level at runtime — invaluable for debugging production without restart. |
| /actuator/threaddump | Full JVM thread dump. Diagnose deadlocks and thread pool saturation without a connected profiler. |
| /actuator/env | All active properties, sources, and precedence order. Sensitive values masked. Restrict to internal network — exposes config secrets otherwise. |
| /actuator/conditions | Conditions evaluation report at runtime — same output as --debug flag. Shows which auto-configurations applied and why others were skipped. |
| /actuator/mappings | All @RequestMapping routes registered. Useful for verifying route configuration and discovering endpoints. |
| /actuator/heapdump | Downloads a .hprof heap dump. Large response; restrict access tightly. Use only for diagnosing memory leaks. |
| maximum-pool-size | Default 10. Max connections per pool. Set based on DB max_connections ÷ number of instances. More is not better — over-provisioning starves other services. |
| minimum-idle | Default = maximumPoolSize (pool never shrinks). Lower for bursty workloads to release idle connections. |
| connection-timeout | Default 30000ms. Time to wait for a connection from pool before throwing. Lower to 5000ms to fail fast under saturation. |
| max-lifetime | Default 1800000ms (30 min). Replace connections before DB/firewall kills them. Set below DB wait_timeout. |
| keepalive-time | Default 0 (disabled). Periodic test query on idle connections to prevent silent firewall drops. Enable if connections go stale mid-flight. |
| spring.datasource.hikari.* | All HikariCP settings via Boot properties: spring.datasource.hikari.maximum-pool-size=20, spring.datasource.hikari.connection-timeout=5000 |
| Dimension | Spring Boot | Quarkus | Micronaut | Vert.x |
|---|---|---|---|---|
| Startup (JVM) | 3–15s (runtime context loading) | 0.5–2s (build-time processing) | 0.5–2s (build-time processing) | <200ms (minimal framework overhead) |
| Startup (native) | ~100ms (GraalVM, experimental) | ~10–50ms (first-class native) | ~50ms (first-class native) | N/A |
| Memory footprint | 200–400 MB baseline | 50–200 MB | 50–200 MB | <50 MB |
| Ecosystem | Largest — MVC, Data, Security, Batch, Cloud, Integration | ~500 extensions | Growing — Data, Security, Functions | Focused on reactive I/O |
| AOP model | Runtime CGLIB/JDK proxy — self-call bypass gotcha | Build-time bytecode — no self-call issue | Build-time AOP — no self-call issue | No AOP — explicit composition |
| GraalVM native | Supported but complex — reflection hints required | First-class, best-in-class | First-class | Limited |
| Best for | Existing Java shops, rich ecosystem, complex enterprise features | Kubernetes-native, serverless, teams wanting Spring DX with fast startup | Fast startup, AWS Lambda, teams avoiding runtime reflection | High-concurrency reactive services, polyglot teams |
@SpringBootApplication composes @Configuration, @EnableAutoConfiguration, and @ComponentScan. On startup:
1. @ComponentScan registers beans from the annotated class's package and sub-packages.
2. @EnableAutoConfiguration loads AutoConfigurationImportSelector, which reads
META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports
from every jar on the classpath — every candidate auto-configuration class name.
@Conditional* annotations. For example,
DataSourceAutoConfiguration:@ConditionalOnClass(DataSource.class) — skipped if no JPA jar present@ConditionalOnMissingBean(DataSource.class) — backed off if you defined your own
Matching configurations create beans. Since your own @Bean methods run first,
@ConditionalOnMissingBean detects them and the auto-configuration backs off.
Result: spring-boot-starter-data-jpa in pom.xml automatically wires DataSource, EntityManagerFactory, and TransactionManager without any XML or @Bean methods.
--debug or /actuator/conditions) is the diagnostic tool. It shows positive matches (applied), negative matches (skipped and why), and unconditional classes. When a bean you expect is missing, the negative matches section tells you exactly which @ConditionalOn* failed.
When writing a shared library: always guard auto-configuration beans with @ConditionalOnMissingBean so downstream apps can override your defaults. This is the same pattern all Spring Boot starters follow — downstream override without touching the library code.@Transactional beans in a CGLIB proxy. External callers go through the proxy, which opens a transaction (or joins/suspends depending on propagation), runs the method, then commits or rolls back.
Failure 1 — self-calls: if method A calls @Transactional method B within the same bean, the call goes via this, bypassing the proxy — no transaction created. Fix: inject self or move B to a separate bean.
Failure 2 — wrong exception type: rollback triggers on RuntimeException (unchecked) by default. Checked exceptions (IOException, SQLException) do NOT roll back. Override: @Transactional(rollbackFor = IOException.class).
Other pitfalls: @Transactional on private methods (proxy can't intercept), and on final classes (CGLIB cannot subclass them).Propagation modes matter at the Staff level:
- REQUIRED (default): join existing or create new. Use for most service methods. - REQUIRES_NEW: always creates a new transaction, suspending the outer one. Use when
an operation must commit independently — audit logs that must persist even if the main
transaction rolls back.
- NOT_SUPPORTED: run without any transaction. Use for DDL statements or reads that
must bypass the transaction manager.
Isolation levels: READ_COMMITTED is the PostgreSQL default and correct for most OLTP. Move to REPEATABLE_READ only when you need consistent reads across multiple statements in one transaction. SERIALIZABLE for strong consistency — rare, expensive.
@Value("${some.property}") injects a single property. It supports SpEL and defaults: @Value("${timeout:5000}"). Downsides: scattered injection points, no type safety, no IDE completion for property names, and hard to test without a Spring context.
@ConfigurationProperties("app.cache") binds an entire prefix to a typed POJO. Benefits: type conversion (timeout=5000 → private Duration timeout), JSR-303 validation with @Validated, IDE auto-completion via annotation processor, and constructor binding for immutable config (Spring Boot 2.2+).
Rule: @Value for a single one-off property. @ConfigurationProperties for any structured configuration block with two or more related properties.@ConfigurationProperties beans through auto-configuration with @ConditionalOnMissingBean. Downstream applications override your library's defaults by defining their own @ConfigurationProperties bean — no fork required. Pair with a spring-configuration-metadata-annotation-processor dependency to generate IDE metadata, so users get auto-completion and documentation in application.yml for your library's config prefix.@SpringBootTest: loads the full application context. Use for cross-layer integration tests. Add webEnvironment=RANDOM_PORT to start a real HTTP server. Slowest to start — use sparingly.
@WebMvcTest(SomeController.class): loads only the web layer — controllers, filters, @ControllerAdvice, security config. MockMvc sends HTTP without starting a server. Service dependencies need @MockBean. Fast and focused. Use to test request/response mapping, validation, HTTP status codes, and security rules.
@DataJpaTest: loads only JPA infrastructure + H2 by default. Each test rolls back. Use to test @Query methods and repository behavior.
@MockBean places a Mockito mock in the Spring context, so it gets autowired into other beans. Different from plain @Mock, which has no Spring context integration.@SpringBootTest, context starts in 20s, CI takes 15 minutes. Fix: slice-first. @WebMvcTest for controller logic, @DataJpaTest for queries, plain unit tests (no Spring) for business logic. Only use @SpringBootTest when testing cross-layer integration that can't be verified in slices.
Testcontainers + @DataJpaTest with real Postgres catches database-specific behavior H2 silently ignores: window functions, JSON column types, index usage differences. The cost is a Docker dependency in CI — worth it for data-heavy services.spring.datasource.hikari.*:
- maximum-pool-size (default 10): max connections. Base on DB max_connections ÷ instance count. - connection-timeout (default 30s): time to wait for a pool connection before throwing.
Lower to 5s to fail fast under saturation.
- max-lifetime (default 30min): replace connections before DB kills them. Set below the
database's connection timeout.
- keepalive-time: periodic test query on idle connections to prevent firewall drops.
Monitor hikaricp.connections.pending (connections waiting on a pool slot) and hikaricp.connections.active via Actuator metrics. A non-zero pending count means the pool is undersized or queries are slow.(cpu_cores × 2) + effective_disk_spindles is the right size for most OLTP — typically 10–20, not 100.
If hikaricp.connections.pending is non-zero, the first fix is query optimization, not a bigger pool. Slow queries hold connections longer. A query that takes 500ms under load instead of 50ms reduces effective pool capacity by 10x.@ControllerAdvice applies across all controllers. @ExceptionHandler(Type.class) handles specific exceptions and returns a ResponseEntity:
```java @ControllerAdvice public class GlobalExceptionHandler {
@ExceptionHandler(ResourceNotFoundException.class)
public ResponseEntity
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity
@ExceptionHandler(Exception.class)
public ResponseEntity
ProblemDetail (RFC 7807, built into Spring Boot 3) rather than a custom error DTO — it's a standard format, tools understand it, and it's extensible. Never expose internal exception messages or stack traces in the response body — it leaks implementation details and is a security risk.server.tomcat.threads.max, default 200). Each HTTP request holds one thread until the response is sent, including all blocking I/O. Throughput ceiling: threads / mean_response_seconds. At 50ms average = 4,000 req/s. At 500ms average = 400 req/s — same thread count, 10x less throughput.
Diagnose exhaustion: - /actuator/metrics/tomcat.threads.busy — if consistently equals max, threads are saturated. - /actuator/threaddump — look for threads in WAITING or TIMED_WAITING on JDBC, HTTP
client socket reads, or lock acquisition. This reveals the blocking call.
- /actuator/metrics/hikaricp.connections.pending — if > 0, threads are waiting for DB connections. - Application logs for Unable to acquire JDBC Connection.server.tomcat.threads.max is the wrong first move. More threads that are all waiting on a slow database just means more threads waiting. The root cause is almost always slow downstream I/O — optimize queries, add caches, tune the connection pool.
The architectural fix for consistently I/O-heavy workloads is Spring WebFlux on Netty. A handful of event-loop threads handle thousands of concurrent connections via non-blocking I/O. But WebFlux requires reactive programming end-to-end — any blocking call (JDBC, RestTemplate) blocks an event-loop thread and degrades everything. Choose the model at design time, not after a production incident.Enable: management.health.livenessstate.enabled=true and management.health.readinessstate.enabled=true (auto-enabled when running in Kubernetes).
- /actuator/health/liveness — is the app in a state to serve traffic? Failure causes K8s
to restart the pod. Reserve for unrecoverable states only: deadlock, OOM, corrupted
in-memory state. Do NOT put transient external dependencies here.
- /actuator/health/readiness — is the app ready for traffic? Failure removes the pod from
the load balancer without restarting. DB connectivity, Kafka broker availability, and
cache warm-up belong here.
Add a startupProbe pointing at /actuator/health/liveness with a high failureThreshold to give Spring Boot time to start without the pod entering a restart loop. Only after the startup probe succeeds does K8s start evaluating readinessProbe.
HealthIndicator and publishing AvailabilityChangeEvent from application code to signal state transitions.order.getItems() on each triggers 101 queries. Hibernate makes each look like a Java property access — invisible without SQL logging.
Detect: logging.level.org.hibernate.SQL=DEBUG. In tests, use a query counter (@Sql + a custom interceptor, or Hypersistence Utilities) that asserts max queries.
Fix options: - JOIN FETCH: @Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.id IN :ids") - @EntityGraph(attributePaths = "items") on the repository method - Projection: SELECT new OrderSummary(o.id, i.name) FROM Order o JOIN o.items i - @BatchSize(size = 50) on the collection — N+1 becomes N/50+1server.shutdown=graceful (Spring Boot 2.3+). On SIGTERM, Spring Boot:
1. Marks the context as "not running" — K8s readiness probe fails, traffic stops routing. 2. Waits up to spring.lifecycle.timeout-per-shutdown-phase (default 30s) for in-flight
requests to complete.
3. Closes the servlet container (drains connections). 4. Destroys Spring beans (closes DataSource, Kafka producers, etc.). 5. JVM exits.
In Kubernetes: terminationGracePeriodSeconds in the pod spec must be longer than timeout-per-shutdown-phase. If K8s's SIGKILL arrives before Spring finishes draining, in-flight requests abort. Set terminationGracePeriodSeconds: 60 and timeout-per-shutdown-phase: 50s as a safe starting point.SmartLifecycle phase ordering. If you define custom beans with shutdown logic, implement SmartLifecycle and set appropriate getPhase() values to ensure they close in the right order. Test graceful shutdown under load: send requests during shutdown and verify zero 500s with the --debug log level to observe the shutdown sequence.tomcat.threads.busy metric — if consistently equals max, threads are saturated. 2. /actuator/threaddump — look for threads WAITING on JDBC socket read, HTTP client,
or lock. This reveals the specific blocking call.
3. hikaricp.connections.pending — if > 0, threads are waiting for DB pool slots. 4. Application logs for Unable to acquire JDBC Connection.
Fix sequence: - Short-term: increase server.tomcat.threads.max to 400 to absorb spikes.
Not a root fix — just buying time.
- Root cause A — slow queries: enable SQL logging, find the slow query, add indexes,
optimize the query. A 500ms query holding a thread is 10x worse than a 50ms one.
- Root cause B — slow downstream HTTP calls: every outbound HTTP call must have a
read timeout. A call with no timeout holds a thread indefinitely. Add Resilience4j
circuit breaker to fail-fast after the downstream is degraded.
- Root cause C — too few DB connections: tune hikari.maximum-pool-size based on
DB max_connections ÷ instance count. Monitor hikaricp.connections.pending.@Bulkhead to limit concurrent calls into a specific dependency). These three patterns together prevent cascade failures. Add management.health.circuitbreakers.enabled=true to expose Resilience4j circuit state in /actuator/health.javax.* → jakarta.*).
Phase 1 — Prepare on Boot 2.7.x (latest Boot 2): - Eliminate all deprecated API usages Boot 2.7 flags — they're removed in Boot 3. - Audit all javax.* imports and transitive dependencies for Jakarta compatibility. - Spring Security: replace WebSecurityConfigurerAdapter (removed) with the lambda DSL.
Phase 2 — Package rename: - Run OpenRewrite UpgradeSpringBoot_3_0 recipe: automates javax.persistence →
jakarta.persistence, javax.servlet → jakarta.servlet, etc.
- Update dependency versions: Hibernate 6+, Flyway 9+, MapStruct 1.5+.
Phase 3 — Boot 3 compile and verify: - Fix remaining compile errors (libraries OpenRewrite didn't cover). - Spring Security: antMatchers → requestMatchers. - Actuator: some metric names changed (check Micrometer 1.10+ migration notes). - Run full integration test suite including Testcontainers paths.javax and jakarta classes cause ClassCastException at runtime when objects cross the namespace boundary. Use mvn dependency:tree | grep javax to audit before upgrading. Libraries not yet on Jakarta (some enterprise libs, older generated clients) must be replaced or wrapped. The migration is also an opportunity: Boot 3 + GraalVM native image becomes first-class. If startup time or memory is a concern, plan native compilation as part of the migration, not a separate future project.Idempotency-Key: <UUID> header with each POST. 2. Controller (or a @Aspect) checks a processed_requests table (or Redis):
if the key exists, return the cached response immediately.
3. If new: execute the operation and store (key, response, expires_at) in the same
database transaction as the main operation.
4. Set TTL on stored keys (24–72h) to bound table growth.
The atomicity requirement: writing the idempotency record and the main operation must be in the same transaction. If you write the record before the operation and crash, the key is "done" but nothing happened. If you write after and crash, the second call re-executes. Atomic write via @Transactional covering both is the correct pattern.micrometer-tracing-bridge-otel — OpenTelemetry bridge - opentelemetry-exporter-otlp — OTLP exporter for Jaeger/Tempo/Grafana
Configuration: yaml management:
tracing:
sampling:
probability: 1.0 # 100% in dev; use 0.01–0.1 in prod
logging:
pattern:
correlation: "[${spring.application.name},%X{traceId:-},%X{spanId:-}] "
Auto-instrumented: RestTemplate, WebClient, @Scheduled, Kafka listeners, JDBC. Custom spans: inject Tracer bean or annotate with @NewSpan.
Log/trace correlation: the traceId in MDC appears in every log line — a single failing request's traceId from the trace backend surfaces all log lines across all services for that request.probability=0.01 misses rare slow requests. Use adaptive tail-based sampling (Jaeger's remote sampling, or OpenTelemetry Collector with tail sampling processor): sample 100% of error traces and slow traces (>500ms), 1% of everything else. This captures failures without drowning the backend. Without log correlation, distributed tracing tells you where time was spent but not why it failed — ship traceId into your log aggregation system and ensure it's indexed for queries.pom.xml files means 5 different Spring Boot versions, 3 different Jackson versions, inconsistent security patches. Solution: a company-internal BOM (Bill of Materials) declaring canonical versions for Spring Boot, shared libraries, and security dependencies. CI enforces compliance — builds fail on version drift.
Configuration proliferation: each service has its own application.yml. Secrets leak into config files, environments diverge silently. Solution: centralized configuration (Spring Cloud Config backed by Git, or HashiCorp Vault + K8s ConfigMaps). Environment- specific config managed centrally. Secrets not in source control.
Observability standardization: 100 services with different log formats, metric names, and trace ID fields are operationally untenable. Mandate: structured JSON logging with traceId + spanId in MDC, Micrometer metric naming conventions, standard Actuator config. A shared logging library (spring-boot-starter-company-logging) enforces this.
Service catalog and ownership: some services will be abandoned. A service catalog (Backstage) with declared ownership, SLA, and dependency map. Services with no owner get deprecated and decommissioned.Native image compiles Spring Boot to a native binary via GraalVM's ahead-of-time (AOT) compilation. Results: startup in 50–150ms vs 5–15s JVM, memory 50–200MB vs 300–600MB, no JIT warm-up — consistent latency from the first request.
What changes: - Build time: native compilation takes 5–20 minutes (full AOT analysis). - Spring AOT (Spring Boot Maven plugin with native profile) processes Spring's
reflection-heavy internals at build time. Most Spring features work unchanged.
- Dynamic reflection, proxies, and resource loading need @RegisterReflectionForBinding
hints or native reflect-config.json entries.
What breaks: - Libraries using heavy runtime reflection (some Hibernate features, JasperReports, custom serializers) need explicit hints or don't support native. - CGLIB runtime proxies replaced by AOT-generated proxies — most work, some edge cases don't. - No HotSwap, no JMX, no heap dump on running process. - Peak throughput often lower than JIT-warmed JVM — native gives consistent latency, JIT eventually optimizes hot paths further for steady-state throughput.
Right call when: serverless/FaaS, scale-to-zero Kubernetes, CLI tools, or startup time is a hard business requirement. Wrong call for long-running high-throughput services where JIT's warm optimization exceeds native steady-state performance.
mvn -Pnative test must run independently from JVM tests; a JVM test pass does not mean native passes. This doubles test time. Factor the CI cost into the decision. If startup time matters but not enough to justify native, spring.main.lazy-initialization=true is a 5–15s improvement with no compilation overhead.tomcat.threads.busy consistently equals max. hikaricp.connections.pending is non-zero — threads are waiting for DB connections on top of slow queries.server.tomcat.threads.max to 400 and server.tomcat.accept-count to 200. Absorbs spikes but doesn't fix the root cause.logging.level.org.hibernate.SQL=DEBUG temporarily to identify queries that take 500ms+ under load. A missing index is the most common culprit. Add the index and validate with EXPLAIN ANALYZE.@Bulkhead on the DB interaction layer. Limit concurrent calls to 100 (matching pool size). Requests beyond the limit get fast 503s instead of queueing indefinitely — bounded degradation is better than unbounded.tomcat.threads.busy > 150 (75% utilization) with a 60s scale-up window. The 90s scale-out means the bulkhead must absorb the gap.OrderService.createOrder() is @Transactional — it saves the Order entity via JPA, then calls @Async NotificationService.sendConfirmation(order) to email the customer. In production, emails intermittently fail with LazyInitializationException when sendConfirmation accesses order.getCustomer().getEmail().@Transactional on createOrder() closes the Hibernate session when the method returns. The Order entity passed to the async thread is a detached JPA entity. Its @ManyToOne(fetch = LAZY) Customer association cannot load — the session is gone.sendConfirmation to accept SendConfirmationCommand(orderId, customerEmail, customerName). The service layer resolves all needed data inside the transaction before dispatching. The async method receives a plain DTO with no JPA dependency.createOrder() (still inside the transaction), call order.getCustomer().getEmail() explicitly to trigger the load. Hibernate caches the loaded association on the entity — the detached entity carries the value into the async thread.sendConfirmation receives orderId and calls orderRepository.findById(orderId) in a new @Transactional context. Simple and safe, but requires one DB query in the async path.sendConfirmation accesses SecurityContextHolder (e.g., for audit logging), configure DelegatingSecurityContextAsyncTaskExecutor as the @Async executor.spring.jpa.open-in-view=true as the fix — extends session to HTTP request lifecycle but doesn't cover async threads, and causes silent N+1 queries in controllersFetchType.EAGER — loads all associations for every query, causing massive over-fetching across the entire serviceLazyInitializationException and returning null — masks the buginitialDelaySeconds=30 on both liveness and readiness probes. The pod passes liveness but fails readiness, gets restarted repeatedly before becoming ready, and enters a crash loop.spring.jpa.show-sql=true and logging.level.org.springframework=DEBUG briefly. Common culprits: Flyway running 100+ migrations, Hibernate schema validation scanning all tables, Config server handshake adding 5s, @ComponentScan scanning unrelated packages.startupProbe in the K8s deployment spec pointing at /actuator/health/liveness with failureThreshold=30, periodSeconds=5 (150s budget). K8s only starts evaluating readinessProbe after the startup probe passes. This stops the crash loop immediately while you fix startup time properly.spring.main.lazy-initialization=true defers bean creation to first request, saving 5–15s. Caveat: startup failures surface on first request. Compensate with a readiness warm-up request in the deployment pipeline.spring.cloud.config.fail-fast=true and spring.cloud.config.retry.max-attempts=3 with short intervals. A slow config server can hang the startup for 30s+ with default retry behavior.initialDelaySeconds to 90s — delays detection of a genuinely broken pod; K8s won't restart a deadlocked service for 90sspring.jpa.hibernate.ddl-auto=create-drop anywhere near production — drops the entire schema on shutdown