Accelerating Java and Spring Boot Web App Development with AI Coding Agents — Repository Practices for the Claude Code and Codex Era
In 2026, asking how to bring Claude Code, OpenAI Codex, Cursor, and GitHub Copilot into a Spring Boot team is an everyday conversation. The arrival of Spring Boot 4.0 and Java 25 LTS has many teams redrawing their stack, and “while we are at it, can we set this up to be AI-native from day one?” tends to land on the same agenda.
The trap most teams walk into is letting surface-level debates about tool selection or which model to use swallow all the discussion, while the actual repository remains untouched. In practice, tool and model choice combined contribute maybe 10 to 20 percent of the outcome. The rest comes from shaping the codebase so the AI cannot misread it.
This post lays out what to do, in order, when you want to run AI coding agents fast on a Java and Spring Boot web application: design, CI guardrails, Model Context Protocol (MCP) integration, and security. It builds on the ecosystem map I wrote earlier in Famous OSS Web Applications Built with Java and Spring Boot, and applies equally to greenfield projects and modernization of existing codebases.
Why Java and Spring Boot Need Some Setup Before the AI
Java is more agent-friendly than people give it credit for. Annotation-driven programming with @RestController, @Service, @Repository, and @Transactional, the strict type system, and the explicit dependency graph declared by Maven and Gradle all act as readable context for an AI. Spring Boot’s layered conventions (Controller → Service → Repository) give the model a strong prior on “what to write next.”
That said, Java is one of the hardest languages for AI to write securely. Veracode’s research shows that while syntactic correctness from current models clears 95 percent, the security pass rate for Java sits at around 28 percent (the lowest of the surveyed languages). The reason is that AI models still pull in patterns from the training data: SQL built by string concatenation, mixed javax.* and jakarta.* imports, Lombok-era boilerplate. They are deeply baked into the corpus.
So Java plus Spring Boot, left untreated, drifts into “fast but broken” output. With proper setup, it becomes “fast, type-safe, auditable” output. The lever is a three-layer stack: a design the AI can read, a unified set of instruction files the AI reads, and CI guardrails that enforce both mechanically. The rest of this post walks through the three layers.
Pin the Design Up Front — Spring Modulith and Hexagonal
The first decision is versions and architecture. As of May 2026, here is a sensible baseline.
| Item | Recommended | Why |
|---|---|---|
| Spring Boot | 4.0.x (leading edge) or 3.5.x (conservative) | OSS support for 3.5.x ends 2026/06/30. New projects should target 4.0.x; pick 3.5.x only when third-party libs are not yet ready |
| Java | 21 LTS (stable) or 25 LTS (modern) | Virtual Threads and Pattern Matching are stable in 21. 25 is first-class in Spring Boot 4.0 |
| Build | Gradle Kotlin DSL | Type-safe build.gradle.kts works well with IDE completion and prevents AI from inventing broken XML hierarchies |
| JDK vendor | Eclipse Temurin or Amazon Corretto | Free, well-supported, and the default in Codex / Devin sandboxes |
Maven overwhelmingly dominates AI training data and Spring’s official samples, so Codex and Cursor reach for pom.xml by default. If you go with Gradle, you have to write “this project uses Gradle Kotlin DSL; do not create pom.xml” into your AGENTS.md or CLAUDE.md, or you will fight the same correction every session.
For architecture, the current sweet spot is Spring Modulith 2.0 combined with hexagonal layers (domain → application → adapter.in/out) inside each module. A flat layered monolith teaches AI agents to “place code in the right layer” but lets them cross package boundaries the moment a feature grows. Going microservices on day one wastes operational budget. A modular monolith is the middle path: it gives you explicit boundaries for the AI, and you can carve a service out later by extracting a single module.
com.example.app
├── AppApplication.java
├── design/ # 1 module = 1 feature (Bounded Context)
│ ├── api/ # Driving Port (@NamedInterface)
│ ├── domain/ # Spring-free
│ │ ├── model/
│ │ └── service/
│ ├── application/ # Use cases
│ └── adapter/
│ ├── in/web/ # @RestController
│ ├── in/event/ # @ApplicationModuleListener
│ └── out/persistence/ # JPA / jOOQ
├── catalog/
├── workspace/
└── shared/ # Cross-module utility (kept minimal)Direct sub-packages are public; everything else is package-private (Modulith’s default). Cross-module communication goes through Application Events (@ApplicationModuleListener) only, never direct @Autowired of another module’s @Service. With these two rules, the ArchUnit and Modulith verify checks introduced later will mechanically catch any boundary crossing the AI tries to commit.
Keep Per-Agent Instruction Files Unified
Hand-maintaining a separate CLAUDE.md for Claude Code, AGENTS.md for Codex, .github/copilot-instructions.md for Copilot, and .cursor/rules/ for Cursor will drift, and each agent will start operating from different assumptions. Pick one as the source of truth and symlink the others to it, so Spring Boot-specific rules like “use Gradle Kotlin DSL, never create pom.xml” or “never edit existing V*.sql files” read identically from every agent’s perspective.
Add Roles with Claude Code Skills, Subagents, and Slash Commands
Claude Code has four extension points: .claude/skills/ (auto-loaded knowledge packs), .claude/agents/ (specialist agents with isolated context), .claude/commands/ (slash commands), and .claude/hooks/ (lifecycle hooks). Combined with AGENTS.md, they let you encode both repository-wide rules and task-specific lieutenants.
.claude/
├── agents/
│ ├── spring-boot-engineer.md # Implementation
│ ├── test-automator.md # JUnit 5 + Testcontainers
│ ├── security-engineer.md # Spring Security / OWASP (Opus recommended)
│ ├── db-migrator.md # Flyway script specialist
│ └── code-reviewer.md # PR reviewer (Sonnet)
├── skills/
│ ├── spring-boot-core/SKILL.md
│ ├── jpa-patterns/SKILL.md
│ ├── flyway-migrations/SKILL.md
│ ├── spring-security/SKILL.md
│ ├── testcontainers/SKILL.md
│ └── archunit-rules/SKILL.md
├── commands/
│ ├── plan.md
│ ├── tdd.md
│ ├── code-review.md
│ ├── api-design.md
│ └── build-fix.md
└── settings.local.jsonA db-migrator subagent is a good example. The job (adding a Flyway migration) is dangerous if done casually but easy to constrain.
---
name: db-migrator
description: Use when a schema change is needed. Adds a new Flyway V*.sql, never edits existing ones, regenerates jOOQ if used, runs tests.
tools: Read, Grep, Glob, Edit, Write, Bash
model: sonnet
---
You are a careful database migrator for Spring Boot + Flyway + PostgreSQL.
When invoked:
1. Inspect `src/main/resources/db/migration/` to find the latest version number.
2. Create `V{yyyyMMddHHmm}__{snake_case_description}.sql` with idempotent DDL where possible.
3. NEVER modify existing V*.sql files (Flyway checksum will break).
4. Run `./gradlew flywayMigrate -Pdev` and `./gradlew test`.
5. If using jOOQ, run `./gradlew generateJooq`.
6. Summarize the change with rollback notes for the human reviewer.The pleasant part of this design is that the main Claude Code session never has to be told “do not touch the database.” Database mutations only happen when you delegate to db-migrator. Skills, on the other hand, hold conventions like “to avoid the JPA N+1 problem on @OneToMany, use JOIN FETCH or @EntityGraph,” which Claude auto-loads as needed.
For slash commands, five cover most of the ground: /plan (drafting an implementation plan), /tdd (red-green-refactor loop), /code-review, /api-design, and /build-fix. Bundle them as a Claude Code plugin and you can ship the same workflow across the team.
Bake the Guardrails into CI — Spotless, ArchUnit, Modulith verify
The faster you let AI write code, the louder your CI quality gate has to be. The ideal is that the moment AI crosses a layer, reverses a dependency direction, or sneaks in a System.out.println, ./gradlew check immediately fails.
The minimum tool set looks like this.
| Purpose | Tool | Why |
|---|---|---|
| Formatting | Spotless + Spring Java Format | Same style as Spring’s official codebase, applied automatically. Editor-independent, AI-output friendly |
| Lint | Checkstyle (minimal rules) | Spotless covers most of it; keep Checkstyle for naming and Javadoc |
| Bug detection | Error Prone + NullAway | Catch NPE and API misuse at compile time |
| Architecture | ArchUnit + Spring Modulith verify | CI fails the moment AI crosses a layer. The most important one |
| Coverage | JaCoCo | Enforce 80% line / 70% branch in CI |
| SAST | CodeQL + Semgrep | Combining the two raises detection coverage on OWASP-style issues |
| Dependency scanning | OWASP Dependency-Check + Trivy | NVD coverage plus filesystem and container scanning |
| Secret scanning | Gitleaks + GitHub Push Protection | Catch .env and API key leaks before push |
ArchUnit rules look like declarative tests you can ship to CI. With these in place, Java code that calls a repository directly from a controller or annotates a domain class with @Entity triggers a build failure on the next push.
@AnalyzeClasses(packages = "com.example.app", importOptions = DoNotIncludeTests.class)
class ArchitectureTest {
@ArchTest
static final ArchRule layered = layeredArchitecture().consideringAllDependencies()
.layer("Domain").definedBy("..domain..")
.layer("Application").definedBy("..application..")
.layer("AdapterIn").definedBy("..adapter.in..")
.layer("AdapterOut").definedBy("..adapter.out..")
.whereLayer("Application").mayOnlyBeAccessedByLayers("AdapterIn")
.whereLayer("Domain").mayOnlyBeAccessedByLayers("Application", "AdapterOut");
@ArchTest
static final ArchRule domainIsFrameworkFree = noClasses()
.that().resideInAPackage("..domain..")
.should().dependOnClassesThat().resideInAnyPackage(
"org.springframework..", "jakarta.persistence..", "com.fasterxml.jackson..");
@ArchTest
static final ArchRule controllersDontCallRepositories = noClasses()
.that().resideInAPackage("..adapter.in.web..")
.should().dependOnClassesThat().resideInAPackage("..adapter.out.persistence..");
@Test
void modulesAreVerified() {
ApplicationModules.of(AppApplication.class).verify();
}
}For pre-commit, Lefthook is light (Go binary, cross-platform) and pairs well with Java backend projects.
# lefthook.yml
pre-commit:
parallel: true
commands:
spotless:
glob: "*.java"
run: ./gradlew spotlessApply
stage_fixed: true
no-system-out:
glob: "src/main/java/**/*.java"
run: |
if grep -rn "System\.out\.println" {staged_files}; then
echo "System.out.println is forbidden. Use SLF4J Logger."; exit 1
fi
pre-push:
commands:
test:
run: ./gradlew checkIn AGENTS.md, set the contract explicitly: “definition of done is compiled, tests pass, static analysis passes, dependency scan passes; the verification command is ./gradlew check.” When the verification command is single, Claude Code and Codex can decide for themselves when a task is finished, which removes a huge chunk of human handoff.
Let AI Write Tests with Testcontainers and TDD
The strongest insurance for AI-generated code is the combination of test-driven development and Testcontainers. Spring Boot 3.1 and later ship @ServiceConnection, which lets you spin up real PostgreSQL or Redis containers in tests without ever touching configuration files.
@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
@Bean
@ServiceConnection
PostgreSQLContainer<?> postgresContainer() {
return new PostgreSQLContainer<>("postgres:16-alpine").withReuse(true);
}
@Bean
@ServiceConnection(name = "redis")
GenericContainer<?> redisContainer() {
return new GenericContainer<>("redis:7-alpine")
.withExposedPorts(6379)
.withReuse(true);
}
}
// Use Testcontainers for local development too (src/test/java)
public class TestAppApplication {
public static void main(String[] args) {
SpringApplication.from(AppApplication::main)
.with(TestcontainersConfiguration.class)
.run(args);
}
}Run ./gradlew bootTestRun and AI agents stop hitting the “no local DB” wall. Set testcontainers.reuse.enable=true in .testcontainers.properties and the second iteration onwards is fast enough that the perceived feedback loop changes character entirely.
For test granularity: Web layer with @WebMvcTest plus MockMvc, repository layer with @DataJpaTest plus Testcontainers, integration tests with @SpringBootTest plus @Import(TestcontainersConfiguration.class). Be explicit in AGENTS.md: “do not use @SpringBootTest for controller tests; use the web slice.” This single line stops AI from inflating CI time by reaching for the heaviest test type by default.
When running TDD with Claude Code or Codex, package the loop into a /tdd slash command so red-green-refactor is one invocation. Claude Code in particular is good at reading mvn test or ./gradlew test output, parsing stack traces, and self-correcting until tests pass; if your tests are right, the implementation quality from the AI is usually fine. Parasoft has documented an AI-driven TDD case study where unit test creation became 100% faster and coverage on the most business-critical microservices climbed from 20 to 85 percent within weeks.
Tighten the Feedback Loop — Gradle Build Cache and Spring Boot DevTools
Java plus Spring Boot has a longer AI feedback loop than TypeScript or Python, languages where AI has abundant training data and is at its strongest. JVM startup, Spring bean initialization, and Testcontainers container boot are unavoidable overheads. To run AI coding agents at speed, you have to engineer this loop down from day one. Otherwise, every time Claude Code or Codex waits 3-5 minutes for ./gradlew test, idle time accumulates without supervision and the experience stops feeling fast.
Use the Gradle Cache and Daemon to the Fullest
Drop the following into gradle.properties.
org.gradle.daemon=true
org.gradle.parallel=true
org.gradle.caching=true
org.gradle.configuration-cache=true
org.gradle.jvmargs=-Xmx4g -XX:+UseG1GCdaemon: keep Gradle resident so JVM startup is reused across invocationsparallel: build sub-projects in a multi-module setup in parallelcaching: build cache. Reuse outputs from any previously executed taskconfiguration-cache: cache the parsed build configuration. Second-run startup drops to seconds
For larger teams, run a shared build cache server like Develocity or an OSS Build Cache node so CI and developer machines share artifacts. First-run builds also speed up several-fold.
Put AI on Continuous Build
Run ./gradlew test --continuous (short form -t) in a separate terminal. Gradle watches for file changes and re-runs only the affected tasks. The moment an AI agent edits and saves a file, tests are running in the background, so the perceived feedback latency is essentially zero. A PostToolUse Hook in Claude Code that runs only the test for the edited file works as an alternative, but Continuous Build is simpler and harder to break.
Spring Boot DevTools and bootTestRun
Spring Boot DevTools added as developmentOnly watches the classpath and performs an automatic restart with a separate classloader. That is 5-10× faster than a cold start, so changes to @RestController responses or @Service logic land in seconds.
dependencies {
developmentOnly("org.springframework.boot:spring-boot-devtools")
}Combine ./gradlew bootTestRun with Testcontainers @ServiceConnection (covered earlier) and the local DB also pays its startup cost only once. By the time the AI says “let me try it,” the endpoint is already updated.
Push Tests Toward the Cheap End
The single biggest lever, in practice, is making tests physically cheap. @SpringBootTest boots the entire application context (5-10 seconds per test), while @WebMvcTest boots only the web layer and @DataJpaTest only the JPA layer (under 0.5 seconds per test).
| Test type | Annotation | Per-test time |
|---|---|---|
| Pure unit | @ExtendWith(MockitoExtension.class) | < 0.1 s |
| Web slice | @WebMvcTest | 0.3-0.8 s |
| JPA slice | @DataJpaTest | 0.5-1.5 s |
| Integration | @SpringBootTest | 5-15 s |
Narrow execution with ./gradlew test --tests "com.example.app.design.*", parallelize with maxParallelForks, and push heavy JaCoCo measurement to CI only. The local feedback loop will not match TypeScript, but it gets close enough to feel useful.
tasks.test {
useJUnitPlatform()
maxParallelForks = (Runtime.getRuntime().availableProcessors() / 2).coerceAtLeast(1)
}Manage AI-Side Timeouts
Tightening the loop only helps if the AI does not sit blocked. Claude Code and Codex have default shell timeouts, so wrap commands explicitly with timeout 60s ./gradlew test --tests TargetTest, or run Continuous Build in a separate terminal and have the AI just read the result report at build/reports/tests/test/index.html. That keeps “the conversation stalls because the build is running” from happening.
Hand the Running App to AI through Spring AI MCP
AI coding agents do a lot just by reading source code, but giving them visibility into the running application’s state pushes debugging accuracy up another level. That is what MCP enables, and Spring AI 1.1 (GA November 2025) ships spring-ai-starter-mcp-server-webmvc, which turns your Spring Boot application itself into an MCP server.
@Service
public class SpecTools {
private final SpecQueryService specQueryService;
public SpecTools(SpecQueryService specQueryService) {
this.specQueryService = specQueryService;
}
@McpTool(description = "List design specs for a project")
public List<SpecSummary> listSpecs(
@McpToolParam(description = "Project ID") String projectId
) {
return specQueryService.list(projectId);
}
}Any Java method tagged with @McpTool becomes a tool the AI can call from natural language. With this in place, “investigate why recent user signups are failing and fix it” goes from a vague prompt to an actionable task: the AI queries the running database error log via MCP, identifies the validation error, refactors the source, and runs tests, all in one sitting. To put OAuth2 in front of the MCP endpoint, add the Spring AI community’s org.springaicommunity:mcp-server-security-spring-boot (v0.1.8 as of April 2026 with a Boot starter Quick Start in place, though not yet officially folded into Spring AI proper) and configure spring.security.oauth2.resourceserver.jwt.issuer-uri. Streamable HTTP, Resource Indicators, and Dynamic Client Registration all come along automatically.
The cardinal rule with MCP is: never hand AI production credentials. The only thing AI should physically touch is a Devcontainer-internal Testcontainers stack or a read-only staging replica. Anything else is a hazard you will regret.
Ship AI-Generated Code Safely — Close the Java/Spring-Specific Gaps
As mentioned earlier, Java is one of the hardest languages for AI to produce secure code in. The reasons are structural: 30 years of training data carrying old unsafe patterns, the javax / jakarta split, and Java-specific vulnerability classes around deserialization, XXE, and SpEL. CI and review should focus on the patterns specific to Java and Spring, not the generic OWASP list.
| Vulnerability | Common AI mistake | Why it is Java/Spring-specific |
|---|---|---|
| CWE-502 (Insecure Deserialization) | ObjectInputStream.readObject() on untrusted input | Java’s native serialization has been an RCE breeding ground for a decade. Apache Commons Collections and Spring4Shell are descendants |
| CWE-611 (XXE) | Calling DocumentBuilderFactory.newInstance() as-is | Java’s standard XML parsers resolve external entities by default and need an explicit disallow-doctype-decl to be safe |
| SpEL Injection | User input flowing into @Value("#{...}") or @PreAuthorize("...") | Spring Expression Language evaluates strings dynamically. Spring4Shell (CVE-2022-22965) is in this family |
| CWE-117 (Log Injection) | logger.info("input: " + userInput) style unsanitized logs | The legacy of Log4Shell (CVE-2021-44228) |
| Loose Spring Security configuration | http.csrf().disable(), permitAll(), @CrossOrigin(origins = "*") | Tutorials are full of “just make it work” snippets that AI faithfully reproduces |
| JPA raw-SQL concatenation | @Query("SELECT ... WHERE name = " + name) | AI proposes string concatenation instead of derived queries or ?1 placeholders |
To catch these mechanically in CI, layer Java/Spring-specific rule packs on top of generic SAST. The CodeQL Java/Kotlin query pack covers SpEL injection, Spring Security bypass, and JPA query injection, and the Semgrep Spring ruleset catches csrf().disable() and permitAll() immediately. Adding SpotBugs Find Security Bugs gives you bytecode-level detection of XXE and SQL injection. OWASP Dependency-Check and Trivy then guard the Java-specific supply chain, blocking versions that still carry known CVEs like Log4j.
The other Java + Spring Boot-specific defense worth setting up is physical credential isolation via @Profile and application-{profile}.yml. AI agents only ever see application-test.yml (pointing at Testcontainers-backed internal DBs) and application-dev.yml. application-prod.yml lives outside the AI sandbox entirely.
# application-test.yml — safe for AI to touch
spring:
datasource:
url: jdbc:tc:postgresql:16-alpine:///test
# application-prod.yml — physically out of the AI's reach
spring:
datasource:
url: ${DATABASE_URL}
username: ${DATABASE_USERNAME}
password: ${DATABASE_PASSWORD}Hide application-prod* via Devcontainer setup, .gitignore, and .cursorignore, and only ever boot with --spring.profiles.active=test or dev. State explicitly in AGENTS.md: “do not edit anything under @Profile("prod")” and “production values are read only via environment variables.”
Wrapping Up
Accelerating Java and Spring Boot web app development with AI coding agents is not really about choosing the right model or tool. It is about three layers stacked from day one: a design AI can read (Spring Modulith plus hexagonal), a unified set of agent instruction files, and guardrails that physically restrain AI (ArchUnit, Modulith verify, Spotless, CI gates). That stack is where the bulk of the speedup comes from.
On top of it, secure quality with Testcontainers and TDD, and hand the running app to AI through Spring AI 1.1’s MCP Server. Once that is in place, AI agents stop being assistants and start behaving like autonomous teammates that hold the repository’s context in their head.
The role of an engineer is shifting from writing all the code yourself to designing the environment in which AI keeps writing high-quality code. Java and Spring Boot, with their explicit conventions and strong type system, are among the best-suited combinations for taking that shift on.
That is the lay of the land for accelerating Java and Spring Boot web app development with AI coding agents.