Accelerating Java and Spring Boot Web App Development with AI Coding Agents — Repository Practices for the Claude Code and Codex Era

Tadashi Shigeoka ·  Wed, January 28, 2026

In 2026, asking how to bring Claude Code, OpenAI Codex, Cursor, and GitHub Copilot into a Spring Boot team is an everyday conversation. The arrival of Spring Boot 4.0 and Java 25 LTS has many teams redrawing their stack, and “while we are at it, can we set this up to be AI-native from day one?” tends to land on the same agenda.

The trap most teams walk into is letting surface-level debates about tool selection or which model to use swallow all the discussion, while the actual repository remains untouched. In practice, tool and model choice combined contribute maybe 10 to 20 percent of the outcome. The rest comes from shaping the codebase so the AI cannot misread it.

This post lays out what to do, in order, when you want to run AI coding agents fast on a Java and Spring Boot web application: design, CI guardrails, Model Context Protocol (MCP) integration, and security. It builds on the ecosystem map I wrote earlier in Famous OSS Web Applications Built with Java and Spring Boot, and applies equally to greenfield projects and modernization of existing codebases.

Why Java and Spring Boot Need Some Setup Before the AI

Java is more agent-friendly than people give it credit for. Annotation-driven programming with @RestController, @Service, @Repository, and @Transactional, the strict type system, and the explicit dependency graph declared by Maven and Gradle all act as readable context for an AI. Spring Boot’s layered conventions (Controller → Service → Repository) give the model a strong prior on “what to write next.”

That said, Java is one of the hardest languages for AI to write securely. Veracode’s research shows that while syntactic correctness from current models clears 95 percent, the security pass rate for Java sits at around 28 percent (the lowest of the surveyed languages). The reason is that AI models still pull in patterns from the training data: SQL built by string concatenation, mixed javax.* and jakarta.* imports, Lombok-era boilerplate. They are deeply baked into the corpus.

So Java plus Spring Boot, left untreated, drifts into “fast but broken” output. With proper setup, it becomes “fast, type-safe, auditable” output. The lever is a three-layer stack: a design the AI can read, a unified set of instruction files the AI reads, and CI guardrails that enforce both mechanically. The rest of this post walks through the three layers.

Pin the Design Up Front — Spring Modulith and Hexagonal

The first decision is versions and architecture. As of May 2026, here is a sensible baseline.

ItemRecommendedWhy
Spring Boot4.0.x (leading edge) or 3.5.x (conservative)OSS support for 3.5.x ends 2026/06/30. New projects should target 4.0.x; pick 3.5.x only when third-party libs are not yet ready
Java21 LTS (stable) or 25 LTS (modern)Virtual Threads and Pattern Matching are stable in 21. 25 is first-class in Spring Boot 4.0
BuildGradle Kotlin DSLType-safe build.gradle.kts works well with IDE completion and prevents AI from inventing broken XML hierarchies
JDK vendorEclipse Temurin or Amazon CorrettoFree, well-supported, and the default in Codex / Devin sandboxes

Maven overwhelmingly dominates AI training data and Spring’s official samples, so Codex and Cursor reach for pom.xml by default. If you go with Gradle, you have to write “this project uses Gradle Kotlin DSL; do not create pom.xml” into your AGENTS.md or CLAUDE.md, or you will fight the same correction every session.

For architecture, the current sweet spot is Spring Modulith 2.0 combined with hexagonal layers (domain → application → adapter.in/out) inside each module. A flat layered monolith teaches AI agents to “place code in the right layer” but lets them cross package boundaries the moment a feature grows. Going microservices on day one wastes operational budget. A modular monolith is the middle path: it gives you explicit boundaries for the AI, and you can carve a service out later by extracting a single module.

com.example.app
├── AppApplication.java
├── design/                       # 1 module = 1 feature (Bounded Context)
   ├── api/                      # Driving Port (@NamedInterface)
   ├── domain/                   # Spring-free
   ├── model/
   └── service/
   ├── application/              # Use cases
   └── adapter/
       ├── in/web/               # @RestController
       ├── in/event/             # @ApplicationModuleListener
       └── out/persistence/      # JPA / jOOQ
├── catalog/
├── workspace/
└── shared/                       # Cross-module utility (kept minimal)

Direct sub-packages are public; everything else is package-private (Modulith’s default). Cross-module communication goes through Application Events (@ApplicationModuleListener) only, never direct @Autowired of another module’s @Service. With these two rules, the ArchUnit and Modulith verify checks introduced later will mechanically catch any boundary crossing the AI tries to commit.

Keep Per-Agent Instruction Files Unified

Hand-maintaining a separate CLAUDE.md for Claude Code, AGENTS.md for Codex, .github/copilot-instructions.md for Copilot, and .cursor/rules/ for Cursor will drift, and each agent will start operating from different assumptions. Pick one as the source of truth and symlink the others to it, so Spring Boot-specific rules like “use Gradle Kotlin DSL, never create pom.xml” or “never edit existing V*.sql files” read identically from every agent’s perspective.

Add Roles with Claude Code Skills, Subagents, and Slash Commands

Claude Code has four extension points: .claude/skills/ (auto-loaded knowledge packs), .claude/agents/ (specialist agents with isolated context), .claude/commands/ (slash commands), and .claude/hooks/ (lifecycle hooks). Combined with AGENTS.md, they let you encode both repository-wide rules and task-specific lieutenants.

.claude/
├── agents/
   ├── spring-boot-engineer.md    # Implementation
   ├── test-automator.md          # JUnit 5 + Testcontainers
   ├── security-engineer.md       # Spring Security / OWASP (Opus recommended)
   ├── db-migrator.md             # Flyway script specialist
   └── code-reviewer.md           # PR reviewer (Sonnet)
├── skills/
   ├── spring-boot-core/SKILL.md
   ├── jpa-patterns/SKILL.md
   ├── flyway-migrations/SKILL.md
   ├── spring-security/SKILL.md
   ├── testcontainers/SKILL.md
   └── archunit-rules/SKILL.md
├── commands/
   ├── plan.md
   ├── tdd.md
   ├── code-review.md
   ├── api-design.md
   └── build-fix.md
└── settings.local.json

A db-migrator subagent is a good example. The job (adding a Flyway migration) is dangerous if done casually but easy to constrain.

---
name: db-migrator
description: Use when a schema change is needed. Adds a new Flyway V*.sql, never edits existing ones, regenerates jOOQ if used, runs tests.
tools: Read, Grep, Glob, Edit, Write, Bash
model: sonnet
---
You are a careful database migrator for Spring Boot + Flyway + PostgreSQL.
 
When invoked:
1. Inspect `src/main/resources/db/migration/` to find the latest version number.
2. Create `V{yyyyMMddHHmm}__{snake_case_description}.sql` with idempotent DDL where possible.
3. NEVER modify existing V*.sql files (Flyway checksum will break).
4. Run `./gradlew flywayMigrate -Pdev` and `./gradlew test`.
5. If using jOOQ, run `./gradlew generateJooq`.
6. Summarize the change with rollback notes for the human reviewer.

The pleasant part of this design is that the main Claude Code session never has to be told “do not touch the database.” Database mutations only happen when you delegate to db-migrator. Skills, on the other hand, hold conventions like “to avoid the JPA N+1 problem on @OneToMany, use JOIN FETCH or @EntityGraph,” which Claude auto-loads as needed.

For slash commands, five cover most of the ground: /plan (drafting an implementation plan), /tdd (red-green-refactor loop), /code-review, /api-design, and /build-fix. Bundle them as a Claude Code plugin and you can ship the same workflow across the team.

Bake the Guardrails into CI — Spotless, ArchUnit, Modulith verify

The faster you let AI write code, the louder your CI quality gate has to be. The ideal is that the moment AI crosses a layer, reverses a dependency direction, or sneaks in a System.out.println, ./gradlew check immediately fails.

The minimum tool set looks like this.

PurposeToolWhy
FormattingSpotless + Spring Java FormatSame style as Spring’s official codebase, applied automatically. Editor-independent, AI-output friendly
LintCheckstyle (minimal rules)Spotless covers most of it; keep Checkstyle for naming and Javadoc
Bug detectionError Prone + NullAwayCatch NPE and API misuse at compile time
ArchitectureArchUnit + Spring Modulith verifyCI fails the moment AI crosses a layer. The most important one
CoverageJaCoCoEnforce 80% line / 70% branch in CI
SASTCodeQL + SemgrepCombining the two raises detection coverage on OWASP-style issues
Dependency scanningOWASP Dependency-Check + TrivyNVD coverage plus filesystem and container scanning
Secret scanningGitleaks + GitHub Push ProtectionCatch .env and API key leaks before push

ArchUnit rules look like declarative tests you can ship to CI. With these in place, Java code that calls a repository directly from a controller or annotates a domain class with @Entity triggers a build failure on the next push.

@AnalyzeClasses(packages = "com.example.app", importOptions = DoNotIncludeTests.class)
class ArchitectureTest {
 
    @ArchTest
    static final ArchRule layered = layeredArchitecture().consideringAllDependencies()
        .layer("Domain").definedBy("..domain..")
        .layer("Application").definedBy("..application..")
        .layer("AdapterIn").definedBy("..adapter.in..")
        .layer("AdapterOut").definedBy("..adapter.out..")
        .whereLayer("Application").mayOnlyBeAccessedByLayers("AdapterIn")
        .whereLayer("Domain").mayOnlyBeAccessedByLayers("Application", "AdapterOut");
 
    @ArchTest
    static final ArchRule domainIsFrameworkFree = noClasses()
        .that().resideInAPackage("..domain..")
        .should().dependOnClassesThat().resideInAnyPackage(
            "org.springframework..", "jakarta.persistence..", "com.fasterxml.jackson..");
 
    @ArchTest
    static final ArchRule controllersDontCallRepositories = noClasses()
        .that().resideInAPackage("..adapter.in.web..")
        .should().dependOnClassesThat().resideInAPackage("..adapter.out.persistence..");
 
    @Test
    void modulesAreVerified() {
        ApplicationModules.of(AppApplication.class).verify();
    }
}

For pre-commit, Lefthook is light (Go binary, cross-platform) and pairs well with Java backend projects.

# lefthook.yml
pre-commit:
  parallel: true
  commands:
    spotless:
      glob: "*.java"
      run: ./gradlew spotlessApply
      stage_fixed: true
    no-system-out:
      glob: "src/main/java/**/*.java"
      run: |
        if grep -rn "System\.out\.println" {staged_files}; then
          echo "System.out.println is forbidden. Use SLF4J Logger."; exit 1
        fi
pre-push:
  commands:
    test:
      run: ./gradlew check

In AGENTS.md, set the contract explicitly: “definition of done is compiled, tests pass, static analysis passes, dependency scan passes; the verification command is ./gradlew check.” When the verification command is single, Claude Code and Codex can decide for themselves when a task is finished, which removes a huge chunk of human handoff.

Let AI Write Tests with Testcontainers and TDD

The strongest insurance for AI-generated code is the combination of test-driven development and Testcontainers. Spring Boot 3.1 and later ship @ServiceConnection, which lets you spin up real PostgreSQL or Redis containers in tests without ever touching configuration files.

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
 
    @Bean
    @ServiceConnection
    PostgreSQLContainer<?> postgresContainer() {
        return new PostgreSQLContainer<>("postgres:16-alpine").withReuse(true);
    }
 
    @Bean
    @ServiceConnection(name = "redis")
    GenericContainer<?> redisContainer() {
        return new GenericContainer<>("redis:7-alpine")
            .withExposedPorts(6379)
            .withReuse(true);
    }
}
 
// Use Testcontainers for local development too (src/test/java)
public class TestAppApplication {
    public static void main(String[] args) {
        SpringApplication.from(AppApplication::main)
            .with(TestcontainersConfiguration.class)
            .run(args);
    }
}

Run ./gradlew bootTestRun and AI agents stop hitting the “no local DB” wall. Set testcontainers.reuse.enable=true in .testcontainers.properties and the second iteration onwards is fast enough that the perceived feedback loop changes character entirely.

For test granularity: Web layer with @WebMvcTest plus MockMvc, repository layer with @DataJpaTest plus Testcontainers, integration tests with @SpringBootTest plus @Import(TestcontainersConfiguration.class). Be explicit in AGENTS.md: “do not use @SpringBootTest for controller tests; use the web slice.” This single line stops AI from inflating CI time by reaching for the heaviest test type by default.

When running TDD with Claude Code or Codex, package the loop into a /tdd slash command so red-green-refactor is one invocation. Claude Code in particular is good at reading mvn test or ./gradlew test output, parsing stack traces, and self-correcting until tests pass; if your tests are right, the implementation quality from the AI is usually fine. Parasoft has documented an AI-driven TDD case study where unit test creation became 100% faster and coverage on the most business-critical microservices climbed from 20 to 85 percent within weeks.

Tighten the Feedback Loop — Gradle Build Cache and Spring Boot DevTools

Java plus Spring Boot has a longer AI feedback loop than TypeScript or Python, languages where AI has abundant training data and is at its strongest. JVM startup, Spring bean initialization, and Testcontainers container boot are unavoidable overheads. To run AI coding agents at speed, you have to engineer this loop down from day one. Otherwise, every time Claude Code or Codex waits 3-5 minutes for ./gradlew test, idle time accumulates without supervision and the experience stops feeling fast.

Use the Gradle Cache and Daemon to the Fullest

Drop the following into gradle.properties.

org.gradle.daemon=true
org.gradle.parallel=true
org.gradle.caching=true
org.gradle.configuration-cache=true
org.gradle.jvmargs=-Xmx4g -XX:+UseG1GC
  • daemon: keep Gradle resident so JVM startup is reused across invocations
  • parallel: build sub-projects in a multi-module setup in parallel
  • caching: build cache. Reuse outputs from any previously executed task
  • configuration-cache: cache the parsed build configuration. Second-run startup drops to seconds

For larger teams, run a shared build cache server like Develocity or an OSS Build Cache node so CI and developer machines share artifacts. First-run builds also speed up several-fold.

Put AI on Continuous Build

Run ./gradlew test --continuous (short form -t) in a separate terminal. Gradle watches for file changes and re-runs only the affected tasks. The moment an AI agent edits and saves a file, tests are running in the background, so the perceived feedback latency is essentially zero. A PostToolUse Hook in Claude Code that runs only the test for the edited file works as an alternative, but Continuous Build is simpler and harder to break.

Spring Boot DevTools and bootTestRun

Spring Boot DevTools added as developmentOnly watches the classpath and performs an automatic restart with a separate classloader. That is 5-10× faster than a cold start, so changes to @RestController responses or @Service logic land in seconds.

dependencies {
    developmentOnly("org.springframework.boot:spring-boot-devtools")
}

Combine ./gradlew bootTestRun with Testcontainers @ServiceConnection (covered earlier) and the local DB also pays its startup cost only once. By the time the AI says “let me try it,” the endpoint is already updated.

Push Tests Toward the Cheap End

The single biggest lever, in practice, is making tests physically cheap. @SpringBootTest boots the entire application context (5-10 seconds per test), while @WebMvcTest boots only the web layer and @DataJpaTest only the JPA layer (under 0.5 seconds per test).

Test typeAnnotationPer-test time
Pure unit@ExtendWith(MockitoExtension.class)< 0.1 s
Web slice@WebMvcTest0.3-0.8 s
JPA slice@DataJpaTest0.5-1.5 s
Integration@SpringBootTest5-15 s

Narrow execution with ./gradlew test --tests "com.example.app.design.*", parallelize with maxParallelForks, and push heavy JaCoCo measurement to CI only. The local feedback loop will not match TypeScript, but it gets close enough to feel useful.

tasks.test {
    useJUnitPlatform()
    maxParallelForks = (Runtime.getRuntime().availableProcessors() / 2).coerceAtLeast(1)
}

Manage AI-Side Timeouts

Tightening the loop only helps if the AI does not sit blocked. Claude Code and Codex have default shell timeouts, so wrap commands explicitly with timeout 60s ./gradlew test --tests TargetTest, or run Continuous Build in a separate terminal and have the AI just read the result report at build/reports/tests/test/index.html. That keeps “the conversation stalls because the build is running” from happening.

Hand the Running App to AI through Spring AI MCP

AI coding agents do a lot just by reading source code, but giving them visibility into the running application’s state pushes debugging accuracy up another level. That is what MCP enables, and Spring AI 1.1 (GA November 2025) ships spring-ai-starter-mcp-server-webmvc, which turns your Spring Boot application itself into an MCP server.

@Service
public class SpecTools {
 
    private final SpecQueryService specQueryService;
 
    public SpecTools(SpecQueryService specQueryService) {
        this.specQueryService = specQueryService;
    }
 
    @McpTool(description = "List design specs for a project")
    public List<SpecSummary> listSpecs(
        @McpToolParam(description = "Project ID") String projectId
    ) {
        return specQueryService.list(projectId);
    }
}

Any Java method tagged with @McpTool becomes a tool the AI can call from natural language. With this in place, “investigate why recent user signups are failing and fix it” goes from a vague prompt to an actionable task: the AI queries the running database error log via MCP, identifies the validation error, refactors the source, and runs tests, all in one sitting. To put OAuth2 in front of the MCP endpoint, add the Spring AI community’s org.springaicommunity:mcp-server-security-spring-boot (v0.1.8 as of April 2026 with a Boot starter Quick Start in place, though not yet officially folded into Spring AI proper) and configure spring.security.oauth2.resourceserver.jwt.issuer-uri. Streamable HTTP, Resource Indicators, and Dynamic Client Registration all come along automatically.

The cardinal rule with MCP is: never hand AI production credentials. The only thing AI should physically touch is a Devcontainer-internal Testcontainers stack or a read-only staging replica. Anything else is a hazard you will regret.

Ship AI-Generated Code Safely — Close the Java/Spring-Specific Gaps

As mentioned earlier, Java is one of the hardest languages for AI to produce secure code in. The reasons are structural: 30 years of training data carrying old unsafe patterns, the javax / jakarta split, and Java-specific vulnerability classes around deserialization, XXE, and SpEL. CI and review should focus on the patterns specific to Java and Spring, not the generic OWASP list.

VulnerabilityCommon AI mistakeWhy it is Java/Spring-specific
CWE-502 (Insecure Deserialization)ObjectInputStream.readObject() on untrusted inputJava’s native serialization has been an RCE breeding ground for a decade. Apache Commons Collections and Spring4Shell are descendants
CWE-611 (XXE)Calling DocumentBuilderFactory.newInstance() as-isJava’s standard XML parsers resolve external entities by default and need an explicit disallow-doctype-decl to be safe
SpEL InjectionUser input flowing into @Value("#{...}") or @PreAuthorize("...")Spring Expression Language evaluates strings dynamically. Spring4Shell (CVE-2022-22965) is in this family
CWE-117 (Log Injection)logger.info("input: " + userInput) style unsanitized logsThe legacy of Log4Shell (CVE-2021-44228)
Loose Spring Security configurationhttp.csrf().disable(), permitAll(), @CrossOrigin(origins = "*")Tutorials are full of “just make it work” snippets that AI faithfully reproduces
JPA raw-SQL concatenation@Query("SELECT ... WHERE name = " + name)AI proposes string concatenation instead of derived queries or ?1 placeholders

To catch these mechanically in CI, layer Java/Spring-specific rule packs on top of generic SAST. The CodeQL Java/Kotlin query pack covers SpEL injection, Spring Security bypass, and JPA query injection, and the Semgrep Spring ruleset catches csrf().disable() and permitAll() immediately. Adding SpotBugs Find Security Bugs gives you bytecode-level detection of XXE and SQL injection. OWASP Dependency-Check and Trivy then guard the Java-specific supply chain, blocking versions that still carry known CVEs like Log4j.

The other Java + Spring Boot-specific defense worth setting up is physical credential isolation via @Profile and application-{profile}.yml. AI agents only ever see application-test.yml (pointing at Testcontainers-backed internal DBs) and application-dev.yml. application-prod.yml lives outside the AI sandbox entirely.

# application-test.yml — safe for AI to touch
spring:
  datasource:
    url: jdbc:tc:postgresql:16-alpine:///test
 
# application-prod.yml — physically out of the AI's reach
spring:
  datasource:
    url: ${DATABASE_URL}
    username: ${DATABASE_USERNAME}
    password: ${DATABASE_PASSWORD}

Hide application-prod* via Devcontainer setup, .gitignore, and .cursorignore, and only ever boot with --spring.profiles.active=test or dev. State explicitly in AGENTS.md: “do not edit anything under @Profile("prod")” and “production values are read only via environment variables.”

Wrapping Up

Accelerating Java and Spring Boot web app development with AI coding agents is not really about choosing the right model or tool. It is about three layers stacked from day one: a design AI can read (Spring Modulith plus hexagonal), a unified set of agent instruction files, and guardrails that physically restrain AI (ArchUnit, Modulith verify, Spotless, CI gates). That stack is where the bulk of the speedup comes from.

On top of it, secure quality with Testcontainers and TDD, and hand the running app to AI through Spring AI 1.1’s MCP Server. Once that is in place, AI agents stop being assistants and start behaving like autonomous teammates that hold the repository’s context in their head.

The role of an engineer is shifting from writing all the code yourself to designing the environment in which AI keeps writing high-quality code. Java and Spring Boot, with their explicit conventions and strong type system, are among the best-suited combinations for taking that shift on.

That is the lay of the land for accelerating Java and Spring Boot web app development with AI coding agents.

References