Local LLM Development by Japanese Companies: A Comprehensive Survey of Domestic AI Models
Japanese companies have developed over 30 major LLM (Large Language Model) variants as of March 2026, forming a substantial ecosystem. From NTT’s fully scratch-built “tsuzumi” to PFN’s SSM-hybrid “PLaMo,” the approaches are diverse. This article provides a comprehensive overview of these models’ technical characteristics, industry-specific deployments, cost structures, and future outlook.
Why “Local LLM” Now?
Three key factors drive Japanese companies toward local LLMs:
| Factor | Details |
|---|---|
| Data Sovereignty & Security | Strict enforcement of Japan’s Act on Protection of Personal Information (APPI) in finance, healthcare, and manufacturing requires eliminating the risk of input data being used for external model training |
| Cost Structure | Pay-per-use cloud API pricing can reach tens of millions of yen per month at enterprise scale; on-premises becomes cheaper within approximately 3 years |
| Japanese Language Optimization | Global models inadequately handle Japanese-specific features like honorifics, subject omission, and domain terminology |
The government supports domestic LLM development through METI’s GENIAC (Generative AI Accelerator Challenge) project and announced a 1 trillion yen (approximately $7 billion) investment in AI and semiconductors over 5 years in December 2025.
Major Domestic LLM Models at a Glance
| Developer | Model | Released | Parameters | Approach | License | Key Strength | URL |
|---|---|---|---|---|---|---|---|
| Rakuten | Rakuten AI 3.0 | 2026-03-17 | ~700B (MoE) | MoE (Mistral-based) | Apache 2.0 | Largest open-weight domestic model | HuggingFace |
| NEC | cotomi v3 | 2026-03-09 | ~13B | Proprietary architecture | Commercial | 10x faster inference than GPT-4, AI agent capability | Official |
| PFN | PLaMo 2.2 Prime | 2026-01 | 31B | Scratch-built (SSM+SWA) | PLaMo Community | GPT-5.1 equivalent on JFBench, 150+ municipalities | HuggingFace |
| NTT | tsuzumi 2 | 2025-10-20 | 30B | Scratch-built | Commercial | Runs on single GPU, 10x domain adaptation efficiency | Official |
| Stockmark | Stockmark-2-100B | 2025-09 | 100B | Scratch-built | MIT | Business-focused, 90% accuracy vs GPT-4o’s 88% | HuggingFace |
| ELYZA | Shortcut-1.0-Qwen-32B | 2025-07 | 32B | Qwen adaptation | Open (HF) | GPT-4o equivalent, medical-specialized model | HuggingFace |
| rinna | Bakeneko 32B | 2025-02 | 32B | Qwen adaptation | Apache 2.0 | 6M+ downloads, published inference optimization data | HuggingFace |
| Fujitsu | Takane | 2024-09-30 | ~104B (Cohere-based) | Co-developed | Commercial | JGLUE world record, 1-bit quantization | Official |
| CyberAgent | CALM3-22B-Chat | 2024-07 | 22B | Scratch-built | Apache 2.0 | 70B-equivalent performance at 22B | HuggingFace |
| SB Intuitions | Sarashina | 2024-06-14 | Up to 460B (MoE) | Scratch-built | API + Research | Largest domestic model, 1T parameter model in development | HuggingFace |
Top 5 Enterprise LLM Strategies
NTT “tsuzumi”: Single-GPU Scratch-Built Model
NTT’s tsuzumi is built on 40+ years of NLP research. The tsuzumi 2 (released October 2025) has 30 billion parameters but runs on a single H100 GPU (~$35,000 hardware). It achieves an 81.3% win rate against GPT-3.5 and requires 10x less training data for domain adaptation compared to competitors. NTT’s AI-related orders reached 67 billion yen in FY2025 Q1.
NEC “cotomi”: Lightweight Agent-Oriented Model
NEC’s cotomi achieves 10x faster inference than GPT-4 with just ~13B parameters. The “cotomi Act” agent technology scored 80.4% on WebArena (exceeding human performance of 78.2%). It was selected for the Digital Agency’s “Government AI” initiative in March 2026. cotomi Pro runs on just 2 GPUs.
Fujitsu “Takane”: JGLUE World Record Holder
Co-developed with Canada’s Cohere using the Command R+ (~104B parameters) as a base, Takane holds the world’s highest score on the JGLUE benchmark. Fujitsu’s 1-bit quantization technology maintains 89% accuracy while reducing memory consumption by 94%.
SB Intuitions “Sarashina”: Scaling to 1 Trillion Parameters
Sarashina2-8x70B reaches ~460B parameters with MoE architecture, with a 1 trillion parameter model under development. The training data design is extensively documented: Japanese:English:Code = 5:4:1, 2.1T training tokens. Infrastructure includes NVIDIA DGX SuperPOD with 4,000+ Blackwell GPUs, backed by ~$1.2 billion in investment (2023–2025).
Rakuten “Rakuten AI 3.0”: Largest Open-Weight Domestic Model
Released March 2026 with ~700B MoE parameters under Apache 2.0, Rakuten AI 3.0 is the only frontier-class LLM from a major Japanese corporation released as fully open-source. It outperforms GPT-4o on Japanese benchmarks and targets 90% cost reduction across Rakuten’s ecosystem.
Startups and Mid-Size Players
PFN “PLaMo”: SSM-Hybrid Architecture
PLaMo 2 uses a Selective State Space Model (SSM) + Sliding Window Attention (SWA) hybrid. PLaMo 2.2 Prime 31B achieved GPT-5.1 equivalent on JFBench and is deployed in 150+ municipalities via QommonsAI. PLaMo Lite (1B) runs on edge devices.
ELYZA: Diffusion-Based LLM Pioneer
ELYZA’s January 2026 release of ELYZA-LLM-Diffusion generates text from noise using diffusion models rather than traditional autoregressive methods. Their medical model achieved the top score on IgakuQA (Japan’s medical licensing exam benchmark).
Stockmark: 100B MIT-Licensed Business Model
Stockmark-2-100B is a 100B parameter scratch-built model released under MIT license — the most permissive license among domestic LLMs of this scale. It achieves 90% accuracy on business Q&A (vs GPT-4o’s 88%) and is used by Toyota, Panasonic, Nissin, and Suntory.
Other Notable Players
- CyberAgent CALM3-22B-Chat: A 22B parameter scratch-built model achieving performance equivalent to Meta Llama-3-70B-Instruct (70B), released under Apache 2.0
- rinna Bakeneko 32B: Over 6M downloads. Published inference benchmarks on T4 GPUs, with int8 quantization reducing VRAM to just 3.8GB
- Sakana AI: Founded by co-authors of “Attention Is All You Need.” Uses Evolutionary Model Merge to build models without gradient-based training
- LINE japanese-large-lm: Released under Apache 2.0. Trained on 650GB of public corpora and internal web crawl data
GENIAC Project and Government Support
GENIAC’s Evolution
| Phase | Projects | Focus | Notable |
|---|---|---|---|
| Phase 1-2 | ~30 | Foundation models, domain data | Woven by Toyota (urban spatiotemporal), Ricoh (document understanding), Turing (autonomous driving VLM) |
| Phase 3 | 24 | Production deployment, agents | Airion (PLC auto-programming), Arivexis (drug discovery), Direava (surgical AI) |
| Phase 4 | Open | Further scale-up | Announced January 2026 |
Digital Agency “Government AI”: 180,000 Staff Deployment
The Digital Agency built the “Gennai” generative AI platform and selected 7 vendors in March 2026 (NTT Data’s tsuzumi 2, KDDI/ELYZA’s Llama-3.1-ELYZA-JP-70B, PFN’s PLaMo 2.0 Prime, NEC cotomi v3, etc.) for deployment to ~180,000 government staff.
Industry-Specific Deployments
Finance
- Mizuho Financial Group + SB Intuitions: Co-developing a finance-specialized LLM based on Sarashina
- MUFG + Sakana AI: Financial AI partnership leveraging evolutionary model merge technology
Healthcare
- Mie University Hospital + NTT West: tsuzumi-based nursing/physician note summarization for shift handover efficiency
- ELYZA-LLM-Med: Top score on IgakuQA medical licensing exam benchmark
Municipalities
- 150+ local governments: Deployed PFN’s PLaMo via “QommonsAI” for administrative operations
Education
- Tokyo Online University: Adopted NTT tsuzumi as an on-campus LLM platform to keep academic data within the institution
Cost Structure
On-Premises vs Cloud API
| Cost Item | Small Scale | Large Scale |
|---|---|---|
| Hardware (GPU, servers) | ~$35,000 | ~$350,000 |
| Software/maintenance (annual) | ~$3,500 | ~$35,000 |
| Operations staff (annual) | ~$550,000 | ~$2.6M |
Lightweight models like tsuzumi can run on older GPUs (A100, etc.) or CPU-mixed environments, significantly reducing hardware costs. For enterprises spending tens of millions of yen monthly on cloud APIs, on-premises becomes cheaper within ~3 years.
Cost by License Type
| License | Representative Models | Characteristics |
|---|---|---|
| Apache 2.0 / MIT | Rakuten AI 3.0, Stockmark-2-100B, CALM3 | Completely free. Commercial use allowed. Maximum flexibility |
| Community License | PLaMo 2 8B | Free for companies with less than 1B yen annual revenue. Balances sustainability with openness |
| Commercial Service | tsuzumi 2, cotomi v3, Takane | API/on-premises contracts. Includes enterprise support |
| Non-commercial | Sarashina2-8x70B (MoE) | Research use only. Commercial use prohibited |
Technical Trends
Multimodal Integration
- Woven by Toyota: Urban spatiotemporal understanding from 600M video-language data points (85.41% on Kinetics400)
- Ricoh: Visual understanding of complex business documents with charts and tables
- Turing: Real-time visual language model for autonomous driving
Edge SLMs
PLaMo Lite (1B) and Rakuten AI 2.0 mini (1.5B) are designed for edge devices and mobile terminals, completing processing on-device for privacy and low latency.
Competitiveness and Challenges
Performance Gap
On the Nejumi Leaderboard 4 (December 2025), GPT-5.2 (0.8285) leads domestic models by ~0.13 points. However, on Japanese-specific benchmarks, PLaMo 2.2 Prime matches GPT-5.1 and Rakuten AI 3.0 outperforms GPT-4o.
Core Value Proposition
The essential value of domestic LLMs lies not in general-purpose performance but in:
- Data sovereignty: Domestic processing of financial, medical, and defense data
- Cost efficiency: Single-GPU operation (tsuzumi 2)
- Few-shot domain adaptation: 10x less training data needed (tsuzumi 2)
- Legal transparency: Scratch-built models avoid copyright risks
Remaining Challenges
- Investment gap: Japan’s 1 trillion yen (5 years) is less than OpenAI’s annual investment alone
- AI talent shortage: Global competition for ML engineers, compounded by language barriers
- Energy/infrastructure constraints: Large-scale training demands massive power and data center capacity
Conclusion: The Hybrid Coexistence Strategy
Japan’s domestic LLM development is converging on a “hybrid coexistence” strategy — using GPT, Claude, and Gemini for general tasks while deploying domestic LLMs for confidential data processing, regulated industries, on-premises environments, and Japanese-specific tasks. NTT’s tsuzumi 2 running on a single GPU and PFN’s PLaMo Translation offered as a monthly subscription demonstrate that “bigger is not always better” in AI. Japan’s LLM ecosystem is maturing as a distinctive system built on data sovereignty, efficiency, and domain specialization — not as an imitation of global counterparts.
That’s all from the Gemba, where I surveyed the landscape of local LLM development by Japanese companies.