Local LLM Development by Japanese Companies: A Comprehensive Survey of Domestic AI Models

Japanese companies have developed over 30 major LLM (Large Language Model) variants as of March 2026, forming a substantial ecosystem. From NTT’s fully scratch-built “tsuzumi” to PFN’s SSM-hybrid “PLaMo,” the approaches are diverse. This article provides a comprehensive overview of these models’ technical characteristics, industry-specific deployments, cost structures, and future outlook.

Why “Local LLM” Now?

Three key factors drive Japanese companies toward local LLMs:

Factor	Details
Data Sovereignty & Security	Strict enforcement of Japan’s Act on Protection of Personal Information (APPI) in finance, healthcare, and manufacturing requires eliminating the risk of input data being used for external model training
Cost Structure	Pay-per-use cloud API pricing can reach tens of millions of yen per month at enterprise scale; on-premises becomes cheaper within approximately 3 years
Japanese Language Optimization	Global models inadequately handle Japanese-specific features like honorifics, subject omission, and domain terminology

The government supports domestic LLM development through METI’s GENIAC (Generative AI Accelerator Challenge) project and announced a 1 trillion yen (approximately $7 billion) investment in AI and semiconductors over 5 years in December 2025.

Major Domestic LLM Models at a Glance

Developer	Model	Released	Parameters	Approach	License	Key Strength	URL
Rakuten	Rakuten AI 3.0	2026-03-17	~700B (MoE)	MoE (Mistral-based)	Apache 2.0	Largest open-weight domestic model	HuggingFace
NEC	cotomi v3	2026-03-09	~13B	Proprietary architecture	Commercial	10x faster inference than GPT-4, AI agent capability	Official
PFN	PLaMo 2.2 Prime	2026-01	31B	Scratch-built (SSM+SWA)	PLaMo Community	GPT-5.1 equivalent on JFBench, 150+ municipalities	HuggingFace
NTT	tsuzumi 2	2025-10-20	30B	Scratch-built	Commercial	Runs on single GPU, 10x domain adaptation efficiency	Official
Stockmark	Stockmark-2-100B	2025-09	100B	Scratch-built	MIT	Business-focused, 90% accuracy vs GPT-4o’s 88%	HuggingFace
ELYZA	Shortcut-1.0-Qwen-32B	2025-07	32B	Qwen adaptation	Open (HF)	GPT-4o equivalent, medical-specialized model	HuggingFace
rinna	Bakeneko 32B	2025-02	32B	Qwen adaptation	Apache 2.0	6M+ downloads, published inference optimization data	HuggingFace
Fujitsu	Takane	2024-09-30	~104B (Cohere-based)	Co-developed	Commercial	JGLUE world record, 1-bit quantization	Official
CyberAgent	CALM3-22B-Chat	2024-07	22B	Scratch-built	Apache 2.0	70B-equivalent performance at 22B	HuggingFace
SB Intuitions	Sarashina	2024-06-14	Up to 460B (MoE)	Scratch-built	API + Research	Largest domestic model, 1T parameter model in development	HuggingFace

Top 5 Enterprise LLM Strategies

NTT “tsuzumi”: Single-GPU Scratch-Built Model

NTT’s tsuzumi is built on 40+ years of NLP research. The tsuzumi 2 (released October 2025) has 30 billion parameters but runs on a single H100 GPU (~$35,000 hardware). It achieves an 81.3% win rate against GPT-3.5 and requires 10x less training data for domain adaptation compared to competitors. NTT’s AI-related orders reached 67 billion yen in FY2025 Q1.

NEC “cotomi”: Lightweight Agent-Oriented Model

NEC’s cotomi achieves 10x faster inference than GPT-4 with just ~13B parameters. The “cotomi Act” agent technology scored 80.4% on WebArena (exceeding human performance of 78.2%). It was selected for the Digital Agency’s “Government AI” initiative in March 2026. cotomi Pro runs on just 2 GPUs.

Fujitsu “Takane”: JGLUE World Record Holder

Co-developed with Canada’s Cohere using the Command R+ (~104B parameters) as a base, Takane holds the world’s highest score on the JGLUE benchmark. Fujitsu’s 1-bit quantization technology maintains 89% accuracy while reducing memory consumption by 94%.

SB Intuitions “Sarashina”: Scaling to 1 Trillion Parameters

Sarashina2-8x70B reaches ~460B parameters with MoE architecture, with a 1 trillion parameter model under development. The training data design is extensively documented: Japanese:English:Code = 5:4:1, 2.1T training tokens. Infrastructure includes NVIDIA DGX SuperPOD with 4,000+ Blackwell GPUs, backed by ~$1.2 billion in investment (2023–2025).

Rakuten “Rakuten AI 3.0”: Largest Open-Weight Domestic Model

Released March 2026 with ~700B MoE parameters under Apache 2.0, Rakuten AI 3.0 is the only frontier-class LLM from a major Japanese corporation released as fully open-source. It outperforms GPT-4o on Japanese benchmarks and targets 90% cost reduction across Rakuten’s ecosystem.

Startups and Mid-Size Players

PFN “PLaMo”: SSM-Hybrid Architecture

PLaMo 2 uses a Selective State Space Model (SSM) + Sliding Window Attention (SWA) hybrid. PLaMo 2.2 Prime 31B achieved GPT-5.1 equivalent on JFBench and is deployed in 150+ municipalities via QommonsAI. PLaMo Lite (1B) runs on edge devices.

ELYZA: Diffusion-Based LLM Pioneer

ELYZA’s January 2026 release of ELYZA-LLM-Diffusion generates text from noise using diffusion models rather than traditional autoregressive methods. Their medical model achieved the top score on IgakuQA (Japan’s medical licensing exam benchmark).

Stockmark: 100B MIT-Licensed Business Model

Stockmark-2-100B is a 100B parameter scratch-built model released under MIT license — the most permissive license among domestic LLMs of this scale. It achieves 90% accuracy on business Q&A (vs GPT-4o’s 88%) and is used by Toyota, Panasonic, Nissin, and Suntory.

Other Notable Players

CyberAgent CALM3-22B-Chat: A 22B parameter scratch-built model achieving performance equivalent to Meta Llama-3-70B-Instruct (70B), released under Apache 2.0
rinna Bakeneko 32B: Over 6M downloads. Published inference benchmarks on T4 GPUs, with int8 quantization reducing VRAM to just 3.8GB
Sakana AI: Founded by co-authors of “Attention Is All You Need.” Uses Evolutionary Model Merge to build models without gradient-based training
LINE japanese-large-lm: Released under Apache 2.0. Trained on 650GB of public corpora and internal web crawl data

GENIAC Project and Government Support

GENIAC’s Evolution

Phase	Projects	Focus	Notable
Phase 1-2	~30	Foundation models, domain data	Woven by Toyota (urban spatiotemporal), Ricoh (document understanding), Turing (autonomous driving VLM)
Phase 3	24	Production deployment, agents	Airion (PLC auto-programming), Arivexis (drug discovery), Direava (surgical AI)
Phase 4	Open	Further scale-up	Announced January 2026

Digital Agency “Government AI”: 180,000 Staff Deployment

The Digital Agency built the “Gennai” generative AI platform and selected 7 vendors in March 2026 (NTT Data’s tsuzumi 2, KDDI/ELYZA’s Llama-3.1-ELYZA-JP-70B, PFN’s PLaMo 2.0 Prime, NEC cotomi v3, etc.) for deployment to ~180,000 government staff.

Industry-Specific Deployments

Finance

Mizuho Financial Group + SB Intuitions: Co-developing a finance-specialized LLM based on Sarashina
MUFG + Sakana AI: Financial AI partnership leveraging evolutionary model merge technology

Healthcare

Mie University Hospital + NTT West: tsuzumi-based nursing/physician note summarization for shift handover efficiency
ELYZA-LLM-Med: Top score on IgakuQA medical licensing exam benchmark

Municipalities

150+ local governments: Deployed PFN’s PLaMo via “QommonsAI” for administrative operations

Education

Tokyo Online University: Adopted NTT tsuzumi as an on-campus LLM platform to keep academic data within the institution

Cost Structure

On-Premises vs Cloud API

Cost Item	Small Scale	Large Scale
Hardware (GPU, servers)	~$35,000	~$350,000
Software/maintenance (annual)	~$3,500	~$35,000
Operations staff (annual)	~$550,000	~$2.6M

Lightweight models like tsuzumi can run on older GPUs (A100, etc.) or CPU-mixed environments, significantly reducing hardware costs. For enterprises spending tens of millions of yen monthly on cloud APIs, on-premises becomes cheaper within ~3 years.

Cost by License Type

License	Representative Models	Characteristics
Apache 2.0 / MIT	Rakuten AI 3.0, Stockmark-2-100B, CALM3	Completely free. Commercial use allowed. Maximum flexibility
Community License	PLaMo 2 8B	Free for companies with less than 1B yen annual revenue. Balances sustainability with openness
Commercial Service	tsuzumi 2, cotomi v3, Takane	API/on-premises contracts. Includes enterprise support
Non-commercial	Sarashina2-8x70B (MoE)	Research use only. Commercial use prohibited

Technical Trends

Multimodal Integration

Woven by Toyota: Urban spatiotemporal understanding from 600M video-language data points (85.41% on Kinetics400)
Ricoh: Visual understanding of complex business documents with charts and tables
Turing: Real-time visual language model for autonomous driving

Edge SLMs

PLaMo Lite (1B) and Rakuten AI 2.0 mini (1.5B) are designed for edge devices and mobile terminals, completing processing on-device for privacy and low latency.

Competitiveness and Challenges

Performance Gap

On the Nejumi Leaderboard 4 (December 2025), GPT-5.2 (0.8285) leads domestic models by ~0.13 points. However, on Japanese-specific benchmarks, PLaMo 2.2 Prime matches GPT-5.1 and Rakuten AI 3.0 outperforms GPT-4o.

Core Value Proposition

The essential value of domestic LLMs lies not in general-purpose performance but in:

Data sovereignty: Domestic processing of financial, medical, and defense data
Cost efficiency: Single-GPU operation (tsuzumi 2)
Few-shot domain adaptation: 10x less training data needed (tsuzumi 2)
Legal transparency: Scratch-built models avoid copyright risks

Remaining Challenges

Investment gap: Japan’s 1 trillion yen (5 years) is less than OpenAI’s annual investment alone
AI talent shortage: Global competition for ML engineers, compounded by language barriers
Energy/infrastructure constraints: Large-scale training demands massive power and data center capacity

Conclusion: The Hybrid Coexistence Strategy

Japan’s domestic LLM development is converging on a “hybrid coexistence” strategy — using GPT, Claude, and Gemini for general tasks while deploying domestic LLMs for confidential data processing, regulated industries, on-premises environments, and Japanese-specific tasks. NTT’s tsuzumi 2 running on a single GPU and PFN’s PLaMo Translation offered as a monthly subscription demonstrate that “bigger is not always better” in AI. Japan’s LLM ecosystem is maturing as a distinctive system built on data sovereignty, efficiency, and domain specialization — not as an imitation of global counterparts.

That’s all from the Gemba, where I surveyed the landscape of local LLM development by Japanese companies.