Local LLM Development by Japanese Companies: A Comprehensive Survey of Domestic AI Models

Tadashi Shigeoka ·  Tue, March 17, 2026

Japanese companies have developed over 30 major LLM (Large Language Model) variants as of March 2026, forming a substantial ecosystem. From NTT’s fully scratch-built “tsuzumi” to PFN’s SSM-hybrid “PLaMo,” the approaches are diverse. This article provides a comprehensive overview of these models’ technical characteristics, industry-specific deployments, cost structures, and future outlook.

Why “Local LLM” Now?

Three key factors drive Japanese companies toward local LLMs:

FactorDetails
Data Sovereignty & SecurityStrict enforcement of Japan’s Act on Protection of Personal Information (APPI) in finance, healthcare, and manufacturing requires eliminating the risk of input data being used for external model training
Cost StructurePay-per-use cloud API pricing can reach tens of millions of yen per month at enterprise scale; on-premises becomes cheaper within approximately 3 years
Japanese Language OptimizationGlobal models inadequately handle Japanese-specific features like honorifics, subject omission, and domain terminology

The government supports domestic LLM development through METI’s GENIAC (Generative AI Accelerator Challenge) project and announced a 1 trillion yen (approximately $7 billion) investment in AI and semiconductors over 5 years in December 2025.

Major Domestic LLM Models at a Glance

DeveloperModelReleasedParametersApproachLicenseKey StrengthURL
RakutenRakuten AI 3.02026-03-17~700B (MoE)MoE (Mistral-based)Apache 2.0Largest open-weight domestic modelHuggingFace
NECcotomi v32026-03-09~13BProprietary architectureCommercial10x faster inference than GPT-4, AI agent capabilityOfficial
PFNPLaMo 2.2 Prime2026-0131BScratch-built (SSM+SWA)PLaMo CommunityGPT-5.1 equivalent on JFBench, 150+ municipalitiesHuggingFace
NTTtsuzumi 22025-10-2030BScratch-builtCommercialRuns on single GPU, 10x domain adaptation efficiencyOfficial
StockmarkStockmark-2-100B2025-09100BScratch-builtMITBusiness-focused, 90% accuracy vs GPT-4o’s 88%HuggingFace
ELYZAShortcut-1.0-Qwen-32B2025-0732BQwen adaptationOpen (HF)GPT-4o equivalent, medical-specialized modelHuggingFace
rinnaBakeneko 32B2025-0232BQwen adaptationApache 2.06M+ downloads, published inference optimization dataHuggingFace
FujitsuTakane2024-09-30~104B (Cohere-based)Co-developedCommercialJGLUE world record, 1-bit quantizationOfficial
CyberAgentCALM3-22B-Chat2024-0722BScratch-builtApache 2.070B-equivalent performance at 22BHuggingFace
SB IntuitionsSarashina2024-06-14Up to 460B (MoE)Scratch-builtAPI + ResearchLargest domestic model, 1T parameter model in developmentHuggingFace

Top 5 Enterprise LLM Strategies

NTT “tsuzumi”: Single-GPU Scratch-Built Model

NTT’s tsuzumi is built on 40+ years of NLP research. The tsuzumi 2 (released October 2025) has 30 billion parameters but runs on a single H100 GPU (~$35,000 hardware). It achieves an 81.3% win rate against GPT-3.5 and requires 10x less training data for domain adaptation compared to competitors. NTT’s AI-related orders reached 67 billion yen in FY2025 Q1.

NEC “cotomi”: Lightweight Agent-Oriented Model

NEC’s cotomi achieves 10x faster inference than GPT-4 with just ~13B parameters. The “cotomi Act” agent technology scored 80.4% on WebArena (exceeding human performance of 78.2%). It was selected for the Digital Agency’s “Government AI” initiative in March 2026. cotomi Pro runs on just 2 GPUs.

Fujitsu “Takane”: JGLUE World Record Holder

Co-developed with Canada’s Cohere using the Command R+ (~104B parameters) as a base, Takane holds the world’s highest score on the JGLUE benchmark. Fujitsu’s 1-bit quantization technology maintains 89% accuracy while reducing memory consumption by 94%.

SB Intuitions “Sarashina”: Scaling to 1 Trillion Parameters

Sarashina2-8x70B reaches ~460B parameters with MoE architecture, with a 1 trillion parameter model under development. The training data design is extensively documented: Japanese:English:Code = 5:4:1, 2.1T training tokens. Infrastructure includes NVIDIA DGX SuperPOD with 4,000+ Blackwell GPUs, backed by ~$1.2 billion in investment (2023–2025).

Rakuten “Rakuten AI 3.0”: Largest Open-Weight Domestic Model

Released March 2026 with ~700B MoE parameters under Apache 2.0, Rakuten AI 3.0 is the only frontier-class LLM from a major Japanese corporation released as fully open-source. It outperforms GPT-4o on Japanese benchmarks and targets 90% cost reduction across Rakuten’s ecosystem.

Startups and Mid-Size Players

PFN “PLaMo”: SSM-Hybrid Architecture

PLaMo 2 uses a Selective State Space Model (SSM) + Sliding Window Attention (SWA) hybrid. PLaMo 2.2 Prime 31B achieved GPT-5.1 equivalent on JFBench and is deployed in 150+ municipalities via QommonsAI. PLaMo Lite (1B) runs on edge devices.

ELYZA: Diffusion-Based LLM Pioneer

ELYZA’s January 2026 release of ELYZA-LLM-Diffusion generates text from noise using diffusion models rather than traditional autoregressive methods. Their medical model achieved the top score on IgakuQA (Japan’s medical licensing exam benchmark).

Stockmark: 100B MIT-Licensed Business Model

Stockmark-2-100B is a 100B parameter scratch-built model released under MIT license — the most permissive license among domestic LLMs of this scale. It achieves 90% accuracy on business Q&A (vs GPT-4o’s 88%) and is used by Toyota, Panasonic, Nissin, and Suntory.

Other Notable Players

  • CyberAgent CALM3-22B-Chat: A 22B parameter scratch-built model achieving performance equivalent to Meta Llama-3-70B-Instruct (70B), released under Apache 2.0
  • rinna Bakeneko 32B: Over 6M downloads. Published inference benchmarks on T4 GPUs, with int8 quantization reducing VRAM to just 3.8GB
  • Sakana AI: Founded by co-authors of “Attention Is All You Need.” Uses Evolutionary Model Merge to build models without gradient-based training
  • LINE japanese-large-lm: Released under Apache 2.0. Trained on 650GB of public corpora and internal web crawl data

GENIAC Project and Government Support

GENIAC’s Evolution

PhaseProjectsFocusNotable
Phase 1-2~30Foundation models, domain dataWoven by Toyota (urban spatiotemporal), Ricoh (document understanding), Turing (autonomous driving VLM)
Phase 324Production deployment, agentsAirion (PLC auto-programming), Arivexis (drug discovery), Direava (surgical AI)
Phase 4OpenFurther scale-upAnnounced January 2026

Digital Agency “Government AI”: 180,000 Staff Deployment

The Digital Agency built the “Gennai” generative AI platform and selected 7 vendors in March 2026 (NTT Data’s tsuzumi 2, KDDI/ELYZA’s Llama-3.1-ELYZA-JP-70B, PFN’s PLaMo 2.0 Prime, NEC cotomi v3, etc.) for deployment to ~180,000 government staff.

Industry-Specific Deployments

Finance

  • Mizuho Financial Group + SB Intuitions: Co-developing a finance-specialized LLM based on Sarashina
  • MUFG + Sakana AI: Financial AI partnership leveraging evolutionary model merge technology

Healthcare

  • Mie University Hospital + NTT West: tsuzumi-based nursing/physician note summarization for shift handover efficiency
  • ELYZA-LLM-Med: Top score on IgakuQA medical licensing exam benchmark

Municipalities

  • 150+ local governments: Deployed PFN’s PLaMo via “QommonsAI” for administrative operations

Education

  • Tokyo Online University: Adopted NTT tsuzumi as an on-campus LLM platform to keep academic data within the institution

Cost Structure

On-Premises vs Cloud API

Cost ItemSmall ScaleLarge Scale
Hardware (GPU, servers)~$35,000~$350,000
Software/maintenance (annual)~$3,500~$35,000
Operations staff (annual)~$550,000~$2.6M

Lightweight models like tsuzumi can run on older GPUs (A100, etc.) or CPU-mixed environments, significantly reducing hardware costs. For enterprises spending tens of millions of yen monthly on cloud APIs, on-premises becomes cheaper within ~3 years.

Cost by License Type

LicenseRepresentative ModelsCharacteristics
Apache 2.0 / MITRakuten AI 3.0, Stockmark-2-100B, CALM3Completely free. Commercial use allowed. Maximum flexibility
Community LicensePLaMo 2 8BFree for companies with less than 1B yen annual revenue. Balances sustainability with openness
Commercial Servicetsuzumi 2, cotomi v3, TakaneAPI/on-premises contracts. Includes enterprise support
Non-commercialSarashina2-8x70B (MoE)Research use only. Commercial use prohibited

Multimodal Integration

  • Woven by Toyota: Urban spatiotemporal understanding from 600M video-language data points (85.41% on Kinetics400)
  • Ricoh: Visual understanding of complex business documents with charts and tables
  • Turing: Real-time visual language model for autonomous driving

Edge SLMs

PLaMo Lite (1B) and Rakuten AI 2.0 mini (1.5B) are designed for edge devices and mobile terminals, completing processing on-device for privacy and low latency.

Competitiveness and Challenges

Performance Gap

On the Nejumi Leaderboard 4 (December 2025), GPT-5.2 (0.8285) leads domestic models by ~0.13 points. However, on Japanese-specific benchmarks, PLaMo 2.2 Prime matches GPT-5.1 and Rakuten AI 3.0 outperforms GPT-4o.

Core Value Proposition

The essential value of domestic LLMs lies not in general-purpose performance but in:

  1. Data sovereignty: Domestic processing of financial, medical, and defense data
  2. Cost efficiency: Single-GPU operation (tsuzumi 2)
  3. Few-shot domain adaptation: 10x less training data needed (tsuzumi 2)
  4. Legal transparency: Scratch-built models avoid copyright risks

Remaining Challenges

  • Investment gap: Japan’s 1 trillion yen (5 years) is less than OpenAI’s annual investment alone
  • AI talent shortage: Global competition for ML engineers, compounded by language barriers
  • Energy/infrastructure constraints: Large-scale training demands massive power and data center capacity

Conclusion: The Hybrid Coexistence Strategy

Japan’s domestic LLM development is converging on a “hybrid coexistence” strategy — using GPT, Claude, and Gemini for general tasks while deploying domestic LLMs for confidential data processing, regulated industries, on-premises environments, and Japanese-specific tasks. NTT’s tsuzumi 2 running on a single GPU and PFN’s PLaMo Translation offered as a monthly subscription demonstrate that “bigger is not always better” in AI. Japan’s LLM ecosystem is maturing as a distinctive system built on data sovereignty, efficiency, and domain specialization — not as an imitation of global counterparts.

That’s all from the Gemba, where I surveyed the landscape of local LLM development by Japanese companies.