Prompt Serve — Enterprise AI Platform

Data Sheet · version 2026.05

This document summarises the Prompt Serve platform characteristics for reference in the Terms of Reference (TOR), proposal, statement of work (SOW), and compliance sheets for Thai government agencies, state enterprises, and large organisations.

Product name (canonical)	Prompt Serve
Category	Enterprise AI Platform (on-premise / private cloud)
Deployment	Docker-based, Linux x86_64
Architecture	Microservices, container-orchestrated
Licence	Commercial Licence — Group of companies
Languages supported	Thai, English (UI + LLM responses)
Standards compliance	PDPA, OWASP Top 10, ISO/IEC 27001 alignment
Audit log retention	Minimum 90 days (configurable up to 7 years)

1. Technical Specifications

1.1 Component Matrix

Component	Technology	Port	Database	Function
Portal (Frontend)	React 18 + Vite + TypeScript + Tailwind	3000	-	UI for admins and users - manage channels, agents, workflows, and KB. Supports TH/EN and light/dark themes.
Portal Service (Backend)	Kotlin + Spring Boot 3.5 WebFlux	8080	PostgreSQL	REST/SSE API, OAuth2 proxy, user/role management, BFF to AI Gateway.
AI Gateway	Go 1.25 + eino framework	4000 / 4001	SQLite 3	LLM proxy, agent runtime, workflow engine, RAG and OCR pipeline.
Agent Runtime	Go + eino	(internal)	-	Agentic execution with tool calling and multi-step reasoning.
PyWorker	Python 3.11 + FastAPI	(internal)	-	Document processing and OCR orchestration.
Hermes Sidecar	Python 3.11 + FastAPI	(internal)	SQLite 3	Self-learning skills, memory, and scheduler.
Documentation Portal	Next.js 15 + Fumadocs	3012	-	User manual, API spec, and data sheet for internal teams and partners.

1.2 External Stack Dependencies

Service	Version	Function	Self-hosted
Keycloak	24.x	Identity provider, OIDC/SAML, user federation, SSO.	Yes
PostgreSQL	16.x	Portal Service primary database.	Yes
OpenSearch	2.13+	RAG (vector + full-text), traces, usage logs, audit events.	Yes
MinIO	Latest stable	S3-compatible object storage for documents, uploads, fixtures.	Yes
Presidio Analyzer	Latest stable	PII detection (Microsoft OSS, port 5001).	Yes
Caddy	2.8.x	TLS termination, reverse proxy, rate limiting, Let's Encrypt.	Yes
Infinity Embedding	Latest	bge-m3 embeddings (production VPS).	Optional (OpenRouter fallback)

1.3 LLM Provider Support

Provider	Models	API Compatibility
OpenAI	GPT-4o, GPT-4o-mini, o1, o3	Native
Anthropic	Claude Opus 4.x, Sonnet 4.x, Haiku 4.x	Native
Google	Gemini 2.5 Pro, Flash, Pro Vision	Native (VLM for OCR)
OpenRouter	100+ models routed	Universal
Typhoon	Typhoon-2 (SCB 10X, Thai-tuned)	OpenAI-compatible
Local / On-premise	vLLM, Ollama, LM Studio, Infinity	OpenAI-compatible
BYOK (Bring Your Own Key)	Any OpenAI-compatible endpoint	Configurable

1.4 Performance Targets

Metric	Target	Notes
Chat completion latency (p50)	< 2 seconds	Streaming start; full response depends on model and length.
Chat completion latency (p95)	< 5 seconds	Streaming start.
Embedding latency (p95)	< 500 ms	bge-m3 on Infinity (single GPU).
OCR latency (p95)	< 8 seconds	Per page, Gemini Flash VLM.
RAG search latency (p95)	< 300 ms	Hybrid vector + BM25.
API throughput	500+ RPS	Per AI Gateway node, scalable.
Concurrent users	1,000+	Per Portal Service node.
Document upload	200 MB max	Single file, configurable.
Workflow execution	30 min max (sync), unlimited (async)	Async pattern recommended for long jobs.

1.5 Storage Capacity

Item	Default allocation	Scaling
PostgreSQL	20 GB	Up to 2 TB (RDS-compatible)
SQLite (AI Gateway)	5 GB	WAL mode, auto-vacuum
OpenSearch	100 GB	Multi-node cluster, scalable to TB.
MinIO	500 GB	Distributed mode, scalable to PB.
KB Documents per bucket	Unlimited	Practical: 100K+ docs per bucket.
Vector embeddings per index	10M+	OpenSearch k-NN scalable.

1.6 Protocols & Standards

Layer	Protocol / Standard
Transport	HTTPS (TLS 1.3), HTTP/2, HTTP/3 (QUIC)
API	REST/JSON (OpenAPI 3.1), Server-Sent Events (SSE), WebSocket (A2UI).
Auth	OAuth 2.1, OIDC, SAML 2.0, Bearer JWT (RS256).
LLM	OpenAI Chat Completions v1, Anthropic Messages v1, custom.
Tool calling	OpenAI tools format, MCP (Model Context Protocol).
Storage	S3 API v4 (SigV4 signing)
Search	OpenSearch v2 API, k-NN (HNSW), BM25
Webhooks	Standard HTTP + HMAC-SHA256 signature verification.

2. Functional Capabilities

2.1 Core AI Features

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Roadmap
	Production

2.2 Agent Capabilities

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.3 Workflow Engine

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.4 Knowledge Base (RAG)

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Beta
	Production

2.5 Channel & Integration

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.6 Evaluation Framework

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.7 Observability

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.8 Security & Compliance

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.9 Identity & SSO

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

2.10 Administration

Feature	Status
	Production
	Production
	Production
	Production
	Production
	Production
	Production
	Production

3. Compliance & Standards

3.1 Personal Data Protection Act (PDPA)

Requirement	Implementation
	Per-user audit trail, identity resolve API.
	Presidio integration (Thai + English).
	Configurable per virtual key/channel.
	Full retention, queryable.
	DELETE endpoints for user data, memory, history.
	Configurable consent flow in Portal.
	On-premise / private cloud deployment.
	None by default (Thai-hosted LLMs available).

3.2 OWASP Top 10 (2021)

OWASP	Mitigation
	RBAC + tenant isolation + per-resource auth checks.
	TLS 1.3, AES-256 at rest, RS256 JWT signing.
	Parameterized queries, input validation, sandboxed code execution.
	Defense-in-depth (Caddy + Gateway + RBAC).
	Hardened Docker images, .env validation, secret rotation playbook.
	Dependency scanning (security_ops skill), regular updates.
	Keycloak (proven OIDC), MFA-ready, brute-force rate limit.
	Signed Docker images, SBOM tracking.
	OpenSearch audit log + alerts.
	URL allowlist in HTTP node, no recursion to internal services.

3.3 ISO/IEC 27001 Alignment

Control Area	Coverage
	Documented in deployment guide.
	Inventory via Portal admin.
	RBAC + Keycloak + audit log.
	TLS + AES + JWT RS256.
	Patching, backup, change mgmt via deploy.sh.
	TLS termination at edge.
	Submodule-based versioning, signed releases.
	Incident response playbook in knowledge_base/lessons_learned.md.
	PDPA, OWASP, sector-specific (Thai law).

3.4 Thai Government Standards

Standard	Compliance
	Audit log + access control + encryption.
	RBAC + audit + retention.
	On-premise option, Thai-hosted LLMs.
	OIDC/SAML federation, custom IDP support.

3.5 Audit Log Retention

Event type	Default retention
User login/logout	90 days · max 7 years
API requests	90 days · max 7 years
Admin actions (CRUD)	1 year · max 7 years
Workflow approvals	1 year · max 7 years
PII masking events	1 year · max 7 years
LLM I/O full content	30 days · max 1 year

4. Deployment & Operations

4.1 Hardware Specifications

Minimum (POC / Development)

Component	Spec
CPU	8 cores (Intel Xeon / AMD EPYC equivalent)
RAM	16 GB
Disk	200 GB SSD
Network	100 Mbps
OS	Ubuntu 22.04 LTS / RHEL 9

Recommended (Production, ~100 concurrent users)

Component	Spec
CPU	16-32 cores
RAM	64 GB (128 GB with local embedding)
Disk	1 TB NVMe SSD + 2 TB HDD (sequential storage)
GPU (optional)	NVIDIA A10 / L4 (for local embedding/LLM)
Network	1 Gbps internal, 100 Mbps internet
OS	Ubuntu 22.04 LTS

Large Scale (Production, 1,000+ users)

Component	Spec
Architecture	Multi-node Docker Swarm / Kubernetes
AI Gateway nodes	3+ (load balanced)
Portal Service nodes	2+ (active-active)
PostgreSQL	HA cluster (primary + 2 replicas)
OpenSearch	3-node cluster minimum
MinIO	Distributed mode (4+ nodes)
Cache layer	Redis (optional)

4.2 Software Requirements

Item	Version
Docker Engine	24.x or newer
Docker Compose	v2.20 or newer
Linux Kernel	5.15 or newer
OpenSSL	3.0 or newer
Git	2.40 or newer (for source deployment)

4.3 Network Topology

4.4 Subdomain Layout (per partner)

Subdomain	Purpose	Auth
app.{domain}	Portal (admin + user UI)	Keycloak SSO
hub.{domain}	App launcher	Optional
chat.{domain}	Main chat application	Keycloak SSO
s3.{domain}	MinIO presigned URLs	SigV4
gw.{domain}	AI Gateway public proxy	VK / Channel key
doc.{domain}	Documentation portal	SSO (manual + API), public (datasheet)
{app}.{domain}	Customer-specific apps (FPO, TCC, FDA, etc.)	Per-app role

4.5 Scaling Guidance

Scale tier	Concurrent users	Setup
Pilot	10-50	Single VPS, all stacks co-located
Small Production	50-500	Dedicated DB host + app VPS
Medium Production	500-5,000	Multi-node Docker Swarm, replicated DB
Enterprise	5,000+	Kubernetes, HA stack, multi-region

4.6 Backup & Disaster Recovery

Item	Notes	RTO	RPO
PostgreSQL	Daily full + WAL streaming	1 hour	5 minutes
SQLite (AI Gateway)	Hourly snapshot to MinIO	1 hour	1 hour
OpenSearch	Snapshot to S3-compatible	4 hours	1 hour
MinIO	Replication to secondary site	2 hours	Real-time
Configuration	Git repo + .env.uat templates	30 minutes	Latest commit

4.7 Update & Patching

Dependency scanning: Weekly automated scan via the security_ops skill (Trivy + pip-audit/npm-audit/gosec).
Security patches: Within 7 days of CVE published (CVSS >= 7.0 triggers immediate ticket).
Feature updates: Monthly release cycle aligned with upstream LLM provider model releases.
Major versions: Quarterly cadence.
Deployment: Zero-downtime rolling updates via deploy.sh with health checks gating each container restart.

4.8 Monitoring & Alerting

Health checks: /health (AI Gateway), /actuator/health (Portal Service).
Metrics: Prometheus-compatible (optional).
Logs: JSON-structured via Docker log driver, integrates with Loki/ELK if available.
Alerts: Configurable webhook into Slack/email/PagerDuty.

4.9 Supported Deployment Modes

Mode	Description
On-premise	Customer's data center, full air-gapped possible
Private Cloud	Customer's AWS/Azure/GCP account
Hybrid	Frontend in cloud + LLM on-premise (data residency)
Managed (SaaS)	Group of companies managed hosting

4.10 Pricing & Licensing Model

Perpetual licence + annual maintenance & support.
Subscription (annual / multi-year).
Pay-per-use (LLM tokens + storage).
Hybrid model (licence + token bundle).

For more information:

Website: https://promptserve.tech
Documentation: https://doc.promptserve.tech
User Manual (login): https://doc.promptserve.tech/docs
API Specification (login): https://doc.promptserve.tech/api-spec/ai-gateway

This document is for reference in TOR and proposal documents only. Actual specifications in each project may be adjusted to match the procuring organisation's specific requirements.