文件

與 IPDesk OS 整合

兩種整合方式 — Skill 讓 AI agent 認識你的 OS,MCP 讓 AI 直接操作 OS。

Skill

Skill 是給 AI agent 認識 IPDesk OS 最快的方式。把這份 Markdown 餵給 Claude / GPT / Gemini,AI 自動知道你的 OS 上有哪些 AI Actions、平台 routes、外部連線,後續對話可以直接用。

為什麼從 Skill 開始? 不需要寫一行 code,不需要部署 server。5 分鐘讓你的 AI agent 認識整個 OS。適合所長/律師日常用 AI 處理事務所事情。

1. 把 Skill 載入你的 AI agent

# Claude / GPT / Gemini — paste this URL into the chat
https://ipdesk.ai/skill/ipos-sdk-converter/raw

# Claude Code / Codex CLI — pipe it directly
curl -s https://ipdesk.ai/skill/ipos-sdk-converter/raw | claude

# curl raw markdown
curl -H "Accept: text/markdown" https://ipdesk.ai/skill/ipos-sdk-converter/raw

2. AI 自動知道的事

  • SDK 打包指南 — 怎麼把現有 Streamlit / Python 腳本 / Excel 巨集轉成 IPOS app
  • 可呼叫的 AI Actions(自動產生) — slug / name / description / input_type / output_shape,從 ai_actions 表 runtime 撈
  • 可用的平台 Routes(自動產生) — 客戶 app 可以呼叫的 server-side endpoints
  • 已知的外部連線(自動產生) — Gmail / Drive / OneDrive / Dropbox / USPTO 等可串接的 externals
為什麼會自動更新: 後三段表格(AI Actions / Routes / Externals)在每次 request 時從 DB 跟 catalog 重抓,所以你 AI agent 看到的永遠是當下平台真實狀態 — admin 上架新 agent 後不用重 push docs。

完整 Skill 內容

下載 raw markdown

下面就是 AI agent 會讀到的完整內容。如果你的 AI 客戶端(例如 Claude Code、Codex CLI)支援 URL fetch,直接給它 raw 連結比較省 token。

Quick start: do you have MCP?

If you are an AI agent and your client (Claude Desktop / Cursor / Claude Code) supports MCP, prefer calling tools from the IPOS MCP server (https://ipdesk.ai/mcp) over reading this entire document:

  • list_ai_actions instead of grepping the auto-generated agent table
  • list_platform_routes instead of the route catalog section
  • create_app / upload_zip / poll_deployment instead of curl commands
  • get_skill to come back and read this when you need decision-level context

If MCP is not available, continue reading — every operation in this doc maps to a documented /api/ext/v1/... endpoint that you can hit with a Bearer token.


IPOS SDK Converter

Convert any existing app into an IPOS OS static app via two structured handoffs.

Core principle: every app connected to IPOS OS must be Agent-controllable. The OS Agent must be able to read the app's complete runtime state and invoke every user-visible control through a declared control surface. An app that only opens in an iframe is not fully converted.

When to use

Invoke this SKILL when:

  • A customer says "I have an existing app and want it on IPOS OS / 想把我的工具放到 IPOS"
  • Someone needs to scaffold a new app in apps/<slug>-app/ in the email-processor repo
  • An IPOS app deploy is failing — see references/gotchas.md first
  • A new platform route is being designed (Q3 of the diagnosis tree)

Architecture in one paragraph

In addition to the three execution buckets below, every converted app also has a mandatory Agent Control Surface: complete app state plus commands for every user-visible control. Treat it as a required control bucket even when the app is otherwise pure static zip code.

Every app capability lands in exactly one of three execution buckets: zip (JS/TS/React/Pyodide, runs in browser, no server), platform route (server-side endpoints maintained by IPOS — OCR, CORS proxies, etc.), or platform-js SDK (IPOS data read/write via postMessage to the shell). Customer servers: never. If a server-side capability is needed and no platform route exists, customer files a Platform Route Request → IPOS builds it → customer's zip never changes.

Bucket What lives here Maintained by
zip JS/TS logic, React UI, Pyodide Python Customer (static, no server)
Platform route Server-side capabilities (OCR, CORS proxy, AI calls) IPOS (shared, centrally maintained)
platform-js SDK IPOS data read/write, session identity IPOS
Agent Control Surface Complete app state snapshot + commands the Agent may invoke Customer defines; IPOS shell transports/exposes

Session Persistence (opt-in)

Apps may declare "session_persistence": true in their manifest to participate in cross-device session restore. Doing so adds two standard postMessage contracts: app → OS sends { kind: 'ipos:request', type: 'session.snapshot', payload: { state, schemaVersion } }; OS → app sends { type: 'ipos.session.restore', windowId, payload: { state, schemaVersion } } once after the iframe handshake when a prior snapshot exists. Window geometry is restored automatically; only in-app state (form drafts, current view, pinned items) requires opt-in. See references/session-persistence.md for the full contract.

How to run this SKILL

With a non-technical audience: Open part1-diagnosis.md, walk through the feature inventory and decision tree together. Output is a filled SDK Spec Sheet.

With a developer: Open part2-implementation.md. Hand them the SDK Spec Sheet from Part 1. They scaffold + implement + deploy.

Before designing a new route: Open platform-route-catalog.md to see what already exists, then references/route-design-patterns.md for the recipes.

Before finishing any app: Open references/agent-control-surface.md. Every app needs a complete state contract, action contract, manifest declarations, and verification.

Before deploy (Part 2 final step): Open references/flow-compliance-report.md and produce a plain-language HTML report for the customer comparing their original flow vs. the IPOS flow, with a compliance checklist. Ask the customer where to put it (default docs/<slug>-flow-compliance.html); do not bundle it into the zip.

When something breaks: Open references/gotchas.md — every issue hit during real deploys is catalogued there with root cause and fix.

For a worked example (Pattern A/B — external API proxy): references/g1001-walkthrough.md walks through converting a Streamlit Python app (g1001) into an SDK app, covering all three buckets including a CORS-blocked external API that required two new platform routes (OCR + USPTO proxy + a Cloudflare Worker to bypass IP blocking).

For a worked example (Pattern E — AI analysis + file storage): references/scan-doc-walkthrough.md walks through converting scan-doc-organizer (a Streamlit app that uses Gemini Vision to split combined scanned PDFs) into an SDK app. Covers: sending PDF inline to Gemini without rendering, SchemaType enum gotcha, platform.appFileUpload() to save to 檔案總管, credentials: 'include' for direct platform route calls, and author_org_id setup for org-owned apps.

Hard rules (don't violate)

  1. No customer-managed servers. If a feature needs server-side execution, it goes through a platform route. Period.

  2. No direct DB credentials in app code. All IPOS data goes through platform.db() or platform routes.

  3. Bucket every capability before writing code. The SDK Spec Sheet is the contract; without it, you're guessing.

  4. Pyodide is a zip-bucket option for pure Python only. It is never a workaround for missing platform routes — Pyodide can't make CORS-blocked HTTP calls or hold secrets.

  5. Bump version in platform.json for every redeploy. Same version + new zip = same deploymentId returned (idempotency); old failed deploy will not be retried.

  6. Agent control is mandatory. Every app must expose complete runtime state and every user-visible control through the Agent Control Surface. Server apps may use /api/agent/*; static apps must use platform-js postMessage state/action bridging through the OS shell.

  7. No hidden UI-only state. If a user can see or change it, the Agent must be able to observe it and trigger the equivalent control unless the SDK Spec Sheet explicitly marks it as sensitive/non-controllable with a reason.

Quick reference: the deploy commands

# Build (use bash, NOT PowerShell Compress-Archive — see gotchas.md)
cd apps/<slug>-app
npm run build                     # produces out/
zip -r ../<slug>-<version>.zip platform.json out/

# Deploy via API (more reliable than DevConsole UI)
TOKEN='ipos_live_<your-dev-token>'
curl -X POST "https://ipdesk.ai/api/ext/v1/apps/<slug>/deploy" \
  -H "Authorization: Bearer $TOKEN" \
  -F "zip=@./<slug>-<version>.zip" \
  -F "manifest=__from_zip__"
# Returns: { "deploymentId": "...", "status": "building" }

# Poll until live (usually 5-15 seconds)
curl "https://ipdesk.ai/api/ext/v1/deployments/<deploymentId>" \
  -H "Authorization: Bearer $TOKEN"

AI Centralization (MANDATORY — read this first if your app touches AI)

Core rule: Every AI call from a customer app MUST go through IPOS's centralized ai_actions framework. No customer app may bring its own AI provider key, install a provider SDK (@google/generative-ai, @anthropic-ai/sdk, @mistralai/mistralai, openai direct, etc.), or fetch api.openai.com / api.anthropic.com / api.mistral.ai / generativelanguage.googleapis.com directly.

This is enforced two ways:

  • CI-side: npm run lint:ai-gate runs 3 ESLint custom rules over app/api/platform/v1/** and red-lights direct SDK imports / provider env vars / forbidden hostnames.
  • Runtime-side: AI calls go through executeAction(slug, orgId, ...) or executeImageAction(slug, orgId, ...) from @/lib/ai/actions. Those resolve provider/model/key from the ai_providers + ai_actions DB tables.

Why this matters

  • Admin manages everything in DB. Adding a new AI agent = INSERT into ai_actions, not a code change. Toggling a provider = admin UI click.
  • Customers don't pay token cost. IPOS absorbs it. No quota to track in your app.
  • Model swapping is invisible. If we move legacy_ocr from Mistral to a cheaper alternative tomorrow, your app keeps working — same slug, different underlying model.
  • Per-org governance. Admin can disable any slug for any org via /admin/ai/grants if they're abusing it. Your app sees a clean error and surfaces it to the user.

Step 1 — Discover existing AI agents BEFORE inventing new ones

Customer agents writing apps must fetch the live agent catalog before declaring uses_ai_actions:

curl https://ipdesk.ai/skill/ipos-sdk-converter/raw

The bottom of that markdown contains an auto-generated table of every ai_actions slug currently registered (with name / description / input_type / output_shape / example_use_case — model_id is intentionally hidden). Reuse an existing slug whenever possible. Inventing a duplicate slug pollutes admin's slug namespace and forces them to wire up a provider/model for a redundant agent.

Step 2 — Declare slugs in platform.json

In your app's platform.json:

{
  "slug": "my-app",
  "version": "1.0.0",
  "ui_entry": { ... },
  "uses_routes": ["/api/platform/v1/ocr"],
  "uses_ai_actions": ["legacy_ocr", "scan_doc_classify"]
}

Both v1 (ManifestSchema) and v2 (BundleManifestV2Schema) accept this field; default is []. Validation: every slug must be a non-empty string. Schema enforces z.array(z.string().min(1)).optional().default([]).

Step 3 — What happens at install time

Behavior depends on whether the upload is admin path (uploading via /admin/os) or org path (uploading as an org user):

Scenario Slug exists in ai_actions Slug missing Grants written
Admin path (author_type='platform') ✅ install proceeds ❌ install REJECTED with: Official app references unknown ai_action slug(s): X. Add them via /admin/ai/actions before installing. None (Q6 bypass — platform apps skip the grant layer entirely)
Org path (author_type='org') ✅ install proceeds ✅ stub ai_actions row auto-created with is_enabled=true, auto_created_from_app=app.id, NULL provider/model. /admin/ai/actions shows it with a yellow "從客戶 app 自動建立 — 請補上 provider/model" banner ai_action_grants(org_id, action_slug, status='enabled') upserted for every slug

Practical implication: when you upload via the org path with a brand-new slug, the slug exists but isn't actually runnable yet — the auto-created stub has no provider/model. Calling executeAction('your_new_slug', orgId, ...) throws Action 'your_new_slug' has no provider/model configured. Admin must visit /admin/ai/actions and wire up provider + model before the agent works.

Step 4 — Calling the AI from a platform route (server-side only)

Platform routes (under app/api/platform/v1/**) call AI like this:

import { executeAction, executeImageAction } from '@/lib/ai/actions'
import { buildRequestContext } from '@/lib/ipos/request-context'

export async function POST(req: NextRequest) {
  const rc = await buildRequestContext(req)
  if (!rc.userId || !rc.orgId) return err('AUTH', 'Unauthorized', 401)

  // Text-in / text-out:
  const r = await executeAction(
    'classify',
    rc.orgId,
    userMessage,                      // user-side input
    undefined,                        // optional system prompt override
    { callerAuthorType: rc.appAuthorType ?? 'org' },
  )

  // Image / PDF in / text out (multimodal):
  const r2 = await executeImageAction(
    'legacy_ocr',
    rc.orgId,
    'Please OCR this PDF',
    pdfBase64,
    'application/pdf',
    undefined,
    { callerAuthorType: rc.appAuthorType ?? 'org' },
  )

  return NextResponse.json({ data: { text: r.content } })
}

Always pass callerAuthorType: rc.appAuthorType ?? 'org' so the grant gate runs for org-installed apps and bypasses for platform apps. The appAuthorType field is populated by buildRequestContext via a 2-hop lookup (ipos_installationsipos_apps.author_type).

Step 5 — What you MUST NOT do

These will fail npm run lint:ai-gate (and conceptually break the centralization promise):

// ❌ direct SDK import in app/api/platform/v1/**
import { GoogleGenerativeAI } from '@google/generative-ai'

// ❌ direct provider env var
const k = process.env.MISTRAL_API_KEY

// ❌ direct fetch to provider host
await fetch('https://api.openai.com/v1/chat/completions', { ... })

The only escape hatch is a top-of-line // eslint-disable-next-line ai-centralization/* -- ai-gate-allowlist: <reason> and that should be reserved for orchestrator-tier code that genuinely needs a feature executeAction doesn't yet support (e.g., OpenAI tool_calls). For ALL customer-facing AI features → centralize.

Step 6 — Calling AI from inside the zip (browser-side)

Don't do it directly. Browser-side code can't hold a provider key, and it shouldn't touch a provider URL even if it could (CORS, key leakage, no logging, no quota tracking).

Instead: have the browser call a platform route (which you may need to request — see "Platform Route Catalog" → "Requesting a new route") that wraps executeAction server-side.

Step 7 — Surfacing admin disable cleanly

If admin disables a grant, your app's call to executeAction throws:

Error: Action 'classify' has been disabled by admin for this org.

Catch it and show a user-friendly message:

try {
  const r = await fetch('/api/platform/v1/my-route', { ... })
  if (!r.ok) {
    const err = await r.json()
    if (err?.error?.message?.includes('disabled by admin')) {
      setError('AI 功能已被管理員停用,請聯絡您的組織管理員')
      return
    }
    throw new Error(err?.error?.message ?? `HTTP ${r.status}`)
  }
} catch (e) { ... }

Quick reference — currently-registered slugs (snapshot, may be stale)

The live agent table is auto-appended to the bottom of this markdown at request time. Use that as the source of truth. Snapshot at time of writing:

slug input use case
classify text Classify an email body into an action category
legacy_ocr PDF (base64) Mistral OCR via OpenRouter — full-page OCR
scan_doc_classify text Scan-doc Layer A: classify a document into doc_type/court/case_number etc.
scan_doc_classify_pro text Scan-doc Layer A pro tier — higher accuracy classifier
scan_doc_layer_b_ask image Scan-doc Layer B: high-DPI sub-region re-ask for stamps/dates/case nums
scan_doc_ocr_primary image Tier 1 OCR — Qianfan-OCR-Fast (free)
scan_doc_ocr_fallback_1 image Tier 2 OCR — Qwen2.5-VL 72B
scan_doc_ocr_fallback_2 image Tier 3 OCR — Gemini 2.5 Flash

The auto-generated table below this document supersedes this snapshot.


Part 1 — Diagnosis

Part 1: Capability Diagnosis (Non-Technical)

Who fills this: Business owner, PM, or anyone who understands what the app does — no coding knowledge needed.

Output: A filled SDK Spec Sheet, ready to hand off to a developer.


Step 1: Feature Inventory

List every capability of your app, one row per capability. Think in terms of what the app does, not how it's built.

Good rows (specific capabilities):

  • "Extract text from uploaded PDF"
  • "Download PDFs from USPTO website"
  • "Show a list of patent numbers the user can edit"
  • "Save the patent numbers to the shared org database"

Bad rows (too coarse — split them):

  • "Process the document" — what does processing mean?
  • "The main workflow" — not a capability
  • "Handle errors" — that's plumbing, not a feature
Feature What it does (one sentence)
(one capability per row) (specific user-visible action)

Step 1A: Agent State and Control Inventory

Every converted app must be controllable by the OS Agent. Inventory all runtime state and all user-visible controls before deciding buckets.

State means anything the user can see or that affects the workflow:

  • Textarea/input values
  • Uploaded file metadata
  • Current step/view/tab/modal
  • Selected row/item/email/case/document
  • Toggle/slider/dropdown values
  • Parsed/derived data
  • Progress and pending jobs
  • Success/failure results
  • Validation and runtime errors
  • Whether buttons/actions are enabled or disabled

Controls means anything the user can do:

  • Type/set a value
  • Click a button
  • Toggle a checkbox
  • Move a slider
  • Select/remove/reorder a row
  • Start/cancel/reset a workflow
  • Download/export/save/push to DB

Fill both tables:

State field What user sees / why it matters Sensitive or large? Agent representation
(e.g. rawInput) (textarea contents) No full string
(e.g. uploadedPdf) (selected file) Large name, size, page count only
Control Equivalent Agent action Params Disabled when
(e.g. Start button) start none no valid input / already running
(e.g. textarea edit) set_input { value: string } workflow running

If something is intentionally not exposed to the Agent, write the reason here. "It is React state" is not a valid reason.


Step 2: Decision Tree (run for each feature)

For each feature, answer in order:

Q1: Can this run entirely in a web browser with no external HTTP calls?
    Pure UI · data transformation · calculations · file manipulation
    → YES: Bucket = zip
    → NO: continue

Q2: Does the feature read or write data already stored in IPOS?
    cases · emails · prior_art · documents · etc.
    → YES: Bucket = platform-js SDK (use platform.db())
    → NO: continue

Q3: Does the IPOS Platform Route Catalog have a matching route?
    See platform-route-catalog.md
    → YES: Bucket = platform route (note the endpoint)
    → NO: continue

Q4: File a Platform Route Request to IPOS.
    IPOS builds it → adds to catalog → then Bucket = platform route.

Common patterns

If the feature is... Bucket Why
Parsing text in the browser zip No HTTP needed
Math, regex, formatting zip Pure JS
OCR / Vision / LLM call platform route Holds API key server-side
Fetch from external API blocked by CORS platform route Browser can't bypass CORS
Fetch from external API blocked by IP platform route + CF Worker Datacenter IPs often blocked; see gotchas
Saving to org's case / email / prior_art platform-js SDK platform.db('org:<table>')
Analyze document with AI (Vision/LLM) platform route API key is platform-managed (set by IPOS admin on Zeabur, not by the app developer). App calls the route; key is invisible to it. See Pattern E in route-design-patterns.md
Save output files to 檔案總管 platform-js SDK platform.appFileUpload() — requires org_write: ["app_files"] in platform.json.permissions
File download to user's machine zip <a download> after fetching bytes
Reading user's session info platform-js SDK platform.session.{user,org}
Real-time updates from server ⚠️ not yet supported WebSocket relay is future work

Scope Decision: Strict Port vs Feature Expansion

Before filling the spec sheet, make one explicit decision — this prevents scope creep from derailing the conversion.

Strict port (default): Recreate the original app feature-for-feature. Nothing added, nothing removed. Every row in Step 3 maps to something the original app already did.

Feature expansion: Add new capabilities during the conversion. Only do this when the addition is clearly adjacent AND small. Each expansion must survive: "Would I build this separately if the port were already done?" If no → cut it.

When expanding scope, add a Scope column to the spec sheet to flag what's new:

Feature Scope Bucket Resource
(original feature) PORT ... ...
(new addition) NEW ... ...

PORT rows must achieve parity with the original. NEW rows require explicit stakeholder sign-off before a developer starts. When in doubt: default to strict port, ship it, then handle expansions as a follow-up.


Step 3: Fill the SDK Spec Sheet

Combine inventory + decisions into one table:

Feature Needs server? Bucket Resource
(feature name) Yes / No / IPOS data zip / platform route / platform SDK (library, endpoint, or SDK method)

Then add the Agent Control Sheet:

Requirement Contract
Complete state schema (link/list all fields from Step 1A)
Action schema (list every action ID and params)
Transport server /api/agent/* OR static platform-js bridge
Manifest hints platform.json.agent_actions entries
Verification how the Agent will read state and invoke actions

Worked example (g1001 — USPTO patent extractor)

Feature Needs server? Bucket Resource
Upload PDF No zip <input type="file">
Extract text from PDF No zip pdfjs-dist
Detect scanned PDF No zip pdfjs-dist (text length heuristic)
OCR scanned PDF Yes (API key) platform route /api/platform/v1/ocr
Filter PTO-892 section No zip JS regex
Extract patent numbers No zip JS regex
Display + edit list No zip React
Download USPTO PDFs Yes (CORS + IP block) platform route /api/platform/v1/uspto-proxy
Package as ZIP No zip jszip
Push patent numbers to org DB IPOS data platform-js SDK platform.db('org:prior_art').insert()

Agent Control Sheet example:

Requirement Contract
Complete state schema step, file metadata, extracted patents, edited patents, download progress, DB push result, errors
Action schema set_file, extract, set_patents, start_download, cancel, reset
Transport static platform-js bridge
Manifest hints get_context plus all action IDs
Verification Agent can inspect current step and invoke every workflow button/action

Step 4: List New Platform Routes Needed

Scan the Resource column. Any row pointing at a route that's not in platform-route-catalog.md → file a request:

Route needed Inputs Expected output App(s) Why server-side?
(path or one-line description) (JSON / file / params) (JSON / file bytes) (app name) (CORS / API key / IP block / ...)

The "Why server-side?" column is critical — it tells the route designer whether they need a CF Worker (IP block), a credential vault (API key), or just a CORS proxy.


Handoff Checklist

Before passing the SDK Spec Sheet to a developer:

  • [ ] Every app feature has exactly one row
  • [ ] Every "platform route" row has a specific endpoint (existing in catalog or confirmed by IPOS)
  • [ ] No row says "customer's own server" / "Lambda we maintain" / "our backend"
  • [ ] IPOS data features all use platform SDK (not direct DB credentials)
  • [ ] Pyodide rows acknowledge it's pure-Python only (no C extensions, no network)
  • [ ] New routes (Step 4) are filed and have a "why server-side" reason
  • [ ] Every visible state field is represented in the Agent state schema
  • [ ] Every user-visible control has an equivalent Agent action
  • [ ] Static apps identify the platform-js bridge requirement; server apps identify /api/agent/* endpoints
  • [ ] platform.json.agent_actions is planned for every context/action entry

The completed sheet is the developer's contract. They should be able to implement each row without re-asking what the app does.

Part 2 — Implementation

Part 2: Implementation Guide (Technical)

Who reads this: The developer who received the SDK Spec Sheet from Part 1.

Work through each row of the SDK Spec Sheet. Each bucket has a recipe below. Then scaffold, build, deploy, verify.


Mandatory: Agent Control Surface

Before coding UI internals, read references/agent-control-surface.md and turn the Agent Control Sheet into code. Every app must expose:

  • Complete runtime state: all visible inputs, selections, current step/view, progress, results, errors, and enabled/disabled action state.
  • Complete controls: every button, toggle, slider, text input, selection, start/cancel/reset/download/save action must have an Agent-callable equivalent.
  • Manifest hints: every state/action entry must be declared in platform.json.agent_actions.

Server/runtime app recipe:

GET  /api/agent/context
POST /api/agent/action

Static zip app recipe:

// Desired platform-js bridge API. If this does not exist yet in the repo,
// the app is not fully Agent-controllable and the platform prerequisite must
// be documented before marking the conversion complete.
platform.agent.updateContext(agentContext)
platform.agent.registerAction('set_input', async ({ value }) => { ... })
platform.agent.registerAction('start', async () => { ... })
platform.agent.registerAction('cancel', async () => { ... })
platform.agent.registerAction('reset', async () => { ... })

Do not use DOM scraping, localStorage, or iframe inspection as the control surface. Static apps must push state and receive actions through platform-js postMessage via the OS shell.


Recipe: Bucket = zip (JS/TS logic)

Place pure logic in apps/<slug>-app/src/lib/, import in components.

// src/lib/normalize-patent.ts
export function normalizePatentNumber(raw: string): string {
  return raw.replace(/^US/i, '').replace(/,/g, '').trim().replace(/^0+/, '') || '0'
}

If the original is Python and pure (no C extensions, no filesystem, no subprocess), prefer porting to TypeScript. Use Pyodide only when the Python is too gnarly to port (numpy heavy / regex with Python-specific lookbehind / etc.).

Recipe: Bucket = zip (React UI)

Mirror the structure of apps/g1001-app/src/. Use plain React, Next.js Pages Router (not App Router — static export support is more reliable on Pages).

// src/components/MyView.tsx
import React, { useState } from 'react'

interface Props { items: string[]; onSelect: (item: string) => void }

export default function MyView({ items, onSelect }: Props) {
  return (
    <ul>
      {items.map((item) => (
        <li key={item}>
          {item} <button onClick={() => onSelect(item)}>選</button>
        </li>
      ))}
    </ul>
  )
}

Recipe: Bucket = zip (Pyodide — pure Python)

Use only when porting to JS would lose a critical dependency. Limits: no C extensions (no pandas w/ native, no cv2, no pyarrow), no network calls (Pyodide can't bypass CORS or hold secrets — that's what platform routes are for).

// src/lib/pyodide-runner.ts
let pyodide: Awaited<ReturnType<typeof import('pyodide').loadPyodide>> | null = null

export async function runPython(code: string, inputs: Record<string, unknown> = {}): Promise<unknown> {
  if (!pyodide) {
    const { loadPyodide } = await import('pyodide')
    pyodide = await loadPyodide({ indexURL: 'https://cdn.jsdelivr.net/pyodide/v0.26.4/full/' })
  }
  for (const [k, v] of Object.entries(inputs)) pyodide.globals.set(k, v)
  return pyodide.runPython(code)
}

Recipe: Bucket = platform route (JSON in/out)

const res = await fetch('/api/platform/v1/<route>', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ /* per catalog */ }),
})
const body = await res.json().catch(() => null)
if (!res.ok) {
  // Errors come back as { error: { code, message } }
  throw new Error(body?.error?.message ?? `${res.status}`)
}
const { data } = body

Recipe: Bucket = platform route (binary in/out)

For routes returning file bytes:

const res = await fetch('/api/platform/v1/uspto-proxy', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ patent_number: 'US9999999' }),
})
if (!res.ok) {
  // Try to parse JSON error envelope before giving up
  const err = await res.json().catch(() => null)
  throw new Error(err?.error?.message ?? `Download failed: ${res.status}`)
}
const buffer = await res.arrayBuffer()

Recipe: Bucket = platform-js SDK

Copy apps/g1001-app/src/vendor/platform-js.ts into your app's src/vendor/. Then in your top-level component:

import { Platform } from './vendor/platform-js'
import manifest from '../platform.json'

const [platform, setPlatform] = useState<Platform | null>(null)
useEffect(() => {
  Platform.connect({ targetOrigin: '*' })
    .then(async (p) => {
      setPlatform(p)
      // REQUIRED: always include version so the OS title bar and Agent can identify
      // which build is running. Format: "<App Name> v<version>"
      await p.setTitle(`${manifest.name} v${manifest.version}`)
    })
    .catch(console.error)
}, [])
if (!platform) return <div>Connecting...</div>

// Read
const rows = await platform.db('org:prior_art').select<PriorArt>()

// Insert (single row only — see gotchas if you have an array)
await platform.db('org:prior_art').insert({ patent_number: 'US9999999', country: 'US' })

// Update by id
await platform.db('org:cases').update(caseId, { status: 'in_progress' })

// Notify (toast in OS shell)
await platform.notify('已推送 5 個專利', 'success')

// Session
const { user, org } = platform.session

Mandatory: setTitle must always be called on connect with the format `${manifest.name} v${manifest.version}`. Never hardcode the version string — always read it from platform.json so bumping the manifest is the single source of truth.

⚠️ Three SDK gotchas — all hit during real builds:

  1. .insert() only accepts a single object — not an array. To insert N rows, loop for (const r of rows) await platform.db(...).insert(r). See references/gotchas.md.
  2. Only whitelisted columns worklib/platform/org-table-config.ts declares insertable_columns per table. Sending an unlisted column = 400. New table column ≠ writeable until config updated.
  3. Duplicate-key errors come back as db_error 500 — pattern-match the message (/duplicate key|unique constraint/i) to dedupe gracefully.

App Scaffold

Use apps/g1001-app/ as your template. Required files:

apps/<slug>-app/
├── platform.json          # manifest — schema below
├── package.json           # next 14.2.33, react 18, ...
├── next.config.js         # output: 'export', basePath, assetPrefix — see template
├── tsconfig.json
├── pages/
│   ├── _app.tsx           # minimal wrapper
│   └── index.tsx          # dynamic(() => import('../src/MyApp'), { ssr: false })
└── src/
    ├── MyApp.tsx          # top-level, calls Platform.connect()
    ├── vendor/
    │   └── platform-js.ts # copy from g1001-app
    ├── steps/ or components/
    └── lib/

platform.json (required fields all validated on upload)

{
  "schema_version": "1.0",
  "slug": "<your-slug>",
  "name": "<Display Name>",
  "description": "<optional>",
  "version": "1.0.0",
  "runtime": "nextjs",
  "window": {
    "default_width": 820,
    "default_height": 640,
    "resizable": true
  },
  "permissions": {
    "org_write": ["<table>"],
    "org_read": ["<table>"]
  },
  "agent_actions": [
    {
      "id": "get_context",
      "description": "Returns the complete AgentContext snapshot for the current app window."
    },
    {
      "id": "<action_id>",
      "description": "Agent-callable equivalent of a user-visible control. Include params in plain language."
    }
  ]
}

⚠️ Bump version for every redeploy. The deploy endpoint is idempotent on (slug, version) — if you re-upload with the same version, you get back the previous deploymentId (including FAILED ones). New version = new deploy.

Cross-device session persistence (optional)

If your app has user-meaningful in-app state (drafts, view selection, pinned items) that should survive a browser close on machine A and re-open on machine B, add "session_persistence": true to your manifest and implement the session.snapshot postMessage emission + ipos.session.restore handler. See references/session-persistence.md.

next.config.js (template — exact)

const platform = require('./platform.json');
const basePath = `/api/app-static/${platform.slug}`;

module.exports = {
  reactStrictMode: true,
  output: 'export',
  basePath,
  assetPrefix: basePath,
  trailingSlash: true,
  images: { unoptimized: true },
  generateBuildId: async () => null,
};

Missing basePath/assetPrefix → all _next/static/... assets 404 after install. The IPOS shell serves the app from /api/app-static/<slug>/ so paths must be prefixed.


Build, Package, Deploy

cd apps/<slug>-app

# 1. Install + build → produces out/
npm install
npm run build

# 2. Zip — MUST use bash zip, NOT PowerShell Compress-Archive
#    PowerShell creates backslash entries which DevConsole rejects as "Malicious entry"
#    out/ MUST be a sub-folder inside the zip, not the zip's root
zip -r ../<slug>-<version>.zip platform.json out/

# Verify the zip looks right
unzip -l ../<slug>-<version>.zip | head
# Expected:
#   platform.json
#   out/
#   out/index.html
#   out/_next/...

Deploy via API (recommended — DevConsole UI is fragile)

TOKEN='ipos_live_<your-dev-token>'   # localStorage.ipos_dev_token in DevConsole
SLUG='<slug>'

# Step 1: register app (skip if exists; 409 = already exists, that's fine)
curl -X POST 'https://ipdesk.ai/api/ext/v1/apps' \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d "{\"slug\":\"$SLUG\",\"name\":\"...\",\"description\":\"...\"}"

# Step 2: upload + deploy
curl -X POST "https://ipdesk.ai/api/ext/v1/apps/$SLUG/deploy" \
  -H "Authorization: Bearer $TOKEN" \
  -F "zip=@./<slug>-<version>.zip" \
  -F "manifest=__from_zip__"
# Returns: { "deploymentId": "...", "status": "building" }

# Step 3: poll until live (5-15 sec typical)
DID='<deploymentId from step 2>'
for i in $(seq 1 10); do
  curl -s "https://ipdesk.ai/api/ext/v1/deployments/$DID" \
    -H "Authorization: Bearer $TOKEN"
  echo
  sleep 3
done
# Expect: status: "live", runtime_url: "/api/app-static/<slug>/"

# Step 4: stream build log (if it failed)
curl -N "https://ipdesk.ai/api/ext/v1/deployments/$DID/log/stream" \
  -H "Authorization: Bearer $TOKEN"

Deploy via DevConsole UI (alternative)

  1. Open IPOS OS → Apps → 開發者主控台
  2. Drop zip into the dropzone
  3. Click 上傳並部署
  4. Watch build log stream

If the UI says "Malicious entry" → see references/gotchas.md (PowerShell zip).


Flow Compliance Report (give the customer an HTML they can read)

Before declaring the conversion complete, produce a plain-language HTML report for the customer. Full instructions + template in references/flow-compliance-report.md.

Steps:

  1. Ask the customer where to put it. Folder structure varies per customer. Default: docs/<slug>-flow-compliance.html. Confirm before writing.
  2. Do not include this file in the deploy zip — it's documentation, not app code.
  3. Audience is the customer, not engineers. No endpoint, route, iframe, manifest, bucket, postMessage jargon — translate every term.
  4. Cover sections A–F from the template:
    • A. Their original flow (in their own words)
    • B. The new flow on IPOS
    • C. Compliance checklist (✅ / ⚠️ / ❌ against IPOS hard rules)
    • D. Remediation plan for every ⚠️ / ❌ (who does what, by when, can they still use it in the meantime?)
    • E. What the OS Agent can see and do (translate every agent_actions entry into "things you can ask the assistant to do")
    • F. Which IPOS cloud services (platform routes) this app uses, and that IPOS — not the customer — maintains them
  5. Update the report once after deploy succeeds: fill in the actual deploymentId and version, flip status from "draft" to "live".

Verification Checklist

  • [ ] Agent can read the complete state snapshot for the current window/app
  • [ ] Agent can invoke every user-visible control through declared actions
  • [ ] platform.json.agent_actions includes get_context plus every action ID
  • [ ] Disabled Agent actions return a reason instead of silently no-oping
  • [ ] Window title shows <App Name> v<version> (from platform.json — not hardcoded)
  • [ ] App opens in OS iframe without console errors
  • [ ] Every row of the SDK Spec Sheet works end-to-end
  • [ ] _next/static/... assets load (no 404s — confirms basePath is correct)
  • [ ] iframe sandbox in components/os/Window.tsx includes allow-downloads if app triggers downloads
  • [ ] Re-running the flow doesn't crash on duplicate inserts (dedupe via /duplicate key/i match)
  • [ ] No raw API keys / DB credentials anywhere in the app source (grep -r insforge\|sk_\|service_role apps/<slug>-app/src/)
  • [ ] platform.notify(...) actually shows in the OS toast
  • [ ] After OS reload, the app still works (storage bucket persistence)
  • [ ] Flow Compliance HTML report exists at the customer-confirmed location, written in plain language, with all ⚠️/❌ items having a remediation plan

Common bugs and where they're documented

Symptom Where to look
Malicious entry: 404\index.html on upload references/gotchas.md → "PowerShell zip"
does not allow inserts via SDK references/gotchas.md → "Read-only table"
column "..." is not in insertable_columns Same — column whitelist enforcement
502 Bad Gateway from a platform route references/gotchas.md → "Cloudflare 502 stripping"
403 from upstream API in platform route references/gotchas.md → "Datacenter IP block / CF Worker"
iframe download blocked references/gotchas.md → "iframe allow-downloads"
Same deploymentId on retry references/gotchas.md → "Idempotent deploy by version"
_next/static/* 404 next.config.js template above (basePath / assetPrefix)
cloudClient: ... not set 500 env vars — INSFORGE_API_KEY + NEXT_PUBLIC_INSFORGE_BASE_URL on Zeabur

For each, references/gotchas.md has the root cause and the fix.

Platform Route Catalog

Platform Route Catalog

Platform routes are server-side endpoints maintained by IPOS. Apps inside the OS iframe call them with fetch() — auth is handled automatically via the shell token (no Bearer needed).

Live routes

Route Method Purpose Request body Response Auth
/api/platform/v1/ocr POST OCR a (scanned) PDF via Mistral { pdf_base64: string } { data: { text: string } } shell token / API key
/api/platform/v1/uspto-proxy POST Fetch USPTO patent PDF (CORS + IP bypass) { patent_number: string } PDF bytes (application/pdf) shell token / API key
/api/platform/v1/prior-art/bulk POST Upsert prior art + link to case { patent_numbers: string[], country?: string, case_id?: string, oa_date?: string } { data: { inserted, linked, prior_art[] } } Bearer (headless)
/api/platform/v1/search GET Cross-entity search (cases, emails, files) ?q=...&entity=...&limit=... { data: { results[] } } shell token / API key
/api/platform/v1/org/<table> GET/POST/PATCH Generic CRUD on whitelisted tables see lib/platform/org-table-config.ts { rows? row? } shell token only
/api/platform/v1/org/<table>/bulk PATCH Bulk update by id list (≤100) { ids: string[], data: {...} } { updated, rows } shell token only
/api/platform/v1/scan-doc/analyze POST Analyze combined PDF with Gemini Vision — detect document boundaries, case numbers, dates, suggested filenames { pdf_b64: string } (max 10 MB) { data: { documents: [{ doc_type, court, case_number, receipt_date, doc_date, start_page, end_page, suggested_filename, has_attachments, attachment_start_page }], page_count: number } } shell token / cookie session
/api/platform/v1/scan-doc/split POST Split a combined PDF into per-document PDFs using explicit page lists { pdf_b64: string, documents: [{ filename: string, pages: number[] }] } { data: { files: [{ filename: string, pdf_b64: string }] } } shell token / cookie session
/api/app-static/[slug]/[...path] GET Serve static app files from bucket file bytes iframe origin

/api/platform/v1/scan-doc/analyze — sends the PDF inline to Gemini 2.5 Flash Vision via inlineData (no server-side PDF rendering needed). Uses responseSchema with SchemaType enum for structured JSON output. Note: current implementation uses GEMINI_API_KEY env var directly as a temporary shortcut; ideally this should call executeAction() from lib/ai/actions.ts (AI Actions system) so the admin can toggle the model at /admin/ai/actions and the app shows "AI 尚未啟動,請洽管理員" when disabled. Refactor target once executeActionMultimodal() is available. Max PDF is 10 MB. Returns one entry per logical document found in the combined scan.

/api/platform/v1/scan-doc/split — pure server-side PDF splitting via pdf-lib. No external API calls. Accepts the same PDF bytes + an array of { filename, pages: number[] } where pages is an explicit list of 1-indexed page numbers in output order (supports page reordering). Returns each document as base64 for the app to upload via platform.appFileUpload().

Notes per route

/api/platform/v1/uspto-proxy — internally fetches via USPTO_PROXY_URL env var (Cloudflare Worker at uspto-proxy.<sub>.workers.dev) because USPTO blocks Zeabur datacenter IPs. Returns 503 (NOT 502 — Cloudflare strips the body of 502 responses) on failure. See references/route-design-patterns.md → "External API blocked by IP" pattern.

/api/platform/v1/org/<table> — only tables listed in lib/platform/org-table-config.ts are exposed. Table needs insertable_columns: [...] to allow POST and writable_columns: [...] to allow PATCH. Both default to disabled. The route always injects org_id server-side; never accept id or org_id from the caller.

/api/platform/v1/prior-art/bulk vs platform.db('org:prior_art').insert() — the bulk route is for headless callers (Bearer-authed); the SDK is for in-iframe apps. Both work; pick based on caller.

How to call a platform route from inside an app

// In any component or lib — shell token added automatically by the SDK proxy
const res = await fetch('/api/platform/v1/ocr', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ pdf_base64: base64String }),
})
const { data, error } = await res.json()
if (!res.ok || error) throw new Error(error?.message ?? `HTTP ${res.status}`)
console.log(data.text)

For routes that return file bytes (e.g. uspto-proxy):

const res = await fetch('/api/platform/v1/uspto-proxy', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ patent_number: 'US9999999' }),
})
if (!res.ok) {
  const err = await res.json().catch(() => null)
  throw new Error(err?.error?.message ?? `Download failed: ${res.status}`)
}
const buffer = await res.arrayBuffer()

Requesting a new route

If the diagnosis tree (Part 1) reaches Q4, file a Platform Route Request with:

  1. What — one-sentence description of what the route should do
  2. Inputs — JSON fields, file types, max sizes
  3. Outputs — JSON shape, content-type, max size
  4. Why server-side — pick the right reason:
    • cors — external API has no Access-Control-Allow-Origin
    • ip_block — external API blocks datacenter IPs (needs CF Worker proxy)
    • secret — needs an API key/credential held by IPOS
    • compute — needs server CPU/RAM that's too heavy for browser
    • cross_org_join — needs a SQL query that spans tables an SDK call can't reach
  5. Which apps need the route

When IPOS builds the route, it lands in this catalog. The customer's zip never changes — only the platform gains capability.

Pattern decision: where new routes belong

/api/platform/v1/<route>     # public to all installed apps via shell token
/api/ext/v1/<route>          # Bearer-authed, headless callers (CI, scripts, webhooks)
/api/v1/<route>              # IPDesk first-party UI only (cookie session)

A route serving an SDK app should always live under /api/platform/v1/. If it also needs to be callable from a CI script, expose a parallel /api/ext/v1/<route> that wraps the same business logic. Don't expose /api/v1/ to apps — those use the user's session cookie which the iframe doesn't carry.

Reference: Agent Control Surface

Agent Control Surface

Every app connected to IPOS OS must expose a complete Agent Control Surface. The Agent must be able to know what the app is doing and invoke every user-visible control. Treat this as part of the app contract, not an optional enhancement.


Required deliverables

For every app, define these four artifacts before implementation is considered complete:

  1. State schema - a serializable snapshot of all runtime state the UI uses.
  2. Action schema - every user-visible control as an Agent-callable command.
  3. Manifest declarations - platform.json.agent_actions describing context and commands.
  4. Verification - tests or manual proof that the Agent can read state and invoke controls.

If a state field or action is deliberately excluded, document why. Valid reasons are narrow: secrets, raw file bytes too large to serialize, or dangerous actions that require an explicit human confirmation step. "It lives in React state" is not a valid exclusion.


State contract

The state snapshot must include everything needed for the Agent to answer: "What is the user looking at, what has happened, and what can be done next?"

Minimum fields:

interface AgentContext {
  app: {
    slug: string
    version: string
    route: string
  }
  ui: {
    currentView: string
    currentStep?: string
    focusedControl?: string | null
    modal?: string | null
  }
  inputs: Record<string, unknown>
  derived: Record<string, unknown>
  progress: Record<string, unknown>
  results: Record<string, unknown>
  errors: Array<{ code?: string; message: string; target?: string }>
  capabilities: {
    availableActions: string[]
    disabledActions: Record<string, string>
  }
  updatedAt: string
}

Guidelines:

  • Include textarea/input values, selected rows, current step, toggles, progress, success/failure counts, validation errors, and pending async operations.
  • Include summaries for large data, plus IDs/names/counts. Do not send raw file blobs or huge binary payloads.
  • Include enough data for the Agent to continue the workflow without guessing.
  • Use stable action IDs and state field names; do not encode UI labels as API contracts.

Action contract

Every user-visible control needs an equivalent action.

Examples:

UI control Agent action
Type into textarea set_input
Change delay slider set_delay_ms
Toggle push-to-DB checkbox set_push_db
Start button start
Cancel button cancel
Reset button reset
Select row select_row
Remove row remove_row
Download/export download or export

Action shape:

interface AgentActionRequest {
  action: string
  params?: Record<string, unknown>
}

interface AgentActionResult {
  ok: boolean
  state: AgentContext
  error?: { code: string; message: string }
}

Rules:

  • Actions should return the updated state snapshot.
  • Actions must share validation with the UI path where possible.
  • Dangerous or irreversible actions must support dryRun or require an explicit confirmation parameter.
  • If the UI button is disabled, the action should return a structured disabled reason, not silently no-op.

Server app pattern

Server/runtime apps can expose HTTP endpoints directly:

GET  /api/agent/context
POST /api/agent/action

GET /api/agent/context returns the current AgentContext. POST /api/agent/action accepts { action, params }, performs the same state transition as the UI, and returns { ok, state, error? }.

These endpoints must be shell-authenticated. Do not expose them as public unauthenticated APIs.


Static zip app pattern

Static apps have no server-side runtime, so they cannot implement /api/agent/context by themselves. They must push state to the OS shell through platform-js and receive action commands from the shell.

Required platform-js capabilities:

platform.agent.updateContext(context)
platform.agent.registerAction(actionId, handler)

Until these SDK methods exist in the repo, static apps are not fully Agent-controllable. Do not mark the conversion complete; document the missing platform-js bridge as a platform prerequisite.

Expected static app flow:

  1. Top-level app builds an AgentContext from React state.
  2. On every meaningful state change, app calls platform.agent.updateContext.
  3. App registers handlers for every action in the Action contract.
  4. OS shell stores the latest context per window/app/installation.
  5. OS Agent reads that shell-stored context and dispatches actions through the same bridge.

Do not solve this with localStorage, DOM scraping, or direct iframe inspection. The supported path is explicit postMessage state/action bridging.


Manifest declarations

Every app must declare Agent capabilities in platform.json:

{
  "agent_actions": [
    {
      "id": "get_context",
      "description": "Returns the complete AgentContext snapshot for the current app window."
    },
    {
      "id": "set_input",
      "description": "Sets the main input text. Params: { value: string }"
    },
    {
      "id": "start",
      "description": "Starts the main workflow using the current state."
    },
    {
      "id": "cancel",
      "description": "Cancels the running workflow if one is active."
    },
    {
      "id": "reset",
      "description": "Resets the workflow to its initial state."
    }
  ]
}

Use exact action IDs from the Action contract. The indexer reads platform.json.agent_actions into ipos_app_capability_index.agent_hints, so missing declarations make the app invisible to the Agent.


Example: g1002 required state/actions

State:

  • rawInput
  • parsed patent list and validation errors
  • delayMs
  • pushDb
  • phase
  • per-row status, size, source, error
  • cancel/running status
  • success/failure/total counts
  • whether each action is currently available

Actions:

  • set_input({ value })
  • set_delay_ms({ value })
  • set_push_db({ value })
  • start()
  • cancel()
  • reset()

Without the static-app platform-js bridge, g1002 can be opened by the Agent but cannot be considered Agent-controllable.



Standard commands: session persistence

These two postMessage shapes are used when the app declares "session_persistence": true in its manifest. They are OS-level contracts, not Agent-level actions, but they share the same postMessage transport and are documented here for completeness.

session.snapshot (app → OS)

Required for: apps that declare session_persistence: true in the manifest.

Direction: app posts to parent (OS shell).

Shape: { kind: 'ipos:request', type: 'session.snapshot', id, payload: { state: unknown, schemaVersion: number } }

When to send: any time persistable app state has changed. The OS debounces to one write per second per window, so over-eager calls are safe.

Returns: standard ipos:response ack with { ok: true }. Errors (e.g. oversize payload >64KB) are silently dropped server-side — the previous good snapshot remains.

ipos.session.restore (OS → app)

Required for: apps that declare session_persistence: true.

Direction: OS posts to app iframe.

Shape: { type: 'ipos.session.restore', windowId, payload: { state: unknown, schemaVersion: number } }

When received: once, after the iframe completes its handshake request, if a prior snapshot exists. Not sent when the user has never used the app before.

Behavior: apply or migrate the prior state. Throwing inside the handler is a silent no-op (OS does not retry); apps should fail gracefully and continue with the default empty state.


Verification checklist

  • [ ] platform.json.agent_actions lists context plus every action.
  • [ ] Agent can retrieve current state after opening the app.
  • [ ] State includes all visible inputs, selected items, progress, results, and errors.
  • [ ] Agent can invoke every UI control through an action.
  • [ ] Actions return updated state.
  • [ ] Disabled actions return a reason.
  • [ ] Sensitive/large fields are summarized and documented.
  • [ ] Static app uses platform-js bridge; server app uses shell-authenticated /api/agent/* endpoints.
  • [ ] If session_persistence: true: both session.snapshot emission and ipos.session.restore handling are implemented and tested across schema bumps.

Reference: Route Design Patterns

Route Design Patterns

Recipes for the most common platform route scenarios, ordered from simplest to most complex. For env var setup, see gotchas #16. For the CF Worker deploy flow, see gotchas #8 and #15.


Pattern A: Simple CORS proxy

Use when: External API works from any server but browser fetch() fails with CORS error (no Access-Control-Allow-Origin header).

Route file template

// app/api/platform/v1/<route-name>/route.ts
import { NextRequest } from 'next/server'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'

export async function POST(req: NextRequest) {
  const ctx = await buildRequestContext(req)
  if (!ctx.ok) return ctx.errorResponse

  let body: { field: string }
  try { body = await req.json() } catch { return errJson('BAD_REQUEST', 'invalid JSON', 400) }
  if (!body.field) return errJson('BAD_REQUEST', 'missing field', 400)

  try {
    const upstream = await fetch('https://api.example.com/endpoint', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ field: body.field }),
      signal: AbortSignal.timeout(25_000),   // Zeabur limit is 30s; 25s leaves room
    })
    if (!upstream.ok) {
      const text = await upstream.text().catch(() => '')
      return errJson('UPSTREAM_ERROR', `upstream ${upstream.status}: ${text.slice(0, 200)}`, 503)
    }

    // JSON response variant:
    return Response.json({ data: await upstream.json() })

    // Binary response variant (PDF, etc.):
    // return new Response(await upstream.arrayBuffer(), {
    //   headers: { 'Content-Type': upstream.headers.get('Content-Type') ?? 'application/pdf' },
    // })
  } catch (err) {
    return errJson('PROXY_ERROR', err instanceof Error ? err.message : 'network error', 503)
  }
}

Checklist:

  • buildRequestContext always runs first — validates caller identity
  • AbortSignal.timeout(25_000) — Zeabur hard-kills at 30s; leave 5s margin
  • Return 503 (not 502) on upstream failure — Cloudflare strips 502 bodies (gotchas #7)
  • text.slice(0, 200) — caps upstream error so sensitive page HTML doesn't leak to the caller

Pattern B: IP-blocked external API (CF Worker egress proxy)

Use when: External API returns 403 from Zeabur's datacenter IPs but works from your local machine. User-Agent alone does not fix it.

How to confirm: curl <url> locally → 200. Same curl via Zeabur exec → 403 or connection refused.

Examples: USPTO full-text images, some government portals.

Step 1 — Write the CF Worker

// infra/cf-workers/<upstream>-proxy/worker.js
export default {
  async fetch(req) {
    const u = new URL(req.url)
    const path = u.pathname.replace(/^\//, '')   // strip leading slash; pass the rest as upstream path
    const upstream = await fetch(`https://upstream.example.com/${path}`, {
      headers: { 'User-Agent': 'Mozilla/5.0 (compatible; YourApp/1.0)' },
      cf: { cacheEverything: true, cacheTtl: 86400 },
    })
    return new Response(upstream.body, {
      status: upstream.status,
      headers: {
        'Content-Type': upstream.headers.get('Content-Type') ?? 'application/octet-stream',
        'Cache-Control': 'public, max-age=86400',
      },
    })
  },
}
# infra/cf-workers/<upstream>-proxy/wrangler.toml
name = "<upstream>-proxy"
main = "worker.js"
compatibility_date = "2024-01-01"
# Deploy (once, not part of app deploy cycle)
cd infra/cf-workers/<upstream>-proxy
CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy
# Prints: https://<upstream>-proxy.<sub>.workers.dev  ← copy this

Step 2 — Set the env var in Zeabur

Naming convention: <UPSTREAM>_PROXY_URL in ALL_CAPS_SNAKE_CASE.

# Use gotchas #16 curl mutation, or Zeabur dashboard
<UPSTREAM>_PROXY_URL=https://<upstream>-proxy.<sub>.workers.dev

Step 3 — Write the server helper

// lib/platform/<upstream>.ts
export async function fetchFromUpstream(path: string): Promise<ArrayBuffer> {
  const proxyBase = process.env.<UPSTREAM>_PROXY_URL?.replace(/\/$/, '')
  // Local dev: proxyBase is unset → falls back to direct (works from laptop, not from Zeabur)
  const url = proxyBase
    ? `${proxyBase}/${path}`
    : `https://upstream.example.com/${path}`
  const res = await fetch(url, { signal: AbortSignal.timeout(25_000) })
  if (!res.ok) throw new Error(`upstream ${res.status} for ${path}`)
  return res.arrayBuffer()
}

Step 4 — Use the helper from the route

Call fetchFromUpstream(path) inside the route's try block, same as Pattern A.

Live example: infra/cf-workers/uspto-proxy/ + lib/platform/uspto.ts


Pattern C: Two-step scrape proxy

Use when: The external API has no direct download URL — you must fetch a page first, parse a URL from it, then fetch the actual resource.

Example: Google Patents — the patent page HTML contains <meta name="citation_pdf_url" content="...">. You parse that to get the real PDF URL.

Worker (two fetches per user request)

// infra/cf-workers/google-patents-proxy/worker.js
export default {
  async fetch(req) {
    const u = new URL(req.url)
    const patentId = u.pathname.replace(/^\//, '')   // e.g. "US7654321B2"

    // Step 1: fetch the HTML patent page
    const pageRes = await fetch(`https://patents.google.com/patent/${patentId}`, {
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'text/html,application/xhtml+xml',
      },
    })
    if (!pageRes.ok) return new Response(`page fetch failed: ${pageRes.status}`, { status: pageRes.status })

    // Step 2: parse the PDF URL out of the meta tag
    const html = await pageRes.text()
    const match = html.match(/<meta\s+name="citation_pdf_url"\s+content="([^"]+)"/)
    if (!match) return new Response('no citation_pdf_url meta tag found', { status: 404 })

    // Step 3: fetch the actual PDF
    const pdfRes = await fetch(match[1], {
      headers: { 'User-Agent': 'Mozilla/5.0 (compatible)' },
      cf: { cacheEverything: true, cacheTtl: 3600 },
    })
    if (!pdfRes.ok) return new Response(`pdf fetch failed: ${pdfRes.status}`, { status: pdfRes.status })

    return new Response(pdfRes.body, {
      headers: { 'Content-Type': 'application/pdf', 'Cache-Control': 'public, max-age=3600' },
    })
  },
}

Fragility note: Two-step workers depend on the source page's HTML structure. If the meta tag format changes, the worker silently 404s. Add a comment in the route README about what to check if it breaks.

Live example: infra/cf-workers/google-patents-proxy/


Pattern D: Multi-source fallback

Use when: The same data can be fetched from multiple sources, and you want resilience when the primary fails.

Client-side vs server-side — which to pick

Client-side (zip) Server-side (platform route)
How calls are made Separate fetch() per source Route tries all sources internally
User sees which source worked Yes — show progress per source No — one call, returns result or error
Changing source order Requires zip redeploy Push to main; no zip change
Debugging Source name visible in UI Logged server-side only
Use when User benefits from seeing per-source status User just needs the file; hide complexity

Client-side fallback (in the zip)

// apps/<slug>-app/src/lib/try-sources.ts
interface Source { name: string; fetch: () => Promise<ArrayBuffer> }

export async function downloadWithFallback(
  sources: Source[],
): Promise<{ buffer: ArrayBuffer; source: string }> {
  const errors: string[] = []
  for (const { name, fetch } of sources) {
    try {
      const buffer = await fetch()
      return { buffer, source: name }
    } catch (err) {
      errors.push(`${name}: ${err instanceof Error ? err.message : String(err)}`)
    }
  }
  throw new Error(errors.join(' | '))
}

// Usage:
const { buffer, source } = await downloadWithFallback([
  { name: 'primary',  fetch: () => fetchFromPrimary(id) },
  { name: 'fallback', fetch: () => fetchFromFallback(id) },
])

Server-side fallback (inside platform route)

// Inside the POST handler
const sources = [fetchFromPrimary, fetchFromFallback, fetchFromTertiary]
let lastErr = new Error('no sources')
for (const fetch of sources) {
  try { return new Response(await fetch(id), { headers: { 'Content-Type': 'application/pdf' } }) }
  catch (err) { lastErr = err instanceof Error ? err : new Error(String(err)) }
}
return errJson('ALL_SOURCES_FAILED', lastErr.message, 503)

Live example (client-side): apps/g1002-app/src/lib/try-sources.ts — Google Patents (with suffix) → Google Patents (bare) → USPTO, with per-row progress shown to user.


Pattern E: AI analysis via AI Actions system + file storage

Use when: The app needs to call an AI model (text, vision, or multimodal) and optionally save output files to 檔案總管 (app_files table).

Why server-side: AI provider credentials are held by the platform. The browser never touches an API key.


How IPOS AI works

The platform has an AI Actions framework. Each "action" is a row in the ai_actions table:

Column Meaning
slug Unique ID the route uses to look up the action (scan-doc-analyze, ocr, …)
provider_id Points to ai_providers row (OpenRouter, BazaarLink, …)
model_id Model string sent to the provider (google/gemini-2.5-flash, etc.)
is_enabled Gate — if false, route returns AI_NOT_ENABLED; app shows "AI 尚未啟動,請洽管理員"
system_prompt Default system prompt; route can override per-call
config JSON blob for extra provider settings

Admin configures actions at https://ipdesk.ai/admin/ai/actions. App developers never configure providers or API keys — they only know the action slug.

Providers (ai_providers table) store base_url + encrypted api_key. Both OpenRouter and BazaarLink expose an OpenAI-compatible API, so the same client code works for any model they route to (Gemini, Claude, GPT-4o, etc.).


Platform-side: call executeAction

// app/api/platform/v1/<slug>/analyze/route.ts
import { NextRequest } from 'next/server'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'
import { executeAction } from '@/lib/ai/actions'

export async function POST(req: NextRequest) {
  const ctx = await buildRequestContext(req)
  if (!ctx.ok) return ctx.errorResponse

  let body: { text?: string }
  try { body = await req.json() } catch { return errJson('BAD_REQUEST', 'invalid JSON', 400) }
  if (!body.text) return errJson('BAD_REQUEST', 'text is required', 400)

  try {
    const result = await executeAction(
      'your-action-slug',   // matches ai_actions.slug
      ctx.orgId,            // for usage logging
      body.text,            // user message
    )
    return Response.json({ data: { content: result.content } })
  } catch (err) {
    const msg = err instanceof Error ? err.message : 'unknown error'
    if (msg.includes('disabled') || msg.includes('not found')) {
      return errJson('AI_NOT_ENABLED', msg, 503)
    }
    return errJson('AI_ERROR', msg, 500)
  }
}

executeAction (in lib/ai/actions.ts) handles everything: slug lookup, is_enabled check, key decryption, OpenAI-compatible call, usage logging. If is_enabled is false it throws — the route maps that to AI_NOT_ENABLED HTTP 503.


App side: handle AI_NOT_ENABLED

const res = await fetch('/api/platform/v1/<slug>/analyze', {
  method: 'POST',
  credentials: 'include',   // required for iframe → platform route auth
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: userInput }),
})
const { data, error } = await res.json()

if (!res.ok || error) {
  if (error?.code === 'AI_NOT_ENABLED') {
    setError('AI 功能尚未啟動,請洽管理員')
  } else {
    setError(error?.message ?? `Error ${res.status}`)
  }
  return
}
// use data.content

Every AI-powered app must handle AI_NOT_ENABLED and display a friendly message. Never show raw API errors to the user.


Processing / splitting route template (no AI — pure compute)

When the second route just transforms files without AI (e.g. PDF splitting), it uses no AI Actions:

// app/api/platform/v1/<slug>/split/route.ts
import { NextRequest } from 'next/server'
import { PDFDocument as PdfLibDoc } from 'pdf-lib'
import { buildRequestContext } from '@/lib/platform/build-request-context'
import { errJson } from '@/lib/platform/err-json'

function err(code: string, message: string, status = 400) {
  return Response.json({ data: null, error: { code, message } }, { status })
}

export async function POST(req: NextRequest) {
  const ctx = await buildRequestContext(req)
  if (!ctx.ok) return ctx.errorResponse

  const { pdf_b64, documents } = await req.json()
  if (!pdf_b64 || !Array.isArray(documents) || documents.length === 0)
    return err('BAD_REQUEST', 'pdf_b64 and documents[] required')

  const srcDoc = await PdfLibDoc.load(Buffer.from(pdf_b64, 'base64'), { ignoreEncryption: true })
  const files: { filename: string; pdf_b64: string }[] = []

  for (const doc of documents) {
    if (!Array.isArray(doc.pages) || doc.pages.length === 0)
      return err('BAD_REQUEST', `pages must be a non-empty array for "${doc.filename}"`)

    const outDoc = await PdfLibDoc.create()
    const copied = await outDoc.copyPages(srcDoc, doc.pages.map((p: number) => p - 1))
    copied.forEach((page) => outDoc.addPage(page))
    const bytes = await outDoc.save()
    files.push({ filename: doc.filename, pdf_b64: Buffer.from(bytes).toString('base64') })
  }

  return Response.json({ data: { files } })
}

App side: save output files to 檔案總管

After the split/processing route returns files, upload each one via the platform SDK:

for (const file of splitResult.files) {
  await platform.appFileUpload({
    name: `${file.filename}.pdf`,
    mime_type: 'application/pdf',
    content_b64: file.pdf_b64,
    metadata: { source: 'your-app-slug', original_file: uploadedFileName },
  })
}

platform.appFileUpload() requires org_write: ["app_files"] in platform.json.permissions. Files appear immediately in 檔案總管 for that org.


Multimodal / Vision (PDF inline to AI)

executeAction handles text-only prompts. For Vision/multimodal (sending a PDF or image to the AI), a future executeActionMultimodal() function will be added to lib/ai/actions.ts. Until then, multimodal routes may use a platform-managed env var directly (e.g. GEMINI_API_KEY on Zeabur) as a temporary shortcut — but this bypasses the AI Actions system (no is_enabled gate, no admin model selection, no usage logging).

Tech debt note: The current scan-doc/analyze route uses this shortcut (direct GEMINI_API_KEY). Once executeActionMultimodal() is available it should be refactored. New multimodal routes should wait for executeActionMultimodal() rather than repeating the shortcut.

If using the shortcut temporarily, the SchemaType enum gotcha still applies — see the test mock pattern below.

Test mock for @google/generative-ai (shortcut routes only)

vi.mock('@google/generative-ai', () => ({
  SchemaType: {           // REQUIRED — omitting causes SchemaType.ARRAY to be undefined at import
    ARRAY: 'array',
    OBJECT: 'object',
    STRING: 'string',
    INTEGER: 'integer',
    BOOLEAN: 'boolean',
    NUMBER: 'number',
  },
  GoogleGenerativeAI: vi.fn().mockImplementation(() => ({
    getGenerativeModel: vi.fn().mockReturnValue({
      generateContent: vi.fn().mockResolvedValue({
        response: { text: () => JSON.stringify([{ title: 'Test Doc', start_page: 1, end_page: 2 }]) },
      }),
    }),
  })),
}))

Admin setup checklist (when building a new AI route)

  1. Decide the action slug (e.g. invoice-extract, contract-summarize)
  2. Ask the platform admin to create the row in ai_actions at /admin/ai/actions:
    • slug: your chosen slug
    • provider_id: pick OpenRouter or BazaarLink
    • model_id: the model string (e.g. google/gemini-2.5-flash)
    • is_enabled: true to activate
    • system_prompt: the domain-specific instruction
  3. Call executeAction(slug, orgId, userMessage) from your route
  4. Your app handles AI_NOT_ENABLED (shows "AI 尚未啟動,請洽管理員")

Live example: app/api/platform/v1/scan-doc/analyze/route.ts and split/route.ts — scan-doc is a worked example of this pattern (note: analyze currently uses the direct-key shortcut; split has no AI). See references/scan-doc-walkthrough.md.


Test recipe

Every new platform route needs:

1. Unit tests for the helper (tests/<slug>/<helper>.test.ts)

import { vi, describe, it, expect, beforeEach } from 'vitest'
import { fetchFromUpstream } from '../../lib/platform/<upstream>'

beforeEach(() => { vi.restoreAllMocks() })

describe('fetchFromUpstream', () => {
  it('returns buffer on 200', async () => {
    vi.stubGlobal('fetch', async () => new Response(new Uint8Array([1, 2, 3]).buffer))
    const buf = await fetchFromUpstream('test-path')
    expect(buf.byteLength).toBe(3)
  })

  it('throws with status code on non-200', async () => {
    vi.stubGlobal('fetch', async () => new Response(null, { status: 404 }))
    await expect(fetchFromUpstream('bad')).rejects.toThrow('404')
  })

  it('respects <UPSTREAM>_PROXY_URL env var', async () => {
    process.env.<UPSTREAM>_PROXY_URL = 'https://proxy.example.com'
    let calledUrl = ''
    vi.stubGlobal('fetch', async (url: string) => { calledUrl = url; return new Response(new ArrayBuffer(0)) })
    await fetchFromUpstream('foo/bar').catch(() => {})
    expect(calledUrl).toContain('proxy.example.com/foo/bar')
    delete process.env.<UPSTREAM>_PROXY_URL
  })
})

2. Smoke test (manual, run after each Zeabur deploy)

TOKEN='ipos_live_<your-token>'

# Happy path
curl -s -X POST https://ipdesk.ai/api/platform/v1/<route-name> \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"<field>": "<test-value>"}' \
  -o /tmp/result.bin \
  -w "HTTP=%{http_code} BYTES=%{size_download} CT=%{content_type}\n"

# For PDF responses: check magic bytes
head -c 4 /tmp/result.bin | xxd   # should show: 25 50 44 46 = %PDF

# Error path — bad input should return 400 JSON, not 500 HTML
curl -s -X POST https://ipdesk.ai/api/platform/v1/<route-name> \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}' | jq .

Env var naming convention

Type Convention Example
CF Worker egress proxy URL <UPSTREAM>_PROXY_URL USPTO_PROXY_URL, GOOGLE_PATENTS_PROXY_URL
Third-party API key (held by IPOS) <SERVICE>_API_KEY MISTRAL_API_KEY
Feature flag / toggle <FEATURE>_ENABLED PRIOR_ART_PUSH_ENABLED

Always set via Zeabur API mutation (gotchas #16) or the Zeabur dashboard — never commit production values into .env.local or the codebase.

After setting a new env var, trigger a redeploy: push a noop commit, or use zeabur-restart skill if no code change is needed.

Reference: Gotchas

Gotchas — Real Issues Hit During Deploys

Every entry here was debugged from a real failure. Each entry: symptom → root cause → fix. New one? Add it after fixing it.


1. PowerShell zip → "Malicious entry: 404\index.html"

Symptom: DevConsole upload shows Malicious entry: 404\index.html (note the backslash) in red.

Root cause: Windows PowerShell Compress-Archive writes ZIP entries with backslash separators (out\404\index.html). The IPOS extractor flags backslash entries as path-traversal attempts.

Fix: Use bash zip instead (Git Bash / WSL works on Windows).

# Wrong (PowerShell)
Compress-Archive -Path platform.json,out -DestinationPath app.zip   # ❌ backslashes

# Right (bash zip)
cd apps/<slug>-app
zip -r ../<slug>-<version>.zip platform.json out/                    # ✅ forward slashes

Verify with unzip -l app.zip — entries should look like out/_next/..., never out\_next\....


2. ZIP structure: out/ must be a sub-folder, not the root contents

Symptom: Build fails with npm error code ENOENT ... package.json — the static-adapter doesn't find out/ and falls back to running npm install (which then fails because no package.json is in the zip).

Root cause: lib/app-deploy/static-adapter.ts looks for <extractedDir>/out/. If you zipped the contents of out/ directly (so index.html is at zip root), there's no out/ folder and the adapter thinks it needs to build.

Fix: Zip with out/ as a sub-folder, alongside platform.json:

# Wrong: zipped contents of out/ at zip root
cd apps/<slug>-app/out && zip -r ../app.zip .                # ❌

# Right: zip platform.json + out/ sibling
cd apps/<slug>-app && zip -r ../app.zip platform.json out/   # ✅

Verify with unzip -l app.zip | head — first lines should be platform.json and out/.


3. Same deploymentId returned on retry — old failed deploy, not new

Symptom: You uploaded a fixed zip but GET /api/ext/v1/deployments/<id> still shows the old failed status. The deploy endpoint returns the same deploymentId you got the first time.

Root cause: startDeployment is idempotent on (slug, version). Same version + new zip = same deploymentId returned (the original — including FAILED ones).

Fix: Bump version in platform.json before re-uploading. Even just 1.0.01.0.1. Build → zip → deploy → new deploymentId.


4. does not allow inserts via SDK (read-only table)

Symptom: App calls platform.db('org:<table>').insert(...) and gets back { error: { code: 'read_only', message: '<table> does not allow inserts via SDK' } }.

Root cause: lib/platform/org-table-config.ts doesn't list insertable_columns for that table → INSERT is disabled by default.

Fix: Edit lib/platform/org-table-config.ts, add insertable_columns: [...] to the table entry, including only columns the app legitimately needs to write. id, org_id, created_at are ALWAYS server-managed and never go in the whitelist. Then push to main and let Zeabur redeploy.

// Before
prior_art: {
  columns: [...],
  order_by: '...',
  default_limit: 500,
},
// After
prior_art: {
  columns: [...],
  order_by: '...',
  default_limit: 500,
  insertable_columns: ['patent_number', 'country', 'title'],
},

5. column "..." is not in insertable_columns (column whitelist)

Symptom: 400 { code: 'bad_request', message: 'column "source" is not in insertable_columns for prior_art' }.

Root cause: App is trying to insert a column not in the whitelist. Either the column doesn't exist on the table at all, or it does exist but isn't safe to expose via SDK.

Fix: Either drop the column from your insert payload (if it's bogus — e.g. source: 'g1001' when prior_art has no source column), or add it to insertable_columns if legitimately needed. Check the migration for the table to see what columns actually exist.


6. SDK .insert(array) silently drops all but first row

Symptom: App passes an array to .insert(). Either the route 400s ("data must be object") or only one row lands in the DB.

Root cause: platform.db().insert() accepts a single Record<string, unknown>, not an array. The SDK type signature is (data: Record<string, unknown>): Promise<T>. TypeScript will let you as unknown as Record<string, unknown> cast an array through the signature; the route then receives an array as body.data and rejects it.

Fix: Loop:

for (const row of rows) {
  try {
    await platform.db('org:<table>').insert(row)
  } catch (err) {
    const msg = err instanceof Error ? err.message : String(err)
    if (/duplicate key|unique constraint/i.test(msg)) continue   // dedupe
    throw err
  }
}

For genuinely large batches needing transactional semantics, use the dedicated bulk endpoint (e.g. /api/platform/v1/prior-art/bulk) — but those are Bearer-authed, not shell-token; only useful from headless callers.


7. Cloudflare strips 502 response bodies

Symptom: Platform route is supposed to return JSON error envelope, but the client sees Cloudflare's "error code: 502" HTML page. Logs show the route ran and returned errJson('PROXY_ERROR', msg, 502) correctly.

Root cause: Cloudflare in front of the origin replaces 502 responses from origin with its own bad-gateway HTML page, throwing away your JSON body.

Fix: Don't return 502 from your origin code — return 503 instead.

// Wrong
return errJson('PROXY_ERROR', msg, 502)   // ❌ CF eats the body

// Right
return errJson('PROXY_ERROR', msg, 503)   // ✅ CF passes through

Same applies to other CF "interception" status codes if you find them. 503 is reliably proxied.


8. External API blocks Zeabur datacenter IP (403)

Symptom: Platform route returns 502 / 503 PROXY_ERROR upstream 403. Curling the external API from your local machine returns 200, but from Zeabur returns 403. User-Agent doesn't help.

Root cause: External service has IP-range blocking on cloud-provider ASNs. Common offenders: gov sites (USPTO), some CDN-protected APIs.

Fix: Run a Cloudflare Worker as an egress proxy. CF edge IPs are not blocked. Pattern:

// Worker (worker.js)
export default {
  async fetch(req) {
    const u = new URL(req.url)
    const path = u.pathname.replace(/^\//, '')
    const r = await fetch(`https://upstream.example.com/${path}`, {
      headers: { 'User-Agent': 'Mozilla/5.0 (compatible; YourApp/1.0)' },
      cf: { cacheEverything: true, cacheTtl: 86400 },
    })
    return new Response(r.body, {
      status: r.status,
      headers: { 'Content-Type': r.headers.get('Content-Type') ?? 'application/octet-stream' },
    })
  },
}

Deploy: CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy. Set <UPSTREAM>_PROXY_URL=https://<name>.<sub>.workers.dev on Zeabur. Have your platform route call process.env.<UPSTREAM>_PROXY_URL when set, else fall back direct.

Live example: infra/cf-workers/uspto-proxy/ + lib/platform/uspto.ts (env var USPTO_PROXY_URL).


9. iframe download blocked: "Download is disallowed"

Symptom: Console shows Download is disallowed. The frame initiating or instantiating the download is sandboxed. Code does <a href={url} download>...</a> or URL.createObjectURL + a.click().

Root cause: components/os/Window.tsx iframe sandbox doesn't include allow-downloads.

Fix: Add allow-downloads to the sandbox attribute:

// Before
sandbox="allow-scripts allow-forms allow-same-origin allow-popups"
// After
sandbox="allow-scripts allow-forms allow-same-origin allow-popups allow-downloads"

10. _next/static/... 404s after deploy

Symptom: App opens, blank page or missing JS, console shows 404s like /_next/static/chunks/.... Browser network tab shows requests going to /_next/... not /api/app-static/<slug>/_next/....

Root cause: next.config.js is missing basePath and assetPrefix. Without them, Next.js generates absolute paths starting with /_next/, but the OS shell serves the app from /api/app-static/<slug>/.

Fix: Use the exact template from part2-implementation.md:

const platform = require('./platform.json');
const basePath = `/api/app-static/${platform.slug}`;
module.exports = {
  output: 'export',
  basePath,
  assetPrefix: basePath,
  trailingSlash: true,
  images: { unoptimized: true },
  generateBuildId: async () => null,
};

11. InsForge .single() throws on 0 rows

Symptom: Code does .single() on a query that should be optional, gets uncaught exception that escapes the route's try-catch.

Root cause: InsForge's .single() (unlike Postgrest's) throws when 0 rows match, not return null. The exception propagates past route handlers if not caught.

Fix: Use .limit(1) + array access:

// Wrong
const { data } = await admin.database.from('x').select('*').eq('id', id).single()
// Right
const { data: rows } = await admin.database.from('x').select('*').eq('id', id).limit(1)
const row = Array.isArray(rows) ? rows[0] : null

And always wrap in try-catch when failure is non-fatal.


12. Platform 500s with empty body (cold start)

Symptom: First request to /api/ext/v1/* after a quiet period returns 500 with no body. Subsequent requests work.

Root cause: Zeabur container cold start. Module-level imports (cloudClient() initialization, etc.) take 1-2s; first request can race the init.

Fix: Just retry. Don't conclude the API is broken from a single 500. Verify with curl /api/ext/v1/me after the first 500 — if it returns 200, the cold start is the cause.


13. pdfjs-dist (or other dep) build fails: "Cannot find module"

Symptom: Zeabur build fails with Type error: Cannot find module 'pdfjs-dist' or its corresponding type declarations. The package IS in apps/<slug>-app/package.json but root tsconfig.json is type-checking the file.

Root cause: Root tsconfig.json includes **/*.ts, which sweeps in apps/<slug>-app/src/lib/*.ts. Those files import pdfjs-dist which is only in the app's local package.json, not root.

Fix: Add to root tsconfig.json:

{
  "exclude": ["apps/**/*", "node_modules", ".next"]
}

Or run npm install <dep> at root too. Excluding is cleaner.


14. npm install pdfjs-dist outputs error in PowerShell with tail

Symptom: npm install ... | tail -5 fails with tail not recognized.

Root cause: tail is not a PowerShell cmdlet.

Fix: Use Select-Object -Last 5 in PowerShell or pipe through bash. Annoying but stable: npm install foo 2>&1 | Select-Object -Last 5.


15. Cloudflare token verification before deploy

Before deploying a CF Worker, verify the token works:

curl -s "https://api.cloudflare.com/client/v4/user/tokens/verify" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"
# Expect: {"result":{"id":"...","status":"active"},"success":true,...}

If success:false → token wrong / expired. If 403 → token doesn't have Workers Scripts:Edit. Re-create with the "Edit Cloudflare Workers" template.

Then deploy:

cd infra/cf-workers/<name>
CLOUDFLARE_API_TOKEN=<token> npx wrangler deploy

The output prints the https://<name>.<sub>.workers.dev URL. Copy that into Zeabur env vars.


16. Setting Zeabur env vars via API (no dashboard needed)

curl -s -X POST 'https://api.zeabur.com/graphql' \
  -H "Authorization: Bearer $ZEABUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d "{\"query\":\"mutation { createEnvironmentVariable(serviceID: \\\"$SVC\\\", environmentID: \\\"$ENV\\\", key: \\\"$K\\\", value: \\\"$V\\\") { key value } }\"}"

The mutation is createEnvironmentVariable, not addEnvironmentVariable (will get a "did you mean" hint if wrong). Update existing var: updateEnvironmentVariable. After mutation, env vars apply to the next deploy — push a noop commit if you need to force a redeploy now.


18. Pre-push hook failure on first run (transient)

Symptom: git push fails with a hook error (e.g., npm run test:coverage:ci exits non-zero, or tsc --noEmit times out) even though the code is clean. Re-running the same push immediately succeeds.

Root cause: Pre-push hooks run in the same process as the git operation. On cold starts (first push after a long idle, or a fresh terminal), Node.js/TypeScript module resolution can race against the hook timer, especially on Windows with large repos.

Fix: Just retry. Push again with the same command. If the second attempt fails with an actual error message (not a timeout/ENOENT), then investigate.

Anti-pattern: --no-verify to skip the hook. Only use --no-verify when you know the error is a known flake AND you'll fix the root cause immediately after.


19. @google/generative-ai SchemaType: use enum, not string literals

Symptom: Zeabur build fails with TS2322: Type '"array"' is not assignable to type 'SchemaType'. Locally tsc --noEmit also fails. A @ts-expect-error suppressor added above the schema const triggers TS2578: Unused '@ts-expect-error' directive (because the error it was suppressing now appears on a different line).

Root cause: @google/generative-ai >= 0.24 requires SchemaType enum values (.ARRAY, .OBJECT, .STRING, .INTEGER, .BOOLEAN) instead of raw string literals. The SDK previously accepted strings via a looser union type; stricter TypeScript versions now reject them.

Fix:

// Wrong
import { GoogleGenerativeAI, type Schema } from '@google/generative-ai'
const schema: Schema = { type: 'array', items: { type: 'object', ... } }  // ❌

// Right
import { GoogleGenerativeAI, SchemaType } from '@google/generative-ai'
import type { Schema } from '@google/generative-ai'
const schema: Schema = { type: SchemaType.ARRAY, items: { type: SchemaType.OBJECT, ... } }  // ✅

Also fix the test mock: If you mock @google/generative-ai in vitest, the mock factory must export SchemaType — otherwise the route file imports SchemaType as undefined at module evaluation time, and SchemaType.ARRAY throws a runtime error that fails all tests:

vi.mock('@google/generative-ai', () => ({
  SchemaType: { ARRAY: 'array', OBJECT: 'object', STRING: 'string', INTEGER: 'integer', BOOLEAN: 'boolean', NUMBER: 'number' },
  GoogleGenerativeAI: vi.fn().mockImplementation(() => ({ ... })),
}))

20. ipos_apps.author_org_id must be set for org-owned apps

Symptom: App deploy call returns 403 or the app doesn't appear in the OS store for the intended org. The deploy went through but permissions fail at the org boundary.

Root cause: ipos_apps.author_org_id = null means the app is treated as a platform app (owned by IPOS itself). Org-level permission checks (org_write: ["app_files"] etc.) resolve differently for platform apps vs org-owned apps. An org-owned app dogfooding its own platform must have author_org_id set to the org's UUID.

Fix: After inserting the row in ipos_apps, update it:

UPDATE ipos_apps SET author_org_id = '<your-org-uuid>' WHERE slug = '<your-slug>';

Or include it in the INSERT:

await admin.database.from('ipos_apps').insert({
  slug: 'your-slug',
  name: 'Your App',
  author_org_id: 'f0753aa5-dcbe-4310-a65f-96e80729d3ef',  // ← required for org-owned apps
  ...
})

To find your org UUID: SELECT id FROM organizations WHERE name = 'Your Org Name' via InsForge admin or zeabur-service-exec.


21. Direct fetch() from iframe to platform routes needs credentials: 'include'

Symptom: Platform route returns 401 when called from the app. The app is running in the OS iframe. buildRequestContext can't find a userId.

Root cause: The platform-js SDK's internal methods (like platform.db(), platform.appFileUpload()) use the shell-injected auth token automatically. But if the app makes a direct fetch() call to a platform route (e.g. /api/platform/v1/scan-doc/analyze), the browser won't include the session cookie unless explicitly told to.

The app and its platform routes are on the same origin (ipdesk.ai), so credentials: 'include' works — the InsForge session cookie is present and buildRequestContext can read it.

Fix: Always include credentials: 'include' when calling platform routes directly from the zip app:

const res = await fetch('/api/platform/v1/<route>', {
  method: 'POST',
  credentials: 'include',           // ← required for cookie-session auth
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ ... }),
})

This only applies to direct fetch() calls. If you call a route through platform.db() or platform.appFileUpload(), auth is handled automatically.


17. Patent number normalization: USPTO uses bare integer

Symptom: Calling USPTO with 09943073 (8-digit padded) → 404. Calling with 9943073 (7-digit bare) → 200.

Root cause: image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/<n> expects bare integer with no leading zeros, no commas, no US prefix.

Fix: Normalize before calling:

export function padPatentNumber(raw: string): string {
  return raw.replace(/^US/i, '').replace(/,/g, '').trim().replace(/^0+/, '') || '0'
}

US099430739943073 → 200.

Reference: Flow Compliance Report

Flow Compliance Report (給客戶看的 HTML 交付文件)

目的:當 Part 2 實作完成後,產出一份 HTML 報告交給客戶。讓非工程師的客戶能看懂自己原本的工具/流程,被怎麼搬到 IPOS、哪裡符合 IPOS 規則、哪裡需要調整。

讀者假設:客戶會 Excel、會用瀏覽器,但不一定懂什麼是 API、bucket、postMessage。所以全篇用白話:能講「按鈕」就不要講「control」、能講「網路請求」就不要講「HTTP route」。


何時產出

  • 在 Part 2「Build, Package, Deploy」之前寫好(這樣 deploy 失敗的話還能改 spec)
  • 也在 deploy 成功後更新一次(標上實際的 deploymentId 和版本)

產出步驟

  1. 問客戶要放哪:每個客戶的目錄結構不一樣,先問

    「這份 HTML 流程說明書要放在哪?預設是 docs/<slug>-flow-compliance.html,可以嗎?或你想放別的位置?」

  2. 確認後再寫檔。不要進 zip(這是文件,不是 app 的一部分)。
  3. 檔名建議:<slug>-flow-compliance.html(例:g1001-flow-compliance.html

三大內容(A + B + C,用客戶看得懂的話寫)

A. 你原本的流程(Before)

用流程圖或編號步驟描述客戶現在怎麼做事。例如:

  • 「打開 Excel 檔 → 貼上專利號 → 跑巨集 → 等 5 分鐘 → 結果存 D:\output\」
  • 「打開 Streamlit → 上傳 PDF → 點 OCR → 下載 CSV」

寫法提示:用客戶自己會說的話,不要翻成技術語言。

B. 搬到 IPOS 後的新流程(After)

對應同一件事在 IPOS 上怎麼做:

  • 在哪個視窗開?
  • 點哪個按鈕?
  • 資料存到哪裡?(IPOS 雲端 / 客戶自己下載 / OS Agent 幫忙做)
  • 哪些步驟變成「不用做」(因為 IPOS 自動處理了)

C. 符合 / 不符合 對照(Compliance Checklist)

用三欄表格:項目 | 狀態 | 說明(白話)

必須涵蓋這些檢查項(用客戶能懂的話包裝):

技術檢查項 翻成白話
不能有客戶自管伺服器 「整個工具不用你自己開機/維護伺服器」
沒有 DB 帳密在程式碼 「裡面沒有寫死任何資料庫密碼」
用到的 platform routes 「IPOS 幫你跑的雲端服務(OCR / 翻譯 / 外部查詢)」
Agent Control Surface 完整 「OS Agent(小幫手)能幫你按下每一個按鈕、看到每個欄位」
沒有隱藏的 UI 狀態 「沒有 Agent 看不到的隱藏資料」
每個 redeploy 有 bump version 「每次更新都會升版號,不會跟舊版混淆」

每一列三種狀態:

  • 符合 — 一句話說明怎麼做到的
  • ⚠️ 部分符合 — 為什麼、哪裡需要客戶配合(例:「需要你提供 OCR 的範例 PDF 來校準」)
  • 不符合 — 為什麼、IPOS 還是客戶要做哪件事才能補上

D. 不符合項的補救計畫(Remediation Plan)

如果有任何 ⚠️ / ❌,逐項列出

  • 是什麼問題(白話)
  • 誰要處理(客戶 / IPOS / 一起)
  • 預計怎麼處理(例:「IPOS 會新增一個 OCR 雲端服務,預計 X 週」)
  • 在補上之前,這個工具能用嗎?哪裡會卡住?

E. Agent 能做什麼(給客戶看的能力清單)

platform.json.agent_actions 翻成白話,例如:

  • ❌ 不要寫:set_patent_number(value: string)
  • ✅ 要寫:「填入專利號碼」

列出 Agent 能到什麼(state)和能什麼(actions)。讓客戶知道「我可以叫 OS 小幫手做這些事」。

F. 用到的 IPOS 雲端服務(Platform Routes)

如果這個 app 有用到 IPOS 提供的 platform route(OCR、翻譯、外部 API proxy 等),在這裡列:

  • 服務名稱(白話)
  • 做什麼用
  • IPOS 維護,客戶不用

HTML 模板(直接拿來改)

<!DOCTYPE html>
<html lang="zh-TW">
<head>
<meta charset="UTF-8">
<title>{{App 名稱}} — IPOS 流程對照與規則檢查</title>
<style>
  body { font-family: -apple-system, "Segoe UI", "Microsoft JhengHei", sans-serif;
         max-width: 880px; margin: 40px auto; padding: 0 20px; color: #222;
         line-height: 1.7; }
  h1 { border-bottom: 3px solid #2563eb; padding-bottom: 8px; }
  h2 { color: #1e40af; margin-top: 40px; }
  h3 { color: #374151; }
  .meta { background: #f3f4f6; padding: 12px 16px; border-radius: 8px;
          font-size: 14px; color: #555; }
  table { width: 100%; border-collapse: collapse; margin: 16px 0; }
  th, td { border: 1px solid #d1d5db; padding: 10px 12px; text-align: left;
           vertical-align: top; }
  th { background: #f9fafb; }
  .ok { color: #15803d; font-weight: 600; }
  .warn { color: #b45309; font-weight: 600; }
  .fail { color: #b91c1c; font-weight: 600; }
  .flow { background: #eff6ff; padding: 16px 20px; border-left: 4px solid #2563eb;
          border-radius: 4px; margin: 12px 0; }
  .flow ol { margin: 0; padding-left: 20px; }
  code { background: #f3f4f6; padding: 2px 6px; border-radius: 3px;
         font-size: 0.9em; }
</style>
</head>
<body>

<h1>{{App 名稱}} — 流程對照與規則檢查</h1>

<div class="meta">
  <strong>客戶:</strong>{{客戶名}}<br>
  <strong>App slug:</strong><code>{{slug}}</code><br>
  <strong>版本:</strong>{{version}}<br>
  <strong>產出日期:</strong>{{YYYY-MM-DD}}<br>
  <strong>狀態:</strong>{{草稿 / 已上線 deploymentId=xxx}}
</div>

<h2>1. 你原本是怎麼做這件事的</h2>
<div class="flow">
  <ol>
    <li>{{步驟 1,用客戶自己的話}}</li>
    <li>{{步驟 2}}</li>
    <li>{{...}}</li>
  </ol>
</div>
<p>{{原本流程的痛點:例如要裝 Python、要等很久、檔案散落、換電腦就不能用 ...}}</p>

<h2>2. 搬到 IPOS 之後變這樣</h2>
<div class="flow">
  <ol>
    <li>{{步驟 1,例:在 IPOS 桌面打開 g1001 視窗}}</li>
    <li>{{步驟 2}}</li>
    <li>{{...}}</li>
  </ol>
</div>
<p>{{改善點:不用裝環境、瀏覽器就能跑、結果自動存到 IPOS 雲端、可以叫 Agent 幫你做 ...}}</p>

<h2>3. IPOS 規則檢查(你的 app 符合嗎?)</h2>
<table>
  <thead>
    <tr><th>檢查項目</th><th>狀態</th><th>說明</th></tr>
  </thead>
  <tbody>
    <tr>
      <td>整個工具不用自己開伺服器</td>
      <td><span class="ok">✅ 符合</span></td>
      <td>{{說明}}</td>
    </tr>
    <tr>
      <td>程式碼裡沒有任何資料庫密碼</td>
      <td><span class="ok">✅ 符合</span></td>
      <td>{{說明}}</td>
    </tr>
    <tr>
      <td>OS 小幫手(Agent)看得到所有欄位、按得了所有按鈕</td>
      <td><span class="ok">✅ 符合</span></td>
      <td>{{說明}}</td>
    </tr>
    <tr>
      <td>每次更新會升版號</td>
      <td><span class="ok">✅ 符合</span></td>
      <td>目前版本 {{version}}</td>
    </tr>
    <tr>
      <td>{{範例:OCR 功能}}</td>
      <td><span class="warn">⚠️ 部分符合</span></td>
      <td>{{IPOS 還在做這個雲端服務,預計 X 週後完成}}</td>
    </tr>
  </tbody>
</table>

<h2>4. 還沒符合的部分要怎麼處理</h2>
<table>
  <thead>
    <tr><th>問題</th><th>誰處理</th><th>怎麼處理</th><th>在這之前能用嗎?</th></tr>
  </thead>
  <tbody>
    <tr>
      <td>{{問題白話描述}}</td>
      <td>{{IPOS / 你 / 一起}}</td>
      <td>{{做法}}</td>
      <td>{{能用,只是 X 功能會跳提示 / 不能用,要等}}</td>
    </tr>
  </tbody>
</table>

<h2>5. OS 小幫手(Agent)能幫你做什麼</h2>
<p>你可以對 OS 小幫手說這些話,它會幫你操作這個 app:</p>
<ul>
  <li>「{{範例:把專利號碼填成 US12345678}}」</li>
  <li>「{{範例:開始下載}}」</li>
  <li>「{{範例:取消 / 重設}}」</li>
</ul>
<p>它也看得到這些東西(不用你截圖給它):</p>
<ul>
  <li>{{目前填了什麼}}</li>
  <li>{{現在跑到第幾步}}</li>
  <li>{{有沒有錯誤訊息}}</li>
</ul>

<h2>6. 用到哪些 IPOS 雲端服務</h2>
<table>
  <thead>
    <tr><th>服務</th><th>做什麼用</th><th>誰維護</th></tr>
  </thead>
  <tbody>
    <tr>
      <td>{{範例:OCR 文字辨識}}</td>
      <td>把 PDF 裡的圖片轉成可搜尋的文字</td>
      <td>IPOS(你不用管)</td>
    </tr>
  </tbody>
</table>

<h2>7. 結論</h2>
<p>{{一段話總結:這個 app 是否完全符合 IPOS 規則、可以正式使用、還是有等待事項。}}</p>

</body>
</html>

寫作檢查(產出前自己看一遍)

  • [ ] 全篇沒有「API / endpoint / route / postMessage / iframe / manifest / bucket」這種詞(除非有解釋)
  • [ ] 每個 ⚠️ / ❌ 都有對應的補救計畫(誰、怎麼做、什麼時候)
  • [ ] Agent actions 全部翻成「對 OS 小幫手說的話」
  • [ ] 客戶讀完知道:(1) 我可以開始用嗎 (2) 哪裡還在等 (3) 我自己要不要做什麼
  • [ ] 如果 deploy 已成功,最上面的「狀態」有填 deploymentId 和版本

Worked example: g1001 (Pattern A/B — external API proxy)

g1001 Walkthrough — Complete Worked Example

g1001 is the reference implementation for ipos-sdk-converter. Originally a Streamlit Python app that extracts US patent numbers from USPTO PTO-892 Office Actions and downloads each patent as a PDF; converted to an IPOS SDK app with full feature parity, zero customer servers, and shipped end-to-end in apps/g1001-app/.

This is the doc to read when you want to see what a successful conversion looks like.


Original app summary

Stack: Python 3, Streamlit, PyMuPDF (PDF extract), Mistral OCR, requests (USPTO download).

Features:

  1. Upload a PDF (PTO-892 form)
  2. Extract text with PyMuPDF
  3. Detect scanned PDFs and run Mistral OCR fallback
  4. Filter the "U.S. PATENT DOCUMENTS" section
  5. Extract all US patent numbers via regex
  6. Show editable list (user adds/removes patents)
  7. Download all patent PDFs from USPTO into a ZIP
  8. Push patent numbers to the IPOS database

Customer pain: Each user needed local Python install + dependency dance every time. Different team members had different versions. Couldn't share results.


Part 1 — SDK Spec Sheet (filled by non-technical owner)

Feature Needs server? Bucket Resource
Upload PDF No zip <input type="file">
Extract text from PDF No zip pdfjs-dist
Detect scanned PDF No zip pdfjs-dist (text length < threshold)
OCR scanned PDF Yes (API key + compute) platform route /api/platform/v1/ocr
Filter PTO-892 section No zip JS regex
Extract patent numbers No zip JS regex
Display + edit list No zip React
Download USPTO PDFs Yes (CORS + IP block) platform route /api/platform/v1/uspto-proxy
Package as ZIP No zip jszip
Push patents to org DB IPOS data platform-js SDK platform.db('org:prior_art').insert()

New routes that needed building: /api/platform/v1/ocr and /api/platform/v1/uspto-proxy.


Part 2 — Implementation outcomes

Repo structure

apps/g1001-app/
├── platform.json               # slug: g1001, name: G1001 專利擷取
├── package.json                # next 14.2.33, react 18, pdfjs-dist 4.10, jszip 3.10
├── next.config.js              # output: 'export', basePath, assetPrefix
├── pages/
│   ├── _app.tsx
│   └── index.tsx               # dynamic(() => import('../src/G1001App'), { ssr: false })
└── src/
    ├── G1001App.tsx            # step router 1 → 2 → 2.5 → 3
    ├── vendor/platform-js.ts   # SDK copy
    ├── steps/
    │   ├── Step1Upload.tsx     # file input
    │   ├── Step2Extract.tsx    # pdfjs + OCR fallback via /ocr
    │   ├── Step25EditList.tsx  # editable patent list
    │   └── Step3Download.tsx   # USPTO proxy + JSZip + platform DB push + manual fallback link
    └── lib/
        ├── normalize-patent.ts # strip US prefix, commas, leading zeros
        ├── pto892-parser.ts    # regex: filter section + extract numbers
        └── pdf-extract.ts      # pdfjs-dist wrapper, isScanned detection

Platform routes built

CF Worker (added during implementation)

USPTO blocks Zeabur's datacenter IPs. We discovered this when 9/9 patents 502'd in production despite working locally. Fix:

  • infra/cf-workers/uspto-proxy/ — 30-line Cloudflare Worker that fetches USPTO from CF's edge IPs (not blocked) and caches 1 day.
  • Set USPTO_PROXY_URL=https://uspto-proxy.<sub>.workers.dev on Zeabur.
  • lib/platform/uspto.ts reads the env var; uses Worker if set, falls back direct otherwise.

This is now a generic pattern — see gotchas.md → "External API blocks Zeabur datacenter IP".

Browser-side fallback

Even with the CF Worker, Step3Download.tsx keeps a fallback: if proxy returns 503, render a "直接下載 ↗" link to USPTO that the user clicks (their browser IP isn't blocked, and CORS doesn't apply to navigation). Patent numbers still push to IPOS DB even when PDFs aren't fetched server-side. This is the "graceful degradation when CF Worker URL is unset" path.


Key porting decisions (Python → TypeScript)

Original Replacement Why
PyMuPDF (fitz) pdfjs-dist C extension; can't run in browser. Same conceptual API: load PDF, iterate pages, extract text items.
Mistral OCR via Python SDK /api/platform/v1/ocr Holds API key server-side. App sends base64 PDF, gets text back.
requests.get('https://image-ppubs.uspto.gov/...') /api/platform/v1/uspto-proxy → CF Worker → USPTO Two layers: browser CORS (no headers) + Zeabur IP block (datacenter ASN). Both solved by routing through CF edge.
ipdesk_sdk.py.push(rows) (was never wired up) platform.db('org:prior_art').insert(row) looped per row SDK takes single row; loop + /duplicate key/i dedupe on retries.
Streamlit step state React useState + step router in G1001App.tsx Same UX, runs in browser instead of Streamlit server.

Bugs encountered (and the gotchas they spawned)

Every bug below caused at least one user-visible failure. All are now in references/gotchas.md.

Bug Symptom Fix in this app
PowerShell zip "Malicious entry: 404\index.html" Use bash zip -r ../app.zip platform.json out/
Wrong zip structure Build tries npm install, fails ENOENT Zip with out/ as sub-folder, not root
Same deploymentId returned New zip silently re-uses old failed deploy Bump version in platform.json
prior_art read-only does not allow inserts via SDK Add insertable_columns to lib/platform/org-table-config.ts
.insert(array) cast TS lets it compile, runtime route 400s Loop one row at a time
source: 'g1001' extra column column "source" not in insertable_columns prior_art has no source column — drop it
502 from proxy Cloudflare replaces body with HTML page Return 503 from origin
403 from USPTO Datacenter IP blocked CF Worker proxy + USPTO_PROXY_URL env
iframe download blocked "Download is disallowed" Add allow-downloads to sandbox in components/os/Window.tsx
pdfjs-dist not found Root tsconfig sweeps in app/.ts files npm install pdfjs-dist at root OR exclude apps/** in tsconfig

What customers learn from this example

  1. Any Python app can convert — even ones with C-extension deps (PyMuPDF), API keys (Mistral), and CORS+IP-blocked external APIs (USPTO).
  2. Server-side needs map cleanly to platform routes. No customer ever runs a Python server.
  3. The diagnosis sheet front-loads decisions. If they had skipped Part 1 and dived into code, they'd have built a Lambda for USPTO downloads — wrong solution.
  4. CF Workers are a generic escape hatch for IP-blocked APIs. First time we hit it, we fixed it. Now it's a pattern.
  5. Each gotcha encountered → permanent skill knowledge. The next app conversion won't re-hit any of these.

Production state (as of last verified)

Item Value
g1001 version 1.0.5
Deploy ID f7053709-636a-428e-990e-0005a7f9e56a (status: live)
Runtime URL /api/app-static/g1001/
CF Worker URL https://uspto-proxy.sku772003.workers.dev
Test patent 9943073 → 200 OK, 332652 bytes, %PDF magic ✅
End-to-end 9/9 patents downloaded, ZIP delivered, 9 rows in prior_art

To redeploy this app from scratch: see part2-implementation.md → "Build, Package, Deploy".

Worked example: scan-doc (Pattern E — AI analysis + file storage)

scan-doc Walkthrough — AI Document Analysis + File Storage

scan-doc (掃描文書分割) is the second reference implementation for ipos-sdk-converter. It converts a Streamlit-based Python tool (scan-doc-organizer) that uses Gemini Vision to identify legal document boundaries in combined scanned PDFs, then splits them and saves each document as a separate file to 檔案總管.

This walkthrough demonstrates Pattern E: AI analysis with structured output + platform.appFileUpload() — the pattern to use whenever an app needs to call an AI Vision API and store output files in the org's file manager.


Original app summary

Stack: Python 3, Streamlit, google-genai Python SDK, pdf2image, pypdf, Pillow.

Features:

  1. Upload a combined PDF (scanned legal documents, multiple per file)
  2. Send PDF to Gemini Vision for analysis — identify each document's page range, type, case number, court, suggested filename
  3. Show editable table of identified documents (user can adjust page ranges / filenames)
  4. Split the PDF into per-document files using explicit page lists
  5. Download the split files as a ZIP

Customer pain: Tool ran only on the developer's laptop. Required Python + poppler + API key setup. Output ZIPs were ephemeral — no org-level storage, no audit trail.


Part 1 — SDK Spec Sheet

Feature Needs server? Bucket Resource
Upload PDF (max 10 MB) No zip <input type="file">
Validate PDF format No zip Magic bytes check %PDF- in JS
Send PDF to Gemini Vision for analysis Yes (platform API key) platform route /api/platform/v1/scan-doc/analyze
Show editable document table No zip React
Validate page ranges (start ≤ end) No zip JS pre-flight check
Split PDF into per-document files Yes (compute) platform route /api/platform/v1/scan-doc/split
Save split PDFs to 檔案總管 IPOS data platform-js SDK platform.appFileUpload()

New routes built: /api/platform/v1/scan-doc/analyze and /api/platform/v1/scan-doc/split.

Key decisions vs g1001:

  • g1001: external API calls (USPTO) → needs CF Worker for IP block
  • scan-doc: AI model call (Gemini) → ideally uses AI Actions system (executeAction); current implementation uses GEMINI_API_KEY env var directly as a shortcut (tech debt — bypasses AI Actions is_enabled gate and usage logging). See tech debt note in route-design-patterns.md Pattern E.

Part 2 — Implementation outcomes

Repo structure

apps/scan-doc-app/
├── platform.json               # slug: scan-doc, permissions: { org_write: ["app_files"] }
├── package.json                # next 14.2.33, react 18, pdfjs-dist 4.10
├── next.config.js              # output: 'export', basePath, assetPrefix
├── pages/
│   ├── _app.tsx
│   └── index.tsx               # dynamic(() => import('../src/ScanDocApp'), { ssr: false })
└── src/
    ├── ScanDocApp.tsx          # Platform.connect(), state machine, agent actions
    ├── vendor/platform-js.ts   # SDK copy (same as g1001)
    └── steps/
        ├── Step1Upload.tsx     # file picker → validate → call analyze route
        ├── Step2Analyzing.tsx  # spinner while route runs
        └── Step3Results.tsx    # editable table → call split → appFileUpload loop

Platform routes built

  • app/api/platform/v1/scan-doc/analyze/route.ts — validates PDF (magic bytes + pdf-lib page count), sends as inlineData to Gemini 2.5 Flash with responseSchema using SchemaType enum. Returns { data: { documents, page_count } }.
  • app/api/platform/v1/scan-doc/split/route.ts — pure pdf-lib splitting; no external calls. Accepts explicit pages: number[] per document for flexible page reordering. Returns { data: { files: [{ filename, pdf_b64 }] } }.

What the AI actually does

Gemini 2.5 Flash Vision analyzes the PDF inline (no rendering needed — Gemini accepts inlineData: { mimeType: 'application/pdf', data: base64 }). The prompt instructs it to identify each independent legal document within the combined scan by looking for:

  • Receipt stamps (收文章) — each stamp indicates a new document
  • Different case numbers (案號) — different numbers = different documents
  • Court identifiers, document type, dates

It returns a structured JSON array (enforced via responseSchema) with page boundaries and suggested filenames. The user can correct any misidentifications in the editable table before splitting.

File storage flow

After splitting, the app loops platform.appFileUpload() per document:

for (const file of splitResult.files) {
  await platform.appFileUpload({
    name: `${file.filename}.pdf`,
    mime_type: 'application/pdf',
    content_b64: file.pdf_b64,
    metadata: { source: 'scan-doc', original_file: result.file_name },
  })
}

Files land in app_files (org-scoped) and appear immediately in 檔案總管. platform.json declares permissions: { org_write: ["app_files"] } to authorize this.


Bugs encountered (and gotchas they spawned)

Bug Symptom Fix
SchemaType string literals in @google/generative-ai TS2322: Type '"array"' not assignable to 'SchemaType' → Zeabur build fails Use SchemaType.ARRAY enum. See gotchas #19
Test mock missing SchemaType All 6 analyze tests fail: SchemaType is undefined at module eval Add SchemaType: { ARRAY: 'array', ... } to vi.mock('@google/generative-ai', () => ({ ... })). See gotchas #19
author_org_id = null in ipos_apps Deploy 403 / wrong permission scope Set author_org_id to org UUID. See gotchas #20
PowerShell zip backslash entries "Malicious entry: 404\index.html" on upload Use bash zip -r. See gotchas #1
Direct fetch needs cookie auth Platform route returns 401 from iframe Add credentials: 'include' to every direct fetch() to platform routes. See gotchas #21
Page range inversion (start > end) Silent empty output document Pre-flight JS validation before calling split route
doc.pages not validated server-side Empty page array slips through, pdf-lib produces zero-page PDF Validate Array.isArray(doc.pages) && doc.pages.length > 0 in split route
Buffer.from() doesn't throw on bad base64 Invalid base64 treated as garbage bytes Check %PDF- magic bytes after decode
pdf-lib load() fails on some real-world PDFs pageCount is wrong Wrap in try-catch, fall back to 1; page count is informational only

Key porting decisions (Python → TypeScript)

Original Replacement Why
google-genai Python SDK + pdf2image for rendering /api/platform/v1/scan-doc/analyze with inlineData Gemini API accepts PDF inline — no page rendering needed. API key kept server-side.
pypdf.PdfReader / PdfWriter for splitting /api/platform/v1/scan-doc/split with pdf-lib pdf-lib is pure JS, works in Node.js without native deps. Supports page reordering via explicit page list.
Streamlit editable table + ZIP download React editable table + platform.appFileUpload() loop No ZIP download needed — files go directly into org's 檔案總管. User gets permanent storage instead of a one-time download.
Local file saved on laptop platform.appFileUpload()app_files All processed docs are now org-accessible and auditable.

Production state

Item Value
App slug scan-doc
Version 1.0.0
Deploy ID da93f5db-1544-4bee-a1ba-0fcd4d7cb062 (status: live)
Runtime URL /api/app-static/scan-doc/
Test cases 10/10 (6 analyze + 4 split)
AI model Gemini 2.5 Flash
Env var required GEMINI_API_KEY on Zeabur

To redeploy: see part2-implementation.md → "Build, Package, Deploy". Bump version in platform.json before re-uploading.

Reference: Session Persistence (opt-in cross-device state)

Session Persistence — Opt-in Contract

Audience: developers shipping an IPOS SDK app who want their app to remember its in-flight state across devices.

Status: opt-in. Apps that do not opt in still get cross-device window geometry (position/size/z-order) — the OS handles that without your help.

When to opt in

Opt in when your app has user-meaningful in-app state that should survive closing the browser on machine A and re-opening on machine B:

  • Form drafts (subject + body of an unfinished email, a partially-filled wizard)
  • Current view / tab / filter selection
  • Pinned items or local sort order
  • Anything the user would consider "where I left off"

Do not opt in for:

  • Auth tokens, refresh tokens, session keys (regenerate from cookies)
  • Live WebSocket connections, in-flight uploads (re-establish on mount)
  • Large blobs (limit is 64 KB JSON per window — uploads or images go in storage, not session state)
  • Data already in the user's database (case lists, email cache — re-fetch)

How to opt in

1. Manifest

Add to platform.json (or v2 bundle manifest):

{ "session_persistence": true }

2. Implement two control-surface commands

The OS speaks two standard postMessage shapes for session persistence:

OS → app (on mount, if a prior snapshot exists):

{ "type": "ipos.session.restore",
  "windowId": "...",
  "payload": { "state": <your state>, "schemaVersion": 3 } }

App → OS (when persistable state changes):

{ "kind": "ipos:request",
  "type": "session.snapshot",
  "id": <request id>,
  "payload": { "state": <your state>, "schemaVersion": 3 } }

The OS debounces incoming snapshots: repeated messages within ~1 second coalesce into a single network write. The OS acknowledges each snapshot with a normal ipos:response.

3. Trigger a snapshot when state changes

If your SDK helper exposes ipos.notifySnapshot(), call it on every relevant state change. Otherwise post manually:

window.parent.postMessage(
  { kind: 'ipos:request', type: 'session.snapshot', id: Date.now(),
    payload: { state: getCurrentDraft(), schemaVersion: 3 } },
  '*',
);

Size, frequency, and schema rules

Rule Limit
state JSON size (after stringify) 64 KB
Snapshot frequency OS debounces to 1 per second per window
schemaVersion Plain integer. Increase when state shape changes.

Oversized payloads return 413 from the server; your app receives no acknowledgement and the previous good snapshot remains the persisted state.

Schema migration

Snapshots can outlive code deploys. When you change the state shape, bump schemaVersion. Your restore handler decides:

  • If schemaVersion === current → apply.
  • If older → migrate or discard.
  • If newer (rolled back deploy) → discard.

The OS never inspects the state JSON, so migration is entirely your responsibility.

What about multiple devices open at once?

state is overwritten on every snapshot — last write wins. The expected pattern is that the user is actively typing in one device at a time. Concurrent edits in two browsers will lose the older one. This is acceptable because (a) the user-visible window has only one active editor at a time, (b) the value of "survives a browser close" is high and the value of "merges concurrent edits in two browsers" is low.

Verification checklist

When adding session persistence to an app, verify:

  • [ ] Manifest declares session_persistence: true.
  • [ ] App emits session.snapshot messages with JSON ≤ 64 KB.
  • [ ] App handles ipos.session.restore for current and prior schema versions.
  • [ ] No auth tokens, refresh tokens, or live connection IDs appear in state.
  • [ ] Sensitive PII filtered out of state (the value is stored server-side under the user's row).
  • [ ] On a fresh device + same user, opening the app shows the last-saved state.
  • [ ] If the schema is bumped, old snapshots are migrated or discarded gracefully (no thrown errors in restore).

Authenticated calls to /api/platform/v1/* — shell-token contract (validator R002)

Rule: every fetch from a static IPOS app to /api/platform/v1/* MUST attach the shell token, otherwise the platform route returns 401 unauthenticated with no session or bearer token.

How the shell delivers the token: when the OS shell opens your app's iframe, it appends ?shell_token=<jwt> to the URL. You read it from window.location.search, then add the x-ipos-shell-token header on every platform-route call.

Canonical pattern (mirrored from apps/os-map-app/src/useTopology.ts):

// lib/api.ts
function readShellToken(): string | null {
  if (typeof window === 'undefined') return null;
  return new URLSearchParams(window.location.search).get('shell_token');
}

async function callPlatform<T>(url: string): Promise<T> {
  const headers: Record<string, string> = { accept: 'application/json' };
  const token = readShellToken();
  if (token) headers['x-ipos-shell-token'] = token;
  const r = await fetch(url, { headers });
  return r.json();
}

For SSE endpoints (EventSource cannot set headers), use ?shell_token=… as a query param — the platform routes that support SSE also accept query-string fallback (e.g. /api/platform/v1/topology/events?shell_token=…).

Validator enforcement (R002, added 2026-05-17):

@ipdesk/ipos-validator ships R002 in defaultRules. It scans lib/, src/, components/, pages/, app/ for files that reference '/api/platform/v1/' but do NOT contain any of shell_token / x-ipos-shell-token / x-ipos-shell and emits a warning diagnostic. This rule runs at cli, pack, upload, and build layers. The CLI ipos pack will surface the warning before you ship; the server-side upload also runs it as a defense-in-depth.

If your app uses a shared helper imported from another file (false positive), make sure the file that actually does the fetch also imports / references a token marker so R002 sees it.

Past incident: ipdesk-legal v1.1.0 shipped without this — every search returned no session or bearer token in the UI. v1.1.1 patched lib/api.ts to include readShellToken(). R002 now catches this class of bug pre-ship.


Available AI Agents (auto-generated)

This section is generated at request time from the live ai_actions table. Customer agents reading this catalog should reuse an existing slug when possible rather than declaring a new one.

slug name description input_type output_shape example_use_case
classify 郵件自動分類 自動分類郵件並套用標籤 text text
composer_agent Composer Agent Agent App Composer - converts natural language to Workspace DAG text text
extract 資料移轉助手 解析上傳檔案(Excel/Word/PDF),自動映射欄位到系統 schema pdf text
legacy_ocr 通用 PDF OCR (legacy) Replaces the legacy v1 /api/platform/v1/ocr direct fetch to api.mistral.ai. Routes through the centralized ai_providers/ai_actions framework so operators can swap models without code changes. image text
legal-search 台灣法律查詢 使用 IPDesk Legal 內建工具搜尋台灣全國法規、司法院判決、大法官解釋與憲法法庭裁判,並依條文/JID/解釋字號引用回答。 text text
os_copilot OS Copilot Agent IPOS 桌面 AI 助理 — 透過 tool-calling 操作已安裝 app、執行 skill。 text text
scan_doc_classify 掃描文書辨識 (Layer A) Tier 1/2 - Gemini 2.5 Flash via OR for legal-doc classification text text
scan_doc_classify_pro 掃描文書辨識 Pro (Layer A) Tier 3 - Gemini 2.5 Pro via OR (escalates when flash leaves uncovered pages) text text
scan_doc_layer_b_ask 掃描文書 Layer B 重判讀 Sub-region high-DPI image ask image text
scan_doc_ocr_fallback_1 掃描文書 OCR 後備一 Tier 2 �X Qwen2.5-VL 72B image text
scan_doc_ocr_fallback_2 掃描文書 OCR 後備二 Tier 3 �X Gemini 2.5 Flash image text
scan_doc_ocr_primary 掃描文書 OCR 主力 Tier 1 �X Qianfan-OCR-Fast (free) image text

Available Platform Routes (auto-generated)

id label serviceId
_log _log svc:_log
app-files App Files svc:app-files
capabilities Capabilities svc:capabilities
chains Chains svc:chains
classify Classify svc:classify
docx Docx svc:docx
email Email svc:email
fetch Fetch svc:fetch
g1002 G1002 svc:g1002
google-patents-pdf Google Patents Pdf svc:google-patents-pdf
legal Legal svc:legal
me Me svc:me
notify Notify svc:notify
ocr Ocr svc:ocr
org Org svc:org
pdf Pdf svc:pdf
public Public svc:public
queue Queue svc:queue
scan-doc Scan Doc svc:scan-doc
search Search svc:search
storage Storage svc:storage
sync Sync svc:sync
tables Tables svc:tables
topology Topology svc:topology
uspto-proxy Uspto Proxy svc:uspto-proxy
views Views svc:views
webhook Webhook svc:webhook
xlsx Xlsx svc:xlsx

Available Externals (auto-generated)

id label
ext:ai-agent-service AI 代理服務 (admin-managed)
ext:gemini-direct Google Gemini API (direct, no admin proxy)
ext:anthropic-direct Anthropic Claude API (direct, no admin proxy)
ext:openai-direct OpenAI API (direct, no admin proxy)
ext:gcp-documentai Google Cloud Document AI
ext:uspto USPTO 美國專利局
ext:mistral Mistral AI (direct, no admin proxy)
ext:google-patents Google Patents
ext:aws-s3 AWS S3
ext:stripe Stripe

想實際試?

14 天免費試用 — 包含 API Key 與所有 6 個 Apps。

免費試用看 Composer 願景